Método basado en aprendizaje automático para la calificación de ensayos cortos en inglés de una muestra de estudiantes de bachillerato

dc.contributor.advisorNiño Vásquez, Luis Fernandospa
dc.contributor.authorBofill Barrera, Joan Gabrielspa
dc.contributor.refereeLeón Guzmán, Elizabethspa
dc.contributor.researchgrouplaboratorio de Investigación en Sistemas Inteligentes Lisispa
dc.date.accessioned2024-05-28T22:15:47Z
dc.date.available2024-05-28T22:15:47Z
dc.date.issued2024
dc.descriptionilustraciones, diagramasspa
dc.description.abstractEste trabajo aborda el desafío de la calificación automática de ensayos argumentativos en inglés escritos por estudiantes de bachillerato que están aprendiendo el inglés como segunda lengua. El objetivo general es implementar un método automético basado en aprendizaje supervisado que permita resolver esta tarea para 6 indicadores en simultáneo: Cohesión, Sintaxis, Vocabulario, Gramática, Fraseología y Convenciones en escala de 1 a 5. Para lograrlo, se realiza un análisis descriptivo de los datos, se aplican procedimientos de preprocesamiento y se extraen características relevantes; se exploran diferentes estrategias, técnicas de representación y modelos desde algunos clásicos hasta aquellos con mejor desempeño en la actualidad, evaluando en cada iteración su rendimiento, contrastándola con las calificaciones humanas. Luego, se presenta el modelo con menor error que está basado principalmente en DeBERTa al cual se le aplican distintas técnicas para mejorar su desempeño y se combina con un modelo SVR que toma como características los embeddings de los textos concatenados en 10 modelos preentrenados sin fine-tuning. Con esta estrategia, el resultado se acerca bastante a las calificaciones humanas, presentando un RMSE de 0.45 sobre todos los indicadores. (Texto tomado de la fuente).spa
dc.description.abstractThis work addresses the challenge of automatically grading argumentative essays in English written by high school students that learn English as a second language. The general objective is to implement an automatic method based on supervised learning that allows solving this task for 6 indicators simultaneously: Cohesion, Syntax, Vocabulary, Grammar, Phraseology and Conventions rated on a scale from 1 to 5. To achieve this, a descriptive analysis of the data is conducted, preprocessing procedures are applied and relevant features are extracted; different strategies, representation techniques and models are explored, from some classic ones to the currently best performing models. Their performance is evaluated in each iteration, contrasting it with human ratings with a chosen measure. Then, the method with the best performance is presented, it is based mainly on DeBERTa V3 Large, where different techniques are applied to improve its performance. Finally, and is combined with a regressor model SVR that takes as features the concatenated embeddings of the texts in 10 different pretrained models. With this strategy, the result is quite close to human ratings, presenting a root mean square error of 0.45 over all indicators.eng
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ingeniería - Ingeniería de Sistemas y Computaciónspa
dc.description.researchareaSistemas inteligentesspa
dc.format.extentvii, 61 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/86173
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.facultyFacultad de Ingenieríaspa
dc.publisher.placeBogotá, Colombiaspa
dc.publisher.programBogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computaciónspa
dc.relation.referencesP. Kline, The New Psychometrics: Science, Psychology and Measurement. Routledge, 1 ed., 1999.spa
dc.relation.referencesT. N. Fitria, “Artificial intelligence (AI) technology in OpenAI ChatGPT application: A review of ChatGPT in writing English essay.,” ELT Forum: Journal of English Language Teaching, vol. 12, no. 1, pp. 44–58, 2023.spa
dc.relation.referencesE. B. Page, “Grading Essays by Computer: Progress Report. Proceedings of the 1966 Invitational Conference on Testing Problems.,” Princeton, N.J. Educational Testing Service, pp. 87–100, 1967.spa
dc.relation.referencesE. Page, “The use of the computer in analyzing student essays,” Int Rev Educ, pp. 210–225, 1968.spa
dc.relation.referencesK. L. Gwet, “Handbook of inter-rater reliability: The definitive guide to measuring the extent of agreement among raters,” Advanced Analytics, LLC, 2014.spa
dc.relation.referencesMark D. S., “Contrasting State-of-the-Art in the Machine Scoring of Short-Form Cons-tructed Responses,” Educational Assessment, vol. 20, no. 1, pp. 46–65, 2015.spa
dc.relation.referencesAlex Franklin, Natalie Rambis, Maggie Meg Benner, Perpetual Baffour, Ryan Holbrook, and u. Scott Crossley, “Feedback Prize - English Language Learning. Kaggle. ,” 2022.spa
dc.relation.referencesS. A. Crossley, K. Kyle, and D. S. Mcnamara, “To Aggregate or Not? Linguistic Features in Automatic Essay Scoring and Feedback Systems,” Grantee Submission, vol. 8, no. 1, 2015.spa
dc.relation.referencesC. Ramineni and D. M. Williamson, “Automated essay scoring: Psychometric guidelines and practices,” Assessing Writing, pp. 25–39, 2013.spa
dc.relation.referencesS. P. Balfour, “Assessing Writing in MOOCs: Automated Essay Scoring and Calibrated PeerReview™,” Research & Practice in Assessment, pp. 40–48, 2013.spa
dc.relation.referencesS. Cushing Weigle, “Validation of automated scores of TOEFL iBT tasks against non-test indicators of writing ability,” Language Testing, vol. 27, no. 3, pp. 335–353, 2010.spa
dc.relation.referencesK. Taghipour, Robust trait-specific essay scoring using neural networks and density es-timators. PhD thesis, National University of Singapore, Singapore, 2017.spa
dc.relation.referencesH. Shi and V. Aryadoust, “Correction to: A systematic review of automated writing evaluation systems,” Education and Information Technologies, vol. 28, pp. 6189–6190, 5 2023.spa
dc.relation.referencesP. C. Jackson, Toward human-level artificial intelligence: Representation and computation of meaning in natural language. Dover Publications, 11 2019.spa
dc.relation.referencesE. Mayfield and C. P. Rosé, “LightSIDE,” in Handbook of Automated Essay Evaluation, Routledge, 1 ed., 2013.spa
dc.relation.referencesM. Shermis and J. Burstein, Handbook of Automated Essay Evaluation: Current Applications and New Directions. Routledge, 1 ed., 2013.spa
dc.relation.referencesS. Burrows, I. Gurevych, and B. Stein, “The Eras and Trends of Automatic Short Answer Grading,” Int J Artif Intell Educ, vol. 25, pp. 60–117, 2015.spa
dc.relation.referencesK. Zupanc and Z. Bosnic, “Automated essay evaluation with semantic analysis,” El-sevier, vol. 120, pp. 118–132, 2017.spa
dc.relation.referencesD. Yan, A. A. Rupp, and P. W. Foltz, Handbook of Automated Scoring; Theory into Practice. Chapman and Hall/CRC., 1 ed., 2020.spa
dc.relation.referencesB. Kitchenham, O. Pearl Brereton, D. Budgen, M. Turner, J. Bailey, and S. Linkman, “Systematic literature reviews in software engineering – A systematic literature review,” Information and Software Technology, vol. 51, no. 1, pp. 7–15, 2009.spa
dc.relation.referencesY.-Y. Chen, C.-L. Liu, C.-H. Lee, and T.-H. Chang, “An unsupervised automated essay scoring system,” IEEE Intelligent Systems, vol. 25, no. 5, pp. 61–67, 2010.spa
dc.relation.referencesY. Wang, Z. Wei, Y. Zhou, and X. Huang, “Automatic essay scoring incorporating rating schema via reinforcement learning,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 791–797, 2018.spa
dc.relation.referencesC. Lu and M. Cutumisu, “Integrating Deep Learning into An Automated Feedback Generation System for Automated Essay Scoring,” International Educational Data Mining Society, 2021.spa
dc.relation.referencesK. S. McCarthy, R. D. Roscoe, L. K. Allen, A. D. Likens, and D. S. McNamara, “Automated writing evaluation: Does spelling and grammar feedback support high-quality writing and revision?,” Assessing Writing, vol. 52, 4 2022.spa
dc.relation.referencesA. Sharma and D. B. Jayagopi, “Automated grading of handwritten essays,” in Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, vol. 2018-August, pp. 279–284, 2018.spa
dc.relation.referencesA. Vaswani, G. Brain, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Neural Information Processing Systems, 2017.spa
dc.relation.referencesT. Pedersen, S. Patwardhan, and J. Michelizzi, “WordNet::Similarity-Measuring the Relatedness of Concepts,” AAAI, vol. 4, pp. 25–29, 7 2004.spa
dc.relation.referencesF. Dong and Y. Zhang, “Automatic Features for Essay Scoring – An Empirical Study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing,” Association for Computational Linguistics., pp. 1072–1077, 11 2016.spa
dc.relation.referencesY. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, 2019.spa
dc.relation.referencesE. Mayfield and A. W. Black, “Should You Fine-Tune BERT for Automated Essay Scoring?,” Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, pp. 151–162, 7 2020.spa
dc.relation.referencesD. Ramesh and S. K. Sanampudi, “An automated essay scoring systems: a systematic literature review,” Artificial Intelligence Review, vol. 55, pp. 2495–2527, 3 2022.spa
dc.relation.referencesH. Yannakoudakis and R. Cummins, “Evaluating the performance of automated text scoring systems,” Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 213–223, 2015.spa
dc.relation.referencesR. Bhatt, M. Patel, G. Srivastava, and V. Mago, “A Graph Based Approach to Automate Essay Evaluation,” in Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics, vol. 2020-Octob, pp. 4379–4385, 2020.spa
dc.relation.referencesZ. Ke and V. Ng, “Automated essay scoring: A survey of the state of the art,” in IJCAI International Joint Conference on Artificial Intelligence, vol. 2019-Augus, pp. 6300–6308, 2019.spa
dc.relation.referencesJ. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arXiv preprint arXiv:1810.04805, 2018.spa
dc.relation.referencesD. Castro-Castro, R. Lannes-Losada, M. Maritxalar, I. Niebla, C. Pérez-Marqués, N. Álamo-Suárez, and A. Pons-Porrata, “A multilingual application for automated essay scoring,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5290 LNAI, pp. 243–251, 2008.spa
dc.relation.referencesP. U. Rodriguez, A. Jafari, and C. M. Ormerod, “Language models and Automated Essay Scoring,” ArXiv, 9 2019.spa
dc.relation.referencesS. Ghannay, B. Favre, Y. Estève, and N. Camelin, “Word Embeddings Evaluation and Combination,” Proceedings of the Tenth International Conference on Language Resources and Evaluation, pp. 300–305, 5 2016.spa
dc.relation.referencesM. Mars, “From Word Embeddings to Pre-Trained Language Models: A State-of-the-Art Walkthrough,” Applied Sciences (Switzerland), vol. 12, 9 2022.spa
dc.relation.referencesY. Zhang, R. Jin, and Z.-H. Zhou, “Understanding bag-of-words model: a statistical framework,” International journal of machine learning and cybernetics, vol. 1, pp. 43–52, 12 2010.spa
dc.relation.referencesK. W. CHURCH, “Word2Vec,” Natural Language Engineering, vol. 23, pp. 155–162, 1 2017.spa
dc.relation.referencesJ. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1535–1543, 2014.spa
dc.relation.referencesV. Kumar and B. Subba, “A TfidfVectorizer and SVM based sentiment analysis framework for text data corpus,” in 2020 National Conference on Communications (NCC), pp. 1–6, IEEE, 2 2020.spa
dc.relation.referencesMicrosoft, “GitHub - Microsoft/LightGBM:Light Gradient Boosting Machine.”spa
dc.relation.referencesG. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.Y. Liu, “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” 31st Conference on Neural Information Processing Systems (NIPS 2017), pp. 3149–3157, 2017.spa
dc.relation.referencesT. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13-17-August-2016, pp. 785–794, Association for Computing Machinery, 8 2016.spa
dc.relation.referencesJ.J.Espinosa-Zúñiga, “Aplicación de algoritmos Random Forest y XGBoost en una base de solicitudes de tarjetas de crédito,” Ingeniería Investigación y Tecnología, vol. 21, no. 3, pp. 1–16, 2020.spa
dc.relation.referencesC. Cortes, V. Vapnik, and L. Saitta, “Support-Vector Networks,” Machine Leaming, vol. 20, pp. 273–297, 1995.spa
dc.relation.referencesM. Awad, R. Khanna, M. Awad, and R. Khanna, Support vector regression. Efficient learning machines: Theories, concepts, and applications for engineers and system designers. Apress, 2015.spa
dc.relation.referencesM.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, “Multilayer Perceptron and Neural Networks,” WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579–588, 2009.spa
dc.relation.referencesT. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language Models are Few-Shot Learners,” ArXiv, 5 2020.spa
dc.relation.referencesT. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. Von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. M. Rush, “Transformers: State-of-the-Art Natural Language Processing,” Association for Computational Linguistics, pp. 38–45, 2020.spa
dc.relation.referencesP. He, X. Liu, J. Gao, W. Chen, and M. Dynamics, “Deberta: Decoding-enhanced bert with disentangled attention,” Conference paper at ICLR, 2021.spa
dc.relation.referencesP. Zhang, “Longformer-based Automated Writing Assessment for English Language Learners Stanford CS224N Custom Project,” tech. rep., Standford, 2023.spa
dc.relation.referencesK. K. Y. Chan, T. Bond, and Z. Yan, “Application of an Automated Essay Scoring engine to English writing assessment using Many-Facet Rasch Measurement,” Language Testing, vol. 40, pp. 61–85, 1 2023.spa
dc.relation.referencesA. Mizumoto and M. Eguchi, “Exploring the potential of using an AI language model for automated essay scoring,” Research Methods in Applied Linguistics, vol. 2, 8 2023.spa
dc.relation.referencesV. Mohan, M. J. Ilamathi, and M. Nithya, “Preprocessing Techniques for Text Mining-An Overview,” International Journal of Computer Science & Communication Networks, vol. 5, no. 1, pp. 7–16, 2015.spa
dc.relation.referencesM. Siino, I. Tinnirello, and M. La Cascia, “Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers,” Information Systems, vol. 121, p. 102342, 3 2024.spa
dc.relation.referencesS. A. Crossley, D. B. Allen, and J. S. Danielle McNamara, “Text readability and intuitive simplification: A comparison of readability formulas,” Reading in a foreign language, vol. 23, no. 1, pp. 84–101, 2011.spa
dc.relation.referencesF. Scarselli and A. C. Tsoi, “Universal Approximation Using Feedforward Neural Networks: A Survey of Some Existing Methods, and Some New Results,” Neural Networks, vol. 11, no. 1, pp. 15–37, 1998.spa
dc.relation.referencesP. He, J. Gao, and W. Chen, “DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing,” ArXiv, 11 2021.spa
dc.relation.referencesD. Wu, S.-t. Xia, and Y. Wang, “Adversarial Weight Perturbation Helps Robust Generalization,” ArXiv, 4 2020.spa
dc.relation.referencesH. Inoue, “Multi-Sample Dropout for Accelerated Training and Better Generalization,” ArXiv, 5 2019.spa
dc.relation.referencesA. Stanciu, I. Cristescu, E. M. Ciuperca, and C. E. Cˆırnu, “Using an ensemble of transformer-based models for automated writing evaluation of essays,” in 14th International Conference on Education and New Learning Technologies, (Palma, Spain), pp. 5276–5282, IATED, 7 2022.spa
dc.relation.referencesH. Zhang, Y. Gong, Y. Shen, W. Li, J. Lv, N. Duan, and W. Chen, “Poolingformer: Long Document Modeling with Pooling Attention,” ArXiv, 2021.spa
dc.relation.referencesA. Aziz, M. Akram Hossain, and A. Nowshed Chy, “CSECU-DSG at SemEval-2023 Task 4: Fine-tuning DeBERTa Transformer Model with Cross-fold Training and Multisample Dropout for Human Values Identification,” tech. rep., Department of Computer Science and Engineering University of Chittagong, Chattogram, Bangladesh, 2023.spa
dc.relation.referencesE. Mayfield and A. W. Black, “Should You Fine-Tune BERT for Automated Essay Scoring?,” Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, pp. 151–162, 7 2020.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadoresspa
dc.subject.ddc370 - Educación::373 - Educación secundariaspa
dc.subject.proposalPLNspa
dc.subject.proposalAprendizaje supervisadospa
dc.subject.proposalTransformersspa
dc.subject.proposalEnsamble de modelosspa
dc.subject.proposalSVRspa
dc.subject.proposalNLPeng
dc.subject.proposalAutomatic essay gradingeng
dc.subject.proposalSupervised learningeng
dc.subject.proposalSVReng
dc.subject.proposalKaggle contesteng
dc.subject.proposalCalificación automática de ensayosspa
dc.subject.proposalEnsamble of modelseng
dc.subject.unescoMétodo de evaluaciónspa
dc.subject.unescoEvaluation methodseng
dc.subject.unescoProcesamiento de datosspa
dc.subject.unescoData processingeng
dc.subject.wikidataaprendizaje supervisadospa
dc.subject.wikidatasupervised learningeng
dc.titleMétodo basado en aprendizaje automático para la calificación de ensayos cortos en inglés de una muestra de estudiantes de bachilleratospa
dc.title.translatedMachine learning based method for scoring short english essays from a high school student sampleeng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentEstudiantesspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
dcterms.audience.professionaldevelopmentMaestrosspa
dcterms.audience.professionaldevelopmentPúblico generalspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1032469305.2024.pdf
Tamaño:
1.99 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ingeniería - Ingeniería de Sistemas y Computación

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: