Asignación de puntajes en exámenes estandarizados mediante el uso de redes neuronales y técnicas de equiparación psicométricas compatibles: Caso examen Saber 11 en Colombia

dc.contributor.advisorNiño Vásquez, Luis Fernandospa
dc.contributor.authorDuplat Durán, Ricardo Renéspa
dc.contributor.researchgrouplaboratorio de Investigación en Sistemas Inteligentes Lisispa
dc.coverage.countryColombiaspa
dc.coverage.tgnhttp://vocab.getty.edu/page/tgn/1000050
dc.date.accessioned2024-04-02T00:15:56Z
dc.date.available2024-04-02T00:15:56Z
dc.date.issued2024-01-28
dc.descriptionilustraciones, diagramasspa
dc.description.abstractLos exámenes estandarizados son valiosas herramientas para evaluar de manera objetiva tanto las características cognitivas como no cognitivas de una población específica. Para construir escalas de medición que reflejen con precisión los constructos que estos exámenes buscan evaluar, se recurre comúnmente a la Teoría de Respuesta al Ítem (TRI), una técnica estadística. Sin embargo, la TRI presenta limitaciones cuando sus supuestos no se cumplen, comprometiendo la comparabilidad a lo largo del tiempo y entre subpoblaciones. Este trabajo de grado se propone desarrollar una metodología innovadora que utiliza Redes Neuronales Artificiales (RNA), específicamente a través de AutoEncoders (AE), para preservar las ventajas de la TRI y aplicarla incluso cuando sus supuestos no se cumplen, buscando incluso mejorar la calidad de ajuste y pronóstico. La investigación se basa en el análisis del examen Saber 11 aplicado en los años 2018 y 2019, durante los calendarios A y B en el país. Se obtuvieron resultados que en algunos casos superan el rendimiento de un modelo clásico de la TRI, como el modelo logístico de 2 parámetros (2PL). Esta metodología propuesta no solo busca subsanar las limitaciones de la TRI en ciertos contextos, sino que también busca optimizar la precisión en la asignación de puntajes en exámenes estandarizados mediante técnicas de equiparación compatibles con la psicometría. La aplicación de RNA, en particular a través de AE, emerge como una prometedora alternativa que contribuye al avance de la evaluación estandarizada, ofreciendo mayor flexibilidad y robustez en la medición de constructos educativos. (Texto tomado de la fuente).spa
dc.description.abstractStandardized exams are valuable tools for objectively assessing both cognitive and non-cognitive characteristics of a specific population. To construct measurement scales that accurately reflect the constructs these exams aim to evaluate, the Item Response Theory (IRT), a statistical technique, is commonly employed. However, IRT has limitations when its assumptions are not met, compromising comparability over time and among subpopulations. This thesis aims to develop an innovative methodology using Artificial Neural Networks (ANNs), specifically through AutoEncoders (AE), to preserve the advantages of IRT and apply it even when its assumptions are not met, seeking to enhance the quality of fit and forecasting. The research is based on the analysis of the Saber 11 exam administered in 2018 and 2019, during schedules A and B in the country. Results were obtained that, in some cases, outperform the performance of a classical IRT model, such as the 2-parameter logistic model (2PL). This proposed methodology not only aims to address the limitations of IRT in certain contexts but also seeks to optimize accuracy in score assignment in standardized exams through equating techniques compatible with psychometrics. The application of ANN, particularly through AE, emerges as a promising alternative contributing to the advancement of standardized assessment, offering greater flexibility and robustness in measuring educational constructs.eng
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ingeniería - Ingeniería de Sistemas y Computaciónspa
dc.description.researchareaSistemas inteligentesspa
dc.format.extentx, 81 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/85835
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.facultyFacultad de Ingenieríaspa
dc.publisher.placeBogotá, Colombiaspa
dc.publisher.programBogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computaciónspa
dc.relation.referencesAmerican Educational Research Association -AERA, American Psychological Association - APA, & National Council on Measurement in Education –NCME (2018). Estándares para pruebas educativas y psicológicas. American Educational Research Association.spa
dc.relation.referencesAmin, A. (2020), ``A Face Recognition System Based on Deep Learning (FRDLS) to Support the Entry and Supervision Procedures on Electronic Exams``. International Journal of Intelligent Computing and Information Sciences, 20(1). https://doi.org/10.21608/ijicis.2020.23149.1015spa
dc.relation.referencesBasheer, Imad & Hajmeer, M.N.. (2001). Artificial Neural Networks: Fundamentals, Computing, Design, and Application. Journal of microbiological methods. 43. 3-31.spa
dc.relation.referencesBock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In Handbook of modern item response theory (pp. 433-448). New York, NY: Springer New York.spa
dc.relation.referencesBolt, D. M., Hare, R. D., Vitale, J. E., & Newman, J. P. (2004). A Multigroup Item Response Theory Analysis of the Psychopathy Checklist-Revised. Psychological assessment, 16(2), 155.spa
dc.relation.referencesBozak, A., & Aybek, E. C. (2020). Comparison of Artificial Neural Networks and Logistic Regression Analysis in PISA Science Literacy Success Prediction. International Journal of Contemporary Educational Research. https://doi.org/10.33200/ijcer.693081spa
dc.relation.referencesBro, R., & Smilde, A. K. (2014). Principal component analysis. Analytical methods, 6(9), 2812-2831.spa
dc.relation.referencesConverse, G., Curi, M., & Oliveira, S. (2019). Autoencoders for educational assessment. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11626 LNAI. https://doi.org/10.1007/978-3-030-23207-8\_8spa
dc.relation.referencesConverse, G., Curi, M., Oliveira, S., & Templin, J. (2021). Estimation of multidimensional item response theory models with correlated latent variables using variational autoencoders. Machine Learning, 110(6). https://doi.org/10.1007/s10994-021-06005-7spa
dc.relation.referencesCortada de Kohan, N. (2005). Posibilidad de integración de las teorías cognitivas y la psicometría moderna. Interdisciplinaria, 22(1), 29-58.spa
dc.relation.referencesCuri, M., Converse, G. A., Hajewski, J., & Oliveira, S. (2019). Interpretable Variational Autoencoders for Cognitive Models. Proceedings of the International Joint Conference on Neural Networks, 2019-July. https://doi.org/10.1109/IJCNN.2019.8852333spa
dc.relation.referencesDevelopers, T. (2022). TensorFlow. Zenodo.spa
dc.relation.referencesDorans, N. J., & Kingston, N. M. (1985). The effects of violations of unidimensionality on the estimation of item and ability parameters and on item response theory equating of the GRE verbal scale. Journal of Educational Measurement, 22(4), 249-262.spa
dc.relation.referencesDunn, T., Howlett, S. E., Stanojevic, S., Shehzad, A., Stanley, J., & Rockwood, K. (2022). Patterns of Symptom Tracking by Caregivers and Patients with Dementia and Mild Cognitive Impairment: Cross-sectional Study. Journal of Medical Internet Research, 24(1). https://doi.org/10.2196/29219spa
dc.relation.referencesEignor, D. R. (2006). Test Equating, Scaling, and Linking Methods and Practices.spa
dc.relation.referencesEl-Alfy, E. S. M., & Abdel-Aal, R. E. (2008). Construction and analysis of educational tests using abductive machine learning. Computers and Education, 51(1). https://doi.org/10.1016/j.compedu.2007.03.003spa
dc.relation.referencesGarcía-González, J. R., Sánchez-Sánchez, P. A., Orozco, M., & Obredor, S. (2019). Extracción de Conocimiento para la Predicción y Análisis de los Resultados de la Prueba de Calidad de la Educación Superior en Colombia. Formación Universitaria, 12(4). https://doi.org/10.4067/s0718-50062019000400055spa
dc.relation.referencesGoodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. MIT Press. ISBN 978-0262035613spa
dc.relation.referencesHambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational measurement: issues and practice, 12(3), 38-47.spa
dc.relation.referencesHambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Sage.spa
dc.relation.referencesHartig, J., & Höhler, J. (2009). Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, 35(2-3), 57-63.spa
dc.relation.referencesHochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.spa
dc.relation.referencesHolland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.spa
dc.relation.referencesIan Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.spa
dc.relation.referencesICFES (2020). Resolución 268 De 2020. https://normograma.icfes.gov.co/docs/resolucion\_icfes\_0268\_2020.htmspa
dc.relation.referencesIcfes. (2021, Abril). Saber al detalle N 08. Retrieved from Icfes: https://www2.icfes.gov.co/documents/39286/2231027/Edicion+8+-+boletin+saber+al+detalle.pdf/0dbb437b-fded-f05d-5d2e-4426e1663e59?version=1.0&t=1647958807836#.spa
dc.relation.referencesJara Pinzón, D., Riascos Villegas, Á. J., & Romero, M. (2010). Detección de copia en pruebas del Estado.spa
dc.relation.referencesJung, J. Y., Tyack, L., & von Davier, M. (2022). Automated Scoring of Constructed-Response Items Using Artificial Neural Networks in International Large-scale Assessment. Psychological Test and Assessment Modeling, 64(4), 471-494.spa
dc.relation.referencesKhobahi, S.; Soltanalian, M. (2019). "Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding"spa
dc.relation.referencesKim, S. (2006), A Comparative Study of IRT Fixed Parameter Calibration Methods. Journal of Educational Measurement, 43: 355-381. https://doi.org/10.1111/j.1745-3984.2006.00021.xspa
dc.relation.referencesKim, S. (2006), A Comparative Study of IRT Fixed Parameter Calibration Methods. Journal of Educational Measurement, 43: 355-381. https://doi.org/10.1111/j.1745-3984.2006.00021.xspa
dc.relation.referencesKim, S. H., Cohen, A. S., & Kim, H. O. (1994). An investigation of Lord's procedure for the detection of differential item functioning. Applied Psychological Measurement, 18(3), 217-228.spa
dc.relation.referencesKim, S. H., Cohen, A. S., & Kim, H. O. (1994). An investigation of Lord's procedure for the detection of differential item functioning. Applied Psychological Measurement, 18(3), 217-228.spa
dc.relation.referencesKingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. In Foundations and Trends in Machine Learning (Vol. 12, Issue 4). https://doi.org/10.1561/2200000056spa
dc.relation.referencesKingma, Diederik P.; Welling, Max (2014-05-01). "Auto-Encoding Variational Bayes". arXiv:1312.6114spa
dc.relation.referencesKramer, Mark A. (1991). "Nonlinear principal component analysis using autoassociative neural networks" (PDF). AIChE Journal. 37 (2): 233–243. doi:10.1002/aic.690370209.spa
dc.relation.referencesLalor, J. P., Wu, H., & Yu, H. (2017). CIFT: Crowd-Informed Fine-Tuning to Improve Machine Learning Ability. ArXiv: Computation and Language, 6(February).spa
dc.relation.referencesLin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open.spa
dc.relation.referencesLinacre, J. M. (1994). Constructing measurement with a Many-Facet Rasch model. In Objective measurement: Theory into practice: Volume 2.spa
dc.relation.referencesLondregan, J. (2021). Handbook of Item Response Theory, Volume 1. Measurement: Interdisciplinary Research and Perspectives, 19(1). https://doi.org/10.1080/15366367.2020.1771960spa
dc.relation.referencesLord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233-245.spa
dc.relation.referencesMann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.spa
dc.relation.referencesMartínez, R., Hernández, M., & Hernández, M. (2014). Psicometría. Alianza Editorial.spa
dc.relation.referencesMinEducacion (2022) Regresan las Pruebas Saber 3°, 5°, 7° y 9°. https://www.mineducacion.gov.co/portal/salaprensa/Noticias/410085:Regresan-las-Pruebas-Saber-3-5-7-y-9-para-cerca-de-200-mil-estudiantes-de-1-300-sedes-educativas-de-todo-el-paisspa
dc.relation.referencesMitchell, Tom (1997). Machine Learning. New York: McGraw Hill. ISBN 0-07-042807-7. OCLC 36417892spa
dc.relation.referencesMuñiz, J. (2018) Introducción a la Psicometría. Teoría Clásica y TRI.spa
dc.relation.referencesMuñiz, José. (2010). Las Teorías de los Tests: Teoría Clásica y Teoría de Respuesta a los Ítems. Papeles del psicólogo: revista del Colegio Oficial de Psicólogos, ISSN 0214-7823, Vol. 31, Nº. 1, 2010 (Ejemplar dedicado a: Metodología al servicio del psicólogo), pags. 57-66. 31.spa
dc.relation.referencesMutch, C., & Tisak, J. (2005). Measurement error and the correlation between positive and negative affect: Spearman (1904, 1907) revisited. Psychological reports, 96(1), 43-46.spa
dc.relation.referencesNovick, M. R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1). https://doi.org/10.1016/0022-2496(66)90002-2spa
dc.relation.referencesOpenAI (2023). GPT-4 Technical Report. arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774spa
dc.relation.referencesOstini, Remo; Nering, Michael L. (2005). Polytomous Item Response Theory Models. Quantitative Applications in the Social Sciences. Vol. 144. SAGE. ISBN 978-0-7619-3068-6.spa
dc.relation.referencesPhelps, R. P. (2011). Standards for educational & psychological testing. New Orleans, LA: American Psychological Association.spa
dc.relation.referencesPISA 2019, Released Field Trial and Main Survey New Reading Items. https://www.oecd.org/pisa/test/PISA2018_Released_REA_Items_12112019.pdfspa
dc.relation.referencesRadford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-trainingspa
dc.relation.referencesRadford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.spa
dc.relation.referencesRadford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.spa
dc.relation.referencesRasch, G. (1960). ON GENERAL LAWS AND THE MEANING OF MEASUREMENT IN. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability: Held at the Statistical Laboratory, University of California, June 20-July 30, 1960 (Vol. 4, p. 321). Univ of California Press.spa
dc.relation.referencesRios, J. A., & Soland, J. (2021). Parameter estimation accuracy of the Effort-Moderated Item Response Theory Model under multiple assumption violations. Educational and Psychological Measurement, 81(3), 569-594.spa
dc.relation.referencesSahin, A., & Anil, D. (2017). The effects of test length and sample size on item parameters in item response theory.spa
dc.relation.referencesSamajima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18(3), 229-244.spa
dc.relation.referencesStevens, R. (2006). Machine learning assessment systems for modeling patterns of student learning. In Games and Simulations in Online Learning: Research and Development Frameworks. https://doi.org/10.4018/978-1-59904-304-3.ch017spa
dc.relation.referencesStone, JV (2013), "Bayes' Rule: A Tutorial Introduction to Bayesian Analysis"spa
dc.relation.referencesSun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., & Jiang, P. (2019, November). BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management (pp. 1441-1450).spa
dc.relation.referencesThissen, D. & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & Wainer, H. (Eds.), Test Scoring (pp. 73-140). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.spa
dc.relation.referencesThurstone, L. L., & Chave, E. (1929). J. The measurement of attitudes. Chicago, III.: University of Chicago Press.spa
dc.relation.referencesTran, Viet Hung (2018). "Copula Variational Bayes inference via information geometry". arXiv:1803.10998spa
dc.relation.referencesVakadkar, K., Purkayastha, D., & Krishnan, D. (2021). Detection of Autism Spectrum Disorder in Children Using Machine Learning Techniques. SN Computer Science, 2(5), 1-9.spa
dc.relation.referencesVan Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.spa
dc.relation.referencesVirla, M. Q. (2010). Confiabilidad y coeficiente Alpha de Cronbach. Telos, 12(2), 248-252.spa
dc.relation.referencesWarm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.spa
dc.relation.referencesWolins, L., Wright, B. D., & Rasch, G. (1982). Probabilistic Models for some Intelligence and Attainment Tests. Journal of the American Statistical Association, 77(377). https://doi.org/10.2307/2287805spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::005 - Programación, programas, datos de computaciónspa
dc.subject.ddc370 - Educación::373 - Educación secundariaspa
dc.subject.proposalCalificación de exámenes estandarizadosspa
dc.subject.proposalTeoría de Respuesta al Ítemspa
dc.subject.proposalRedes Neuronales Artificialesspa
dc.subject.proposalAutoEncoderseng
dc.subject.proposalPsicometríaspa
dc.subject.proposalEquiparación de puntajesspa
dc.subject.proposalModelo logístico de 2 parámetrosspa
dc.subject.proposalStandardized exam scoringeng
dc.subject.proposalItem Response Theoryeng
dc.subject.proposalArtificial Neural Networkseng
dc.subject.proposalPsychometricseng
dc.subject.proposal2-parameter logistic modeleng
dc.subject.proposalScore equatingeng
dc.subject.unescoEvaluación del estudiantespa
dc.subject.unescoStudent evaluationeng
dc.subject.unescoPsicometríaspa
dc.subject.unescoPsychometricseng
dc.subject.unescoInformática educativaspa
dc.subject.unescoComputer uses in educationeng
dc.titleAsignación de puntajes en exámenes estandarizados mediante el uso de redes neuronales y técnicas de equiparación psicométricas compatibles: Caso examen Saber 11 en Colombiaspa
dc.title.translatedAssignment of standardized test scores using neural networks and compatible psychometric equating techniques: The case of the Saber 11 exam in Colombia.eng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentEstudiantesspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
dcterms.audience.professionaldevelopmentMaestrosspa
dcterms.audience.professionaldevelopmentPúblico generalspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1032412151.2024.pdf
Tamaño:
3.53 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ingeniería - Ingeniería de Sistemas y Computación

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: