Mostrar el registro sencillo del documento

dc.rights.licenseAtribución-NoComercial 4.0 Internacional
dc.contributor.advisorNiño Vásquez, Luis Fernando
dc.contributor.authorDuplat Durán, Ricardo René
dc.date.accessioned2024-04-02T00:15:56Z
dc.date.available2024-04-02T00:15:56Z
dc.date.issued2024-01-28
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/85835
dc.descriptionilustraciones, diagramas
dc.description.abstractLos exámenes estandarizados son valiosas herramientas para evaluar de manera objetiva tanto las características cognitivas como no cognitivas de una población específica. Para construir escalas de medición que reflejen con precisión los constructos que estos exámenes buscan evaluar, se recurre comúnmente a la Teoría de Respuesta al Ítem (TRI), una técnica estadística. Sin embargo, la TRI presenta limitaciones cuando sus supuestos no se cumplen, comprometiendo la comparabilidad a lo largo del tiempo y entre subpoblaciones. Este trabajo de grado se propone desarrollar una metodología innovadora que utiliza Redes Neuronales Artificiales (RNA), específicamente a través de AutoEncoders (AE), para preservar las ventajas de la TRI y aplicarla incluso cuando sus supuestos no se cumplen, buscando incluso mejorar la calidad de ajuste y pronóstico. La investigación se basa en el análisis del examen Saber 11 aplicado en los años 2018 y 2019, durante los calendarios A y B en el país. Se obtuvieron resultados que en algunos casos superan el rendimiento de un modelo clásico de la TRI, como el modelo logístico de 2 parámetros (2PL). Esta metodología propuesta no solo busca subsanar las limitaciones de la TRI en ciertos contextos, sino que también busca optimizar la precisión en la asignación de puntajes en exámenes estandarizados mediante técnicas de equiparación compatibles con la psicometría. La aplicación de RNA, en particular a través de AE, emerge como una prometedora alternativa que contribuye al avance de la evaluación estandarizada, ofreciendo mayor flexibilidad y robustez en la medición de constructos educativos. (Texto tomado de la fuente).
dc.description.abstractStandardized exams are valuable tools for objectively assessing both cognitive and non-cognitive characteristics of a specific population. To construct measurement scales that accurately reflect the constructs these exams aim to evaluate, the Item Response Theory (IRT), a statistical technique, is commonly employed. However, IRT has limitations when its assumptions are not met, compromising comparability over time and among subpopulations. This thesis aims to develop an innovative methodology using Artificial Neural Networks (ANNs), specifically through AutoEncoders (AE), to preserve the advantages of IRT and apply it even when its assumptions are not met, seeking to enhance the quality of fit and forecasting. The research is based on the analysis of the Saber 11 exam administered in 2018 and 2019, during schedules A and B in the country. Results were obtained that, in some cases, outperform the performance of a classical IRT model, such as the 2-parameter logistic model (2PL). This proposed methodology not only aims to address the limitations of IRT in certain contexts but also seeks to optimize accuracy in score assignment in standardized exams through equating techniques compatible with psychometrics. The application of ANN, particularly through AE, emerges as a promising alternative contributing to the advancement of standardized assessment, offering greater flexibility and robustness in measuring educational constructs.
dc.format.extentx, 81 páginas
dc.format.mimetypeapplication/pdf
dc.language.isospa
dc.publisherUniversidad Nacional de Colombia
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::005 - Programación, programas, datos de computación
dc.subject.ddc370 - Educación::373 - Educación secundaria
dc.titleAsignación de puntajes en exámenes estandarizados mediante el uso de redes neuronales y técnicas de equiparación psicométricas compatibles: Caso examen Saber 11 en Colombia
dc.typeTrabajo de grado - Maestría
dc.type.driverinfo:eu-repo/semantics/masterThesis
dc.type.versioninfo:eu-repo/semantics/acceptedVersion
dc.publisher.programBogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computación
dc.contributor.researchgrouplaboratorio de Investigación en Sistemas Inteligentes Lisi
dc.coverage.countryColombia
dc.coverage.tgnhttp://vocab.getty.edu/page/tgn/1000050
dc.description.degreelevelMaestría
dc.description.degreenameMagíster en Ingeniería - Ingeniería de Sistemas y Computación
dc.description.researchareaSistemas inteligentes
dc.identifier.instnameUniversidad Nacional de Colombia
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombia
dc.identifier.repourlhttps://repositorio.unal.edu.co/
dc.publisher.facultyFacultad de Ingeniería
dc.publisher.placeBogotá, Colombia
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotá
dc.relation.referencesAmerican Educational Research Association -AERA, American Psychological Association - APA, & National Council on Measurement in Education –NCME (2018). Estándares para pruebas educativas y psicológicas. American Educational Research Association.
dc.relation.referencesAmin, A. (2020), ``A Face Recognition System Based on Deep Learning (FRDLS) to Support the Entry and Supervision Procedures on Electronic Exams``. International Journal of Intelligent Computing and Information Sciences, 20(1). https://doi.org/10.21608/ijicis.2020.23149.1015
dc.relation.referencesBasheer, Imad & Hajmeer, M.N.. (2001). Artificial Neural Networks: Fundamentals, Computing, Design, and Application. Journal of microbiological methods. 43. 3-31.
dc.relation.referencesBock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In Handbook of modern item response theory (pp. 433-448). New York, NY: Springer New York.
dc.relation.referencesBolt, D. M., Hare, R. D., Vitale, J. E., & Newman, J. P. (2004). A Multigroup Item Response Theory Analysis of the Psychopathy Checklist-Revised. Psychological assessment, 16(2), 155.
dc.relation.referencesBozak, A., & Aybek, E. C. (2020). Comparison of Artificial Neural Networks and Logistic Regression Analysis in PISA Science Literacy Success Prediction. International Journal of Contemporary Educational Research. https://doi.org/10.33200/ijcer.693081
dc.relation.referencesBro, R., & Smilde, A. K. (2014). Principal component analysis. Analytical methods, 6(9), 2812-2831.
dc.relation.referencesConverse, G., Curi, M., & Oliveira, S. (2019). Autoencoders for educational assessment. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11626 LNAI. https://doi.org/10.1007/978-3-030-23207-8\_8
dc.relation.referencesConverse, G., Curi, M., Oliveira, S., & Templin, J. (2021). Estimation of multidimensional item response theory models with correlated latent variables using variational autoencoders. Machine Learning, 110(6). https://doi.org/10.1007/s10994-021-06005-7
dc.relation.referencesCortada de Kohan, N. (2005). Posibilidad de integración de las teorías cognitivas y la psicometría moderna. Interdisciplinaria, 22(1), 29-58.
dc.relation.referencesCuri, M., Converse, G. A., Hajewski, J., & Oliveira, S. (2019). Interpretable Variational Autoencoders for Cognitive Models. Proceedings of the International Joint Conference on Neural Networks, 2019-July. https://doi.org/10.1109/IJCNN.2019.8852333
dc.relation.referencesDevelopers, T. (2022). TensorFlow. Zenodo.
dc.relation.referencesDorans, N. J., & Kingston, N. M. (1985). The effects of violations of unidimensionality on the estimation of item and ability parameters and on item response theory equating of the GRE verbal scale. Journal of Educational Measurement, 22(4), 249-262.
dc.relation.referencesDunn, T., Howlett, S. E., Stanojevic, S., Shehzad, A., Stanley, J., & Rockwood, K. (2022). Patterns of Symptom Tracking by Caregivers and Patients with Dementia and Mild Cognitive Impairment: Cross-sectional Study. Journal of Medical Internet Research, 24(1). https://doi.org/10.2196/29219
dc.relation.referencesEignor, D. R. (2006). Test Equating, Scaling, and Linking Methods and Practices.
dc.relation.referencesEl-Alfy, E. S. M., & Abdel-Aal, R. E. (2008). Construction and analysis of educational tests using abductive machine learning. Computers and Education, 51(1). https://doi.org/10.1016/j.compedu.2007.03.003
dc.relation.referencesGarcía-González, J. R., Sánchez-Sánchez, P. A., Orozco, M., & Obredor, S. (2019). Extracción de Conocimiento para la Predicción y Análisis de los Resultados de la Prueba de Calidad de la Educación Superior en Colombia. Formación Universitaria, 12(4). https://doi.org/10.4067/s0718-50062019000400055
dc.relation.referencesGoodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. MIT Press. ISBN 978-0262035613
dc.relation.referencesHambleton, R. K., & Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational measurement: issues and practice, 12(3), 38-47.
dc.relation.referencesHambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory (Vol. 2). Sage.
dc.relation.referencesHartig, J., & Höhler, J. (2009). Multidimensional IRT models for the assessment of competencies. Studies in Educational Evaluation, 35(2-3), 57-63.
dc.relation.referencesHochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
dc.relation.referencesHolland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.
dc.relation.referencesIan Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
dc.relation.referencesICFES (2020). Resolución 268 De 2020. https://normograma.icfes.gov.co/docs/resolucion\_icfes\_0268\_2020.htm
dc.relation.referencesIcfes. (2021, Abril). Saber al detalle N 08. Retrieved from Icfes: https://www2.icfes.gov.co/documents/39286/2231027/Edicion+8+-+boletin+saber+al+detalle.pdf/0dbb437b-fded-f05d-5d2e-4426e1663e59?version=1.0&t=1647958807836#.
dc.relation.referencesJara Pinzón, D., Riascos Villegas, Á. J., & Romero, M. (2010). Detección de copia en pruebas del Estado.
dc.relation.referencesJung, J. Y., Tyack, L., & von Davier, M. (2022). Automated Scoring of Constructed-Response Items Using Artificial Neural Networks in International Large-scale Assessment. Psychological Test and Assessment Modeling, 64(4), 471-494.
dc.relation.referencesKhobahi, S.; Soltanalian, M. (2019). "Model-Aware Deep Architectures for One-Bit Compressive Variational Autoencoding"
dc.relation.referencesKim, S. (2006), A Comparative Study of IRT Fixed Parameter Calibration Methods. Journal of Educational Measurement, 43: 355-381. https://doi.org/10.1111/j.1745-3984.2006.00021.x
dc.relation.referencesKim, S. (2006), A Comparative Study of IRT Fixed Parameter Calibration Methods. Journal of Educational Measurement, 43: 355-381. https://doi.org/10.1111/j.1745-3984.2006.00021.x
dc.relation.referencesKim, S. H., Cohen, A. S., & Kim, H. O. (1994). An investigation of Lord's procedure for the detection of differential item functioning. Applied Psychological Measurement, 18(3), 217-228.
dc.relation.referencesKim, S. H., Cohen, A. S., & Kim, H. O. (1994). An investigation of Lord's procedure for the detection of differential item functioning. Applied Psychological Measurement, 18(3), 217-228.
dc.relation.referencesKingma, D. P., & Welling, M. (2019). An introduction to variational autoencoders. In Foundations and Trends in Machine Learning (Vol. 12, Issue 4). https://doi.org/10.1561/2200000056
dc.relation.referencesKingma, Diederik P.; Welling, Max (2014-05-01). "Auto-Encoding Variational Bayes". arXiv:1312.6114
dc.relation.referencesKramer, Mark A. (1991). "Nonlinear principal component analysis using autoassociative neural networks" (PDF). AIChE Journal. 37 (2): 233–243. doi:10.1002/aic.690370209.
dc.relation.referencesLalor, J. P., Wu, H., & Yu, H. (2017). CIFT: Crowd-Informed Fine-Tuning to Improve Machine Learning Ability. ArXiv: Computation and Language, 6(February).
dc.relation.referencesLin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI Open.
dc.relation.referencesLinacre, J. M. (1994). Constructing measurement with a Many-Facet Rasch model. In Objective measurement: Theory into practice: Volume 2.
dc.relation.referencesLondregan, J. (2021). Handbook of Item Response Theory, Volume 1. Measurement: Interdisciplinary Research and Perspectives, 19(1). https://doi.org/10.1080/15366367.2020.1771960
dc.relation.referencesLord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233-245.
dc.relation.referencesMann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.
dc.relation.referencesMartínez, R., Hernández, M., & Hernández, M. (2014). Psicometría. Alianza Editorial.
dc.relation.referencesMinEducacion (2022) Regresan las Pruebas Saber 3°, 5°, 7° y 9°. https://www.mineducacion.gov.co/portal/salaprensa/Noticias/410085:Regresan-las-Pruebas-Saber-3-5-7-y-9-para-cerca-de-200-mil-estudiantes-de-1-300-sedes-educativas-de-todo-el-pais
dc.relation.referencesMitchell, Tom (1997). Machine Learning. New York: McGraw Hill. ISBN 0-07-042807-7. OCLC 36417892
dc.relation.referencesMuñiz, J. (2018) Introducción a la Psicometría. Teoría Clásica y TRI.
dc.relation.referencesMuñiz, José. (2010). Las Teorías de los Tests: Teoría Clásica y Teoría de Respuesta a los Ítems. Papeles del psicólogo: revista del Colegio Oficial de Psicólogos, ISSN 0214-7823, Vol. 31, Nº. 1, 2010 (Ejemplar dedicado a: Metodología al servicio del psicólogo), pags. 57-66. 31.
dc.relation.referencesMutch, C., & Tisak, J. (2005). Measurement error and the correlation between positive and negative affect: Spearman (1904, 1907) revisited. Psychological reports, 96(1), 43-46.
dc.relation.referencesNovick, M. R. (1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1). https://doi.org/10.1016/0022-2496(66)90002-2
dc.relation.referencesOpenAI (2023). GPT-4 Technical Report. arXiv:2303.08774. https://doi.org/10.48550/arXiv.2303.08774
dc.relation.referencesOstini, Remo; Nering, Michael L. (2005). Polytomous Item Response Theory Models. Quantitative Applications in the Social Sciences. Vol. 144. SAGE. ISBN 978-0-7619-3068-6.
dc.relation.referencesPhelps, R. P. (2011). Standards for educational & psychological testing. New Orleans, LA: American Psychological Association.
dc.relation.referencesPISA 2019, Released Field Trial and Main Survey New Reading Items. https://www.oecd.org/pisa/test/PISA2018_Released_REA_Items_12112019.pdf
dc.relation.referencesRadford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training
dc.relation.referencesRadford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
dc.relation.referencesRadford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
dc.relation.referencesRasch, G. (1960). ON GENERAL LAWS AND THE MEANING OF MEASUREMENT IN. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability: Held at the Statistical Laboratory, University of California, June 20-July 30, 1960 (Vol. 4, p. 321). Univ of California Press.
dc.relation.referencesRios, J. A., & Soland, J. (2021). Parameter estimation accuracy of the Effort-Moderated Item Response Theory Model under multiple assumption violations. Educational and Psychological Measurement, 81(3), 569-594.
dc.relation.referencesSahin, A., & Anil, D. (2017). The effects of test length and sample size on item parameters in item response theory.
dc.relation.referencesSamajima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18(3), 229-244.
dc.relation.referencesStevens, R. (2006). Machine learning assessment systems for modeling patterns of student learning. In Games and Simulations in Online Learning: Research and Development Frameworks. https://doi.org/10.4018/978-1-59904-304-3.ch017
dc.relation.referencesStone, JV (2013), "Bayes' Rule: A Tutorial Introduction to Bayesian Analysis"
dc.relation.referencesSun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., & Jiang, P. (2019, November). BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management (pp. 1441-1450).
dc.relation.referencesThissen, D. & Orlando, M. (2001). Item response theory for items scored in two categories. In D. Thissen & Wainer, H. (Eds.), Test Scoring (pp. 73-140). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
dc.relation.referencesThurstone, L. L., & Chave, E. (1929). J. The measurement of attitudes. Chicago, III.: University of Chicago Press.
dc.relation.referencesTran, Viet Hung (2018). "Copula Variational Bayes inference via information geometry". arXiv:1803.10998
dc.relation.referencesVakadkar, K., Purkayastha, D., & Krishnan, D. (2021). Detection of Autism Spectrum Disorder in Children Using Machine Learning Techniques. SN Computer Science, 2(5), 1-9.
dc.relation.referencesVan Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. Scotts Valley, CA: CreateSpace.
dc.relation.referencesVirla, M. Q. (2010). Confiabilidad y coeficiente Alpha de Cronbach. Telos, 12(2), 248-252.
dc.relation.referencesWarm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450.
dc.relation.referencesWolins, L., Wright, B. D., & Rasch, G. (1982). Probabilistic Models for some Intelligence and Attainment Tests. Journal of the American Statistical Association, 77(377). https://doi.org/10.2307/2287805
dc.rights.accessrightsinfo:eu-repo/semantics/openAccess
dc.subject.proposalCalificación de exámenes estandarizados
dc.subject.proposalTeoría de Respuesta al Ítem
dc.subject.proposalRedes Neuronales Artificiales
dc.subject.proposalAutoEncoders
dc.subject.proposalPsicometría
dc.subject.proposalEquiparación de puntajes
dc.subject.proposalModelo logístico de 2 parámetros
dc.subject.proposalStandardized exam scoring
dc.subject.proposalItem Response Theory
dc.subject.proposalArtificial Neural Networks
dc.subject.proposalPsychometrics
dc.subject.proposal2-parameter logistic model
dc.subject.proposalScore equating
dc.subject.unescoEvaluación del estudiante
dc.subject.unescoStudent evaluation
dc.subject.unescoPsicometría
dc.subject.unescoPsychometrics
dc.subject.unescoInformática educativa
dc.subject.unescoComputer uses in education
dc.title.translatedAssignment of standardized test scores using neural networks and compatible psychometric equating techniques: The case of the Saber 11 exam in Colombia.
dc.type.coarhttp://purl.org/coar/resource_type/c_bdcc
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.contentText
dc.type.redcolhttp://purl.org/redcol/resource_type/TM
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2
dcterms.audience.professionaldevelopmentEstudiantes
dcterms.audience.professionaldevelopmentInvestigadores
dcterms.audience.professionaldevelopmentMaestros
dcterms.audience.professionaldevelopmentPúblico general


Archivos en el documento

Thumbnail

Este documento aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del documento

Atribución-NoComercial 4.0 InternacionalEsta obra está bajo licencia internacional Creative Commons Reconocimiento-NoComercial 4.0.Este documento ha sido depositado por parte de el(los) autor(es) bajo la siguiente constancia de depósito