Uso de información auxiliar en la estimación inicial de la habilidad de una prueba adaptativa computarizada
| dc.contributor.advisor | Torres Jiménez, Camilo José | |
| dc.contributor.author | Rodriguez Rivera, Nelson Andrés | |
| dc.date.accessioned | 2025-09-16T16:03:16Z | |
| dc.date.available | 2025-09-16T16:03:16Z | |
| dc.date.issued | 2025-09 | |
| dc.description | Ilustraciones, gráficos | spa |
| dc.description.abstract | Este trabajo se desarrolla en un escenario hipotético de implementación de pruebas adaptativas computarizadas (CAT, por sus siglas en inglés) en el contexto colombiano. Aunque el Instituto Colombiano para la Evaluación de la Educación (Icfes) no aplica actualmente este tipo de pruebas, ha realizado algunos pilotajes, lo cual motiva el análisis de sus posibles efectos y condiciones de aplicación. El objetivo del estudio es evaluar el uso de información auxiliar en la estimación inicial de la habilidad del evaluado, con el fin de seleccionar de manera más adecuada el primer ítem del examen. Para ello, se emplean modelos predictivos —específicamente regresión lineal, bosques aleatorios y redes neuronales artificiales— que permiten obtener una estimación inicial a partir de información contextual recopilada antes de la aplicación de la prueba. Posteriormente, se realizan simulaciones que comparan el desempeño del algoritmo adaptativo cuando se utiliza una estimación inicial basada en información auxiliar en comparación con escenarios en los que esta información no está disponible o no se emplea. Dichas simulaciones consideran factores clave como el tamaño del banco de ítems, el método de selección del primer ítem y los criterios de parada. Los resultados indican que contar con una estimación inicial cercana al valor real de la habilidad mejora la precisión final y reduce el número de ítems administrados. A partir de estos hallazgos, se presentan recomendaciones prácticas sobre las condiciones en las que el uso de información auxiliar podría incrementar de forma significativa la eficiencia y la precisión de futuras aplicaciones de pruebas adaptativas en el país. (Tomado de la fuente) | spa |
| dc.description.abstract | This study is developed within a hypothetical scenario of implementing computerized adaptive testing (CAT) in the Colombian context. Although the Colombian Institute for the Evaluation of Education (Icfes) does not currently administer this type of test, it has conducted several pilot studies, motivating the analysis of their potential effects and conditions for application. The objective of the study is to evaluate the use of auxiliary information in the initial estimation of examinee ability, with the aim of more appropriately selecting the first test item. To this end, predictive models—specifically linear regression, random forests, and artificial neural networks—are employed to obtain an initial estimate based on contextual information available prior to test administration. Subsequently, simulations are carried out to compare the performance of the adaptive algorithm when using an initial estimate based on auxiliary information against scenarios in which such information is unavailable or not used. These simulations consider key factors such as item bank size, first-item selection method, and stopping criteria. The results indicate that having an initial estimate close to the true ability value improves final precision and reduces the number of administered items. Based on these findings, practical recommendations are proposed regarding the conditions under which the use of auxiliary information could significantly enhance the efficiency and accuracy of future adaptive test implementations in the country. | eng |
| dc.description.curriculararea | Estadística.Sede Bogotá | |
| dc.description.degreelevel | Maestría | |
| dc.description.degreename | Magíster en Ciencias - Estadística | |
| dc.description.technicalinfo | Se usó el software de procesamiento y análisis estadístico R en su versión 4.3.1 (2023-06-16 ucrt) | spa |
| dc.format.extent | 99 páginas | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.instname | Universidad Nacional de Colombia | spa |
| dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
| dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
| dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/88804 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad Nacional de Colombia | |
| dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá | |
| dc.publisher.faculty | Facultad de Ciencias | |
| dc.publisher.place | Bogotá, Colombia | |
| dc.publisher.program | Bogotá - Ciencias - Maestría en Ciencias - Estadística | |
| dc.relation.indexed | LaReferencia | |
| dc.relation.references | AERA, A. and NCME (2014). Standards for Educational and Psychological Testing. American Educational Research Association, Washington, DC. | |
| dc.relation.references | Allen, M. J. (2003). Assessing academic programs in higher education, volume 42. John Wiley & Sons. | |
| dc.relation.references | Bartram, D. (2017). Computer-based testing and the internet. The Blackwell handbook of personnel selection, pages 397–418. | |
| dc.relation.references | Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. Statistical theories of mental test scores. | |
| dc.relation.references | Bishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 4. Springer. | |
| dc.relation.references | Bloom, B. S. et al. (1971). Handbook on formative and summative evaluation of student learning. ERIC. | |
| dc.relation.references | Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2):123–140. | |
| dc.relation.references | Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32. | |
| dc.relation.references | Casella, G. and Berger, R. L. (2002). Statistical Inference. Duxbury, Pacific Grove, CA, 2nd edition. | |
| dc.relation.references | Chang, H.-H. (2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80:1–20. | |
| dc.relation.references | Chang, H.-H. and Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20(3):213–229. | |
| dc.relation.references | Chen, S.-Y., Ankenmann, R. D., and Chang, H.-H. (2000). A comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24(3):241–255. | |
| dc.relation.references | Cronbach, L. J. (1963). Course improvement through evaluation. Teachers college record, 64(8):1–13. | |
| dc.relation.references | de Andrade, D. F., Tavares, H. R., and da Cunha Valle, R. (2000). Teoria da Resposta ao Item: conceitos e aplicações. ABE, Sao Paulo. | |
| dc.relation.references | De Ayala, R. J. (2013). The theory and practice of item response theory. Guilford Publications. | |
| dc.relation.references | Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1–22. | |
| dc.relation.references | Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7:1–26. | |
| dc.relation.references | Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2):179–211. | |
| dc.relation.references | Embretson, S. E. and Reise, S. P. (2013). Item response theory for psychologists. Psychology Press. | |
| dc.relation.references | Faraway, J. J. (2016). Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Chapman and Hall/CRC. | |
| dc.relation.references | Gholamy, A., Kreinovich, V., and Kosheleva, O. (2018). Why 70/30 or 80/20 relation between training and testing sets: A pedagogical explanation. Technical report, The University of Texas at El Paso. | |
| dc.relation.references | Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http: //www.deeplearningbook.org. | |
| dc.relation.references | Gulliksen, H. (1950). Theory of mental tests. | |
| dc.relation.references | Hambleton, R. K. and Jones, R. W. (1993). Comparison of classical test theory and item response theory and their applications to test development. Educational measurement: issues and practice, 12(3):38–47 | |
| dc.relation.references | Hambleton, R. K., Swaminathan, H., and Rogers, H. J. (1991). Fundamentals of item response theory, volume 2. Sage. | |
| dc.relation.references | Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2nd edition. | |
| dc.relation.references | Haykin, S. (2009). Neural networks and learning machines, 3/E. Pearson Education India. | |
| dc.relation.references | He, W. and Reckase, M. D. (2014). Item pool design for an operational variable-length computerized adaptive test. Educational and Psychological Measurement, 74(3):473–494. | |
| dc.relation.references | Ho, T. K. (1995). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE. | |
| dc.relation.references | Huff, K. L. and Sireci, S. G. (2001). Validity issues in computer-based testing. Educational measurement: Issues and practice, 20(3):16–25. | |
| dc.relation.references | Icfes (2019a). Boletín saber al detalle (edición 4) - ¿cómo se construye el Índice de nivel socioeconómico (inse) en el contexto de las pruebas saber? https://www.icfes.gov.co/ wp-content/uploads/2024/11/Edicion-4-boletin-saber-al-detalle-.pdf. Consultado el 12 de diciembre de 2024. | |
| dc.relation.references | Icfes (2019b). Boletín saber al detalle (edición 6) - ¿en qué consiste la aplicación de pre saber 11° en versión adaptativa (cat)? https://www.icfes.gov.co/wp-content/uploads/ 2025/02/6-Edicion-boletin-saber-al-detalle.pdf. Consultado el 12 de diciembre de 2024. | |
| dc.relation.references | Icfes (2024). Gu´ıa de orientaci´on examen saber 11°. | |
| dc.relation.references | Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr. | |
| dc.relation.references | Kane, M. (2006). Validation in rl breenan. Educational measurement 4th Ed. | |
| dc.relation.references | LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436–444. | |
| dc.relation.references | LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324. | |
| dc.relation.references | Lehmann, E. L. and Casella, G. (2006). Theory of point estimation. Springer Science & Business Media. | |
| dc.relation.references | Lord, F. (1952). A theory of test scores. Psychometric monographs. | |
| dc.relation.references | Lord, F., Novick, M., and Birnbaum, A. (1968). Statistical theories of mental test scores. | |
| dc.relation.references | Lord, F. M. (1986). Maximum likelihood and bayesian parameter estimation in item response theory. Journal of Educational Measurement, pages 157–162. | |
| dc.relation.references | Lord, F. M. (2012). Applications of item response theory to practical testing problems. Routledge. | |
| dc.relation.references | Magis, D., Yan, D., and Von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer. | |
| dc.relation.references | Martín, E. S., Del Pino, G., and De Boeck, P. (2006). Irt models for ability-based guessing. Applied Psychological Measurement, 30(3):183–203. | |
| dc.relation.references | Mendenhall, W. (2003). A Second Course in Statistics: Regression Analysis. Prentice Hall. | |
| dc.relation.references | Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2):177–195. | |
| dc.relation.references | Montgomery, D. C., Peck, E. A., and Vining, G. G. (2021). Introduction to linear regression analysis. John Wiley & Sons. | |
| dc.relation.references | Park, J. Y., de Jong, T., Koning, B. B., and van der Meijden, H. A. T. (2018). An explanatory item response theory method for alleviating the cold-start problem in adaptive learning environments. Behavior Research Methods, 51(2):895–909. | |
| dc.relation.references | Pliakos, K., Papamitsiou, Z., and Economides, A. A. (2019). Integrating machine learning into item response theory for addressing the cold-start problem in adaptive learning systems. Computers & Education, 137:91–106. | |
| dc.relation.references | Popham, W. J. (2003). What every teacher should know about educational assessment. Pearson Education. | |
| dc.relation.references | R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. | |
| dc.relation.references | Rasch, G. (1960). Studies in mathematical psychology: I. Probabilistic models for some intelligence and attainment tests. Nielsen & Lydiche. | |
| dc.relation.references | Reckase, M. (2003). Item pool design for computerized adaptive tests. In annual meeting of the National Council on Measurement in Education, Chicago, IL. | |
| dc.relation.references | Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61:85–117. | |
| dc.relation.references | Shepard, L. A. (2000). The role of assessment in a learning culture. Educational researcher, 29(7):4–14. | |
| dc.relation.references | Shmueli, G. (2010). To explain or to predict? Statistical science, pages 289–310. | |
| dc.relation.references | Spearman, C. (1904). ”general intelligence,.objectively determined and measured. The American Journal of Psychology, 15(2):201–292. | |
| dc.relation.references | Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958. | |
| dc.relation.references | Suchman, E. (1968). Evaluative Research: Principles and Practice in Public Service and Social Action Progr. Russell Sage Foundation. | |
| dc.relation.references | Thissen, D. and Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4):397–412. | |
| dc.relation.references | Van Der Linden, W. J. (1999). Empirical initialization of the trait estimator in adaptive testing. Applied Psychological Measurement, 23(1):21–29. | |
| dc.relation.references | Van der Linden, W. J. and Glas, C. A. (2010). Elements of adaptive testing, volume 10. Springer. | |
| dc.relation.references | Van der Linden, W. J., Glas, C. A., et al. (2000). Computerized adaptive testing: Theory and practice, volume 13. Springer. | |
| dc.relation.references | Van der Linden, W. J. and Hambleton, R. K. (2015). Handbook of item response theory. CRC press. | |
| dc.relation.references | von Davier, M. (2009). Is there need for the 3pl model? guess what? Measurement: Interdisciplinary Research and Perspectives. | |
| dc.relation.references | Wainer, H., Dorans, N. J., Flaugher, R., Green, B. F., and Mislevy, R. J. (2000). Computerized adaptive testing: A primer. Routledge. | |
| dc.relation.references | Wang, T. and Kolen, M. J. (2001). Evaluating comparability in computerized adaptive testing: Issues, criteria and an example. Journal of Educational Measurement, 38(1):19– 49. | |
| dc.relation.references | Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied psychological measurement, 6(4):473–492. | |
| dc.relation.references | Yao, L., Pommerich, M., and Segall, D. O. (2014). Using multidimensional cat to administer a short, yet precise, screening test. Applied Psychological Measurement, 38(8):614–631. | |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
| dc.rights.license | Atribución-NoComercial-CompartirIgual 4.0 Internacional | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
| dc.subject.ddc | 510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas | |
| dc.subject.lemb | Estimación de parámetros | |
| dc.subject.lemb | Estadística matemática | |
| dc.subject.lemb | Análisis de regresión | |
| dc.subject.lemb | Redes neurales (Computadores) | |
| dc.subject.lemb | Mediciones y pruebas educativas | |
| dc.subject.proposal | Teoría de respuesta al ítem | spa |
| dc.subject.proposal | Pruebas adaptativas computarizadas | spa |
| dc.subject.proposal | Psicometría | spa |
| dc.subject.proposal | Modelado estadístico | spa |
| dc.subject.proposal | Item response theory | eng |
| dc.subject.proposal | Computer adaptive testing | eng |
| dc.subject.proposal | Psychometrics | eng |
| dc.subject.proposal | Statistical modeling | eng |
| dc.title | Uso de información auxiliar en la estimación inicial de la habilidad de una prueba adaptativa computarizada | spa |
| dc.title.translated | Use of auxiliary information in the initial ability estimation of a computerized adaptive test | eng |
| dc.type | Trabajo de grado - Maestría | |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | |
| dc.type.content | Text | |
| dc.type.driver | info:eu-repo/semantics/masterThesis | |
| dc.type.redcol | http://purl.org/redcol/resource_type/TM | |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | |
| dcterms.audience.professionaldevelopment | Investigadores | |
| dcterms.audience.professionaldevelopment | Estudiantes | |
| dcterms.audience.professionaldevelopment | Maestros | |
| oaire.accessrights | http://purl.org/coar/access_right/c_abf2 |

