Application of Machine Learning Models to credit scoring
| dc.contributor.advisor | Montenegro Diaz, Álvaro Mauricio | |
| dc.contributor.author | Castilla Reyes, Astrid Natalia | |
| dc.contributor.cvlac | Castilla Reyes, Astrid Natalia [0000040134] | |
| dc.contributor.googlescholar | Castilla Reyes, Astrid Natalia [fjh2ib4AAAAJ&hl] | |
| dc.date.accessioned | 2026-02-10T15:11:54Z | |
| dc.date.available | 2026-02-10T15:11:54Z | |
| dc.date.issued | 2025 | |
| dc.description | ilustraciones a color, diagramas | spa |
| dc.description.abstract | El presente estudio diseña y valida un marco de trabajo de extremo a extremo para el desarrollo de modelos de calificación crediticia, superando la dicotomía tradicional entre el rendimiento de los modelos de aprendizaje automático y la necesidad de interpretabilidad regulatoria. La metodología se distingue por sus innovaciones, incluyendo un exhaustivo marco de ingeniería y selección de características que emplea múltiples métodos. Este marco se complementa con una validación económica para cuantificar el impacto financiero y un sistema de interpretabilidad híbrido (SHAP, LIME, WoE) para explicar las predicciones de modelos complejos. Se aplicó este proceso para desarrollar y comparar cuatro modelos: una Regresión Logística (RL) de base, dos variantes de RL para mitigar el desbalance de clases y un modelo XGBoost optimizado. Los resultados revelaron que el modelo XGBoost alcanzó un rendimiento superior, con un AUC de 0.7012 y una capacidad de detección de incumplimientos (recall) del 70.5%. El análisis económico cuantificó el valor de esta precisión en ahorros potenciales de $4.2 millones de dólares. Este trabajo no solo presenta un modelo predictivo superior, sino que ofrece un paradigma replicable para que las instituciones financieras adopten soluciones de aprendizaje automático de manera responsable, garantizando que sean robustas, económicamente viables y transparentes. (Texto tomado de la fuente) | spa |
| dc.description.abstract | This study designs and validates an end-to-end framework for developing credit scoring models, overcoming the traditional dichotomy between machine learning performance and the need for regulatory interpretability. The methodology is distinguished by its innovations, including an exhaustive feature engineering and selection framework that employs multiple methods. This framework is complemented by a risk-based economic validation to quantify financial impact and a hybrid interpretability system (SHAP, LIME, WoE) to explain complex model predictions. This process was applied to develop and compare four models: a baseline Logistic Regression (LR), two LR variants to mitigate class imbalance, and an optimized XGBoost model. Results revealed that the XGBoost model achieved superior performance, with an AUC of 0.7012 and a default recall of 70.5%. The economic analysis quantified the value of this accuracy at $4.2 million USD in potential savings. This work not only presents a superior predictive model but offers a replicable paradigm for financial institutions to responsibly adopt machine learning solutions, ensuring they are robust, economically viable, and transparent. | eng |
| dc.description.degreelevel | Maestría | |
| dc.description.degreename | Maestra en Ciencias Estadística | |
| dc.description.notes | Distinción meritoria a trabajo de grado de maestría | spa |
| dc.description.researcharea | Estadística Computacional | |
| dc.format.extent | xi, 131 páginas | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.instname | Universidad Nacional de Colombia | spa |
| dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
| dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
| dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/89448 | |
| dc.language.iso | eng | |
| dc.publisher | Universidad Nacional de Colombia | |
| dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá | |
| dc.publisher.faculty | Facultad de Ciencias | |
| dc.publisher.place | Bogotá, Colombia | |
| dc.publisher.program | Bogotá - Ciencias - Maestría en Ciencias - Estadística | |
| dc.relation.references | AbdElHamid, Mohamed H.: Machine Learning Approach For Credit Score Analysis: A Case Study Of Predicting Mortgage Loan Defaults, Universidade Nova de Lisboa, Dissertation, 2019 | |
| dc.relation.references | Anderson, Joseph F. Hair JR/ William C. Black/ Barry J. Babin/ Rolph E.: Multi- variate Data Analysis. 8. Cengage Learning, 2019 | |
| dc.relation.references | Baesens, T. / Viaene S. / Stepanova M. / Suykens J. / Vanthienen J.: Benchmarking State-of-the-art Classification Algorithms for Credit Scoring. (2021), S. 627–635 | |
| dc.relation.references | Bruce, Peter Bruce. / A.: Practical Statistics for Data Scientist. OREILLY, 2017 | |
| dc.relation.references | Caixeiro, Francisco Falcao A: Infrastructure and Machine Learning for Credit Scor- ing. Lisboa, Tecnico Lisboa, Dissertation, 2018 | |
| dc.relation.references | Castañeda, Liliana B. (Hrsg.): Probabilidad. Universidad Nacional de Colombia, sede Bogotá, 2004 | |
| dc.relation.references | Chorzempa, Marshall Lux./ M.: When Markets Quake: Online Banks and Their Past, Present and Future. In: Mossavar-Rahmani Center for Business Government 73 (2017), S. 1–73 | |
| dc.relation.references | David, Cano ChuquiJorge./Ogosi AuquiJosé./GuadalopeMoriVictor./ ObandoP.: Machine Learning for Personal Credit Evaluation: A Systematic Review. In: WSEAS Transactions on Computer Research (2022) | |
| dc.relation.references | David, Durand: Risk Elements in Consumer Installment Financing. In: National Bureau of Economic Research (1941) | |
| dc.relation.references | Experian: Credit score basics. tips for unlocking your credit potential. 2019 | |
| dc.relation.references | Framework, Basel: IRB approach: minimum requirements to use IRB approach. In: art.30.20–30.24 (2021) | |
| dc.relation.references | Framework, Basel: IRB approach: overview and asset class definitions. In: art.30.20–30.24 (2021) | |
| dc.relation.references | Friedman, TrevorHastie/RobertTibshirani/J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009 | |
| dc.relation.references | Ghédira, M.Soui./I.Gasmi./S.Smiti./K.: Rule-basedcreditriskassessmentmodel using multi-objective evolutionary algorithms. In: Expert systems with applications 126 (2019), S. 144–157 | |
| dc.relation.references | Gillen, Ben ; Huber, Gregory A.: The use of credit scores in peer-to-peer lending platforms. In: Federal Reserve Bank of Philadelphia Working Paper 18-15R (2018), S. 1–50 | |
| dc.relation.references | Van Greuning. / Darrel Scott. / Simonet Terblanche, Hennie (Hrsg.): In- ternational Financial Reporting Standards. Washington DC : The World Bank, 2011 | |
| dc.relation.references | Groh, Edoardo Mosca. / Ferenc Szigeti. / Stella Tragianni. / Daniel Gallagher. / G.: SHAP-Based Explanation Methods: A Review for NLP Interpretability. In: Proceedings of the 29th International Conference on Computational Linguistics (2012), S.4593–4603 | |
| dc.relation.references | Géron, Aurélien (Hrsg.): Hands-on Machine Learning with Scikit-Learn, Keras Ten- sorFlow. Canada : OREILLY, 2019 | |
| dc.relation.references | Hill, David G. (Hrsg.): Data Protection: Governance, risk management, and Compli- ance. Florida, United States : CRC:Press, 2019 | |
| dc.relation.references | Hulme, K. Buehler. / A. Freeman. / R: The new arsenal of risk management. In: Harvard Business Review 86 (2008), S. 93–100 | |
| dc.relation.references | Jain, Alok Kumar. / M.: Ensemble Learning for AI Developers. Apress, 2020 | |
| dc.relation.references | Jesús, Espinosa-Zúñiga J.: Aplicación de metodología CRISP-DM para segmentación geográfica de una base de datos pública. In: Ingeniería Investigación y Tecnología (2020), S. 1–17 | |
| dc.relation.references | Juscafresa, Aleix N.: An introduction to explainable artificial intelligence with LIME and SHAP, Universitat Barcelona, Dissertation, 2022 | |
| dc.relation.references | Kegelmeyer, Nitesh V. Chawla/ Kevin W. Bowyer/ Lawrence O. Hall/ W. P.: SMOTE: Synthetic Minority Over-sampling Technique. In: Journal of Artificial In- telligence Research 16 (2002), S. 321–357 | |
| dc.relation.references | Lee, Scott M. Lundberg. / Su-In: A Unified Approach to Interpreting Model Predic- tions. In: 31st Conference on Neural Information Processing Systems (2014) | |
| dc.relation.references | Li, Dongmei Li. / L.: Research on Efficiency in Credit Risk Prediction Using Logis- ticSBM Model. In: Wireless Communications and Mobile Computing (2022) | |
| dc.relation.references | Marquardt, Florian: Machine Learning and quantum devices. Max Planck Insti- tute for the Science of Light and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, 2021 | |
| dc.relation.references | Massaron, Korand Banachewicz. / L.: The Kaggle Book. Packt, 2022 | |
| dc.relation.references | Masís, Serg (Hrsg.): Interpretable Machine Learning with Python. Canada : Packt, 2021 | |
| dc.relation.references | Naranjo, BilalYurdakul /J.: Statisticalpropertiesofthepopulationstabilityindex. In: Journal of Risk Model Validation 14 (2023), S. 89–100 | |
| dc.relation.references | Nordio, A. R. Provenzano. / D. Trifiro. / A. Datteo. / L. Giada. / N. Jean. / A. Riciputi. / G. Le Pera. / M. Spadaccino. / L. Massaron. / C.: Machine Learning approach for Credit Scoring. In: arXiv:2008.01687v1 (2020) | |
| dc.relation.references | O’Brien., R. M.: A Caution Regarding Rules of Thumb for Variance Inflation Factors. In: Quality Quantity 41 (2007), S. 673–690 | |
| dc.relation.references | Permanasari, Naufal Azmi Verdikha/ Teguh Bharata Adji/ Adhistya E.: Study of UndersamplingMethod: InstanceHardnessThresholdwithVariousEstimatorsforHate SpeechClassification. In: IJITEE (International Journal of Information Technology and Electrical Engineering) 2 (2018), S. 39–44 | |
| dc.relation.references | Popovych, Bohdan (Hrsg.): Application of AI in Credit Scoring Modeling. Austria : Springer Gabler, 2022 | |
| dc.relation.references | Potsane, Xolani Dastile. / Turgay Celik. / M.: Statistical and machine learning models in credit scoring: A systematic literature survey. In: Applied Soft Computing (2020) | |
| dc.relation.references | De Prado, Marcos M. L.: Machine learning for asset managers. Cambridge University Press, 2020 | |
| dc.relation.references | Richardson, Pankaj Mehta. / Ching-Hao Wang. / Alexandre G. R. Day. / C.: A high-bias, low-variance introduction to Machine Learning for physicists. In: arXiv:1803.08823v3 (2019) | |
| dc.relation.references | Shmueli, Mingfeng Lin/ Henry C. Lucas Jr/ G.: Too Big to Fail: Large Samples and the p-Value Problem. In: Information Systems Research 24(4) (2013), S. 906–917 | |
| dc.relation.references | Siddiqi, Naeem (Hrsg.): Intelligent Credit Scoring. Canada : Wiley, 2017 | |
| dc.relation.references | Sperlí, Flora Amato. / Antonino Ferraro. / Antonio Galli. / Francesco Moscato. / Vincenzo Moscato. / G.: Credit Score Prediction Relying on Machine Learning. In: CEUR Workshop Proceedings 3194 (2022) | |
| dc.relation.references | Staffa, Alessandra Carleo/Roberto Rocci/Mariana S.: Measuring the Recovery Per- formance of a Portfolio of NPLs. In: Computation 11 (2023), S. 29 | |
| dc.relation.references | Suleri, Qamar: Interpretable Machine Learning for Credit Scoring, Eramus school of economics, Dissertation, 2023 | |
| dc.relation.references | Talwalkar, M. Mohri. / A. Rostamizadeh. / A. (Hrsg.): Foundations of machine learning. USA : MIT Press, 2018 | |
| dc.relation.references | Tomas, Tomas M.Cover/ Joy A.: Elements of Information Theory. 2. Wiley, 2006 | |
| dc.relation.references | Weed, Douglas L.: Weight of Evidence: A Review of Concept and Methods. In: Risk Analysis 25(6) (2005), S. 1545–1557 | |
| dc.relation.references | Xiong, Y. Guo. / W. Zhou. / C. Luo. / C. Liu. / H.: Instance-based credit risk assessment for investment decisions in p2p lending. In: European Journal of Operational Research 249 (2016), S. 417–426 | |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
| dc.rights.license | Reconocimiento 4.0 Internacional | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject.ddc | 330 - Economía::332 - Economía financiera | |
| dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores | |
| dc.subject.lemb | DEPARTAMENTOS DE CREDITO | spa |
| dc.subject.lemb | Credit departments | eng |
| dc.subject.lemb | POLITICA CREDITICIA | spa |
| dc.subject.lemb | Credit policy | eng |
| dc.subject.lemb | APRENDIZAJE AUTOMATICO (INTELIGENCIA ARTIFICIAL) | spa |
| dc.subject.lemb | Machine learning | eng |
| dc.subject.lemb | PRONOSTICO DE LA ECONOMIA | spa |
| dc.subject.lemb | Economic forecasting | eng |
| dc.subject.lemb | PROYECCIONES ECONOMICAS | spa |
| dc.subject.lemb | Economic projections | eng |
| dc.subject.lemb | PRONOSTICO DE LOS NEGOCIOS | spa |
| dc.subject.lemb | Business forecasting | eng |
| dc.subject.proposal | Credit Scoring | eng |
| dc.subject.proposal | Binary Classification | eng |
| dc.subject.proposal | Credit Risk Analysis | eng |
| dc.subject.proposal | Model Interpretability | eng |
| dc.subject.proposal | Machine Learning | eng |
| dc.subject.proposal | SHAP | eng |
| dc.subject.proposal | Economic Value Analysis | eng |
| dc.title | Application of Machine Learning Models to credit scoring | eng |
| dc.title.translated | Aplicación de modelos de Machine Learning para la calificación crediticia | spa |
| dc.type | Trabajo de grado - Maestría | |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | |
| dc.type.content | Text | |
| dc.type.driver | info:eu-repo/semantics/masterThesis | |
| dc.type.redcol | http://purl.org/redcol/resource_type/TM | |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | |
| dcterms.audience.professionaldevelopment | Especializada | |
| dcterms.audience.professionaldevelopment | Estudiantes | |
| dcterms.audience.professionaldevelopment | Investigadores | |
| dcterms.audience.professionaldevelopment | Maestros | |
| oaire.accessrights | http://purl.org/coar/access_right/c_abf2 |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- Tesis_de_maestrýa_UNAL.pdf
- Tamaño:
- 6.12 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Maestría en Ciencias Estadística
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:

