Application of Machine Learning Models to credit scoring

dc.contributor.advisorMontenegro Diaz, Álvaro Mauricio
dc.contributor.authorCastilla Reyes, Astrid Natalia
dc.contributor.cvlacCastilla Reyes, Astrid Natalia [0000040134]
dc.contributor.googlescholarCastilla Reyes, Astrid Natalia [fjh2ib4AAAAJ&hl]
dc.date.accessioned2026-02-10T15:11:54Z
dc.date.available2026-02-10T15:11:54Z
dc.date.issued2025
dc.descriptionilustraciones a color, diagramasspa
dc.description.abstractEl presente estudio diseña y valida un marco de trabajo de extremo a extremo para el desarrollo de modelos de calificación crediticia, superando la dicotomía tradicional entre el rendimiento de los modelos de aprendizaje automático y la necesidad de interpretabilidad regulatoria. La metodología se distingue por sus innovaciones, incluyendo un exhaustivo marco de ingeniería y selección de características que emplea múltiples métodos. Este marco se complementa con una validación económica para cuantificar el impacto financiero y un sistema de interpretabilidad híbrido (SHAP, LIME, WoE) para explicar las predicciones de modelos complejos. Se aplicó este proceso para desarrollar y comparar cuatro modelos: una Regresión Logística (RL) de base, dos variantes de RL para mitigar el desbalance de clases y un modelo XGBoost optimizado. Los resultados revelaron que el modelo XGBoost alcanzó un rendimiento superior, con un AUC de 0.7012 y una capacidad de detección de incumplimientos (recall) del 70.5%. El análisis económico cuantificó el valor de esta precisión en ahorros potenciales de $4.2 millones de dólares. Este trabajo no solo presenta un modelo predictivo superior, sino que ofrece un paradigma replicable para que las instituciones financieras adopten soluciones de aprendizaje automático de manera responsable, garantizando que sean robustas, económicamente viables y transparentes. (Texto tomado de la fuente)spa
dc.description.abstractThis study designs and validates an end-to-end framework for developing credit scoring models, overcoming the traditional dichotomy between machine learning performance and the need for regulatory interpretability. The methodology is distinguished by its innovations, including an exhaustive feature engineering and selection framework that employs multiple methods. This framework is complemented by a risk-based economic validation to quantify financial impact and a hybrid interpretability system (SHAP, LIME, WoE) to explain complex model predictions. This process was applied to develop and compare four models: a baseline Logistic Regression (LR), two LR variants to mitigate class imbalance, and an optimized XGBoost model. Results revealed that the XGBoost model achieved superior performance, with an AUC of 0.7012 and a default recall of 70.5%. The economic analysis quantified the value of this accuracy at $4.2 million USD in potential savings. This work not only presents a superior predictive model but offers a replicable paradigm for financial institutions to responsibly adopt machine learning solutions, ensuring they are robust, economically viable, and transparent.eng
dc.description.degreelevelMaestría
dc.description.degreenameMaestra en Ciencias Estadística
dc.description.notesDistinción meritoria a trabajo de grado de maestríaspa
dc.description.researchareaEstadística Computacional
dc.format.extentxi, 131 páginas
dc.format.mimetypeapplication/pdf
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/89448
dc.language.isoeng
dc.publisherUniversidad Nacional de Colombia
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotá
dc.publisher.facultyFacultad de Ciencias
dc.publisher.placeBogotá, Colombia
dc.publisher.programBogotá - Ciencias - Maestría en Ciencias - Estadística
dc.relation.referencesAbdElHamid, Mohamed H.: Machine Learning Approach For Credit Score Analysis: A Case Study Of Predicting Mortgage Loan Defaults, Universidade Nova de Lisboa, Dissertation, 2019
dc.relation.referencesAnderson, Joseph F. Hair JR/ William C. Black/ Barry J. Babin/ Rolph E.: Multi- variate Data Analysis. 8. Cengage Learning, 2019
dc.relation.referencesBaesens, T. / Viaene S. / Stepanova M. / Suykens J. / Vanthienen J.: Benchmarking State-of-the-art Classification Algorithms for Credit Scoring. (2021), S. 627–635
dc.relation.referencesBruce, Peter Bruce. / A.: Practical Statistics for Data Scientist. OREILLY, 2017
dc.relation.referencesCaixeiro, Francisco Falcao A: Infrastructure and Machine Learning for Credit Scor- ing. Lisboa, Tecnico Lisboa, Dissertation, 2018
dc.relation.referencesCastañeda, Liliana B. (Hrsg.): Probabilidad. Universidad Nacional de Colombia, sede Bogotá, 2004
dc.relation.referencesChorzempa, Marshall Lux./ M.: When Markets Quake: Online Banks and Their Past, Present and Future. In: Mossavar-Rahmani Center for Business Government 73 (2017), S. 1–73
dc.relation.referencesDavid, Cano ChuquiJorge./Ogosi AuquiJosé./GuadalopeMoriVictor./ ObandoP.: Machine Learning for Personal Credit Evaluation: A Systematic Review. In: WSEAS Transactions on Computer Research (2022)
dc.relation.referencesDavid, Durand: Risk Elements in Consumer Installment Financing. In: National Bureau of Economic Research (1941)
dc.relation.referencesExperian: Credit score basics. tips for unlocking your credit potential. 2019
dc.relation.referencesFramework, Basel: IRB approach: minimum requirements to use IRB approach. In: art.30.20–30.24 (2021)
dc.relation.referencesFramework, Basel: IRB approach: overview and asset class definitions. In: art.30.20–30.24 (2021)
dc.relation.referencesFriedman, TrevorHastie/RobertTibshirani/J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009
dc.relation.referencesGhédira, M.Soui./I.Gasmi./S.Smiti./K.: Rule-basedcreditriskassessmentmodel using multi-objective evolutionary algorithms. In: Expert systems with applications 126 (2019), S. 144–157
dc.relation.referencesGillen, Ben ; Huber, Gregory A.: The use of credit scores in peer-to-peer lending platforms. In: Federal Reserve Bank of Philadelphia Working Paper 18-15R (2018), S. 1–50
dc.relation.referencesVan Greuning. / Darrel Scott. / Simonet Terblanche, Hennie (Hrsg.): In- ternational Financial Reporting Standards. Washington DC : The World Bank, 2011
dc.relation.referencesGroh, Edoardo Mosca. / Ferenc Szigeti. / Stella Tragianni. / Daniel Gallagher. / G.: SHAP-Based Explanation Methods: A Review for NLP Interpretability. In: Proceedings of the 29th International Conference on Computational Linguistics (2012), S.4593–4603
dc.relation.referencesGéron, Aurélien (Hrsg.): Hands-on Machine Learning with Scikit-Learn, Keras Ten- sorFlow. Canada : OREILLY, 2019
dc.relation.referencesHill, David G. (Hrsg.): Data Protection: Governance, risk management, and Compli- ance. Florida, United States : CRC:Press, 2019
dc.relation.referencesHulme, K. Buehler. / A. Freeman. / R: The new arsenal of risk management. In: Harvard Business Review 86 (2008), S. 93–100
dc.relation.referencesJain, Alok Kumar. / M.: Ensemble Learning for AI Developers. Apress, 2020
dc.relation.referencesJesús, Espinosa-Zúñiga J.: Aplicación de metodología CRISP-DM para segmentación geográfica de una base de datos pública. In: Ingeniería Investigación y Tecnología (2020), S. 1–17
dc.relation.referencesJuscafresa, Aleix N.: An introduction to explainable artificial intelligence with LIME and SHAP, Universitat Barcelona, Dissertation, 2022
dc.relation.referencesKegelmeyer, Nitesh V. Chawla/ Kevin W. Bowyer/ Lawrence O. Hall/ W. P.: SMOTE: Synthetic Minority Over-sampling Technique. In: Journal of Artificial In- telligence Research 16 (2002), S. 321–357
dc.relation.referencesLee, Scott M. Lundberg. / Su-In: A Unified Approach to Interpreting Model Predic- tions. In: 31st Conference on Neural Information Processing Systems (2014)
dc.relation.referencesLi, Dongmei Li. / L.: Research on Efficiency in Credit Risk Prediction Using Logis- ticSBM Model. In: Wireless Communications and Mobile Computing (2022)
dc.relation.referencesMarquardt, Florian: Machine Learning and quantum devices. Max Planck Insti- tute for the Science of Light and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, 2021
dc.relation.referencesMassaron, Korand Banachewicz. / L.: The Kaggle Book. Packt, 2022
dc.relation.referencesMasís, Serg (Hrsg.): Interpretable Machine Learning with Python. Canada : Packt, 2021
dc.relation.referencesNaranjo, BilalYurdakul /J.: Statisticalpropertiesofthepopulationstabilityindex. In: Journal of Risk Model Validation 14 (2023), S. 89–100
dc.relation.referencesNordio, A. R. Provenzano. / D. Trifiro. / A. Datteo. / L. Giada. / N. Jean. / A. Riciputi. / G. Le Pera. / M. Spadaccino. / L. Massaron. / C.: Machine Learning approach for Credit Scoring. In: arXiv:2008.01687v1 (2020)
dc.relation.referencesO’Brien., R. M.: A Caution Regarding Rules of Thumb for Variance Inflation Factors. In: Quality Quantity 41 (2007), S. 673–690
dc.relation.referencesPermanasari, Naufal Azmi Verdikha/ Teguh Bharata Adji/ Adhistya E.: Study of UndersamplingMethod: InstanceHardnessThresholdwithVariousEstimatorsforHate SpeechClassification. In: IJITEE (International Journal of Information Technology and Electrical Engineering) 2 (2018), S. 39–44
dc.relation.referencesPopovych, Bohdan (Hrsg.): Application of AI in Credit Scoring Modeling. Austria : Springer Gabler, 2022
dc.relation.referencesPotsane, Xolani Dastile. / Turgay Celik. / M.: Statistical and machine learning models in credit scoring: A systematic literature survey. In: Applied Soft Computing (2020)
dc.relation.referencesDe Prado, Marcos M. L.: Machine learning for asset managers. Cambridge University Press, 2020
dc.relation.referencesRichardson, Pankaj Mehta. / Ching-Hao Wang. / Alexandre G. R. Day. / C.: A high-bias, low-variance introduction to Machine Learning for physicists. In: arXiv:1803.08823v3 (2019)
dc.relation.referencesShmueli, Mingfeng Lin/ Henry C. Lucas Jr/ G.: Too Big to Fail: Large Samples and the p-Value Problem. In: Information Systems Research 24(4) (2013), S. 906–917
dc.relation.referencesSiddiqi, Naeem (Hrsg.): Intelligent Credit Scoring. Canada : Wiley, 2017
dc.relation.referencesSperlí, Flora Amato. / Antonino Ferraro. / Antonio Galli. / Francesco Moscato. / Vincenzo Moscato. / G.: Credit Score Prediction Relying on Machine Learning. In: CEUR Workshop Proceedings 3194 (2022)
dc.relation.referencesStaffa, Alessandra Carleo/Roberto Rocci/Mariana S.: Measuring the Recovery Per- formance of a Portfolio of NPLs. In: Computation 11 (2023), S. 29
dc.relation.referencesSuleri, Qamar: Interpretable Machine Learning for Credit Scoring, Eramus school of economics, Dissertation, 2023
dc.relation.referencesTalwalkar, M. Mohri. / A. Rostamizadeh. / A. (Hrsg.): Foundations of machine learning. USA : MIT Press, 2018
dc.relation.referencesTomas, Tomas M.Cover/ Joy A.: Elements of Information Theory. 2. Wiley, 2006
dc.relation.referencesWeed, Douglas L.: Weight of Evidence: A Review of Concept and Methods. In: Risk Analysis 25(6) (2005), S. 1545–1557
dc.relation.referencesXiong, Y. Guo. / W. Zhou. / C. Luo. / C. Liu. / H.: Instance-based credit risk assessment for investment decisions in p2p lending. In: European Journal of Operational Research 249 (2016), S. 417–426
dc.rights.accessrightsinfo:eu-repo/semantics/openAccess
dc.rights.licenseReconocimiento 4.0 Internacional
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.ddc330 - Economía::332 - Economía financiera
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores
dc.subject.lembDEPARTAMENTOS DE CREDITOspa
dc.subject.lembCredit departmentseng
dc.subject.lembPOLITICA CREDITICIAspa
dc.subject.lembCredit policyeng
dc.subject.lembAPRENDIZAJE AUTOMATICO (INTELIGENCIA ARTIFICIAL)spa
dc.subject.lembMachine learningeng
dc.subject.lembPRONOSTICO DE LA ECONOMIAspa
dc.subject.lembEconomic forecastingeng
dc.subject.lembPROYECCIONES ECONOMICASspa
dc.subject.lembEconomic projectionseng
dc.subject.lembPRONOSTICO DE LOS NEGOCIOSspa
dc.subject.lembBusiness forecastingeng
dc.subject.proposalCredit Scoringeng
dc.subject.proposalBinary Classificationeng
dc.subject.proposalCredit Risk Analysiseng
dc.subject.proposalModel Interpretabilityeng
dc.subject.proposalMachine Learningeng
dc.subject.proposalSHAPeng
dc.subject.proposalEconomic Value Analysiseng
dc.titleApplication of Machine Learning Models to credit scoringeng
dc.title.translatedAplicación de modelos de Machine Learning para la calificación crediticiaspa
dc.typeTrabajo de grado - Maestría
dc.type.coarhttp://purl.org/coar/resource_type/c_bdcc
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.contentText
dc.type.driverinfo:eu-repo/semantics/masterThesis
dc.type.redcolhttp://purl.org/redcol/resource_type/TM
dc.type.versioninfo:eu-repo/semantics/acceptedVersion
dcterms.audience.professionaldevelopmentEspecializada
dcterms.audience.professionaldevelopmentEstudiantes
dcterms.audience.professionaldevelopmentInvestigadores
dcterms.audience.professionaldevelopmentMaestros
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Tesis_de_maestrýa_UNAL.pdf
Tamaño:
6.12 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ciencias Estadística

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: