Modelo de Machine Learning para determinar si existe un exceso de muertes por enfermedades cardiovasculares, de personas mayores de 60 años, relacionadas con la exposición de corto plazo al material particulado PM2.5
| dc.contributor.advisor | Niño Vasquez, Luis Fernando | spa |
| dc.contributor.advisor | Gutierrez Torres, Juan David | spa |
| dc.contributor.author | Roncancio Turriago, Jorge Luis | spa |
| dc.contributor.researchgroup | laboratorio de Investigación en Sistemas Inteligentes Lisi | |
| dc.date.accessioned | 2025-12-11T21:34:54Z | |
| dc.date.available | 2025-12-11T21:34:54Z | |
| dc.date.issued | 2025-09-10 | |
| dc.description | ilustraciones, gráficas, tablas | spa |
| dc.description.abstract | La baja calidad del aire en Colombia se está convirtiendo en un riesgo para la población mayor, debido al deterioro de su sistema inmunitario a causa del envejecimiento, esto ocasiona que desarrollen con mayor facilidad enfermedades cardiovasculares. Por esto se plantea crear un modelo de Machine Learning que permita determinar si existe un exceso en muertes por enfermedades cardiovasculares, de personas mayores de 60 años, relacionadas con la exposición de corto plazo al material particulado PM2.5. Para lograr este objetivo, se realizó una recolección de datos ambientales y epidemiológicos, se aplicaron técnicas de preprocesamiento de datos, se entrenaron distintos modelos con diferentes técnicas de Machine Learning y se evaluó el desempeño de estos modelos, comparando diferentes métricas de evaluación, con el objetivo de seleccionar el modelo con mejores resultados; todo esto haciendo uso de la metodología CRISP-DM. Con el modelo final seleccionado, se ejecutó un análisis de interpretabilidad donde se evidencio que existe una relación entre las variables de estudio “pm25”, “prevalencia en muertes por hipertensión” y “exceso de muertes”; esta relación no necesariamente induce a afirmar que exista un exceso en muertes por enfermedades cardiovasculares, relacionadas con la exposición de corto plazo al material particulado PM2.5, pero sí muestra que el modelo es capaz de generar buenas predicciones para responder a dicha incógnita. El modelo final cuenta con un “Recall” para la etiqueta 0 (No hay exceso de muertes) de 70.72% y para la etiqueta 1 (Si hay exceso de muertes) de 71.38% lo cual es equivalente a que el modelo es capaz de acertar 18 de cada 25 veces que se le pregunte. (Texto tomado de la fuente). | spa |
| dc.description.abstract | Poor air quality in Colombia is becoming a risk for the elderly population due to the deterioration of their immune system caused by aging, which makes them more susceptible to developing cardiovascular diseases. For this reason, it is proposed to create a Machine Learning model to determine whether there is an excess of deaths from cardiovascular diseases in people over 60 years old related to short-term exposure to PM2.5 particulate matter. To achieve this goal, environmental and epidemiological data were collected, data preprocessing techniques were applied, various models were trained using different Machine Learning techniques, and their performance was evaluated by comparing different evaluation metrics, with the aim of selecting the model with the best results; all of this following the CRISP-DM methodology. With the final model selected, an interpretability analysis was performed, which showed a relationship between the study variables "PM25", "prevalence of deaths due to hypertension" and "excess deaths." This relationship does not necessarily lead to the conclusion that there is an excess of deaths from cardiovascular disease related to short-term exposure to PM2.5 particulate matter, but it does show that the model is capable of generating good predictions to answer this question. The final model has a “Recall” for label 0 (There are no excess deaths) of 70.72% and for label 1 (There are excess deaths) of 71.38%, which is equivalent to the model being able to get it right 18 out of every 25 times it is asked. | eng |
| dc.description.degreelevel | Maestría | spa |
| dc.description.degreename | Magister en Ingeniería - Ingeniería de Sistemas y Computación | spa |
| dc.description.researcharea | Sistemas inteligentes | spa |
| dc.format.extent | 72 páginas | spa |
| dc.format.mimetype | application/pdf | |
| dc.identifier.instname | Universidad Nacional de Colombia | spa |
| dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
| dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
| dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/89204 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad Nacional de Colombia | spa |
| dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá | spa |
| dc.publisher.faculty | Facultad de Ingeniería | spa |
| dc.publisher.place | Bogotá, Colombia | spa |
| dc.publisher.program | Bogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computación | spa |
| dc.relation.indexed | Bireme | spa |
| dc.relation.references | Alaa, A. M., Bolton, T., Di Angelantonio, E., Rudd, J. H. F., & van der Schaar, M. (2019a). Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLOS ONE, 14(5), e0213653. https://doi.org/10.1371/journal.pone.0213653 | |
| dc.relation.references | Alaa, A. M., Bolton, T., Di Angelantonio, E., Rudd, J. H. F., & van der Schaar, M. (2019b). Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLOS ONE, 14(5), e0213653. https://doi.org/10.1371/journal.pone.0213653 | |
| dc.relation.references | Alman, B. L., Pfister, G., Hao, H., Stowell, J., Hu, X., Liu, Y., & Strickland, M. J. (2016). The association of wildfire smoke with respiratory and cardiovascular emergency department visits in Colorado in 2012: a case crossover study. Environmental Health, 15(1), 64. https://doi.org/10.1186/s12940-016-0146-8 | |
| dc.relation.references | Alves, L. (2020a). Amazon fires coincide with increased respiratory illnesses in indigenous populations. The Lancet Respiratory Medicine, 8(11), e84. https://doi.org/10.1016/S2213-2600(20)30421-5 | |
| dc.relation.references | Alves, L. (2020b). Amazon fires coincide with increased respiratory illnesses in indigenous populations. The Lancet Respiratory Medicine, 8(11), e84. https://doi.org/10.1016/S2213-2600(20)30421-5 | |
| dc.relation.references | Arias, L. (2023). Metodología CRISP-DM: La guía definitiva para la Minería de Datos. https://www.linkedin.com/pulse/metodolog%C3%ADa-crisp-dm-la-gu%C3%ADa-definitiva-para-miner%C3%ADa-de-arias-xyusf | |
| dc.relation.references | Azmi, J., Arif, M., Nafis, M. T., Alam, M. A., Tanweer, S., & Wang, G. (2022). A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Medical Engineering & Physics, 105, 103825. https://doi.org/10.1016/j.medengphy.2022.103825 | |
| dc.relation.references | Bălă, G.-P., Râjnoveanu, R.-M., Tudorache, E., Motișan, R., & Oancea, C. (2021). Air pollution exposure—the (in)visible risk factor for respiratory diseases. Environmental Science and Pollution Research, 28(16), 19615–19628. https://doi.org/10.1007/s11356-021-13208-x | |
| dc.relation.references | Barrera, M., Morales, A., Hernández, J., Hernández, D., Valencia, R., & Ramírez, M. (2017a). Immunosenescence. Medicina Interna de México, 33(5), 696–704. https://doi.org/doi.org/10.24245/mim.v33i5.1204 | |
| dc.relation.references | Barrera, M., Morales, A., Hernández, J., Hernández, D., Valencia, R., & Ramírez, M. (2017b). Immunosenescence. Medicina Interna de México, 33(5), 696–704. https://doi.org/doi.org/10.24245/mim.v33i5.1204 | |
| dc.relation.references | Bhatnagar, A. (2017). Environmental Determinants of Cardiovascular Disease. Circulation Research, 121(2), 162–180. https://doi.org/10.1161/CIRCRESAHA.117.306458 | |
| dc.relation.references | Burhan, E., & Mukminin, U. (2020). A systematic review of respiratory infection due to air pollution during natural disasters. Medical Journal of Indonesia, 29(1), 11–18. https://doi.org/10.13181/mji.oa.204390 | |
| dc.relation.references | Caldeira, D., Franco, F., Bravo Baptista, S., Cabral, S., Cachulo, M. do C., Dores, H., Peixeiro, A., Rodrigues, R., Santos, M., Timóteo, A. T., Vasconcelos, J., & Gonçalves, L. (2022a). Air pollution and cardiovascular diseases: A position paper. Revista Portuguesa de Cardiologia, 41(8), 709–717. https://doi.org/10.1016/j.repc.2022.05.006 | |
| dc.relation.references | Caldeira, D., Franco, F., Bravo Baptista, S., Cabral, S., Cachulo, M. do C., Dores, H., Peixeiro, A., Rodrigues, R., Santos, M., Timóteo, A. T., Vasconcelos, J., & Gonçalves, L. (2022b). Air pollution and cardiovascular diseases: A position paper. Revista Portuguesa de Cardiologia, 41(8), 709–717. https://doi.org/10.1016/j.repc.2022.05.006 | |
| dc.relation.references | Cascio, W. E. (2018). Wildland fire smoke and human health. Science of The Total Environment, 624, 586–595. https://doi.org/10.1016/j.scitotenv.2017.12.086 | |
| dc.relation.references | Chen, G., Guo, Y., Yue, X., Tong, S., Gasparrini, A., Bell, M. L., Armstrong, B., Schwartz, J., Jaakkola, J. J. K., Zanobetti, A., Lavigne, E., Nascimento Saldiva, P. H., Kan, H., Royé, D., Milojevic, A., Overcenco, A., Urban, A., Schneider, A., Entezari, A., … Li, S. (2021). Mortality risk attributable to wildfire-related PM2·5 pollution: a global time series study in 749 locations. The Lancet Planetary Health, 5(9), e579–e587. https://doi.org/10.1016/S2542-5196(21)00200-X | |
| dc.relation.references | Chen, H., Samet, J. M., Bromberg, P. A., & Tong, H. (2021). Cardiovascular health impacts of wildfire smoke exposure. Particle and Fibre Toxicology, 18(1), 2. https://doi.org/10.1186/s12989-020-00394-8 | |
| dc.relation.references | Dane. (2020). Censos Nacionales de Población y Vivienda. https://systema59.dane.gov.co/bincol/rpwebengine.exe/PortalAction?lang=esp | |
| dc.relation.references | de Bont, J., Jaganathan, S., Dahlquist, M., Persson, Å., Stafoggia, M., & Ljungman, P. (2022). Ambient air pollution and cardiovascular diseases: An umbrella review of systematic reviews and meta‐analyses. Journal of Internal Medicine, 291(6), 779–800. https://doi.org/10.1111/joim.13467 | |
| dc.relation.references | EPA. (2023a). La contaminación del aire y las enfermedades del corazón. https://espanol.epa.gov/espanol/la-contaminacion-del-aire-y-las-enfermedades-del-corazon | |
| dc.relation.references | EPA. (2023b). La contaminación del aire y las enfermedades del corazón. https://espanol.epa.gov/espanol/la-contaminacion-del-aire-y-las-enfermedades-del-corazon | |
| dc.relation.references | Groot, E., Caturay, A., Khan, Y., & Copes, R. (2019). A systematic review of the health impacts of occupational exposure to wildland fires. International Journal of Occupational Medicine and Environmental Health. https://doi.org/10.13075/ijomeh.1896.01326 | |
| dc.relation.references | Hotz, N. (2024). ¿Qué es CRISPDM? https://www.datascience-pm.com/crisp-dm-2/ | |
| dc.relation.references | IBM. (2025a). ¿Qué es el algoritmo de k vecinos más cercanos? https://www.ibm.com/es-es/think/topics/knn | |
| dc.relation.references | IBM. (2025b). ¿Qué es un árbol de decisión? https://www.ibm.com/mx-es/think/topics/decision-trees | |
| dc.relation.references | IBM. (2025c). ¿Qué es XGBoost? https://www.ibm.com/es-es/think/topics/xgboost | |
| dc.relation.references | IBM. (2025d). ¿Qué son las redes neuronales? https://www.ibm.com/es-es/think/topics/neural-networks | |
| dc.relation.references | IDEAM. (n.d.-a). Calidad de Aire. Retrieved May 27, 2024, from http://www.ideam.gov.co/web/siac/calidadaire | |
| dc.relation.references | IDEAM. (n.d.-b). Calidad de Aire. Retrieved May 27, 2024, from http://www.ideam.gov.co/web/siac/calidadaire | |
| dc.relation.references | IEC. (2024). Tasa de mortalidad estandarizada. https://www.idescat.cat/pub/?id=tmee&lang=es#:~:text=La%20tasa%20de%20mortalidad%20estandarizada,una%20poblaci%C3%B3n%20tipo%20o%20est%C3%A1ndar. | |
| dc.relation.references | Krittanawong, C., Qadeer, Y. K., Hayes, R. B., Wang, Z., Thurston, G. D., Virani, S., & Lavie, C. J. (2023). PM2.5 and cardiovascular diseases: State-of-the-Art review. International Journal of Cardiology Cardiovascular Risk and Prevention, 19, 200217. https://doi.org/10.1016/j.ijcrp.2023.200217 | |
| dc.relation.references | Krittanawong, C., Virk, H. U. H., Bangalore, S., Wang, Z., Johnson, K. W., Pinotti, R., Zhang, H., Kaplin, S., Narasimhan, B., Kitai, T., Baber, U., Halperin, J. L., & Tang, W. H. W. (2020). Machine learning prediction in cardiovascular diseases: a meta-analysis. Scientific Reports, 10(1), 16057. https://doi.org/10.1038/s41598-020-72685-1 | |
| dc.relation.references | Liu, J. C., Pereira, G., Uhl, S. A., Bravo, M. A., & Bell, M. L. (2015). A systematic review of the physical health impacts from non-occupational exposure to wildfire smoke. Environmental Research, 136, 120–132. https://doi.org/10.1016/j.envres.2014.10.015 | |
| dc.relation.references | Mahsin, M. D., Cabaj, J., & Saini, V. (2022). Respiratory and cardiovascular condition-related physician visits associated with wildfire smoke exposure in Calgary, Canada, in 2015: a population-based study. International Journal of Epidemiology, 51(1), 166–178. https://doi.org/10.1093/ije/dyab206 | |
| dc.relation.references | Matz, C. J., Egyed, M., Xi, G., Racine, J., Pavlovic, R., Rittmaster, R., Henderson, S. B., & Stieb, D. M. (2020). Health impact analysis of PM2.5 from wildfire smoke in Canada (2013–2015, 2017–2018). Science of The Total Environment, 725, 138506. https://doi.org/10.1016/j.scitotenv.2020.138506 | |
| dc.relation.references | McGrath, A., & Jonker, A. (2024, October 8). ¿Qué es la interpretabilidad de la IA? https://www.ibm.com/es-es/think/topics/interpretability#:~:text=tipo%20de%20modelo.-,M%C3%A9todos%20de%20interpretabilidad,Individual%20Conditional%20Expectation%20(ICE)%20Plots | |
| dc.relation.references | MinSalud. (2025). Consulta a cubos y modulo geográfico. | |
| dc.relation.references | NASA. (2025). Giovanni. https://giovanni.gsfc.nasa.gov/giovanni/ | |
| dc.relation.references | Navarro, K. M., Kleinman, M. T., Mackay, C. E., Reinhardt, T. E., Balmes, J. R., Broyles, G. A., Ottmar, R. D., Naher, L. P., & Domitrovich, J. W. (2019). Wildland firefighter smoke exposure and risk of lung cancer and cardiovascular disease mortality. Environmental Research, 173, 462–468. https://doi.org/10.1016/j.envres.2019.03.060 | |
| dc.relation.references | Pal, M., Parija, S., Panda, G., Dhama, K., & Mohapatra, R. K. (2022). Risk prediction of cardiovascular disease using machine learning classifiers. Open Medicine, 17(1), 1100–1113. https://doi.org/10.1515/med-2022-0508 | |
| dc.relation.references | Qiu, H., Luo, L., Su, Z., Zhou, L., Wang, L., & Chen, Y. (2020). Machine learning approaches to predict peak demand days of cardiovascular admissions considering environmental exposure. BMC Medical Informatics and Decision Making, 20(1), 83. https://doi.org/10.1186/s12911-020-1101-8 | |
| dc.relation.references | Rosero, D. (2022a). ¿Cómo afecta la calidad del aire la salud de los colombianos? https://www.radionacional.co/actualidad/medio-ambiente/calidad-del-aire-en-colombia-afectaciones-la-salud#:~:text=Uno%20de%20los%20riesgos%20ambientales,la%20mala%20calidad%20del%20aire. | |
| dc.relation.references | Rosero, D. (2022b). ¿Cómo afecta la calidad del aire la salud de los colombianos? https://www.radionacional.co/actualidad/medio-ambiente/calidad-del-aire-en-colombia-afectaciones-la-salud#:~:text=Uno%20de%20los%20riesgos%20ambientales,la%20mala%20calidad%20del%20aire. | |
| dc.relation.references | Saavedra, D., & García, B. (2014a). Inmunosenescencia: efectos de la edad sobre el sistema inmune. http://scielo.sld.cu/scielo.php?script=sci_arttext&pid=S0864-02892014000400005 | |
| dc.relation.references | Saavedra, D., & García, B. (2014b). Inmunosenescencia: efectos de la edad sobre el sistema inmune. http://scielo.sld.cu/scielo.php?script=sci_arttext&pid=S0864-02892014000400005 | |
| dc.relation.references | Sammut, C., & Webb, G. I. (2017). Accuracy. In Encyclopedia of Machine Learning and Data Mining (pp. 8–8). Springer US. https://doi.org/10.1007/978-1-4899-7687-1_3 | |
| dc.relation.references | Stowell, J. D., Geng, G., Saikawa, E., Chang, H. H., Fu, J., Yang, C.-E., Zhu, Q., Liu, Y., & Strickland, M. J. (2019). Associations of wildfire smoke PM2.5 exposure with cardiorespiratory events in Colorado 2011–2014. Environment International, 133, 105151. https://doi.org/10.1016/j.envint.2019.105151 | |
| dc.relation.references | Ting, K. M. (2017). Confusion Matrix. In Encyclopedia of Machine Learning and Data Mining (pp. 260–260). Springer US. https://doi.org/10.1007/978-1-4899-7687-1_50 | |
| dc.relation.references | Tsarapatsani, K., Sakellarios, A. I., Pezoulas, V. C., Tsakanikas, V. D., Kleber, M. E., Marz, W., Michalis, L. K., & Fotiadis, D. I. (2022). Machine Learning Models for Cardiovascular Disease Events Prediction. 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), 1066–1069. https://doi.org/10.1109/EMBC48229.2022.9871121 | |
| dc.relation.references | Zhang, E., & Zhang, Y. (2018a). F-Measure. In Encyclopedia of Database Systems (pp. 1492–1493). Springer New York. https://doi.org/10.1007/978-1-4614-8265-9_483 | |
| dc.relation.references | Zhang, E., & Zhang, Y. (2018b). Precision. In Encyclopedia of Database Systems (pp. 2778–2779). Springer New York. https://doi.org/10.1007/978-1-4614-8265-9_480 | |
| dc.relation.references | Zhang, E., & Zhang, Y. (2018c). Recall. In Encyclopedia of Database Systems (pp. 3119–3120). Springer New York. https://doi.org/10.1007/978-1-4614-8265-9_479 | |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
| dc.rights.license | Atribución-NoComercial 4.0 Internacional | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales::005 - Programación, programas, datos de computación | spa |
| dc.subject.ddc | 360 - Problemas y servicios sociales; asociaciones::361 - Problemas sociales y servicios | spa |
| dc.subject.ddc | 610 - Medicina y salud::616 - Enfermedades | spa |
| dc.subject.proposal | Datos | spa |
| dc.subject.proposal | Desempeño | spa |
| dc.subject.proposal | Enfermedades Cardiovasculares | spa |
| dc.subject.proposal | Inmunosenescencia | spa |
| dc.subject.proposal | Interpretabilidad | spa |
| dc.subject.proposal | Modelos de ML | spa |
| dc.subject.proposal | PM2.5 | spa |
| dc.subject.proposal | Cardiovascular Diseases | eng |
| dc.subject.proposal | Data | eng |
| dc.subject.proposal | Performance | eng |
| dc.subject.proposal | Immunosenescence | eng |
| dc.subject.proposal | Interpretability | eng |
| dc.subject.proposal | ML Models | eng |
| dc.subject.proposal | PM2.5 | eng |
| dc.subject.unesco | Enfermedad cardiovascular | spa |
| dc.subject.unesco | Cardiovascular diseases | eng |
| dc.subject.unesco | Epidemiología | spa |
| dc.subject.unesco | Epidemiology | eng |
| dc.subject.unesco | Medio ambiente | spa |
| dc.subject.unesco | Environment | eng |
| dc.subject.unesco | Contaminación | spa |
| dc.subject.unesco | Pollution | eng |
| dc.title | Modelo de Machine Learning para determinar si existe un exceso de muertes por enfermedades cardiovasculares, de personas mayores de 60 años, relacionadas con la exposición de corto plazo al material particulado PM2.5 | spa |
| dc.title.translated | Machine learning model to determine whether there is an excess of deaths from cardiovascular disease in people over 60 years old related to short-term exposure to PM2.5 particulate matter | eng |
| dc.type | Trabajo de grado - Maestría | spa |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | |
| dc.type.content | Text | |
| dc.type.driver | info:eu-repo/semantics/masterThesis | |
| dc.type.redcol | http://purl.org/redcol/resource_type/TM | |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | |
| dcterms.audience.professionaldevelopment | Estudiantes | spa |
| dcterms.audience.professionaldevelopment | Investigadores | spa |
| dcterms.audience.professionaldevelopment | Maestros | spa |
| oaire.accessrights | http://purl.org/coar/access_right/c_abf2 |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- Trabajo Final Maestria.pdf
- Tamaño:
- 2.62 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Maestría en Ingeniería - Ingeniería de Sistemas y Computación
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:

