Método para identificar y caracterizar variantes del SARS-CoV-2 mediante algoritmos de aprendizaje de máquinas y simulaciones moleculares
dc.contributor.advisor | Branch Bedoya, John William | |
dc.contributor.advisor | Hernández Ortíz, Juan Pablo | |
dc.contributor.author | López Carvajal, María Stella | |
dc.contributor.cvlac | https://scienti.minciencias.gov.co/cvlac/visualizador/generarCurriculoCv.do?cod_rh=0001956104 | |
dc.contributor.orcid | López Carvajal, Maria Stella [0000-0003-3053-1278] | |
dc.contributor.orcid | Branch Bedoya, John William [0000-0002-0378-028X] | |
dc.contributor.orcid | Hernández Ortíz, Juan Pablo [0000-0003-0404-9947] | |
dc.contributor.researchgroup | Gidia: Grupo de Investigación YyDesarrollo en Inteligencia Artificial | |
dc.date.accessioned | 2025-09-19T14:00:34Z | |
dc.date.available | 2025-09-19T14:00:34Z | |
dc.date.issued | 2024-04-15 | |
dc.description | Ilustraciones, gráficos | spa |
dc.description.abstract | La vigilancia genómica del SARS-CoV-2 ha permitido la identificación de variantes de interés y de preocupación a nivel mundial, relevantes para el manejo de salud pública, el mejoramiento de pruebas de diagnóstico y el diseño de las vacunas. Aproximadamente 16 millones de secuencias del virus han sido reportadas a la fecha, un número de instancias varios ordenes de magnitud superior a las decenas de miles de secuencias que han podido ser analizadas con árboles filogenéticos usando estrategias de paralelización computacional. El aprendizaje automático constituye una alternativa para el procesamiento de grandes conjuntos de datos y la identificación de patrones en el genoma viral, características que pueden ser aprovechadas para la identificación de variantes y el reconocimiento de linajes emergentes. En el presente trabajo se construyó una herramienta para la identificación de variantes del SARS-CoV-2 a partir de la obtención y procesamiento automático de secuencias del virus, transformaciones numéricas de los datos y aprendizaje no supervisado. Además, se incorporaron herramientas bioinformáticas para el modelado y la caracterización de proteínas codificadas por los genomas representativos de los linajes identificados. (Tomado de la fuente) | spa |
dc.description.abstract | Genomic surveillance of SARS-CoV-2 has allowed the identification of variants of interest and concern worldwide, relevant for public health management, the improvement of diagnostic tests and the design of vaccines. Approximately 16 million sequences of the virus have been reported from December 20, 2019 to June 15, 2024, a number of instances several orders of magnitude higher than the tens of thousands of sequences that have been able to be analyzed with phylogenetic trees using strategies such as computational parallelization. Machine learning constitutes an alternative for processing large data sets and identifying patterns in the viral genome, characteristics that can be used to identify variants and recognize emerging lineages. In the present work, a tool was built to automatically obtain SARS-CoV-2 sequences and preprocess them, and unsupervised learning was implemented to identify virus variants. | Eng |
dc.description.curriculararea | Ingeniería De Sistemas E Informática.Sede Medellín | |
dc.description.degreelevel | Maestría | |
dc.description.degreename | Magíster en Ingeniería - Analítica | |
dc.description.researcharea | Bioinformatica, Aprendizaje de Máquinas | |
dc.format.extent | 62 páginas | |
dc.format.mimetype | application/pdf | |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/88923 | |
dc.language.iso | spa | |
dc.publisher | Universidad Nacional de Colombia | |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | |
dc.publisher.faculty | Facultad de Minas | |
dc.publisher.place | Medellín, Colombia | |
dc.publisher.program | Medellín - Minas - Maestría en Ingeniería - Analítica | |
dc.relation.indexed | LaReferencia | |
dc.relation.references | Ben Hu et al. Characteristics of SARS-CoV-2 and COVID-19. Mar. de 2021. doi: 10.1038/s41579-020-00459-7. | |
dc.relation.references | WHO COVID-19 dashboard. 2024. url: https : / / data . who . int / dashboards / covid19/cases?n=c. | |
dc.relation.references | Social Impact of COVID-19. 2024. url: https://www.un.org/en/coronavirus. | |
dc.relation.references | Seguimiento de las variantes del SARS-CoV-2. 2024. url: https://www.who.int/ es/activities/tracking-SARS-CoV-2-variants. | |
dc.relation.references | Kishan Kalia, Gayatri Saberwal y Gaurav Sharma. The lag in SARS-CoV-2 genome submissions to GISAID. Sep. de 2021. doi: 10.1038/s41587-021-01040-0. | |
dc.relation.references | Dylan Lebatteux et al. “Machine learning-based approach KEVOLVE efficiently identifies SARS-CoV-2 variant-specific genomic signatures”. En: PLoS ONE 19 (1 January ene. de 2024). issn: 19326203. doi: 10.1371/journal.pone.0296627. | |
dc.relation.references | Andrew Melnyk et al. Clustering Based Identification of SARS-CoV-2 Subtypes. url: http://www.springer.com/series/5381. | |
dc.relation.references | Carla Mavian et al. Sampling bias and incorrect rooting make phylogenetic network tracing of SARS-COV-2 infections unreliable. Jun. de 2020. doi: 10 . 1073 / pnas . 2007295117 | |
dc.relation.references | Michael Ott. Inference of Large Phylogenetic Trees on Parallel Architectures. 2010. | |
dc.relation.references | Sarwan Ali et al. “PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences”. En: (2022). doi: 10.3390/biology. | |
dc.relation.references | Zahra Tayebi, Sarwan Ali y Murray Patterson. “Robust representation and efficient feature selection allows for effective clustering of SARS-CoV-2 variants”. En: Algorithms 14 (12 dic. de 2021). issn: 19994893. doi: 10.3390/a14120348. | |
dc.relation.references | Yawei Li et al. “Unsupervised clustering analysis of SARS-Cov-2 population structure reveals six major subtypes at early stage across the world”. En: (2021). doi: 10.1101/ 2020.09.04.283358. url: https://doi.org/10.1101/2020.09.04.283358. | |
dc.relation.references | Xiao Wen Cheng et al. “Identification of SARS-CoV-2 Variants and Their Clinical Significance in Hefei, China”. En: Frontiers in Medicine 8 (ene. de 2022). issn: 2296858X. doi: 10.3389/fmed.2021.784632. | |
dc.relation.references | Sergio E Rodriguez et al. SARS-CoV-2 variants: Impact on biological and clinical outcome. 2019 | |
dc.relation.references | Can rong Wu et al. Structure genomics of SARS-CoV-2 and its Omicron variant: drug design templates for COVID-19. Dic. de 2022. doi: 10.1038/s41401-021-00851-w. | |
dc.relation.references | Sarah Alsobaie. Understanding the molecular biology of SARS-CoV-2 and the COVID19 pandemic: A review. 2021. doi: 10.2147/IDR.S306441. | |
dc.relation.references | Daniel Wrapp et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. 2019. url: https://www.science.org. | |
dc.relation.references | Dongwan Kim et al. “The Architecture of SARS-CoV-2 Transcriptome”. En: Cell 181 (4 mayo de 2020), 914-921.e10. issn: 10974172. doi: 10.1016/j.cell.2020.04.011. | |
dc.relation.references | Abbas Khan et al. SARS-CoV-2 new variants: Characteristic features and impact on the efficacy of different vaccines. Nov. de 2021. doi: 10.1016/j.biopha.2021.112176. | |
dc.relation.references | Alessandro M. Carabelli et al. SARS-CoV-2 variant biology: immune escape, transmission and fitness. Mar. de 2023. doi: 10.1038/s41579-022-00841-7. | |
dc.relation.references | Overview of Variants/Mutations. 2024. url: https://data.who.int/dashboards/ covid19/cases?n=c. | |
dc.relation.references | Carlos Peña. “Métodos de inferencia filogenética”. En: Rev. peru. biol 18 (2 2011), págs. 265-267. issn: 1561-0837. | |
dc.relation.references | Federico Abascal et al. Filogenia y evolución molecular. | |
dc.relation.references | Javier Mendoza-Revilla y Universidad Peruana Cayetano Heredia Lima. Aportes de la filogenética a la investigación médica. Contributions of phylogenetics to medical research. 2012. | |
dc.relation.references | Genomic epidemiology of SARS-CoV-2 with subsampling focused globally over the past 6 months. 2024. url: https://nextstrain.org/ncov/gisaid/global/6m | |
dc.relation.references | James Hadfield et al. “NextStrain: Real-time tracking of pathogen evolution”. En: Bioinformatics 34 (23 dic. de 2018), p´ags. 4121-4123. issn: 14602059. doi: 10.1093/ bioinformatics/bty407. | |
dc.relation.references | Iqbal H. Sarker. Machine Learning: Algorithms, Real-World Applications and Research Directions. Mayo de 2021. doi: 10.1007/s42979-021-00592-x | |
dc.relation.references | Fausto Pedro y García Márquez. Supervised Machine Learning Algorithm: A Review of Classification Techniques. 2022. url: https://link.springer.com/bookseries/ 8767. | |
dc.relation.references | Kamal Berahmand et al. “Autoencoders and their applications in machine learning: a survey”. En: Artificial Intelligence Review 57 (2 feb. de 2024). issn: 15737462. doi: 10.1007/s10462-023-10662-6. | |
dc.relation.references | Jesper E. van Engelen y Holger H. Hoos. “A survey on semi-supervised learning”. En: Machine Learning 109 (2 feb. de 2020), p´ags. 373-440. issn: 15730565. doi: 10.1007/ s10994-019-05855-6. | |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
dc.rights.license | Atribución-NoComercial-CompartirIgual 4.0 Internacional | |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores | |
dc.subject.ddc | 610 - Medicina y salud::616 - Enfermedades | |
dc.subject.decs | Covid-19 - Procesamiento de datos | |
dc.subject.lemb | Síndrome Respiratorio Agudo Grave - Procesamiento de datos | |
dc.subject.lemb | Infecciones por coronavirus - Procesamiento de datos | |
dc.subject.lemb | Infecciones respiratorios - Procesamiento de datos | |
dc.subject.lemb | Aprendizaje automático (Inteligencia artificial) | |
dc.subject.lemb | Vigilancia epidemiológica - Procesamiento de datos | |
dc.subject.lemb | Genómica - Procesamiento de datos | |
dc.subject.proposal | Clustering no supervisado | spa |
dc.subject.proposal | SARS-CoV-2 | eng |
dc.subject.proposal | K-means | eng |
dc.subject.proposal | k-mers | eng |
dc.subject.proposal | Unsupervised Clustering | eng |
dc.title | Método para identificar y caracterizar variantes del SARS-CoV-2 mediante algoritmos de aprendizaje de máquinas y simulaciones moleculares | spa |
dc.title.translated | Method for identifying and characterizing SARS-CoV-2 variants using machine learning algorithms and molecular simulations | eng |
dc.type | Trabajo de grado - Maestría | |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | |
dc.type.content | Text | |
dc.type.driver | info:eu-repo/semantics/masterThesis | |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | |
dc.type.version | info:eu-repo/semantics/acceptedVersion | |
dcterms.audience.professionaldevelopment | Estudiantes | |
dcterms.audience.professionaldevelopment | Investigadores | |
dcterms.audience.professionaldevelopment | Maestros | |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 |