Modelo basado en técnicas de machine learning para la clasificación de virus de ARN

dc.contributor.advisorBermúdez Santana, Clara Isabel
dc.contributor.advisorNiño Vásquez, Luis Fernando
dc.contributor.authorColmenares Celis, Carolina
dc.contributor.researchgroupRnomica Teórica y Computacionalspa
dc.contributor.researchgrouplaboratorio de Investigación en Sistemas Inteligentes Lisispa
dc.date.accessioned2023-08-29T14:07:59Z
dc.date.available2023-08-29T14:07:59Z
dc.date.issued2023
dc.descriptionilustraciones, diagramasspa
dc.description.abstractLos virus son las entidades biológicas más abundantes de la Tierra, pero detectarlos, aislarlos y clasificarlos ha sido todo un reto para la ciencia. Los virus de ARN patógenos causan numerosas muertes humanas, especialmente los implicados en la transmisión de enfermedades zoonóticas, lo que conduce a emergencias víricas y pandemias globales como la asociada al SARS-CoV-2. En este estudio, se explora y describen representaciones teóricas como la de árbol extendido, HIT y árbol de grano grueso para virus de ARN, basados en niveles de secuencia y estructura. Estas representaciones se utilizaron para determinar cuál de ellas demuestra un mejor potencial como entradas para un modelo de clasificación basado en técnicas de aprendizaje de máquina. Para el diseño del modelo, se investigaron algoritmos de perceptrón multicapa, árboles de sufijos, modelos ocultos de Markov (HMM) y redes neuronales convolucionales con memoria de corto y largo plazo (CNN-LSTM). La aplicación de estos algoritmos se llevó a cabo utilizando dos conjuntos de datos. Los datos de entrenamiento consistieron en secuencias de familias de virus ARN, incluyendo Orthomyxoviridae, Sedoreoviridae, Spinareoviridae, Retroviridae y Arteriviridae, obtenidas de la base de datos del Centro Nacional para la Información Biotecnológica (NCBI). Los datos de prueba están comprendidos de metaviromas recolectados durante la "Expedición Biológica en Ecosistemas Representativos de Colombia: Bosque húmedo tropical de la Sierra Nevada de Santa Marta", un proyecto financiado por Colciencias en colaboración con el grupo de investigación teórica y computacional RNomica de la Universidad Nacional de Colombia. Ambos conjuntos de datos se transformaron en las representaciones estructurales mencionadas utilizando el paquete ViennaRNA. La representación HIT mostró las mejores características para la extracción, y los modelos basados en HMMs y CNN-LSTM demostraron un rendimiento superior y potencial para clasificar metagenomas de virus ARN. (Texto tomado de la fuente)spa
dc.description.abstractViruses are the most abundant biological entities on Earth, but detecting, isolating, and classifying them has posed a significant challenge for science. Pathogenic RNA viruses cause numerous human deaths, especially those involved in the transmission of zoonotic diseases, leading to viral emergencies and global pandemics like the one associated with SARS-CoV-2. In this study, theoretical frameworks such as extended tree, HIT, and coarse-grained tree are explored and described for RNA viruses, based on levels of sequence and structure. These representations were used to determine which of them demonstrates better potential as inputs for a classification model based on machine learning techniques. For model design, algorithms including multilayer perceptrons, suffix trees, hidden Markov models (HMMs), and convolutional neural networks with short and long-term memory (CNN-LSTM) were investigated. The application of these algorithms was carried out using two datasets. The training data consisted of sequences from families of RNA viruses, including Orthomyxoviridae, Sedoreoviridae, Spinareoviridae, Retroviridae, and Arteriviridae, obtained from the National Center for Biotechnology Information (NCBI) database. The test data comprised metaviromes collected during the "Biological Expedition in Representative Ecosystems of Colombia: Tropical Rainforest of the Sierra Nevada de Santa Marta," a project funded by Colciencias in collaboration with the theoretical and computational research group RNomica at the National University of Colombia. Both datasets were transformed into the mentioned structural representations using the ViennaRNA package. The HIT representation exhibited the most favorable features for extraction, and models based on HMMs and CNN-LSTM demonstrated superior performance and potential for classifying RNA virus metagenomes.eng
dc.description.degreelevelMaestríaspa
dc.description.researchareaTecnologías computacionales en Bioinformáticaspa
dc.format.extent114 páginosspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/84608
dc.language.isospaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.facultyFacultad de Ingenieríaspa
dc.publisher.placeBogotá, Colombiaspa
dc.publisher.programBogotá - Ingeniería - Maestría en Bioinformáticaspa
dc.relation.referencesMarz M, Beerenwinkel N, Drosten C, et al. (2014) Challenges in RNA virus bioinformatics.30(13):1793-1799. doi:10.1093/bioinformatics/btu105spa
dc.relation.referencesVilla, T.G., Abril, A.G., Sanchez, S. et al. Animal and human RNA viruses: genetic variability and ability to overcome vaccines. Arch Microbiol 203, 443–464 (2021). https://doi.org/10.1007/s00203-020-02040-5spa
dc.relation.referencesCobo Paz, V. (2020). Protocolo computacional para la asignaci´on taxon´omica de virus en metadatos gen´omicos. Universidad Nacional de Colombiaspa
dc.relation.referencesMahmoudabadi, G., and Phillips, R. (2018). A comprehensive and quantitative exploration of thousands of viral genomes. eLife, 7, e31955. https://doi.org/10.7554/eLife.31955.spa
dc.relation.referencesStruck D, Lawyer G, Ternes AM, Schmit JC, Bercoff DP (2014). Comet: adaptive context-based modeling for ultrafast hiv-1 subtype identification. Nucleic Acids Res.42(18):e144.spa
dc.relation.referencesWagner, Edward K.; Hewlett, Martinez J. (1999). Basic virology. Malden, MA: Blackwell Science, Inc. p. 249. ISBN 0-632-04299-0.spa
dc.relation.referencesPatton JT (editor). (2008). Segmented Double-stranded RNA Viruses: Structure and Molecular Biology. Caister Academic Press. ISBN 978-1-904455-21-9.spa
dc.relation.referencesMerriam-Webster. (n.d.). Orthomyxoviridae. In Merriam-Webster.com medical dictionary. Retrieved January 19, 2023, from https://www.merriam-webster.com/medical/Orthomyxoviridae.spa
dc.relation.referencesMerriam-Webster. (n.d.). Retroviridae. In Merriam-Webster.com medical dictionary. Retrieved January 19, 2023, from https://www.merriam-webster.com/medical/Retroviridae.spa
dc.relation.referencesArteriviridae - ICTV. (s. f.). Retrieved January 19, 2023, from https://ictv.global/report_9th/RNApos/Nidovirales/Arteriviridae.spa
dc.relation.referencesMatthijnssens et al., (2022) ICTV Virus Taxonomy Profile: Sedoreoviridae, Journal of General Virology (2022) 103:001782.spa
dc.relation.referencesMatthijnssens et al., (2022) ICTV Virus Taxonomy Profile: Spinareoviridae, Journal of General Virology (2022) 103:001781.spa
dc.relation.referencesJabeen A., Ahmad N., Raza K. (2018) Machine Learning-Based Stateof-the-Art Methods for the Classification of RNA-Seq Data. In: Dey N., Ashour A., Borra S. (eds) Classification in BioApps. Lecture Notes in Computational Vision and Biomechanics, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-65981-76.spa
dc.relation.referencesRemita MA, Halioui A, Malick Diouara AA, Daigle B, Kiani G, Diallo AB. (2017 Apr 11) A machine learning approach for viral genome classification. BMC Bioinformatics. 18(1):208. doi:10.1186/s12859-017-1602-3spa
dc.relation.referencesFontana, W., Stadler, P. F., Bornberg-Bauer, E. G., Griesmacher, T., Hofacker, I. L., Tacker, M., Tarazona, P., Weinberger, E. D., & Schuster, P. (1993). RNA folding and combinatory landscapes. Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics, 47(3), 2083–2099. https://doi.org/10.1103/ physreve.47.2083.spa
dc.relation.referencesShapiro B. A. (1988). An algorithm for comparing multiple RNA secondary structures. Computer applications in the biosciences: CABIOS, 4(3), 387–393. https: //doi.org/10.1093/bioinformatics/4.3.387.spa
dc.relation.referencesLorenz, Ronny and Bernhart, Stephan H. and H¨oner zu Siederdissen, Christian and Tafer, Hakim and Flamm, Christoph and Stadler, Peter F. and Hofacker, Ivo L. ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6:1 26, 2011, doi:10.1186/1748- 7188-6-26spa
dc.relation.referencesSikkema, R. S., y Koopmans, M. (2021). Preparing for Emerging Zoonotic Viruses. Encyclopedia of Virology, 256–266. https://doi.org/10.1016/ B978-0-12-814515-9.00150-8.spa
dc.relation.referencesAllen T. Global hotspots and correlates of emerging zoonotic diseases. Nature Communications. 2017;8(1)spa
dc.relation.referencesJones K.E. Global trends in emerging infectious diseases. Nature. 2008;451(7181):990–993.spa
dc.relation.referencesS. Shadab, M. T. Alam Khan, N. A. Neezi, S. Adilina, and S. Shatabda, “DeepDBP: deep neural networks for identification of DNA-binding proteins,” Informatics in Medicine Unlocked, vol. 19, article 100318, 2020.spa
dc.relation.referencesGunasekaran, H., Ramalakshmi, K., Rex Macedo Arokiaraj, A., Deepa Kanmani, S., Venkatesan, C., y Suresh Gnana Dhas, C. (2021). Analysis of DNA Sequence Classification Using CNN and Hybrid Models. Computational and mathematical methods in medicine, 2021, 1835056. https://doi.org/10.1155/2021/1835056.spa
dc.relation.referencesFu, L., Niu, B., Zhu, Z., Wu, S., y Li, W. (2012). CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics (Oxford, England), 28(23), 3150–3152. https://doi.org/10.1093/bioinformatics/bts565.spa
dc.relation.referencesLi, W., y Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (Oxford, England), 22(13), 1658–1659. https://doi.org/10.1093/bioinformatics/btl158.spa
dc.relation.referencesEl Naqa, I., Murphy, M.J. (2015). What Is Machine Learning?. In: El Naqa, I., Li, R., Murphy, M. (eds) Machine Learning in Radiation Oncology. Springer, Cham. https://doi.org/10.1007/978-3-319-18305-3_1.spa
dc.relation.referencesLarsson, A. (2014). AliView: a fast and lightweight alignment viewer and editor for large data sets. Bioinformatics30(22): 3276-3278. http://dx.doi.org/10.1093/ bioinformatics/bt.spa
dc.relation.referencesPedro Larrañaga, Borja Calvo, Roberto Santana, Concha Bielza, Josu Galdiano, I˜naki Inza, Jose A. Lozano, Ruben Armananzas, Guzman Santafe, Aritz Perez, Victor Robles, Machine learning in bioinformatics, Briefings in Bioinformatics, Volume 7, Issue 1, March 2006, Pages 86–112, https://doi.org/10.1093/bib/bbk007.spa
dc.relation.referencesShastry, K.A., Sanjay, H.A. (2020). Machine Learning for Bioinformatics. In: Srinivasa, K., Siddesh, G., Manisekhar, S. (eds) Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/ 978-981-15-2445-5_3.spa
dc.relation.referencesSamuel AL. Some studies in machine learning using the game of checkers. IBM J Res Dev. 1959;3(3):210–229.spa
dc.relation.referencesQifang Bi, Katherine E Goodman, Joshua Kaminsky, Justin Lessler, What is Machine Learning? A Primer for the Epidemiologist, American Journal of Epidemiology, Volume 188, Issue 12, December 2019, Pages 2222–2239, https://doi.org/10.1093/ aje/kwz189.spa
dc.relation.referencesMadhu Chetty, Jennifer Hallinan, Gonzalo A. Ruz, Anil Wipat, Computational intelligence and machine learning in bioinformatics and computational biology, Biosystems, Volume 222, 2022, 104792, ISSN 0303-2647, https://doi.org/10.1016/j. biosystems.2022.104792.spa
dc.relation.referencesHennig C, Meila M, Murtagh F, et al. Handbook of Cluster Analysis. 1st ed. Boca Raton, FL: CRC Press; 2015:34.spa
dc.relation.referencesBishop CM. Pattern Recognition and Machine Learning. 1st ed. New York, NY: Springer Publishing Compnay; 2006:424.spa
dc.relation.referencesBellett, A. J. D. (1967). Preliminary classification of viruses based on quantitative comparisons of viral nucleic acids. Journal of Virology, 1(2), 245-259.spa
dc.relation.referencesLWOFF, A., & TOURNIER, P. (1971). Remarks on the Classification of Viruses. Comparative Virology, 1–42. https://doi.org/10.1016/B978-0-12-470260-8. 50006-3.spa
dc.relation.referencesLibretexts. (2021, 3 enero). 9.8A: Positive-Strand RNA Viruses of Animals. Biology LibreTexts. https://bio.libretexts.org/Bookshelves/Microbiology/ Microbiology_(Boundless)/09:_Viruses.spa
dc.relation.referencesPatton JT, ed. (2008). Segmented Double-stranded RNA Viruses: Structure and Molecular Biology. Caister Academic Press. ISBN 978-1-904455-21-9.spa
dc.relation.referencesSanjuan, R., Nebot, M. R., Chirico, N., Mansky, L. M., & Belshaw, R. (2010). Viral mutation rates. Journal of virology, 84(19), 9733-9748.spa
dc.relation.referencesKlein DW, Prescott LM, Harley J (1993). Microbiology. Dubuque, Iowa: Wm. C. Brown. ISBN 978-0-697-01372-9.spa
dc.relation.referencesDomingo E. (1997). Rapid evolution of viral RNA genomes. The Journal of nutrition, 127(5 Suppl), 958S–961S. https://doi.org/10.1093/jn/127.5.958S.spa
dc.relation.referencesRNA: The Versatile Molecule. (s. f.). https://learn.genetics.utah.edu/ content/basics/rna/spa
dc.relation.referencesMolnar, C. (2015, 14 mayo). 9.1 The Structure of DNA – Concepts of Biology – 1st Canadian Edition. Pressbooks. https://opentextbc.ca/biology/chapter/ 9-1-the-structure-of-dna/spa
dc.relation.referencesBerg, J. M., Tymoczko, J. L., Stryer, L., & National Center for Biotechnology Information (U.S.). (2002). Biochemistry, Fifth Edition. W. H. Freeman.spa
dc.relation.referencesDaros, J. A., Elena, S. F., & Flores, R. (2006). Viroids: an Ariadne’s thread into the RNA labyrinth. EMBO reports, 7(6), 593–598. https://doi.org/10.1038/sj.embor. 7400706.spa
dc.relation.referencesPayne S. (2017). Introduction to RNA Viruses. Viruses, 97–105. https://doi.org/ 10.1016/B978-0-12-803109-4.00010-6.spa
dc.relation.referencesWang D, Farhana A. Biochemistry, RNA Structure. [Updated 2022 May 8]. In: Stat- Pearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2022 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK558999/.spa
dc.relation.referencesWooley, J. C., Godzik, A., & Friedberg, I. (2010). A primer on metagenomics. PLoS computational biology, 6(2), e1000667. https://doi.org/10.1371/journal. pcbi.1000667.spa
dc.relation.referencesPaez-Espino, D., Eloe-Fadrosh, E. A., Pavlopoulos, G. A., Thomas, A. D., Huntemann, M., Mikhailova, N., Rubin, E., Ivanova, N. N., & Kyrpides, N. C. (2016). Uncovering Earth’s virome. Nature, 536(7617), 425–430. https://doi.org/10.1038/ nature19094.spa
dc.relation.referencesMetagenomics. (s. f.). En Metagenomics. Recuperado 24 de febrero de 2023, de https://en.wikipedia.org/wiki/Metagenomics#Virusesspa
dc.relation.referencesStaden R. (1979). A strategy of DNA sequencing employing computer programs. Nucleic acids research, 6(7), 2601–2610. https://doi.org/10.1093/nar/6.7.2601.spa
dc.relation.referencesEdwards, R., Rohwer, F. Viral metagenomics. Nat Rev Microbiol 3, 504–510 (2005). https://doi.org/10.1038/nrmicro1163.spa
dc.relation.referencesKristensen DM, Mushegian AR, Dolja VV, Koonin EV. New dimensions of the virus world discovered through metagenomics. Trends Microbiol. 2010 Jan;18(1):11- 9. doi: 10.1016/j.tim.2009.11.003. Epub 2009 Nov 26. PMID: 19942437; PMCID: PMC3293453.spa
dc.relation.referencesDelwart, E. L. (2007). Viral metagenomics. Reviews in medical virology, 17(2), 115- 131.spa
dc.relation.referencesSommers, P., Chatterjee, A., Varsani, A., & Trubl, G. (2021). Integrating Viral Metagenomics into an Ecological Framework. Annual review of virology, 8(1), 133–158. https://doi.org/10.1146/annurev-virology-010421-053015.spa
dc.relation.referencesGrasis J. A. (2018). Host-Associated Bacteriophage Isolation and Preparation for Viral Metagenomics. Methods in molecular biology (Clifton, N.J.), 1746, 1–25. https: //doi.org/10.1007/978-1-4939-7683-6_1.spa
dc.relation.referencesAlavandi SV, Poornima M. Viral metagenomics: a tool for virus discovery and diversity in aquaculture. Indian J Virol. 2012 Sep;23(2):88-98. doi: 10.1007/s13337-012- 0075-2. Epub 2012 Aug 14. PMID: 23997432; PMCID: PMC3550753.spa
dc.relation.referencesSievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, S¨oding J, Thompson JD, Higgins DG. Fast, scalable generation of highquality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol. 2011 Oct 11;7:539. doi: 10.1038/msb.2011.75. PMID: 21988835.spa
dc.relation.referencesSievers, F. and Higgins, D.G. (2018), Clustal Omega for making accurate alignments of many protein sequences. Protein Science, 27: 135-145. https://doi.org/10.1002/ pro.3290.spa
dc.relation.referencesHofacker, Ivo & Stadler, Peter. (2006). RNA Secondary Structures. 10.1002/3527600906.mcb.200500009.spa
dc.relation.referencesIUPAC. Compendium of Chemical Terminology, 2nd ed. (the Gold Book). Compiled by A. D. McNaught and A. Wilkinson. Blackwell Scientific Publications, Oxford (1997). Online version (2019-) created by S. J. Chalk. ISBN 0-9678550-9-8. https://doi.org/ 10.1351/goldbook.spa
dc.relation.referencesJones, C. P., & Ferr´e-D’Amar´e, A. R. (2015). RNA quaternary structure and global symmetry. Trends in biochemical sciences, 40(4), 211–220. https://doi.org/10.1016/ j.tibs.2015.02.004.spa
dc.relation.referencesXia, T., SantaLucia, J., Jr, Burkard, M. E., Kierzek, R., Schroeder, S. J., Jiao, X., Cox, C., & Turner, D. H. (1998). Thermodynamic parameters for an expanded nearestneighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry, 37(42), 14719–14735. https://doi.org/10.1021/bi9809425.spa
dc.relation.referencesMathews, D. H., Disney, M. D., Childs, J. L., Schroeder, S. J., Zuker, M., & Turner, D. H. (2004). Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proceedings of the National Academy of Sciences, 101(19), 7287-7292.spa
dc.relation.referencesZuker M. (1989). On finding all suboptimal foldings of an RNA molecule. Science (New York, N.Y.), 244(4900), 48–52. https://doi.org/10.1126/science.2468181.spa
dc.relation.referencesSato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat Commun 12, 941 (2021). https: //doi.org/10.1038/s41467-021-21194-4.spa
dc.relation.referencesGruber, A. R., Findeiß, S., Washietl, S., Hofacker, I. L., & Stadler, P. F. (2010). RNAz 2.0: improved noncoding RNA detection. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 69–79.spa
dc.relation.referencesWashietl, S., Hofacker, I. L., & Stadler, P. F. (2005). Fast and reliable prediction of noncoding RNAs. Proceedings of the National Academy of Sciences of the United States of America, 102(7), 2454–2459. https://doi.org/10.1073/pnas.0409169102.spa
dc.relation.referencesRoux, S., Enault, F., Hurwitz, B. L., & Sullivan, M. B. (2015). VirSorter: mining viral signal from microbial genomic data. PeerJ, 3, e985. https://doi.org/10.7717/ peerj.985.spa
dc.relation.referencesThomas, T., Gilbert, J., & Meyer, F. (2012). Metagenomics - a guide from sampling to data analysis. Microbial informatics and experimentation, 2(1), 3. https://doi. org/10.1186/2042-5783-2-3.spa
dc.relation.referencesHofacker, I. L., Fekete, M., & Stadler, P. F. (2002). Secondary structure prediction for aligned RNA sequences. Journal of molecular biology, 319(5), 1059–1066. https: //doi.org/10.1016/S0022-2836(02)00308-X.spa
dc.relation.referencesWashietl, S., & Hofacker, I. L. (2004). Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. Journal of molecular biology, 342(1), 19–30. https://doi.org/10.1016/j.jmb.2004.07.018.spa
dc.relation.referencesChiu, D. K., & Kolodziejczak, T. (1991). Inferring consensus structure from nucleic acid sequences. Computer applications in the biosciences : CABIOS, 7(3), 347–352. https://doi.org/10.1093/bioinformatics/7.3.347.spa
dc.relation.referencesGutell, R. R., & Woese, C. R. (1990). Higher order structural elements in ribosomal RNAs: pseudo-knots and the use of noncanonical pairs. Proceedings of the National Academy of Sciences of the United States of America, 87(2), 663–667. https://doi. org/10.1073/pnas.87.2.663.spa
dc.relation.referencesGutell, R. R., Power, A., Hertz, G. Z., Putz, E. J., & Stormo, G. D. (1992). Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic acids research, 20(21), 5785–5795.https://doi.org/10.1093/nar/20.21.5785.spa
dc.relation.referencesShang, L., Xu, W., Ozer, S., & Gutell, R. R. (2012). Structural constraints identified with covariation analysis in ribosomal RNA. PLoS One, 7(6), e39383.spa
dc.relation.referencesWaggener, Bill (1995). Pulse Code Modulation Techniques. Springer. p. 206. ISBN 9780442014360.spa
dc.relation.referencesI.L. Hofacker, W. Fontana, P.F. Stadler, S. Bonhoeffer, M. Tacker, P. Schuster (1994), ”Fast Folding and Comparison of RNA Secondary Structures”, Monatshefte f. Chemie: 125, pp 167-188spa
dc.relation.referencesZuker, M., & Stiegler, P. (1981). Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic acids research, 9(1), 133–148. https://doi.org/10.1093/nar/9.1.133.spa
dc.relation.referencesHofacker I. L. (2003). Vienna RNA secondary structure server. Nucleic acids research, 31(13), 3429–3431. https://doi.org/10.1093/nar/gkg599.spa
dc.relation.referencesGeron, A. (2022). Hands-on machine learning with Scikit-Learn, Keras, and Tensor- Flow. O’Reilly Media, Inc.spa
dc.relation.referencesSen, P.C., Hajra, M., Ghosh, M. (2020). Supervised Classification Algorithms in Machine Learning: A Survey and Review. In: Mandal, J., Bhattacharya, D. (eds) Emerging Technology in Modelling and Graphics. Advances in Intelligent Systems and Computing, vol 937. Springer, Singapore. https://doi.org/10.1007/978-981-13-7403-6_ 11.spa
dc.relation.referencesKotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160(1), 3-24.spa
dc.relation.referencesKotsiantis, S. (2011). Feature selection for machine learning classification problems: a recent overview. Artificial Intelligence Review, 42(1), 157-176.spa
dc.relation.referencesViral Genomes in Nature. (2021, January 3). Boundless. https://bio.libretexts. org/@go/page/9330.spa
dc.relation.referencesVujovic, Z. (2021). Classification model evaluation metrics. International Journal of Advanced Computer Science and Applications, 12(6), 599-606.spa
dc.relation.referencesA short Tutorial on RNA Bioinformatics. The ViennaRNA Package and related Programs. (s. f.). Recuperado 10 de abril de 2023, de https://algosb2019.sciencesconf. org/data/RNAtutorial.pdf.spa
dc.relation.referencesMcQuarrie, A. (2000). Statistical Mechanics. Sausalito, CA: University Science Books.spa
dc.relation.referencesRaschka, S. (2017). Machine Learning. University of Wisconsin–Madison. Department of Statistics. Recuperado 11 de abril de 2023, de https://sebastianraschka. com/pdf/lecture-notes/stat479fs18/02_knn_notes.pdf.spa
dc.relation.referencesWikipedia contributors. (2023, March 31). K-nearest neighbors algorithm. In Wikipedia, The Free Encyclopedia. Retrieved 16:44, April 11, 2023, from https://en.wikipedia.org/w/index.php?title=K-nearest_neighbors_ algorithm&oldid=1147498657.spa
dc.relation.referencesLandau, S., Leese, M., Stahl, D., & Everitt, B. S. (2011). Cluster analysis. John Wiley & Sons.spa
dc.relation.references1.4. Support Vector Machines. (s. f.). scikit-learn. https://scikit-learn.org/ stable/modules/svm.htmlspa
dc.relation.referencesWikipedia contributors. (2023, March 12). Support vector machine. In Wikipedia, The Free Encyclopedia. Retrieved 22:16, April 11, 2023, from https://en.wikipedia. org/w/index.php?title=Support_vector_machine&oldid=1144271534.spa
dc.relation.referencesBoser, B. E., Guyon, I. M., & Vapnik, V. N. (1992, July). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on Computational learning theory (pp. 144-152).spa
dc.relation.referencesSong, Y. Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry, 27(2), 130–135. https://doi.org/10. 11919/j.issn.1002-0829.215044.spa
dc.relation.referencesHastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning (2nd ed.). Springer. ISBN 0-387-95284-5.spa
dc.relation.referencesHaykin, S. S. (2009). Neural networks and learning machines. Upper Saddle River, NJ: Pearson Education.spa
dc.relation.referencesOjha, V. K., Abraham, A., & Snasel, V. (2017). Metaheuristic design of feedforward neural networks: A review of two decades of research. Engineering Applications of Artificial Intelligence, 60, 97-116.spa
dc.relation.referencesPedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.spa
dc.relation.referencesWikipedia contributors. (2023, March 2). Suffix tree. In Wikipedia, The Free Encyclopedia. Retrieved April 27 2023, from https://en.wikipedia.org/w/index.php? title=Suffix_tree&oldid=1142499280.spa
dc.relation.referencesUkkonen, E. (1995). On-line construction of suffix trees. Algorithmica, 14(3), 249- 260.spa
dc.relation.referencesWikipedia contributors. (2023, March 22). Hidden Markov model. In Wikipedia, The Free Encyclopedia. Retrieved April 28 2023, from https://en.wikipedia.org/ w/index.php?title=Hidden_Markov_model&oldid=1146111455.spa
dc.relation.referencesBlunsom, P. (2004). Hidden markov models. Lecture notes, August, 15(18-19), 48.spa
dc.relation.referencesYoon B. J. (2009). Hidden Markov Models and their Applications in Biological Sequence Analysis. Current genomics, 10(6), 402–415. https://doi.org/10.2174/ 138920209789177575.spa
dc.relation.referencesM. Przytycka, & Zheng, J. (2003). Encyclopedia of Life Sciences: Hidden Markov Models (TM in Nature Encyclopedia of the Human Genome Nature Publishing Group, Ed.). NCBI. Recuperado 28 de abril de 2023, de https://www.ncbi.nlm.nih.gov/ CBBresearch/Przytycka/index.cgi#publications.spa
dc.relation.referencesNelwamondo, F. V., Marwala, T., & Mahola, U. (2006). Early classifications of bearing faults using hidden Markov models, Gaussian mixture models, mel-frequency cepstral coefficients and fractals. International Journal of Innovative Computing, Information and Control, 2(6), 1281-1299.spa
dc.relation.referencesRyan, M. S., & Nudd, G. R. (1993). The viterbi algorithm.spa
dc.relation.referencesMuller, M. (2015). Fundamentals of music processing: Audio, analysis, algorithms, applications (Vol. 5, Pages 237-301). Cham: Springer.spa
dc.relation.referencesIan Goodfellow and Yoshua Bengio and Aaron Courville (2016). Deep Learning. MIT Press. p. 326.spa
dc.relation.referencesWikipedia contributors. (2023, April 30). Convolutional neural network. In Wikipedia, The Free Encyclopedia. Retrieved April 30, 2023, from https://en.wikipedia. org/w/index.php?title=Convolutional_neural_network&oldid=1152491486.spa
dc.relation.referencesMishra, M. (2021, 15 diciembre). Convolutional Neural Networks, Explained - Towards Data Science. Medium. https://towardsdatascience.com/ convolutional-neural-networks-explained-9cc5188c4939.spa
dc.relation.referencesHochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735spa
dc.relation.referencesGers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: continual prediction with LSTM. Neural computation, 12(10), 2451–2471. https://doi.org/10. 1162/089976600300015015.spa
dc.relation.referencesHochreiter, S., & Schmidhuber, J. (1996). LSTM can solve hard long time lag problems. Advances in neural information processing systems, 9.spa
dc.relation.referencesBrownlee, J. (2019). CNN Long Short-Term Memory Networks. https:// machinelearningmastery.com/cnn-long-short-term-memory-networks/.spa
dc.relation.referencesZill, D., & Shanahan, P. (2009). A First Course in Complex Analysis with Applications. Jones & Bartlett Learning.spa
dc.relation.referencesLefkowitz, E. J., Dempsey, D. M., Hendrickson, R. C., Orton, R. J., Siddell, S. G., & Smith, D. B. (2018). Virus taxonomy: the database of the International Committee on Taxonomy of Viruses (ICTV). Nucleic acids research, 46(D1), D708-D717.spa
dc.relation.referencesKing, A. M., Adams, M. J., Carstens, E. B., & Lefkowitz, E. J. (2012). Virus taxonomy. Ninth report of the International Committee on Taxonomy of Viruses, 9.spa
dc.relation.referencesSimmonds, P. (2015). Methods for virus classification and the challenge of incorporating metagenomic sequence data. Journal of General Virology, 96(6), 1193-1206.spa
dc.relation.referencesForterre, P. (2010). Giant viruses: conflicts in revisiting the virus concept. Intervirology, 53(5), 362-378.spa
dc.relation.referencesLwoff, A. (1959). Factors influencing the evolution of viral diseases at the cellular level and in the organism. Bacteriological reviews, 23(3), 109-124.spa
dc.relation.referencesYamada, T. (2011). Giant viruses in the environment: their origins and evolution. Current opinion in virology, 1(1), 58-62.spa
dc.relation.referencesDoolittle, R. F., & Feng, D. F. (1992). Tracing the origin of retroviruses. Genetic Diversity of RNA Viruses, 195-211.spa
dc.relation.referencesTemin, H. M. (1970). Malignant transformation of cells by viruses. Perspectives in biology and medicine, 14(1), 11-26.spa
dc.relation.referencesIllangasekare, M., Sanchez, G., Nickles, T., & Yarus, M. (1995). Aminoacyl-RNA synthesis catalyzed by an RNA. Science, 267(5198), 643-647.spa
dc.relation.referencesGilbert, W. (1986). Origin of life: The RNA world. nature, 319(6055), 618-618.spa
dc.relation.referencesLi, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics (Oxford, England), 31(10), 1674–1676. https: //doi.org/10.1093/bioinformatics/btv033.spa
dc.relation.referencesLi, D., Luo, R., Liu, C. M., Leung, C. M., Ting, H. F., Sadakane, K., Yamashita, H., & Lam, T. W. (2016). MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods (San Diego, Calif.), 102, 3–11. https://doi.org/10.1016/j.ymeth.2016.02.020.spa
dc.relation.referencesXiong, J. (2006). Protein Motifs and Domain Prediction. In Essential Bioinformatics (pp. 85-94). Cambridge: Cambridge UniversityPress.doi:10.1017/ CBO9780511806087.008.spa
dc.relation.referencesIqbal, T., Elahi, A., Wijns, W., & Shahzad, A. (2022). Exploring Unsupervised Machine Learning Classification Methods for Physiological Stress Detection. Frontiers in medical technology, 4, 782756. https://doi.org/10.3389/fmedt.2022.782756.spa
dc.relation.referencesMock, F., Kretschmer, F., Kriese, A., B¨ocker, S., & Marz, M. (2022). Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks. Proceedings of the National Academy of Sciences, 119(35), e2122636119.spa
dc.relation.referencesShang, J., & Sun, Y. (2021). CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning. Methods (San Diego, Calif.), 189, 95–103. https://doi.org/10.1016/j.ymeth.2020.05.018.spa
dc.relation.referencesDevlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint ar- Xiv:1810.04805.spa
dc.relation.referencesHuson, D. H., Auch, A. F., Qi, J., & Schuster, S. C. (2007). MEGAN analysis of metagenomic data. Genome research, 17(3), 377-386.spa
dc.relation.referencesMikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.spa
dc.relation.referencesZerbino, D. R., & Birney, E. (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome research, 18(5), 821–829. https://doi.org/ 10.1101/gr.074492.107.spa
dc.relation.referencesCompeau, P. E., Pevzner, P. A., & Tesler, G. (2011). How to apply de Bruijn graphs to genome assembly. Nature biotechnology, 29(11), 987–991. https://doi.org/10. 1038/nbt.2023.spa
dc.relation.referencesMartin, J.,Wang, Z. (2011) Next-generation transcriptome assembly. Nat Rev Genet 12, 671–682. https://doi.org/10.1038/nrg3068.spa
dc.relation.referencesDamelin, S. B., & Miller Jr, W. (2012). The mathematics of signal processing (No. 48). Cambridge University Press.spa
dc.relation.referencesWikipedia contributors (2023) Convolution. In Wikipedia, The Free Encyclopedia. Retrieved May 25, 2023, from https://en.wikipedia.org/w/index.php?title= Convolution&oldid=1155936911.spa
dc.relation.referencesBudach, S., & Marsico, A. (2018). pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics (Oxford, England), 34(17), 3035–3037. https://doi.org/10.1093/ bioinformatics/bty222.spa
dc.relation.referencesGelderblom, H. R. (1996). Structure and Classification of Viruses. In S. Baron (Ed.), Medical Microbiology. (4th ed.). University of Texas Medical Branch at Galveston.spa
dc.relation.referencesLouten J. (2016). Virus Structure and Classification. Essential Human Virology, 19–29. https://doi.org/10.1016/B978-0-12-800947-5.00002-8.spa
dc.relation.referencesAjami, N. J., Wong, M. C., Ross, M. C., Lloyd, R. E., & Petrosino, J. F. (2018). Maximal viral information recovery from sequence data using VirMAP. Nature communications, 9(1), 3205. https://doi.org/10.1038/s41467-018-05658-8.spa
dc.relation.referencesLin, J., Kramna, L., Autio, R., Hy¨oty, H., Nykter, M., & Cinek, O. (2017). Vipie: web pipeline for parallel characterization of viral populations from multiple NGS samples. BMC genomics, 18(1), 378. https://doi.org/10.1186/s12864-017-3721-7.spa
dc.relation.referencesLin, H. H., & Liao, Y. C. (2017). drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes. GigaScience, 6(2), 1–10. https://doi. org/10.1093/gigascience/gix003.spa
dc.relation.referencesRampelli, S., Soverini, M., Turroni, S., Quercia, S., Biagi, E., Brigidi, P., & Candela, M. (2016). ViromeScan: a new tool for metagenomic viral community profiling. BMC genomics, 17, 165. https://doi.org/10.1186/s12864-016-2446-3.spa
dc.relation.referencesSegata, N.,Waldron, L., Ballarini, A., Narasimhan, V., Jousson, O., & Huttenhower, C. (2012). Metagenomic microbial community profiling using unique clade-specific marker genes. Nature methods, 9(8), 811–814. https://doi.org/10.1038/nmeth.2066.spa
dc.relation.referencesTithi, S. S., Aylward, F. O., Jensen, R. V., & Zhang, L. (2018). FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data. PeerJ, 6, e4227. https://doi.org/10.7717/peerj.4227.spa
dc.relation.referencesYamashita, A., Sekizuka, T., & Kuroda, M. (2016). VirusTAP: Viral Genome- Targeted Assembly Pipeline. Frontiers in microbiology, 7, 32. https://doi.org/10. 3389/fmicb.2016.00032.spa
dc.relation.referencesZhao, G., Wu, G., Lim, E. S., Droit, L., Krishnamurthy, S., Barouch, D. H., Virgin, H. W., & Wang, D. (2017). VirusSeeker, a computational pipeline for virus discovery and virome composition analysis. Virology, 503, 21–30. https://doi.org/10.1016/j. virol.2017.01.005.spa
dc.relation.referencesMenzel, P., Ng, K. L., & Krogh, A. (2016). Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature communications, 7, 11257. https: //doi.org/10.1038/ncomms11257.spa
dc.relation.referencesWood, D. E., & Salzberg, S. L. (2014). Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome biology, 15(3), R46. https://doi.org/ 10.1186/gb-2014-15-3-r46.spa
dc.relation.referencesFiers, Walter & Contreras, Roland & Duerinck, Fred & Haegeman, Guy & Iserentant, Dirk & Merregaert, Joseph & Jou, Willy & Molemans, Francis & Raeymaekers, Alex & Berghe, A & Volckaert, Guido & Ysebaert, Marc. (1976). Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature. 260. 500-7. 10.1038/260500a0.spa
dc.relation.referencesSanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A. R., Fiddes, C. A., Hutchison, C. A., Slocombe, P. M., & Smith, M. (1977). Nucleotide sequence of bacteriophage phi X174 DNA. Nature, 265(5596), 687–695. https://doi.org/10.1038/ 265687a0.spa
dc.relation.referencesCobbin, J. C., Charon, J., Harvey, E., Holmes, E. C., & Mahar, J. E. (2021). Current challenges to virus discovery by meta-transcriptomics. Current Opinion in Virology, 51, 48-55.spa
dc.relation.referencesBashiardes, S., Zilberman-Schapira, G., & Elinav, E. (2016). Use of Metatranscriptomics in Microbiome Research. Bioinformatics and biology insights, 10, 19–25. https://doi.org/10.4137/BBI.S34610.spa
dc.relation.referencesAguiar-Pulido, V., Huang, W., Suarez-Ulloa, V., Cickovski, T., Mathee, K., & Narasimhan, G. (2016). Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis. Evolutionary bioinformatics online, 12(Suppl 1), 5–16. https://doi.org/10.4137/EBO.S36436.spa
dc.relation.referencesKelly, D., Yang, L., & Pei, Z. (2017). A review of the oesophageal microbiome in health and disease. Methods in microbiology, 44, 19-35.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.decsTransmisión de enfermedad infecciosaspa
dc.subject.decsDisease Transmission, Infectiouseng
dc.subject.decsARN viralspa
dc.subject.decsZoonosis viralesspa
dc.subject.decsViral Zoonoseseng
dc.subject.lembRNA, Viralspa
dc.subject.proposalVirus ARNspa
dc.subject.proposalMetagenómicaspa
dc.subject.proposalMetavirómicaspa
dc.subject.proposalAprendizaje de máquinaspa
dc.subject.proposalEstructuras secundariasspa
dc.subject.proposalClasificaciónspa
dc.subject.proposalRNA viruseseng
dc.subject.proposalMetagenomicseng
dc.subject.proposalMetaviromicseng
dc.subject.proposalMachine learningeng
dc.subject.proposalSecondary structureseng
dc.subject.proposalClassificationeng
dc.titleModelo basado en técnicas de machine learning para la clasificación de virus de ARNspa
dc.title.translatedModel based on machine learning techniques for the classification of RNA viruseseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1020808077.2023.pdf
Tamaño:
2.92 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Bioinformática

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: