Modelo de inteligencia artificial para realizar gap filling en ensambles de reads cortos de genomas de Acinetobacter baumannii

dc.contributor.advisorBarreto Hernández, Emiliano
dc.contributor.authorNavas Luquez, Mateo
dc.contributor.researchgroupBioinformática
dc.date.accessioned2026-02-10T13:57:47Z
dc.date.available2026-02-10T13:57:47Z
dc.date.issued2025
dc.descriptionIlustraciones, diagramas, gráficosspa
dc.description.abstractLas técnicas de secuenciación masiva de nueva generación (NGS) fueron revolucionarias en el campo de la genómica y en el proceso de secuenciación de genomas completos (WGS), debido a que permiten secuenciar un volumen de datos con gran profundidad, y a un costo asequible. Estos procedimientos implementan secuenciación masiva de reads cortos, que permiten la lectura en paralelo de todo el genoma. Pese la alta capacidad de lectura, los reads cortos son fragmentos pequeños de secuencias mucho más grandes. Por ello, las técnicas de ensamblaje son fundamentales en la extensión de contigs y orientación de scaffolds, para obtener genomas completos. No obstante, los ensambladores para secuenciación de reads cortos NGS presentan limitaciones técnicas y teóricas asociadas a las regiones repetitivas o de baja complejidad presentes en los genomas. Estas regiones limitan los cálculos de orientación entre scaffols, lo que genera huecos o gaps en los procesos de ensamblaje. Pese a las limitantes del ensamblaje con reads cortos NGS, el numero proyectos de secuenciación y la disponibilidad de datos de WGS siguen en aumento debido a que es altamente costo efectivo. Uno de los organismos con gran crecimiento en el numero de proyectos de secuenciación es la bacteria multirresistente Acinetobacter baumannii. Debido a que es riesgo para la salud pública mundial dado a su capacidad para sobrevivir en ambientes hospitalarios y de generar infecciones graves. Con el fin de mejorar los procesos de calidad en los ensamblajes y aprovechar el gran número de datos de secuenciación se propone implementar metodologías de inteligencia artificial, para entrenar modelos capaces de cerrar huecos en los ensamblajes de novo de Acinetobacter baumannii, que implementen metodologías de reads cortos. (Texto tomado de la fuente)spa
dc.description.abstractNext-generation sequencing (NGS) techniques were revolutionary in the field of genomics and in the whole-genome sequencing (WGS) process, because they allow for the sequencing of a large volume of data with great depth and at an affordable cost. These procedures implement massive sequencing of short reads, which allow for the parallel reading of the entire genome. Despite the high reading capacity, short reads are small fragments of much larger sequences. Therefore, assembly techniques are fundamental in the extension of contigs and the orientation of scaffolds to obtain complete genomes. However, assemblers for short-read NGS sequencing have technical and theoretical limitations associated with repetitive or low-complexity regions present in the genomes. These regions limit the calculations of orientation between scaffolds, generating gaps in the assembly processes. Despite the limitations of short read NGS assembly, the number of sequencing projects and the availability of WGS data continue to increase because it is highly cost-effective. One of the organisms with a significant increase in the number of sequencing projects is the multidrug-resistant bacterium Acinetobacter baumannii. This is because it poses a risk to global public health due to its ability to survive in hospital environments and cause serious infections. In order to improve quality processes in assemblies and take advantage of the large amount of sequencing data, it is proposed to implement artificial intelligence methodologies to train models capable of closing gaps in de novo Acinetobacter baumannii assemblies that implement short read methodologies.eng
dc.description.degreelevelMaestría
dc.description.degreenameMaestro en Bioinformatica
dc.description.researchareaBioinformática funcional y estructural
dc.format.extentxi, 71 páginas
dc.format.mimetypeapplication/pdf
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/89446
dc.language.isospa
dc.publisherUniversidad Nacional de Colombia
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotá
dc.publisher.facultyFacultad de Ingeniería
dc.publisher.placeBogotá, Colombia
dc.publisher.programBogotá - Ingeniería - Maestría en Bioinformática
dc.relation.referencesAli, M., Dewan, A., Sahu, A. K., & Taye, M. M. (2023). Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers 2023, Vol. 12, Page 91, 12(5), 91. https://doi.org/10.3390/COMPUTERS12050091
dc.relation.referencesAli, Y. A., Awwad, E. M., Al-Razgan, M., & Maarouf, A. (2023). Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes 2023, Vol. 11, Page 349, 11(2), 349. https://doi.org/10.3390/PR11020349
dc.relation.referencesBadillo, S., Banfai, B., Birzele, F., Davydov, I. I., Hutchinson, L., Kam-Thong, T., Siebourg-Polster, J., Steiert, B., & Zhang, J. D. (2020). An Introduction to Machine Learning. Clinical Pharmacology and Therapeutics, 107(4), 871. https://doi.org/10.1002/CPT.1796
dc.relation.referencesBayat, A., Deshpande, N. P., Wilkins, M. R., & Parameswaran, S. (2020). Fast Short Read De-Novo Assembly Using Overlap-Layout-Consensus Approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(1), 334–338. https://doi.org/10.1109/TCBB.2018.2875479
dc.relation.referencesBoetzer, M., & Pirovano, W. (2012). Toward almost closed genomes with GapFiller. Genome Biology, 13(6). https://doi.org/10.1186/GB-2012-13-6-R56
dc.relation.referencesBolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114. https://doi.org/10.1093/BIOINFORMATICS/BTU170
dc.relation.referencesBush, S. J., Foster, D., Eyre, D. W., Clark, E. L., de Maio, N., Shaw, L. P., Stoesser, N., Peto, T. E. A., Crook, D. W., & Walker, A. S. (2020a). Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines. GigaScience, 9(2), 1–21. https://doi.org/10.1093/GIGASCIENCE/GIAA007
dc.relation.referencesCalin, O. (2020). Deep Learning Architectures. https://doi.org/10.1007/978-3-030 36721-3
dc.relation.referencesCappelletti, L., Fontana, T., Di Donato, G. W., Di Tucci, L., Casiraghi, E., & Valentini, G. (2020). Complex data imputation by auto-encoders and convolutional neural networks—A case study on genome gap-filling. Computers, 9(2). https://doi.org/10.3390/computers9020037
dc.relation.referencesChen, A., Field, M., Bhattacharya, A., Nabeel Asim Muhammad, M., Nabeel Asim, M., Ali Ibrahim, M., Zaib, A., & Dengel, A. (2025). DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models. Frontiers in Medicine, 12, 1503229. https://doi.org/10.3389/FMED.2025.1503229
dc.relation.referencesChen, E., Chu, J., Zhang, J., Warren, R., & Birol, I. (2021). GapPredict A Language Model for Resolving Gaps in Draft Genome Assemblies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. https://doi.org/10.1109/TCBB.2021.3109557
dc.relation.referencesChen, Y., Wang, G., & Zhang, T. (2024). Utilizing Deep Neural Networks to Fill Gaps in Small Genomes. International Journal of Molecular Sciences 2024, Vol. 25, Page 8502, 25(15), 8502. https://doi.org/10.3390/IJMS25158502
dc.relation.referencesChu, C., Li, X., & Wu, Y. (2019). GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads. BMC Genomics, 20(Suppl 5). https://doi.org/10.1186/S12864-019-5703-4
dc.relation.referencesCoombe, L., Li, J. X., Lo, T., Wong, J., Nikolic, V., Warren, R. L., & Birol, I. (2021). LongStitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859-021-04451-7
dc.relation.referencesCoombe, L., Nikolić, V., Chu, J., Birol, I., & Warren, R. L. (2020). NtJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs. Bioinformatics, 36(12), 3885–3887. https://doi.org/10.1093/bioinformatics/btaa253
dc.relation.referencesCrossley, B. M., Bai, J., Glaser, A., Maes, R., Porter, E., Killian, M. L., Clement, T., & Toohey-Kurth, K. (2020). Guidelines for Sanger sequencing and molecular assay monitoring. Journal of Veterinary Diagnostic Investigation, 32(6), 767 775. https://doi.org/10.1177/1040638720905833/ASSET/IMAGES/LARGE/10.1177_ 1040638720905833-FIG1.JPEG
dc.relation.referencesCuber, P., Chooneea, D., Geeves, C., Salatino, S., Creedy, T. J., Griffin, C., Sivess, L., Barnes, I., Price, B., & Misra, R. (2023). Comparing the accuracy and efficiency of third generation sequencing technologies, Oxford Nanopore Technologies, and Pacific Biosciences, for DNA barcode sequencing applications. Ecological Genetics and Genomics, 28, 100181. https://doi.org/10.1016/J.EGG.2023.100181
dc.relation.referencesDai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2020). Transformer-XL: Attentive language models beyond a fixed-length context. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 2978–2988. https://doi.org/10.18653/v1/p19 1285
dc.relation.referencesDarby, C. A., Gaddipati, R., Schatz, M. C., & Langmead, B. (2020). Vargas: heuristic free alignment for assessing linear and graph read aligners. Bioinformatics, 36(12), 3712–3718. https://doi.org/10.1093/BIOINFORMATICS/BTAA265
dc.relation.referencesDarling, A. C. E., Mau, B., Blattner, F. R., & Perna, N. T. (2004). Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Research, 14(7), 1394. https://doi.org/10.1101/GR.2289704
dc.relation.referencesDuraisamy, P., Abinaya Srijanani, A., Duraisamy, M., Amrit Candida, M., Dinesh Babu, P., & Karthik, S. (2024). Implementation of CNN-LSTM Integration for Advancing Human-Computer Dialogue through Precise Sign Language Gesture Interpretation. 5th International Conference on Recent Trends in Computer Science and Technology, ICRTCST 2024 - Proceedings, 5–9. https://doi.org/10.1109/ICRTCST61793.2024.10578503
dc.relation.referencesFang, Y., Quan, J., Hua, X., Feng, Y., Li, X., Wang, J., Ruan, Z., Shang, S., & Yu, Y. (2016). Complete genome sequence of Acinetobacter baumannii XH386 (ST208), a multi-drug resistant bacteria isolated from pediatric hospital in China. Genomics Data, 7, 269. https://doi.org/10.1016/J.GDATA.2015.12.002
dc.relation.referencesForouzan, E., Shariati, P., Mousavi Maleki, M. S., Karkhane, A. A., & Yakhchali, B. (2018). Practical evaluation of 11 de novo assemblers in metagenome assembly. Journal of Microbiological Methods, 151, 99–105. https://doi.org/10.1016/j.mimet.2018.06.007
dc.relation.referencesGeneralovic, T. N., McCarthy, S. A., Warren, I. A., Wood, J. M. D., Torrance, J., Sims, Y., Quail, M., Howe, K., Pipan, M., Durbin, R., & Jiggins, C. D. (2021). A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.). G3: Genes, Genomes, Genetics, 11(5). https://doi.org/10.1093/g3journal/jkab085
dc.relation.referencesGiani, A. M., Gallo, G. R., Gianfranceschi, L., & Formenti, G. (2020). Long walk to genomics: History and current approaches to genome sequencing and assembly. Computational and Structural Biotechnology Journal, 18, 9–19. https://doi.org/10.1016/j.csbj.2019.11.002
dc.relation.referencesGourlé, H., Karlsson-Lindsjö, O., Hayer, J., & Bongcam-Rudloff, E. (2019). Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics, 35(3), 521–522. https://doi.org/10.1093/BIOINFORMATICS/BTY630
dc.relation.referencesGunasekaran, H., Ramalakshmi, K., Rex Macedo Arokiaraj, A., Kanmani, S. D., Venkatesan, C., & Dhas, C. S. G. (2021). Analysis of DNA Sequence Classification Using CNN and Hybrid Models. Computational and Mathematical Methods in Medicine, 2021, 1835056. https://doi.org/10.1155/2021/1835056
dc.relation.referencesGupta, Y. M., Kirana, S. N., & Homchan, S. (2024). Representing DNA for machine learning algorithms: A primer on one-hot, binary, and integer encodings. Biochemistry and Molecular Biology Education, 53(2), 142–146. https://doi.org/10.1002/BMB.21870;WGROUP:STRING:PUBLICATION
dc.relation.referencesGurevich, A., Saveliev, V., Vyahhi, N., & Tesler, G. (2013). QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), 1072. https://doi.org/10.1093/BIOINFORMATICS/BTT086
dc.relation.referencesHuang, B., Wei, G., Wang, B., Ju, F., Zhong, Y., Shi, Z., Sun, S., & Bu, D. (2021). Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859 021-04448-2
dc.relation.referencesIqbal, T., & Qureshi, S. (2022). The survey: Text generation models in deep learning. Journal of King Saud University - Computer and Information Sciences, 34(6), 2515–2528. https://doi.org/10.1016/J.JKSUCI.2020.04.001
dc.relation.referencesJi, Y., Zhou, Z., Liu, H., & Davuluri, R. V. (2021a). DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA language in genome. Bioinformatics, 37(15), 2112–2120. https://doi.org/10.1093/BIOINFORMATICS/BTAB083
dc.relation.referencesJordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/SCIENCE.AAA8415
dc.relation.referencesKairi, A., Majumdar, P. G., & Rao, A. R. (2020). hAssembler: A hybrid de novo genome assembly approach for large genomes. Indian Journal of Agricultural Sciences, 90(10), 2000–2005. https://www.scopus.com/inward/record.uri?eid=2-s2.0 85114150355&partnerID=40&md5=dc46ab77dac243c63638847f87cedbfc
dc.relation.referencesKaplan, N., & Dekker, J. (2013). High-throughput genome scaffolding from in vivo DNA interaction frequency. Nature Biotechnology 2013 31:12, 31(12), 1143 1147. https://doi.org/10.1038/nbt.2768
dc.relation.referencesKuśmirek, W., & Nowak, R. (2018). De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application. BMC Bioinformatics, 19(1). https://doi.org/10.1186/S12859-018-2281-4
dc.relation.referencesKwon, N., Yoo, Y., & Lee, B. (2024). Class conditioned text generation with style attention mechanism for embracing diversity. Applied Soft Computing, 163. https://doi.org/10.1016/j.asoc.2024.111893
dc.relation.referencesLantz, H., Dominguez Del Angel, V., Hjerde, E., Sterck, L., Capella-Gutierrez, S., Notredame, C., Vinnere Pettersson, O., Amselem, J., Bouri, L., Bocs, S., Klopp, C., Gibrat, J. F., Vlasova, A., Leskosek, B. L., Soler, L., & Binzer-Panchal, M. (2018). Ten steps to get started in Genome Assembly and Annotation. F1000Research, 7, ELIXIR-148. https://doi.org/10.12688/F1000RESEARCH.13598.1
dc.relation.referencesLetunic, I., & Bork, P. (2024). Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Research, 52(W1), W78. https://doi.org/10.1093/NAR/GKAE268
dc.relation.referencesLi, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: an ultra fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics (Oxford, England), 31(10), 1674–1676. https://doi.org/10.1093/BIOINFORMATICS/BTV033
dc.relation.referencesLi, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094. https://doi.org/10.1093/BIOINFORMATICS/BTY191 Bibliografía 66
dc.relation.referencesLi, H., & Durbin, R. (2009a). Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics (Oxford, England), 25(14), 1754–1760. https://doi.org/10.1093/BIOINFORMATICS/BTP324
dc.relation.referencesLiao, Y. C., Lin, S. H., & Lin, H. H. (2015). Completing bacterial genome assemblies: strategy and performance comparisons. Scientific Reports, 5, 8747. https://doi.org/10.1038/SREP08747
dc.relation.referencesLiu, F., Zhu, Y., Yi, Y., Lu, N., Zhu, B., & Hu, Y. (2014). Comparative genomic analysis of Acinetobacter baumannii clinical isolates reveals extensive genomic variation and diverse antibiotic resistance determinants. BMC Genomics, 15(1), 1–14. https://doi.org/10.1186/1471-2164-15-1163/TABLES/4
dc.relation.referencesLiu, J., Yang, M., Yu, Y., Xu, H., Li, K., & Zhou, X. (2024). Large language models in bioinformatics: applications and perspectives. ArXiv, arXiv:2401.04155v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC10802675/
dc.relation.referencesLiu, Y. H., Luo, C., Golding, S. G., Ioffe, J. B., & Zhou, X. M. (2024). Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nature Communications 2024 15:1, 15(1), 1–22. https://doi.org/10.1038/s41467-024-46614-z
dc.relation.referencesLu, P., Jin, J., Li, Z., Xu, Y., Hu, D., Liu, J., & Cao, P. (2020). PGcloser: Fast Parallel Gap-Closing Tool Using Long-Reads or Contigs to Fill Gaps in Genomes. Evolutionary Bioinformatics, 16. https://doi.org/10.1177/1176934320913859
dc.relation.referencesLuhmann, N., Doerr, D., & Chauve, C. (2017). Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient yersinia pestis genomes. Microbial Genomics, 3(9). https://doi.org/10.1099/mgen.0.000123
dc.relation.referencesLuo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., … Wang, J. (2012). SOAPdenovo2: An empirically improved memory efficient short-read de novo assembler. GigaScience, 1(1). https://doi.org/10.1186/2047-217X-1-18
dc.relation.referencesMcGinnis, S., & Madden, T. L. (2004). BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research, 32(Web Server issue), W20. https://doi.org/10.1093/NAR/GKH435 Bibliografía 67
dc.relation.referencesMinkin, I., & Medvedev, P. (2020). Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nature Communications, 11(1), 1–11. https://doi.org/10.1038/S41467-020-19777 8;SUBJMETA=114,212,61,631,748;KWRD=COMPARATIVE+GENOMICS,CO MPUTATIONAL+BIOLOGY+AND+BIOINFORMATICS
dc.relation.referencesMiyamoto, M., Motooka, D., Gotoh, K., Imai, T., Yoshitake, K., Goto, N., Iida, T., Yasunaga, T., Horii, T., Arakawa, K., Kasahara, M., & Nakamura, S. (2014). Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. BMC Genomics, 15(1), 1–9. https://doi.org/10.1186/1471-2164-15-699/COMMENTS
dc.relation.referencesMoeckel, C., Mareboina, M., Konnaris, M. A., Chan, C. S. Y., Mouratidis, I., Montgomery, A., Chantzi, N., Pavlopoulos, G. A., & Georgakopoulos-Soares, I. (2024). A survey of k-mer methods and applications in bioinformatics. Computational and Structural Biotechnology Journal, 23, 2289–2303. https://doi.org/10.1016/J.CSBJ.2024.05.025
dc.relation.referencesMorisse, P., Marchet, C., Limasset, A., Lecroq, T., & Lefebvre, A. (2021). Scalable long read self-correction and assembly polishing with multiple sequence alignment. Scientific Reports 2021 11:1, 11(1), 1–13. https://doi.org/10.1038/s41598-020-80757-5
dc.relation.referencesNadalin, F., Vezzi, F., & Policriti, A. (2012). GapFiller: A de novo assembly approach to fill the gap within paired reads. BMC Bioinformatics, 13(SUPPL 1), 1–16. https://doi.org/10.1186/1471-2105-13-S14-S8/TABLES/7 Ng, P. (2017). dna2vec: Consistent vector representations of variable-length k-mers. https://arxiv.org/pdf/1701.06279
dc.relation.referencesNielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press. http://neuralnetworksanddeeplearning.com Ogunsanya, M., Isichei, J., & Desai, S. (2023). Grid search hyperparameter tuning in additive manufacturing processes. Manufacturing Letters, 35, 1031–1042. https://doi.org/10.1016/J.MFGLET.2023.08.056
dc.relation.referencesPaulino, D., Warren, R. L., Vandervalk, B. P., Raymond, A., Jackman, S. D., & Birol, I. (2015). Sealer: A scalable gap-closing application for finishing draft genomes. BMC Bioinformatics, 16(1), 1–8. https://doi.org/10.1186/S12859-015-0663 4/FIGURES/2
dc.relation.referencesPeona, V., Blom, M. P. K., Xu, L., Burri, R., Sullivan, S., Bunikis, I., Liachko, I., Haryoko, T., Jønsson, K. A., Zhou, Q., Irestedt, M., & Suh, A. (2020). Identifying the causes and consequences of assembly gaps using a multiplatform genome Bibliografía 68 assembly of a bird‐of‐paradise. Molecular Ecology Resources, 21(1), 263. https://doi.org/10.1111/1755-0998.13252
dc.relation.referencesPevzner, P. A., Tang, H., & Tesler, G. (2004). De novo repeat classification and fragment assembly. Genome Research, 14(9), 1786–1796. https://doi.org/10.1101/GR.2395204
dc.relation.referencesPourcel, C., Minandri, F., Hauck, Y., D’Arezzo, S., Imperi, F., Vergnaud, G., & Visca, P. (2011). Identification of variable-number tandem-repeat (VNTR) sequences in Acinetobacter baumannii and interlaboratory validation of an optimized multiple-locus VNTR analysis typing scheme. Journal of Clinical Microbiology, 49(2), 539–548. https://doi.org/10.1128/JCM.02003-10
dc.relation.referencesPrjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020a). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1), e102. https://doi.org/10.1002/CPBI.102
dc.relation.referencesPrjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020b). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1). https://doi.org/10.1002/CPBI.102
dc.relation.referencesRácz, A., Bajusz, D., & Héberger, K. (2019). Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics. Molecules (Basel, Switzerland), 24. https://doi.org/10.3390/molecules24152811
dc.relation.referencesRizzi, R., Beretta, S., Patterson, M., Pirola, Y., Previtali, M., Della Vedova, G., & Bonizzoni, P. (2019). Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. Quantitative Biology, 7(4), 278 292. https://doi.org/10.1007/S40484-019-0181-X/METRICS
dc.relation.referencesSaha, S., Bridges, S., Magbanua, Z. V., & Peterson, D. G. (2008). Computational Approaches and Tools Used in Identification of Dispersed Repetitive DNA Sequences. Tropical Plant Biology 2008 1:1, 1(1), 85–96. https://doi.org/10.1007/S12042-007-9007-5
dc.relation.referencesSalmela, L., Sahlin, K., Mäkinen, V., & Tomescu, A. I. (2016). Gap filling as exact path length problem. Journal of Computational Biology, 23(5), 347–361. https://doi.org/10.1089/cmb.2015.0197
dc.relation.referencesSalmela, L., & Tomescu, A. I. (2016). Safely filling gaps with partial solutions common to all solutions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9838 LNCS, XIII. https://www.scopus.com/inward/record.uri?eid=2-s2.0 84984982154&partnerID=40&md5=152f5f5c325caa43d1074da1b3360ed1 Bibliografía 69
dc.relation.referencesSalzberg, S. L., Phillippy, A. M., Zimin, A., Puiu, D., Magoc, T., Koren, S., Treangen, T. J., Schatz, M. C., Delcher, A. L., Roberts, M., Marçais, G., Pop, M., & Yorke, J. A. (2012). GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Research, 22(3), 557–567. https://doi.org/10.1101/gr.131383.111
dc.relation.referencesSanabria, M., Hirsch, J., & Poetsch, A. R. (2024). Distinguishing word identity and sequence context in DNA language models. BMC Bioinformatics, 25(1), 1–12. https://doi.org/10.1186/S12859-024-05869-5/FIGURES/3
dc.relation.referencesSchmeing, S., & Robinson, M. D. (2023). Gapless provides combined scaffolding, gap filling, and assembly correction with long reads. Life Science Alliance, 6(7). https://doi.org/10.26508/LSA.202201471
dc.relation.referencesSchwengers, O., Hoek, A., Fritzenwanker, M., Falgenhauer, L., Hain, T., Chakraborty, T., & Goesmann, A. (2020). ASA3P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLoS Computational Biology, 16(3). https://doi.org/10.1371/journal.pcbi.1007134
dc.relation.referencesSeemann, T. (2015). Snippy: rapid haploid variant calling and core SNP phylogeny. GitHub. Available at: Github. Com/Tseemann/Snippy. Shanthamallu, U. S., & Spanias, A. (2022). Machine and Deep Learning Algorithms and Applications (pp. 1–106). Springer Nature. https://asu.elsevierpure.com/en/publications/machine-and-deep-learning algorithms-and-applications
dc.relation.referencesShe, X., & Zhang, D. (2018). Text Classification Based on Hybrid CNN-LSTM Hybrid Model. Proceedings - 2018 11th International Symposium on Computational Intelligence and Design, ISCID 2018, 2, 185–189. https://doi.org/10.1109/ISCID.2018.10144
dc.relation.referencesSimpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M., & Birol, I. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117. https://doi.org/10.1101/GR.089532.108
dc.relation.referencesSohn, J. Il, & Nam, J. W. (2018). The present and future of de novo whole-genome assembly. Briefings in Bioinformatics, 19(1), 23–40. https://doi.org/10.1093/bib/bbw096
dc.relation.referencesSong, S., Huang, H., & Ruan, T. (2019). Abstractive text summarization using LSTM CNN based deep learning. Multimedia Tools and Applications, 78(1), 857–875. https://doi.org/10.1007/S11042-018-5749-3/METRICS Bibliografía 70
dc.relation.referencesThomma, B. P. H. J., Seidl, M. F., Shi-Kunne, X., Cook, D. E., Bolton, M. D., van Kan, J. A. L., & Faino, L. (2016). Mind the gap; seven reasons to close fragmented genome assemblies. Fungal Genetics and Biology, 90, 24–30. https://doi.org/10.1016/J.FGB.2015.08.010,
dc.relation.referencesTørresen, O. K., Star, B., Mier, P., Andrade-Navarro, M. A., Bateman, A., Jarnot, P., Gruca, A., Grynberg, M., Kajava, A. V., Promponas, V. J., Anisimova, M., Jakobsen, K. S., & Linke, D. (2019). Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research, 47(21), 10994. https://doi.org/10.1093/NAR/GKZ841
dc.relation.referencesTreangen, T. J., & Salzberg, S. L. (2011). Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics 2011 13:1, 13(1), 36–46. https://doi.org/10.1038/nrg3117
dc.relation.referencesTurton, J. F., Matos, J., Kaufmann, M. E., & Pitt, T. L. (2009). Variable number tandem repeat loci providing discrimination within widespread genotypes of acinetobacter baumannii. European Journal of Clinical Microbiology and Infectious Diseases, 28(5), 499–507. https://doi.org/10.1007/S10096-008-0659 3,
dc.relation.referencesUguen, K., Michaud, J. L., & Génin, E. (2024). Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. European Journal of Human Genetics : EJHG, 32(9), 1037–1044. https://doi.org/10.1038/S41431-024-01666-Z
dc.relation.referencesVrigazova, B. (2021). The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems. Business Systems Research : International Journal of the Society for Advancing Innovation and Research in Economy, 12(1), 228–242. https://doi.org/10.2478/bsrj-2021-0015
dc.relation.referencesWang, Z., Sun, J., Gao, Y., Xue, Y., Zhang, Y., Li, K., Zhang, W., Zhang, C., Zu, J., & Zhang, L. (2023). Fusang: a framework for phylogenetic tree inference via deep learning. Nucleic Acids Research, 51(20), 10909–10923. https://doi.org/10.1093/NAR/GKAD805
dc.relation.referencesWhibley, A., Kelley, J. L., & Narum, S. R. (2021). The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Molecular Ecology Resources, 21(3), 641–652. https://doi.org/10.1111/1755-0998.13312
dc.relation.referencesWright, M. S., Haft, D. H., Harkins, D. M., Perez, F., Hujer, K. M., Bajaksouzian, S., Benard, M. F., Jacobs, M. R., Bonomo, R. A., & Adams, M. D. (2014). New insights into dissemination and variation of the health care- associated Bibliografía 71 pathogen Acinetobacter baumannii from genomic analysis. MBio, 5(1). https://doi.org/10.1128/MBIO.00963-13/SUPPL_FILE/MBO006131705ST4.TXT
dc.relation.referencesXavier, B. B., Sabirova, J., Pieter, M., Hernalsteens, J. P., De Greve, H., Goossens, H., & Malhotra-Kumar, S. (2014). Employing whole genome mapping for optimal de novo assembly of bacterial genomes. BMC Research Notes, 7(1), 1–4. https://doi.org/10.1186/1756-0500-7-484/FIGURES/1
dc.relation.referencesXu, C., Zhu, Z., Wang, J., Wang, J., Zhang, W., & Zhang, W. 2024. (2024). Understanding the Role of Cross-Entropy Loss in Fairly Evaluating Large Language Model-based Recommendation. Proceedings of ACM Conference (Conference’17), 1. https://doi.org/XXXXXXX.XXXXXXX
dc.relation.referencesYang, A., Zhang, W., Wang, J., Yang, K., Han, Y., & Zhang, L. (2020). Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA. Frontiers in Bioengineering and Biotechnology, 8, 1032. https://doi.org/10.3389/FBIOE.2020.01032/BIBTEX
dc.relation.referencesYoon, S., Kim, D., Kang, K., & Park, W. J. (2018). TraRECo: A greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics, 19(1), 1–20. https://doi.org/10.1186/S12864-018-5034-X/FIGURES/14
dc.relation.referencesZhai, J., Sun, H., Xu, C., & Sun, W. (2023). ODTC: An online darknet traffic classification model based on multimodal self-attention chaotic mapping features. Electronic Research Archive, 31(8), 5056–5082. https://doi.org/10.3934/ERA.2023259
dc.relation.referencesZhang, D., Zhang, W., Zhao, Y., Zhang, J., He, B., Qin, C., & Yao, J. (2023). DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks. https://arxiv.org/pdf/2307.05628
dc.relation.referencesZhao, Z., Zhou, Y., Wang, S., Zhang, X., Wang, C., & Li, S. (2020). LDscaff: LD based scaffolding of de novo genome assemblies. BMC Bioinformatics, 21. https://doi.org/10.1186/s12859-020-03895-7
dc.rights.accessrightsinfo:eu-repo/semantics/openAccess
dc.rights.licenseAtribución-NoComercial 4.0 Internacional
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subject.ddc000 - Ciencias de la computación, información y obras generales
dc.subject.ddc570 - Biología
dc.subject.lembBioinformáticaspa
dc.subject.lembBioinformaticseng
dc.subject.lembInteligencia artificialspa
dc.subject.lembArtificial intelligenceeng
dc.subject.proposalAcinetobacter baumannii, Ensamblaje de Novo, Inteligencia artificial, Procesamiento de Lenguaje Natural, Llenado de huecosspa
dc.subject.proposalAcinetobacter baumannii, Artificial Intelligence, Natural Language Processing, Gap Fillingeng
dc.subject.wikidataAcinetobacter baumanniispa
dc.subject.wikidataGenómica comparativaspa
dc.subject.wikidataComparative genomicseng
dc.titleModelo de inteligencia artificial para realizar gap filling en ensambles de reads cortos de genomas de Acinetobacter baumanniispa
dc.title.translatedArtificial intelligence model for performing gap filling in assemblies of short reads of Acinetobacter baumannii genomeseng
dc.typeTrabajo de grado - Maestría
dc.type.coarhttp://purl.org/coar/resource_type/c_bdcc
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.contentText
dc.type.driverinfo:eu-repo/semantics/masterThesis
dc.type.redcolhttp://purl.org/redcol/resource_type/TM
dc.type.versioninfo:eu-repo/semantics/acceptedVersion
dcterms.audience.professionaldevelopmentBibliotecarios
dcterms.audience.professionaldevelopmentEstudiantes
dcterms.audience.professionaldevelopmentInvestigadores
dcterms.audience.professionaldevelopmentMaestros
dcterms.audience.professionaldevelopmentPúblico general
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Trabajo Final de Maestría en Bioinformática.2025.pdf
Tamaño:
2.33 MB
Formato:
Adobe Portable Document Format

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: