Modelo de inteligencia artificial para  realizar gap filling en ensambles de  reads cortos de genomas de  Acinetobacter baumannii

Navas Luquez, Mateo

Modelo de inteligencia artificial para realizar gap filling en ensambles de reads cortos de genomas de Acinetobacter baumannii

dc.contributor.advisor	Barreto Hernández, Emiliano
dc.contributor.author	Navas Luquez, Mateo
dc.contributor.researchgroup	Bioinformática
dc.date.accessioned	2026-02-10T13:57:47Z
dc.date.available	2026-02-10T13:57:47Z
dc.date.issued	2025
dc.description	Ilustraciones, diagramas, gráficos	spa
dc.description.abstract	Las técnicas de secuenciación masiva de nueva generación (NGS) fueron revolucionarias en el campo de la genómica y en el proceso de secuenciación de genomas completos (WGS), debido a que permiten secuenciar un volumen de datos con gran profundidad, y a un costo asequible. Estos procedimientos implementan secuenciación masiva de reads cortos, que permiten la lectura en paralelo de todo el genoma. Pese la alta capacidad de lectura, los reads cortos son fragmentos pequeños de secuencias mucho más grandes. Por ello, las técnicas de ensamblaje son fundamentales en la extensión de contigs y orientación de scaffolds, para obtener genomas completos. No obstante, los ensambladores para secuenciación de reads cortos NGS presentan limitaciones técnicas y teóricas asociadas a las regiones repetitivas o de baja complejidad presentes en los genomas. Estas regiones limitan los cálculos de orientación entre scaffols, lo que genera huecos o gaps en los procesos de ensamblaje. Pese a las limitantes del ensamblaje con reads cortos NGS, el numero proyectos de secuenciación y la disponibilidad de datos de WGS siguen en aumento debido a que es altamente costo efectivo. Uno de los organismos con gran crecimiento en el numero de proyectos de secuenciación es la bacteria multirresistente Acinetobacter baumannii. Debido a que es riesgo para la salud pública mundial dado a su capacidad para sobrevivir en ambientes hospitalarios y de generar infecciones graves. Con el fin de mejorar los procesos de calidad en los ensamblajes y aprovechar el gran número de datos de secuenciación se propone implementar metodologías de inteligencia artificial, para entrenar modelos capaces de cerrar huecos en los ensamblajes de novo de Acinetobacter baumannii, que implementen metodologías de reads cortos. (Texto tomado de la fuente)	spa
dc.description.abstract	Next-generation sequencing (NGS) techniques were revolutionary in the field of genomics and in the whole-genome sequencing (WGS) process, because they allow for the sequencing of a large volume of data with great depth and at an affordable cost. These procedures implement massive sequencing of short reads, which allow for the parallel reading of the entire genome. Despite the high reading capacity, short reads are small fragments of much larger sequences. Therefore, assembly techniques are fundamental in the extension of contigs and the orientation of scaffolds to obtain complete genomes. However, assemblers for short-read NGS sequencing have technical and theoretical limitations associated with repetitive or low-complexity regions present in the genomes. These regions limit the calculations of orientation between scaffolds, generating gaps in the assembly processes. Despite the limitations of short read NGS assembly, the number of sequencing projects and the availability of WGS data continue to increase because it is highly cost-effective. One of the organisms with a significant increase in the number of sequencing projects is the multidrug-resistant bacterium Acinetobacter baumannii. This is because it poses a risk to global public health due to its ability to survive in hospital environments and cause serious infections. In order to improve quality processes in assemblies and take advantage of the large amount of sequencing data, it is proposed to implement artificial intelligence methodologies to train models capable of closing gaps in de novo Acinetobacter baumannii assemblies that implement short read methodologies.	eng
dc.description.degreelevel	Maestría
dc.description.degreename	Maestro en Bioinformatica
dc.description.researcharea	Bioinformática funcional y estructural
dc.format.extent	xi, 71 páginas
dc.format.mimetype	application/pdf
dc.identifier.instname	Universidad Nacional de Colombia	spa
dc.identifier.reponame	Repositorio Institucional Universidad Nacional de Colombia	spa
dc.identifier.repourl	https://repositorio.unal.edu.co/	spa
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/89446
dc.language.iso	spa
dc.publisher	Universidad Nacional de Colombia
dc.publisher.branch	Universidad Nacional de Colombia - Sede Bogotá
dc.publisher.faculty	Facultad de Ingeniería
dc.publisher.place	Bogotá, Colombia
dc.publisher.program	Bogotá - Ingeniería - Maestría en Bioinformática
dc.relation.references	Ali, M., Dewan, A., Sahu, A. K., & Taye, M. M. (2023). Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers 2023, Vol. 12, Page 91, 12(5), 91. https://doi.org/10.3390/COMPUTERS12050091
dc.relation.references	Ali, Y. A., Awwad, E. M., Al-Razgan, M., & Maarouf, A. (2023). Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes 2023, Vol. 11, Page 349, 11(2), 349. https://doi.org/10.3390/PR11020349
dc.relation.references	Badillo, S., Banfai, B., Birzele, F., Davydov, I. I., Hutchinson, L., Kam-Thong, T., Siebourg-Polster, J., Steiert, B., & Zhang, J. D. (2020). An Introduction to Machine Learning. Clinical Pharmacology and Therapeutics, 107(4), 871. https://doi.org/10.1002/CPT.1796
dc.relation.references	Bayat, A., Deshpande, N. P., Wilkins, M. R., & Parameswaran, S. (2020). Fast Short Read De-Novo Assembly Using Overlap-Layout-Consensus Approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(1), 334–338. https://doi.org/10.1109/TCBB.2018.2875479
dc.relation.references	Boetzer, M., & Pirovano, W. (2012). Toward almost closed genomes with GapFiller. Genome Biology, 13(6). https://doi.org/10.1186/GB-2012-13-6-R56
dc.relation.references	Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114. https://doi.org/10.1093/BIOINFORMATICS/BTU170
dc.relation.references	Bush, S. J., Foster, D., Eyre, D. W., Clark, E. L., de Maio, N., Shaw, L. P., Stoesser, N., Peto, T. E. A., Crook, D. W., & Walker, A. S. (2020a). Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines. GigaScience, 9(2), 1–21. https://doi.org/10.1093/GIGASCIENCE/GIAA007
dc.relation.references	Calin, O. (2020). Deep Learning Architectures. https://doi.org/10.1007/978-3-030 36721-3
dc.relation.references	Cappelletti, L., Fontana, T., Di Donato, G. W., Di Tucci, L., Casiraghi, E., & Valentini, G. (2020). Complex data imputation by auto-encoders and convolutional neural networks—A case study on genome gap-filling. Computers, 9(2). https://doi.org/10.3390/computers9020037
dc.relation.references	Chen, A., Field, M., Bhattacharya, A., Nabeel Asim Muhammad, M., Nabeel Asim, M., Ali Ibrahim, M., Zaib, A., & Dengel, A. (2025). DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models. Frontiers in Medicine, 12, 1503229. https://doi.org/10.3389/FMED.2025.1503229
dc.relation.references	Chen, E., Chu, J., Zhang, J., Warren, R., & Birol, I. (2021). GapPredict A Language Model for Resolving Gaps in Draft Genome Assemblies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. https://doi.org/10.1109/TCBB.2021.3109557
dc.relation.references	Chen, Y., Wang, G., & Zhang, T. (2024). Utilizing Deep Neural Networks to Fill Gaps in Small Genomes. International Journal of Molecular Sciences 2024, Vol. 25, Page 8502, 25(15), 8502. https://doi.org/10.3390/IJMS25158502
dc.relation.references	Chu, C., Li, X., & Wu, Y. (2019). GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads. BMC Genomics, 20(Suppl 5). https://doi.org/10.1186/S12864-019-5703-4
dc.relation.references	Coombe, L., Li, J. X., Lo, T., Wong, J., Nikolic, V., Warren, R. L., & Birol, I. (2021). LongStitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859-021-04451-7
dc.relation.references	Coombe, L., Nikolić, V., Chu, J., Birol, I., & Warren, R. L. (2020). NtJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs. Bioinformatics, 36(12), 3885–3887. https://doi.org/10.1093/bioinformatics/btaa253
dc.relation.references	Crossley, B. M., Bai, J., Glaser, A., Maes, R., Porter, E., Killian, M. L., Clement, T., & Toohey-Kurth, K. (2020). Guidelines for Sanger sequencing and molecular assay monitoring. Journal of Veterinary Diagnostic Investigation, 32(6), 767 775. https://doi.org/10.1177/1040638720905833/ASSET/IMAGES/LARGE/10.1177_ 1040638720905833-FIG1.JPEG
dc.relation.references	Cuber, P., Chooneea, D., Geeves, C., Salatino, S., Creedy, T. J., Griffin, C., Sivess, L., Barnes, I., Price, B., & Misra, R. (2023). Comparing the accuracy and efficiency of third generation sequencing technologies, Oxford Nanopore Technologies, and Pacific Biosciences, for DNA barcode sequencing applications. Ecological Genetics and Genomics, 28, 100181. https://doi.org/10.1016/J.EGG.2023.100181
dc.relation.references	Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2020). Transformer-XL: Attentive language models beyond a fixed-length context. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 2978–2988. https://doi.org/10.18653/v1/p19 1285
dc.relation.references	Darby, C. A., Gaddipati, R., Schatz, M. C., & Langmead, B. (2020). Vargas: heuristic free alignment for assessing linear and graph read aligners. Bioinformatics, 36(12), 3712–3718. https://doi.org/10.1093/BIOINFORMATICS/BTAA265
dc.relation.references	Darling, A. C. E., Mau, B., Blattner, F. R., & Perna, N. T. (2004). Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Research, 14(7), 1394. https://doi.org/10.1101/GR.2289704
dc.relation.references	Duraisamy, P., Abinaya Srijanani, A., Duraisamy, M., Amrit Candida, M., Dinesh Babu, P., & Karthik, S. (2024). Implementation of CNN-LSTM Integration for Advancing Human-Computer Dialogue through Precise Sign Language Gesture Interpretation. 5th International Conference on Recent Trends in Computer Science and Technology, ICRTCST 2024 - Proceedings, 5–9. https://doi.org/10.1109/ICRTCST61793.2024.10578503
dc.relation.references	Fang, Y., Quan, J., Hua, X., Feng, Y., Li, X., Wang, J., Ruan, Z., Shang, S., & Yu, Y. (2016). Complete genome sequence of Acinetobacter baumannii XH386 (ST208), a multi-drug resistant bacteria isolated from pediatric hospital in China. Genomics Data, 7, 269. https://doi.org/10.1016/J.GDATA.2015.12.002
dc.relation.references	Forouzan, E., Shariati, P., Mousavi Maleki, M. S., Karkhane, A. A., & Yakhchali, B. (2018). Practical evaluation of 11 de novo assemblers in metagenome assembly. Journal of Microbiological Methods, 151, 99–105. https://doi.org/10.1016/j.mimet.2018.06.007
dc.relation.references	Generalovic, T. N., McCarthy, S. A., Warren, I. A., Wood, J. M. D., Torrance, J., Sims, Y., Quail, M., Howe, K., Pipan, M., Durbin, R., & Jiggins, C. D. (2021). A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.). G3: Genes, Genomes, Genetics, 11(5). https://doi.org/10.1093/g3journal/jkab085
dc.relation.references	Giani, A. M., Gallo, G. R., Gianfranceschi, L., & Formenti, G. (2020). Long walk to genomics: History and current approaches to genome sequencing and assembly. Computational and Structural Biotechnology Journal, 18, 9–19. https://doi.org/10.1016/j.csbj.2019.11.002
dc.relation.references	Gourlé, H., Karlsson-Lindsjö, O., Hayer, J., & Bongcam-Rudloff, E. (2019). Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics, 35(3), 521–522. https://doi.org/10.1093/BIOINFORMATICS/BTY630
dc.relation.references	Gunasekaran, H., Ramalakshmi, K., Rex Macedo Arokiaraj, A., Kanmani, S. D., Venkatesan, C., & Dhas, C. S. G. (2021). Analysis of DNA Sequence Classification Using CNN and Hybrid Models. Computational and Mathematical Methods in Medicine, 2021, 1835056. https://doi.org/10.1155/2021/1835056
dc.relation.references	Gupta, Y. M., Kirana, S. N., & Homchan, S. (2024). Representing DNA for machine learning algorithms: A primer on one-hot, binary, and integer encodings. Biochemistry and Molecular Biology Education, 53(2), 142–146. https://doi.org/10.1002/BMB.21870;WGROUP:STRING:PUBLICATION
dc.relation.references	Gurevich, A., Saveliev, V., Vyahhi, N., & Tesler, G. (2013). QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), 1072. https://doi.org/10.1093/BIOINFORMATICS/BTT086
dc.relation.references	Huang, B., Wei, G., Wang, B., Ju, F., Zhong, Y., Shi, Z., Sun, S., & Bu, D. (2021). Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859 021-04448-2
dc.relation.references	Iqbal, T., & Qureshi, S. (2022). The survey: Text generation models in deep learning. Journal of King Saud University - Computer and Information Sciences, 34(6), 2515–2528. https://doi.org/10.1016/J.JKSUCI.2020.04.001
dc.relation.references	Ji, Y., Zhou, Z., Liu, H., & Davuluri, R. V. (2021a). DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA language in genome. Bioinformatics, 37(15), 2112–2120. https://doi.org/10.1093/BIOINFORMATICS/BTAB083
dc.relation.references	Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/SCIENCE.AAA8415
dc.relation.references	Kairi, A., Majumdar, P. G., & Rao, A. R. (2020). hAssembler: A hybrid de novo genome assembly approach for large genomes. Indian Journal of Agricultural Sciences, 90(10), 2000–2005. https://www.scopus.com/inward/record.uri?eid=2-s2.0 85114150355&partnerID=40&md5=dc46ab77dac243c63638847f87cedbfc
dc.relation.references	Kaplan, N., & Dekker, J. (2013). High-throughput genome scaffolding from in vivo DNA interaction frequency. Nature Biotechnology 2013 31:12, 31(12), 1143 1147. https://doi.org/10.1038/nbt.2768
dc.relation.references	Kuśmirek, W., & Nowak, R. (2018). De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application. BMC Bioinformatics, 19(1). https://doi.org/10.1186/S12859-018-2281-4
dc.relation.references	Kwon, N., Yoo, Y., & Lee, B. (2024). Class conditioned text generation with style attention mechanism for embracing diversity. Applied Soft Computing, 163. https://doi.org/10.1016/j.asoc.2024.111893
dc.relation.references	Lantz, H., Dominguez Del Angel, V., Hjerde, E., Sterck, L., Capella-Gutierrez, S., Notredame, C., Vinnere Pettersson, O., Amselem, J., Bouri, L., Bocs, S., Klopp, C., Gibrat, J. F., Vlasova, A., Leskosek, B. L., Soler, L., & Binzer-Panchal, M. (2018). Ten steps to get started in Genome Assembly and Annotation. F1000Research, 7, ELIXIR-148. https://doi.org/10.12688/F1000RESEARCH.13598.1
dc.relation.references	Letunic, I., & Bork, P. (2024). Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Research, 52(W1), W78. https://doi.org/10.1093/NAR/GKAE268
dc.relation.references	Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: an ultra fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics (Oxford, England), 31(10), 1674–1676. https://doi.org/10.1093/BIOINFORMATICS/BTV033
dc.relation.references	Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094. https://doi.org/10.1093/BIOINFORMATICS/BTY191 Bibliografía 66
dc.relation.references	Li, H., & Durbin, R. (2009a). Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics (Oxford, England), 25(14), 1754–1760. https://doi.org/10.1093/BIOINFORMATICS/BTP324
dc.relation.references	Liao, Y. C., Lin, S. H., & Lin, H. H. (2015). Completing bacterial genome assemblies: strategy and performance comparisons. Scientific Reports, 5, 8747. https://doi.org/10.1038/SREP08747
dc.relation.references	Liu, F., Zhu, Y., Yi, Y., Lu, N., Zhu, B., & Hu, Y. (2014). Comparative genomic analysis of Acinetobacter baumannii clinical isolates reveals extensive genomic variation and diverse antibiotic resistance determinants. BMC Genomics, 15(1), 1–14. https://doi.org/10.1186/1471-2164-15-1163/TABLES/4
dc.relation.references	Liu, J., Yang, M., Yu, Y., Xu, H., Li, K., & Zhou, X. (2024). Large language models in bioinformatics: applications and perspectives. ArXiv, arXiv:2401.04155v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC10802675/
dc.relation.references	Liu, Y. H., Luo, C., Golding, S. G., Ioffe, J. B., & Zhou, X. M. (2024). Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nature Communications 2024 15:1, 15(1), 1–22. https://doi.org/10.1038/s41467-024-46614-z
dc.relation.references	Lu, P., Jin, J., Li, Z., Xu, Y., Hu, D., Liu, J., & Cao, P. (2020). PGcloser: Fast Parallel Gap-Closing Tool Using Long-Reads or Contigs to Fill Gaps in Genomes. Evolutionary Bioinformatics, 16. https://doi.org/10.1177/1176934320913859
dc.relation.references	Luhmann, N., Doerr, D., & Chauve, C. (2017). Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient yersinia pestis genomes. Microbial Genomics, 3(9). https://doi.org/10.1099/mgen.0.000123
dc.relation.references	Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., … Wang, J. (2012). SOAPdenovo2: An empirically improved memory efficient short-read de novo assembler. GigaScience, 1(1). https://doi.org/10.1186/2047-217X-1-18
dc.relation.references	McGinnis, S., & Madden, T. L. (2004). BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research, 32(Web Server issue), W20. https://doi.org/10.1093/NAR/GKH435 Bibliografía 67
dc.relation.references	Minkin, I., & Medvedev, P. (2020). Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nature Communications, 11(1), 1–11. https://doi.org/10.1038/S41467-020-19777 8;SUBJMETA=114,212,61,631,748;KWRD=COMPARATIVE+GENOMICS,CO MPUTATIONAL+BIOLOGY+AND+BIOINFORMATICS
dc.relation.references	Miyamoto, M., Motooka, D., Gotoh, K., Imai, T., Yoshitake, K., Goto, N., Iida, T., Yasunaga, T., Horii, T., Arakawa, K., Kasahara, M., & Nakamura, S. (2014). Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. BMC Genomics, 15(1), 1–9. https://doi.org/10.1186/1471-2164-15-699/COMMENTS
dc.relation.references	Moeckel, C., Mareboina, M., Konnaris, M. A., Chan, C. S. Y., Mouratidis, I., Montgomery, A., Chantzi, N., Pavlopoulos, G. A., & Georgakopoulos-Soares, I. (2024). A survey of k-mer methods and applications in bioinformatics. Computational and Structural Biotechnology Journal, 23, 2289–2303. https://doi.org/10.1016/J.CSBJ.2024.05.025
dc.relation.references	Morisse, P., Marchet, C., Limasset, A., Lecroq, T., & Lefebvre, A. (2021). Scalable long read self-correction and assembly polishing with multiple sequence alignment. Scientific Reports 2021 11:1, 11(1), 1–13. https://doi.org/10.1038/s41598-020-80757-5
dc.relation.references	Nadalin, F., Vezzi, F., & Policriti, A. (2012). GapFiller: A de novo assembly approach to fill the gap within paired reads. BMC Bioinformatics, 13(SUPPL 1), 1–16. https://doi.org/10.1186/1471-2105-13-S14-S8/TABLES/7 Ng, P. (2017). dna2vec: Consistent vector representations of variable-length k-mers. https://arxiv.org/pdf/1701.06279
dc.relation.references	Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press. http://neuralnetworksanddeeplearning.com Ogunsanya, M., Isichei, J., & Desai, S. (2023). Grid search hyperparameter tuning in additive manufacturing processes. Manufacturing Letters, 35, 1031–1042. https://doi.org/10.1016/J.MFGLET.2023.08.056
dc.relation.references	Paulino, D., Warren, R. L., Vandervalk, B. P., Raymond, A., Jackman, S. D., & Birol, I. (2015). Sealer: A scalable gap-closing application for finishing draft genomes. BMC Bioinformatics, 16(1), 1–8. https://doi.org/10.1186/S12859-015-0663 4/FIGURES/2
dc.relation.references	Peona, V., Blom, M. P. K., Xu, L., Burri, R., Sullivan, S., Bunikis, I., Liachko, I., Haryoko, T., Jønsson, K. A., Zhou, Q., Irestedt, M., & Suh, A. (2020). Identifying the causes and consequences of assembly gaps using a multiplatform genome Bibliografía 68 assembly of a bird‐of‐paradise. Molecular Ecology Resources, 21(1), 263. https://doi.org/10.1111/1755-0998.13252
dc.relation.references	Pevzner, P. A., Tang, H., & Tesler, G. (2004). De novo repeat classification and fragment assembly. Genome Research, 14(9), 1786–1796. https://doi.org/10.1101/GR.2395204
dc.relation.references	Pourcel, C., Minandri, F., Hauck, Y., D’Arezzo, S., Imperi, F., Vergnaud, G., & Visca, P. (2011). Identification of variable-number tandem-repeat (VNTR) sequences in Acinetobacter baumannii and interlaboratory validation of an optimized multiple-locus VNTR analysis typing scheme. Journal of Clinical Microbiology, 49(2), 539–548. https://doi.org/10.1128/JCM.02003-10
dc.relation.references	Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020a). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1), e102. https://doi.org/10.1002/CPBI.102
dc.relation.references	Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020b). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1). https://doi.org/10.1002/CPBI.102
dc.relation.references	Rácz, A., Bajusz, D., & Héberger, K. (2019). Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics. Molecules (Basel, Switzerland), 24. https://doi.org/10.3390/molecules24152811
dc.relation.references	Rizzi, R., Beretta, S., Patterson, M., Pirola, Y., Previtali, M., Della Vedova, G., & Bonizzoni, P. (2019). Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. Quantitative Biology, 7(4), 278 292. https://doi.org/10.1007/S40484-019-0181-X/METRICS
dc.relation.references	Saha, S., Bridges, S., Magbanua, Z. V., & Peterson, D. G. (2008). Computational Approaches and Tools Used in Identification of Dispersed Repetitive DNA Sequences. Tropical Plant Biology 2008 1:1, 1(1), 85–96. https://doi.org/10.1007/S12042-007-9007-5
dc.relation.references	Salmela, L., Sahlin, K., Mäkinen, V., & Tomescu, A. I. (2016). Gap filling as exact path length problem. Journal of Computational Biology, 23(5), 347–361. https://doi.org/10.1089/cmb.2015.0197
dc.relation.references	Salmela, L., & Tomescu, A. I. (2016). Safely filling gaps with partial solutions common to all solutions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9838 LNCS, XIII. https://www.scopus.com/inward/record.uri?eid=2-s2.0 84984982154&partnerID=40&md5=152f5f5c325caa43d1074da1b3360ed1 Bibliografía 69
dc.relation.references	Salzberg, S. L., Phillippy, A. M., Zimin, A., Puiu, D., Magoc, T., Koren, S., Treangen, T. J., Schatz, M. C., Delcher, A. L., Roberts, M., Marçais, G., Pop, M., & Yorke, J. A. (2012). GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Research, 22(3), 557–567. https://doi.org/10.1101/gr.131383.111
dc.relation.references	Sanabria, M., Hirsch, J., & Poetsch, A. R. (2024). Distinguishing word identity and sequence context in DNA language models. BMC Bioinformatics, 25(1), 1–12. https://doi.org/10.1186/S12859-024-05869-5/FIGURES/3
dc.relation.references	Schmeing, S., & Robinson, M. D. (2023). Gapless provides combined scaffolding, gap filling, and assembly correction with long reads. Life Science Alliance, 6(7). https://doi.org/10.26508/LSA.202201471
dc.relation.references	Schwengers, O., Hoek, A., Fritzenwanker, M., Falgenhauer, L., Hain, T., Chakraborty, T., & Goesmann, A. (2020). ASA3P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLoS Computational Biology, 16(3). https://doi.org/10.1371/journal.pcbi.1007134
dc.relation.references	Seemann, T. (2015). Snippy: rapid haploid variant calling and core SNP phylogeny. GitHub. Available at: Github. Com/Tseemann/Snippy. Shanthamallu, U. S., & Spanias, A. (2022). Machine and Deep Learning Algorithms and Applications (pp. 1–106). Springer Nature. https://asu.elsevierpure.com/en/publications/machine-and-deep-learning algorithms-and-applications
dc.relation.references	She, X., & Zhang, D. (2018). Text Classification Based on Hybrid CNN-LSTM Hybrid Model. Proceedings - 2018 11th International Symposium on Computational Intelligence and Design, ISCID 2018, 2, 185–189. https://doi.org/10.1109/ISCID.2018.10144
dc.relation.references	Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M., & Birol, I. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117. https://doi.org/10.1101/GR.089532.108
dc.relation.references	Sohn, J. Il, & Nam, J. W. (2018). The present and future of de novo whole-genome assembly. Briefings in Bioinformatics, 19(1), 23–40. https://doi.org/10.1093/bib/bbw096
dc.relation.references	Song, S., Huang, H., & Ruan, T. (2019). Abstractive text summarization using LSTM CNN based deep learning. Multimedia Tools and Applications, 78(1), 857–875. https://doi.org/10.1007/S11042-018-5749-3/METRICS Bibliografía 70
dc.relation.references	Thomma, B. P. H. J., Seidl, M. F., Shi-Kunne, X., Cook, D. E., Bolton, M. D., van Kan, J. A. L., & Faino, L. (2016). Mind the gap; seven reasons to close fragmented genome assemblies. Fungal Genetics and Biology, 90, 24–30. https://doi.org/10.1016/J.FGB.2015.08.010,
dc.relation.references	Tørresen, O. K., Star, B., Mier, P., Andrade-Navarro, M. A., Bateman, A., Jarnot, P., Gruca, A., Grynberg, M., Kajava, A. V., Promponas, V. J., Anisimova, M., Jakobsen, K. S., & Linke, D. (2019). Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research, 47(21), 10994. https://doi.org/10.1093/NAR/GKZ841
dc.relation.references	Treangen, T. J., & Salzberg, S. L. (2011). Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics 2011 13:1, 13(1), 36–46. https://doi.org/10.1038/nrg3117
dc.relation.references	Turton, J. F., Matos, J., Kaufmann, M. E., & Pitt, T. L. (2009). Variable number tandem repeat loci providing discrimination within widespread genotypes of acinetobacter baumannii. European Journal of Clinical Microbiology and Infectious Diseases, 28(5), 499–507. https://doi.org/10.1007/S10096-008-0659 3,
dc.relation.references	Uguen, K., Michaud, J. L., & Génin, E. (2024). Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. European Journal of Human Genetics : EJHG, 32(9), 1037–1044. https://doi.org/10.1038/S41431-024-01666-Z
dc.relation.references	Vrigazova, B. (2021). The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems. Business Systems Research : International Journal of the Society for Advancing Innovation and Research in Economy, 12(1), 228–242. https://doi.org/10.2478/bsrj-2021-0015
dc.relation.references	Wang, Z., Sun, J., Gao, Y., Xue, Y., Zhang, Y., Li, K., Zhang, W., Zhang, C., Zu, J., & Zhang, L. (2023). Fusang: a framework for phylogenetic tree inference via deep learning. Nucleic Acids Research, 51(20), 10909–10923. https://doi.org/10.1093/NAR/GKAD805
dc.relation.references	Whibley, A., Kelley, J. L., & Narum, S. R. (2021). The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Molecular Ecology Resources, 21(3), 641–652. https://doi.org/10.1111/1755-0998.13312
dc.relation.references	Wright, M. S., Haft, D. H., Harkins, D. M., Perez, F., Hujer, K. M., Bajaksouzian, S., Benard, M. F., Jacobs, M. R., Bonomo, R. A., & Adams, M. D. (2014). New insights into dissemination and variation of the health care- associated Bibliografía 71 pathogen Acinetobacter baumannii from genomic analysis. MBio, 5(1). https://doi.org/10.1128/MBIO.00963-13/SUPPL_FILE/MBO006131705ST4.TXT
dc.relation.references	Xavier, B. B., Sabirova, J., Pieter, M., Hernalsteens, J. P., De Greve, H., Goossens, H., & Malhotra-Kumar, S. (2014). Employing whole genome mapping for optimal de novo assembly of bacterial genomes. BMC Research Notes, 7(1), 1–4. https://doi.org/10.1186/1756-0500-7-484/FIGURES/1
dc.relation.references	Xu, C., Zhu, Z., Wang, J., Wang, J., Zhang, W., & Zhang, W. 2024. (2024). Understanding the Role of Cross-Entropy Loss in Fairly Evaluating Large Language Model-based Recommendation. Proceedings of ACM Conference (Conference’17), 1. https://doi.org/XXXXXXX.XXXXXXX
dc.relation.references	Yang, A., Zhang, W., Wang, J., Yang, K., Han, Y., & Zhang, L. (2020). Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA. Frontiers in Bioengineering and Biotechnology, 8, 1032. https://doi.org/10.3389/FBIOE.2020.01032/BIBTEX
dc.relation.references	Yoon, S., Kim, D., Kang, K., & Park, W. J. (2018). TraRECo: A greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics, 19(1), 1–20. https://doi.org/10.1186/S12864-018-5034-X/FIGURES/14
dc.relation.references	Zhai, J., Sun, H., Xu, C., & Sun, W. (2023). ODTC: An online darknet traffic classification model based on multimodal self-attention chaotic mapping features. Electronic Research Archive, 31(8), 5056–5082. https://doi.org/10.3934/ERA.2023259
dc.relation.references	Zhang, D., Zhang, W., Zhao, Y., Zhang, J., He, B., Qin, C., & Yao, J. (2023). DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks. https://arxiv.org/pdf/2307.05628
dc.relation.references	Zhao, Z., Zhou, Y., Wang, S., Zhang, X., Wang, C., & Li, S. (2020). LDscaff: LD based scaffolding of de novo genome assemblies. BMC Bioinformatics, 21. https://doi.org/10.1186/s12859-020-03895-7
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.license	Atribución-NoComercial 4.0 Internacional
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.subject.ddc	000 - Ciencias de la computación, información y obras generales
dc.subject.ddc	570 - Biología
dc.subject.lemb	Bioinformática	spa
dc.subject.lemb	Bioinformatics	eng
dc.subject.lemb	Inteligencia artificial	spa
dc.subject.lemb	Artificial intelligence	eng
dc.subject.proposal	Acinetobacter baumannii, Ensamblaje de Novo, Inteligencia artificial, Procesamiento de Lenguaje Natural, Llenado de huecos	spa
dc.subject.proposal	Acinetobacter baumannii, Artificial Intelligence, Natural Language Processing, Gap Filling	eng
dc.subject.wikidata	Acinetobacter baumannii	spa
dc.subject.wikidata	Genómica comparativa	spa
dc.subject.wikidata	Comparative genomics	eng
dc.title	Modelo de inteligencia artificial para realizar gap filling en ensambles de reads cortos de genomas de Acinetobacter baumannii	spa
dc.title.translated	Artificial intelligence model for performing gap filling in assemblies of short reads of Acinetobacter baumannii genomes	eng
dc.type	Trabajo de grado - Maestría
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.content	Text
dc.type.driver	info:eu-repo/semantics/masterThesis
dc.type.redcol	http://purl.org/redcol/resource_type/TM
dc.type.version	info:eu-repo/semantics/acceptedVersion
dcterms.audience.professionaldevelopment	Bibliotecarios
dcterms.audience.professionaldevelopment	Estudiantes
dcterms.audience.professionaldevelopment	Investigadores
dcterms.audience.professionaldevelopment	Maestros
dcterms.audience.professionaldevelopment	Público general
oaire.accessrights	http://purl.org/coar/access_right/c_abf2

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Trabajo Final de Maestría en Bioinformática.2025.pdf
Tamaño:: 2.33 MB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 5.74 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Maestría en Bioinformática