Modelo de inteligencia artificial para realizar gap filling en ensambles de reads cortos de genomas de Acinetobacter baumannii
| dc.contributor.advisor | Barreto Hernández, Emiliano | |
| dc.contributor.author | Navas Luquez, Mateo | |
| dc.contributor.researchgroup | Bioinformática | |
| dc.date.accessioned | 2026-02-10T13:57:47Z | |
| dc.date.available | 2026-02-10T13:57:47Z | |
| dc.date.issued | 2025 | |
| dc.description | Ilustraciones, diagramas, gráficos | spa |
| dc.description.abstract | Las técnicas de secuenciación masiva de nueva generación (NGS) fueron revolucionarias en el campo de la genómica y en el proceso de secuenciación de genomas completos (WGS), debido a que permiten secuenciar un volumen de datos con gran profundidad, y a un costo asequible. Estos procedimientos implementan secuenciación masiva de reads cortos, que permiten la lectura en paralelo de todo el genoma. Pese la alta capacidad de lectura, los reads cortos son fragmentos pequeños de secuencias mucho más grandes. Por ello, las técnicas de ensamblaje son fundamentales en la extensión de contigs y orientación de scaffolds, para obtener genomas completos. No obstante, los ensambladores para secuenciación de reads cortos NGS presentan limitaciones técnicas y teóricas asociadas a las regiones repetitivas o de baja complejidad presentes en los genomas. Estas regiones limitan los cálculos de orientación entre scaffols, lo que genera huecos o gaps en los procesos de ensamblaje. Pese a las limitantes del ensamblaje con reads cortos NGS, el numero proyectos de secuenciación y la disponibilidad de datos de WGS siguen en aumento debido a que es altamente costo efectivo. Uno de los organismos con gran crecimiento en el numero de proyectos de secuenciación es la bacteria multirresistente Acinetobacter baumannii. Debido a que es riesgo para la salud pública mundial dado a su capacidad para sobrevivir en ambientes hospitalarios y de generar infecciones graves. Con el fin de mejorar los procesos de calidad en los ensamblajes y aprovechar el gran número de datos de secuenciación se propone implementar metodologías de inteligencia artificial, para entrenar modelos capaces de cerrar huecos en los ensamblajes de novo de Acinetobacter baumannii, que implementen metodologías de reads cortos. (Texto tomado de la fuente) | spa |
| dc.description.abstract | Next-generation sequencing (NGS) techniques were revolutionary in the field of genomics and in the whole-genome sequencing (WGS) process, because they allow for the sequencing of a large volume of data with great depth and at an affordable cost. These procedures implement massive sequencing of short reads, which allow for the parallel reading of the entire genome. Despite the high reading capacity, short reads are small fragments of much larger sequences. Therefore, assembly techniques are fundamental in the extension of contigs and the orientation of scaffolds to obtain complete genomes. However, assemblers for short-read NGS sequencing have technical and theoretical limitations associated with repetitive or low-complexity regions present in the genomes. These regions limit the calculations of orientation between scaffolds, generating gaps in the assembly processes. Despite the limitations of short read NGS assembly, the number of sequencing projects and the availability of WGS data continue to increase because it is highly cost-effective. One of the organisms with a significant increase in the number of sequencing projects is the multidrug-resistant bacterium Acinetobacter baumannii. This is because it poses a risk to global public health due to its ability to survive in hospital environments and cause serious infections. In order to improve quality processes in assemblies and take advantage of the large amount of sequencing data, it is proposed to implement artificial intelligence methodologies to train models capable of closing gaps in de novo Acinetobacter baumannii assemblies that implement short read methodologies. | eng |
| dc.description.degreelevel | Maestría | |
| dc.description.degreename | Maestro en Bioinformatica | |
| dc.description.researcharea | Bioinformática funcional y estructural | |
| dc.format.extent | xi, 71 páginas | |
| dc.format.mimetype | application/pdf | |
| dc.identifier.instname | Universidad Nacional de Colombia | spa |
| dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
| dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
| dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/89446 | |
| dc.language.iso | spa | |
| dc.publisher | Universidad Nacional de Colombia | |
| dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá | |
| dc.publisher.faculty | Facultad de Ingeniería | |
| dc.publisher.place | Bogotá, Colombia | |
| dc.publisher.program | Bogotá - Ingeniería - Maestría en Bioinformática | |
| dc.relation.references | Ali, M., Dewan, A., Sahu, A. K., & Taye, M. M. (2023). Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers 2023, Vol. 12, Page 91, 12(5), 91. https://doi.org/10.3390/COMPUTERS12050091 | |
| dc.relation.references | Ali, Y. A., Awwad, E. M., Al-Razgan, M., & Maarouf, A. (2023). Hyperparameter Search for Machine Learning Algorithms for Optimizing the Computational Complexity. Processes 2023, Vol. 11, Page 349, 11(2), 349. https://doi.org/10.3390/PR11020349 | |
| dc.relation.references | Badillo, S., Banfai, B., Birzele, F., Davydov, I. I., Hutchinson, L., Kam-Thong, T., Siebourg-Polster, J., Steiert, B., & Zhang, J. D. (2020). An Introduction to Machine Learning. Clinical Pharmacology and Therapeutics, 107(4), 871. https://doi.org/10.1002/CPT.1796 | |
| dc.relation.references | Bayat, A., Deshpande, N. P., Wilkins, M. R., & Parameswaran, S. (2020). Fast Short Read De-Novo Assembly Using Overlap-Layout-Consensus Approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(1), 334–338. https://doi.org/10.1109/TCBB.2018.2875479 | |
| dc.relation.references | Boetzer, M., & Pirovano, W. (2012). Toward almost closed genomes with GapFiller. Genome Biology, 13(6). https://doi.org/10.1186/GB-2012-13-6-R56 | |
| dc.relation.references | Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114. https://doi.org/10.1093/BIOINFORMATICS/BTU170 | |
| dc.relation.references | Bush, S. J., Foster, D., Eyre, D. W., Clark, E. L., de Maio, N., Shaw, L. P., Stoesser, N., Peto, T. E. A., Crook, D. W., & Walker, A. S. (2020a). Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines. GigaScience, 9(2), 1–21. https://doi.org/10.1093/GIGASCIENCE/GIAA007 | |
| dc.relation.references | Calin, O. (2020). Deep Learning Architectures. https://doi.org/10.1007/978-3-030 36721-3 | |
| dc.relation.references | Cappelletti, L., Fontana, T., Di Donato, G. W., Di Tucci, L., Casiraghi, E., & Valentini, G. (2020). Complex data imputation by auto-encoders and convolutional neural networks—A case study on genome gap-filling. Computers, 9(2). https://doi.org/10.3390/computers9020037 | |
| dc.relation.references | Chen, A., Field, M., Bhattacharya, A., Nabeel Asim Muhammad, M., Nabeel Asim, M., Ali Ibrahim, M., Zaib, A., & Dengel, A. (2025). DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models. Frontiers in Medicine, 12, 1503229. https://doi.org/10.3389/FMED.2025.1503229 | |
| dc.relation.references | Chen, E., Chu, J., Zhang, J., Warren, R., & Birol, I. (2021). GapPredict A Language Model for Resolving Gaps in Draft Genome Assemblies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. https://doi.org/10.1109/TCBB.2021.3109557 | |
| dc.relation.references | Chen, Y., Wang, G., & Zhang, T. (2024). Utilizing Deep Neural Networks to Fill Gaps in Small Genomes. International Journal of Molecular Sciences 2024, Vol. 25, Page 8502, 25(15), 8502. https://doi.org/10.3390/IJMS25158502 | |
| dc.relation.references | Chu, C., Li, X., & Wu, Y. (2019). GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads. BMC Genomics, 20(Suppl 5). https://doi.org/10.1186/S12864-019-5703-4 | |
| dc.relation.references | Coombe, L., Li, J. X., Lo, T., Wong, J., Nikolic, V., Warren, R. L., & Birol, I. (2021). LongStitch: high-quality genome assembly correction and scaffolding using long reads. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859-021-04451-7 | |
| dc.relation.references | Coombe, L., Nikolić, V., Chu, J., Birol, I., & Warren, R. L. (2020). NtJoin: Fast and lightweight assembly-guided scaffolding using minimizer graphs. Bioinformatics, 36(12), 3885–3887. https://doi.org/10.1093/bioinformatics/btaa253 | |
| dc.relation.references | Crossley, B. M., Bai, J., Glaser, A., Maes, R., Porter, E., Killian, M. L., Clement, T., & Toohey-Kurth, K. (2020). Guidelines for Sanger sequencing and molecular assay monitoring. Journal of Veterinary Diagnostic Investigation, 32(6), 767 775. https://doi.org/10.1177/1040638720905833/ASSET/IMAGES/LARGE/10.1177_ 1040638720905833-FIG1.JPEG | |
| dc.relation.references | Cuber, P., Chooneea, D., Geeves, C., Salatino, S., Creedy, T. J., Griffin, C., Sivess, L., Barnes, I., Price, B., & Misra, R. (2023). Comparing the accuracy and efficiency of third generation sequencing technologies, Oxford Nanopore Technologies, and Pacific Biosciences, for DNA barcode sequencing applications. Ecological Genetics and Genomics, 28, 100181. https://doi.org/10.1016/J.EGG.2023.100181 | |
| dc.relation.references | Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2020). Transformer-XL: Attentive language models beyond a fixed-length context. ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, 2978–2988. https://doi.org/10.18653/v1/p19 1285 | |
| dc.relation.references | Darby, C. A., Gaddipati, R., Schatz, M. C., & Langmead, B. (2020). Vargas: heuristic free alignment for assessing linear and graph read aligners. Bioinformatics, 36(12), 3712–3718. https://doi.org/10.1093/BIOINFORMATICS/BTAA265 | |
| dc.relation.references | Darling, A. C. E., Mau, B., Blattner, F. R., & Perna, N. T. (2004). Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Research, 14(7), 1394. https://doi.org/10.1101/GR.2289704 | |
| dc.relation.references | Duraisamy, P., Abinaya Srijanani, A., Duraisamy, M., Amrit Candida, M., Dinesh Babu, P., & Karthik, S. (2024). Implementation of CNN-LSTM Integration for Advancing Human-Computer Dialogue through Precise Sign Language Gesture Interpretation. 5th International Conference on Recent Trends in Computer Science and Technology, ICRTCST 2024 - Proceedings, 5–9. https://doi.org/10.1109/ICRTCST61793.2024.10578503 | |
| dc.relation.references | Fang, Y., Quan, J., Hua, X., Feng, Y., Li, X., Wang, J., Ruan, Z., Shang, S., & Yu, Y. (2016). Complete genome sequence of Acinetobacter baumannii XH386 (ST208), a multi-drug resistant bacteria isolated from pediatric hospital in China. Genomics Data, 7, 269. https://doi.org/10.1016/J.GDATA.2015.12.002 | |
| dc.relation.references | Forouzan, E., Shariati, P., Mousavi Maleki, M. S., Karkhane, A. A., & Yakhchali, B. (2018). Practical evaluation of 11 de novo assemblers in metagenome assembly. Journal of Microbiological Methods, 151, 99–105. https://doi.org/10.1016/j.mimet.2018.06.007 | |
| dc.relation.references | Generalovic, T. N., McCarthy, S. A., Warren, I. A., Wood, J. M. D., Torrance, J., Sims, Y., Quail, M., Howe, K., Pipan, M., Durbin, R., & Jiggins, C. D. (2021). A high-quality, chromosome-level genome assembly of the Black Soldier Fly (Hermetia illucens L.). G3: Genes, Genomes, Genetics, 11(5). https://doi.org/10.1093/g3journal/jkab085 | |
| dc.relation.references | Giani, A. M., Gallo, G. R., Gianfranceschi, L., & Formenti, G. (2020). Long walk to genomics: History and current approaches to genome sequencing and assembly. Computational and Structural Biotechnology Journal, 18, 9–19. https://doi.org/10.1016/j.csbj.2019.11.002 | |
| dc.relation.references | Gourlé, H., Karlsson-Lindsjö, O., Hayer, J., & Bongcam-Rudloff, E. (2019). Simulating Illumina metagenomic data with InSilicoSeq. Bioinformatics, 35(3), 521–522. https://doi.org/10.1093/BIOINFORMATICS/BTY630 | |
| dc.relation.references | Gunasekaran, H., Ramalakshmi, K., Rex Macedo Arokiaraj, A., Kanmani, S. D., Venkatesan, C., & Dhas, C. S. G. (2021). Analysis of DNA Sequence Classification Using CNN and Hybrid Models. Computational and Mathematical Methods in Medicine, 2021, 1835056. https://doi.org/10.1155/2021/1835056 | |
| dc.relation.references | Gupta, Y. M., Kirana, S. N., & Homchan, S. (2024). Representing DNA for machine learning algorithms: A primer on one-hot, binary, and integer encodings. Biochemistry and Molecular Biology Education, 53(2), 142–146. https://doi.org/10.1002/BMB.21870;WGROUP:STRING:PUBLICATION | |
| dc.relation.references | Gurevich, A., Saveliev, V., Vyahhi, N., & Tesler, G. (2013). QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), 1072. https://doi.org/10.1093/BIOINFORMATICS/BTT086 | |
| dc.relation.references | Huang, B., Wei, G., Wang, B., Ju, F., Zhong, Y., Shi, Z., Sun, S., & Bu, D. (2021). Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph. BMC Bioinformatics, 22(1). https://doi.org/10.1186/s12859 021-04448-2 | |
| dc.relation.references | Iqbal, T., & Qureshi, S. (2022). The survey: Text generation models in deep learning. Journal of King Saud University - Computer and Information Sciences, 34(6), 2515–2528. https://doi.org/10.1016/J.JKSUCI.2020.04.001 | |
| dc.relation.references | Ji, Y., Zhou, Z., Liu, H., & Davuluri, R. V. (2021a). DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA language in genome. Bioinformatics, 37(15), 2112–2120. https://doi.org/10.1093/BIOINFORMATICS/BTAB083 | |
| dc.relation.references | Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255–260. https://doi.org/10.1126/SCIENCE.AAA8415 | |
| dc.relation.references | Kairi, A., Majumdar, P. G., & Rao, A. R. (2020). hAssembler: A hybrid de novo genome assembly approach for large genomes. Indian Journal of Agricultural Sciences, 90(10), 2000–2005. https://www.scopus.com/inward/record.uri?eid=2-s2.0 85114150355&partnerID=40&md5=dc46ab77dac243c63638847f87cedbfc | |
| dc.relation.references | Kaplan, N., & Dekker, J. (2013). High-throughput genome scaffolding from in vivo DNA interaction frequency. Nature Biotechnology 2013 31:12, 31(12), 1143 1147. https://doi.org/10.1038/nbt.2768 | |
| dc.relation.references | Kuśmirek, W., & Nowak, R. (2018). De novo assembly of bacterial genomes with repetitive DNA regions by dnaasm application. BMC Bioinformatics, 19(1). https://doi.org/10.1186/S12859-018-2281-4 | |
| dc.relation.references | Kwon, N., Yoo, Y., & Lee, B. (2024). Class conditioned text generation with style attention mechanism for embracing diversity. Applied Soft Computing, 163. https://doi.org/10.1016/j.asoc.2024.111893 | |
| dc.relation.references | Lantz, H., Dominguez Del Angel, V., Hjerde, E., Sterck, L., Capella-Gutierrez, S., Notredame, C., Vinnere Pettersson, O., Amselem, J., Bouri, L., Bocs, S., Klopp, C., Gibrat, J. F., Vlasova, A., Leskosek, B. L., Soler, L., & Binzer-Panchal, M. (2018). Ten steps to get started in Genome Assembly and Annotation. F1000Research, 7, ELIXIR-148. https://doi.org/10.12688/F1000RESEARCH.13598.1 | |
| dc.relation.references | Letunic, I., & Bork, P. (2024). Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Research, 52(W1), W78. https://doi.org/10.1093/NAR/GKAE268 | |
| dc.relation.references | Li, D., Liu, C. M., Luo, R., Sadakane, K., & Lam, T. W. (2015). MEGAHIT: an ultra fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics (Oxford, England), 31(10), 1674–1676. https://doi.org/10.1093/BIOINFORMATICS/BTV033 | |
| dc.relation.references | Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34(18), 3094. https://doi.org/10.1093/BIOINFORMATICS/BTY191 Bibliografía 66 | |
| dc.relation.references | Li, H., & Durbin, R. (2009a). Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics (Oxford, England), 25(14), 1754–1760. https://doi.org/10.1093/BIOINFORMATICS/BTP324 | |
| dc.relation.references | Liao, Y. C., Lin, S. H., & Lin, H. H. (2015). Completing bacterial genome assemblies: strategy and performance comparisons. Scientific Reports, 5, 8747. https://doi.org/10.1038/SREP08747 | |
| dc.relation.references | Liu, F., Zhu, Y., Yi, Y., Lu, N., Zhu, B., & Hu, Y. (2014). Comparative genomic analysis of Acinetobacter baumannii clinical isolates reveals extensive genomic variation and diverse antibiotic resistance determinants. BMC Genomics, 15(1), 1–14. https://doi.org/10.1186/1471-2164-15-1163/TABLES/4 | |
| dc.relation.references | Liu, J., Yang, M., Yu, Y., Xu, H., Li, K., & Zhou, X. (2024). Large language models in bioinformatics: applications and perspectives. ArXiv, arXiv:2401.04155v1. https://pmc.ncbi.nlm.nih.gov/articles/PMC10802675/ | |
| dc.relation.references | Liu, Y. H., Luo, C., Golding, S. G., Ioffe, J. B., & Zhou, X. M. (2024). Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data. Nature Communications 2024 15:1, 15(1), 1–22. https://doi.org/10.1038/s41467-024-46614-z | |
| dc.relation.references | Lu, P., Jin, J., Li, Z., Xu, Y., Hu, D., Liu, J., & Cao, P. (2020). PGcloser: Fast Parallel Gap-Closing Tool Using Long-Reads or Contigs to Fill Gaps in Genomes. Evolutionary Bioinformatics, 16. https://doi.org/10.1177/1176934320913859 | |
| dc.relation.references | Luhmann, N., Doerr, D., & Chauve, C. (2017). Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient yersinia pestis genomes. Microbial Genomics, 3(9). https://doi.org/10.1099/mgen.0.000123 | |
| dc.relation.references | Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., … Wang, J. (2012). SOAPdenovo2: An empirically improved memory efficient short-read de novo assembler. GigaScience, 1(1). https://doi.org/10.1186/2047-217X-1-18 | |
| dc.relation.references | McGinnis, S., & Madden, T. L. (2004). BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Research, 32(Web Server issue), W20. https://doi.org/10.1093/NAR/GKH435 Bibliografía 67 | |
| dc.relation.references | Minkin, I., & Medvedev, P. (2020). Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nature Communications, 11(1), 1–11. https://doi.org/10.1038/S41467-020-19777 8;SUBJMETA=114,212,61,631,748;KWRD=COMPARATIVE+GENOMICS,CO MPUTATIONAL+BIOLOGY+AND+BIOINFORMATICS | |
| dc.relation.references | Miyamoto, M., Motooka, D., Gotoh, K., Imai, T., Yoshitake, K., Goto, N., Iida, T., Yasunaga, T., Horii, T., Arakawa, K., Kasahara, M., & Nakamura, S. (2014). Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. BMC Genomics, 15(1), 1–9. https://doi.org/10.1186/1471-2164-15-699/COMMENTS | |
| dc.relation.references | Moeckel, C., Mareboina, M., Konnaris, M. A., Chan, C. S. Y., Mouratidis, I., Montgomery, A., Chantzi, N., Pavlopoulos, G. A., & Georgakopoulos-Soares, I. (2024). A survey of k-mer methods and applications in bioinformatics. Computational and Structural Biotechnology Journal, 23, 2289–2303. https://doi.org/10.1016/J.CSBJ.2024.05.025 | |
| dc.relation.references | Morisse, P., Marchet, C., Limasset, A., Lecroq, T., & Lefebvre, A. (2021). Scalable long read self-correction and assembly polishing with multiple sequence alignment. Scientific Reports 2021 11:1, 11(1), 1–13. https://doi.org/10.1038/s41598-020-80757-5 | |
| dc.relation.references | Nadalin, F., Vezzi, F., & Policriti, A. (2012). GapFiller: A de novo assembly approach to fill the gap within paired reads. BMC Bioinformatics, 13(SUPPL 1), 1–16. https://doi.org/10.1186/1471-2105-13-S14-S8/TABLES/7 Ng, P. (2017). dna2vec: Consistent vector representations of variable-length k-mers. https://arxiv.org/pdf/1701.06279 | |
| dc.relation.references | Nielsen, M. A. (2015). Neural Networks and Deep Learning. Determination Press. http://neuralnetworksanddeeplearning.com Ogunsanya, M., Isichei, J., & Desai, S. (2023). Grid search hyperparameter tuning in additive manufacturing processes. Manufacturing Letters, 35, 1031–1042. https://doi.org/10.1016/J.MFGLET.2023.08.056 | |
| dc.relation.references | Paulino, D., Warren, R. L., Vandervalk, B. P., Raymond, A., Jackman, S. D., & Birol, I. (2015). Sealer: A scalable gap-closing application for finishing draft genomes. BMC Bioinformatics, 16(1), 1–8. https://doi.org/10.1186/S12859-015-0663 4/FIGURES/2 | |
| dc.relation.references | Peona, V., Blom, M. P. K., Xu, L., Burri, R., Sullivan, S., Bunikis, I., Liachko, I., Haryoko, T., Jønsson, K. A., Zhou, Q., Irestedt, M., & Suh, A. (2020). Identifying the causes and consequences of assembly gaps using a multiplatform genome Bibliografía 68 assembly of a bird‐of‐paradise. Molecular Ecology Resources, 21(1), 263. https://doi.org/10.1111/1755-0998.13252 | |
| dc.relation.references | Pevzner, P. A., Tang, H., & Tesler, G. (2004). De novo repeat classification and fragment assembly. Genome Research, 14(9), 1786–1796. https://doi.org/10.1101/GR.2395204 | |
| dc.relation.references | Pourcel, C., Minandri, F., Hauck, Y., D’Arezzo, S., Imperi, F., Vergnaud, G., & Visca, P. (2011). Identification of variable-number tandem-repeat (VNTR) sequences in Acinetobacter baumannii and interlaboratory validation of an optimized multiple-locus VNTR analysis typing scheme. Journal of Clinical Microbiology, 49(2), 539–548. https://doi.org/10.1128/JCM.02003-10 | |
| dc.relation.references | Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020a). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1), e102. https://doi.org/10.1002/CPBI.102 | |
| dc.relation.references | Prjibelski, A., Antipov, D., Meleshko, D., Lapidus, A., & Korobeynikov, A. (2020b). Using SPAdes De Novo Assembler. Current Protocols in Bioinformatics, 70(1). https://doi.org/10.1002/CPBI.102 | |
| dc.relation.references | Rácz, A., Bajusz, D., & Héberger, K. (2019). Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics. Molecules (Basel, Switzerland), 24. https://doi.org/10.3390/molecules24152811 | |
| dc.relation.references | Rizzi, R., Beretta, S., Patterson, M., Pirola, Y., Previtali, M., Della Vedova, G., & Bonizzoni, P. (2019). Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era. Quantitative Biology, 7(4), 278 292. https://doi.org/10.1007/S40484-019-0181-X/METRICS | |
| dc.relation.references | Saha, S., Bridges, S., Magbanua, Z. V., & Peterson, D. G. (2008). Computational Approaches and Tools Used in Identification of Dispersed Repetitive DNA Sequences. Tropical Plant Biology 2008 1:1, 1(1), 85–96. https://doi.org/10.1007/S12042-007-9007-5 | |
| dc.relation.references | Salmela, L., Sahlin, K., Mäkinen, V., & Tomescu, A. I. (2016). Gap filling as exact path length problem. Journal of Computational Biology, 23(5), 347–361. https://doi.org/10.1089/cmb.2015.0197 | |
| dc.relation.references | Salmela, L., & Tomescu, A. I. (2016). Safely filling gaps with partial solutions common to all solutions. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9838 LNCS, XIII. https://www.scopus.com/inward/record.uri?eid=2-s2.0 84984982154&partnerID=40&md5=152f5f5c325caa43d1074da1b3360ed1 Bibliografía 69 | |
| dc.relation.references | Salzberg, S. L., Phillippy, A. M., Zimin, A., Puiu, D., Magoc, T., Koren, S., Treangen, T. J., Schatz, M. C., Delcher, A. L., Roberts, M., Marçais, G., Pop, M., & Yorke, J. A. (2012). GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Research, 22(3), 557–567. https://doi.org/10.1101/gr.131383.111 | |
| dc.relation.references | Sanabria, M., Hirsch, J., & Poetsch, A. R. (2024). Distinguishing word identity and sequence context in DNA language models. BMC Bioinformatics, 25(1), 1–12. https://doi.org/10.1186/S12859-024-05869-5/FIGURES/3 | |
| dc.relation.references | Schmeing, S., & Robinson, M. D. (2023). Gapless provides combined scaffolding, gap filling, and assembly correction with long reads. Life Science Alliance, 6(7). https://doi.org/10.26508/LSA.202201471 | |
| dc.relation.references | Schwengers, O., Hoek, A., Fritzenwanker, M., Falgenhauer, L., Hain, T., Chakraborty, T., & Goesmann, A. (2020). ASA3P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLoS Computational Biology, 16(3). https://doi.org/10.1371/journal.pcbi.1007134 | |
| dc.relation.references | Seemann, T. (2015). Snippy: rapid haploid variant calling and core SNP phylogeny. GitHub. Available at: Github. Com/Tseemann/Snippy. Shanthamallu, U. S., & Spanias, A. (2022). Machine and Deep Learning Algorithms and Applications (pp. 1–106). Springer Nature. https://asu.elsevierpure.com/en/publications/machine-and-deep-learning algorithms-and-applications | |
| dc.relation.references | She, X., & Zhang, D. (2018). Text Classification Based on Hybrid CNN-LSTM Hybrid Model. Proceedings - 2018 11th International Symposium on Computational Intelligence and Design, ISCID 2018, 2, 185–189. https://doi.org/10.1109/ISCID.2018.10144 | |
| dc.relation.references | Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M., & Birol, I. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117. https://doi.org/10.1101/GR.089532.108 | |
| dc.relation.references | Sohn, J. Il, & Nam, J. W. (2018). The present and future of de novo whole-genome assembly. Briefings in Bioinformatics, 19(1), 23–40. https://doi.org/10.1093/bib/bbw096 | |
| dc.relation.references | Song, S., Huang, H., & Ruan, T. (2019). Abstractive text summarization using LSTM CNN based deep learning. Multimedia Tools and Applications, 78(1), 857–875. https://doi.org/10.1007/S11042-018-5749-3/METRICS Bibliografía 70 | |
| dc.relation.references | Thomma, B. P. H. J., Seidl, M. F., Shi-Kunne, X., Cook, D. E., Bolton, M. D., van Kan, J. A. L., & Faino, L. (2016). Mind the gap; seven reasons to close fragmented genome assemblies. Fungal Genetics and Biology, 90, 24–30. https://doi.org/10.1016/J.FGB.2015.08.010, | |
| dc.relation.references | Tørresen, O. K., Star, B., Mier, P., Andrade-Navarro, M. A., Bateman, A., Jarnot, P., Gruca, A., Grynberg, M., Kajava, A. V., Promponas, V. J., Anisimova, M., Jakobsen, K. S., & Linke, D. (2019). Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Research, 47(21), 10994. https://doi.org/10.1093/NAR/GKZ841 | |
| dc.relation.references | Treangen, T. J., & Salzberg, S. L. (2011). Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature Reviews Genetics 2011 13:1, 13(1), 36–46. https://doi.org/10.1038/nrg3117 | |
| dc.relation.references | Turton, J. F., Matos, J., Kaufmann, M. E., & Pitt, T. L. (2009). Variable number tandem repeat loci providing discrimination within widespread genotypes of acinetobacter baumannii. European Journal of Clinical Microbiology and Infectious Diseases, 28(5), 499–507. https://doi.org/10.1007/S10096-008-0659 3, | |
| dc.relation.references | Uguen, K., Michaud, J. L., & Génin, E. (2024). Short Tandem Repeats in the era of next-generation sequencing: from historical loci to population databases. European Journal of Human Genetics : EJHG, 32(9), 1037–1044. https://doi.org/10.1038/S41431-024-01666-Z | |
| dc.relation.references | Vrigazova, B. (2021). The Proportion for Splitting Data into Training and Test Set for the Bootstrap in Classification Problems. Business Systems Research : International Journal of the Society for Advancing Innovation and Research in Economy, 12(1), 228–242. https://doi.org/10.2478/bsrj-2021-0015 | |
| dc.relation.references | Wang, Z., Sun, J., Gao, Y., Xue, Y., Zhang, Y., Li, K., Zhang, W., Zhang, C., Zu, J., & Zhang, L. (2023). Fusang: a framework for phylogenetic tree inference via deep learning. Nucleic Acids Research, 51(20), 10909–10923. https://doi.org/10.1093/NAR/GKAD805 | |
| dc.relation.references | Whibley, A., Kelley, J. L., & Narum, S. R. (2021). The changing face of genome assemblies: Guidance on achieving high-quality reference genomes. Molecular Ecology Resources, 21(3), 641–652. https://doi.org/10.1111/1755-0998.13312 | |
| dc.relation.references | Wright, M. S., Haft, D. H., Harkins, D. M., Perez, F., Hujer, K. M., Bajaksouzian, S., Benard, M. F., Jacobs, M. R., Bonomo, R. A., & Adams, M. D. (2014). New insights into dissemination and variation of the health care- associated Bibliografía 71 pathogen Acinetobacter baumannii from genomic analysis. MBio, 5(1). https://doi.org/10.1128/MBIO.00963-13/SUPPL_FILE/MBO006131705ST4.TXT | |
| dc.relation.references | Xavier, B. B., Sabirova, J., Pieter, M., Hernalsteens, J. P., De Greve, H., Goossens, H., & Malhotra-Kumar, S. (2014). Employing whole genome mapping for optimal de novo assembly of bacterial genomes. BMC Research Notes, 7(1), 1–4. https://doi.org/10.1186/1756-0500-7-484/FIGURES/1 | |
| dc.relation.references | Xu, C., Zhu, Z., Wang, J., Wang, J., Zhang, W., & Zhang, W. 2024. (2024). Understanding the Role of Cross-Entropy Loss in Fairly Evaluating Large Language Model-based Recommendation. Proceedings of ACM Conference (Conference’17), 1. https://doi.org/XXXXXXX.XXXXXXX | |
| dc.relation.references | Yang, A., Zhang, W., Wang, J., Yang, K., Han, Y., & Zhang, L. (2020). Review on the Application of Machine Learning Algorithms in the Sequence Data Mining of DNA. Frontiers in Bioengineering and Biotechnology, 8, 1032. https://doi.org/10.3389/FBIOE.2020.01032/BIBTEX | |
| dc.relation.references | Yoon, S., Kim, D., Kang, K., & Park, W. J. (2018). TraRECo: A greedy approach based de novo transcriptome assembler with read error correction using consensus matrix. BMC Genomics, 19(1), 1–20. https://doi.org/10.1186/S12864-018-5034-X/FIGURES/14 | |
| dc.relation.references | Zhai, J., Sun, H., Xu, C., & Sun, W. (2023). ODTC: An online darknet traffic classification model based on multimodal self-attention chaotic mapping features. Electronic Research Archive, 31(8), 5056–5082. https://doi.org/10.3934/ERA.2023259 | |
| dc.relation.references | Zhang, D., Zhang, W., Zhao, Y., Zhang, J., He, B., Qin, C., & Yao, J. (2023). DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks. https://arxiv.org/pdf/2307.05628 | |
| dc.relation.references | Zhao, Z., Zhou, Y., Wang, S., Zhang, X., Wang, C., & Li, S. (2020). LDscaff: LD based scaffolding of de novo genome assemblies. BMC Bioinformatics, 21. https://doi.org/10.1186/s12859-020-03895-7 | |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | |
| dc.rights.license | Atribución-NoComercial 4.0 Internacional | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales | |
| dc.subject.ddc | 570 - Biología | |
| dc.subject.lemb | Bioinformática | spa |
| dc.subject.lemb | Bioinformatics | eng |
| dc.subject.lemb | Inteligencia artificial | spa |
| dc.subject.lemb | Artificial intelligence | eng |
| dc.subject.proposal | Acinetobacter baumannii, Ensamblaje de Novo, Inteligencia artificial, Procesamiento de Lenguaje Natural, Llenado de huecos | spa |
| dc.subject.proposal | Acinetobacter baumannii, Artificial Intelligence, Natural Language Processing, Gap Filling | eng |
| dc.subject.wikidata | Acinetobacter baumannii | spa |
| dc.subject.wikidata | Genómica comparativa | spa |
| dc.subject.wikidata | Comparative genomics | eng |
| dc.title | Modelo de inteligencia artificial para realizar gap filling en ensambles de reads cortos de genomas de Acinetobacter baumannii | spa |
| dc.title.translated | Artificial intelligence model for performing gap filling in assemblies of short reads of Acinetobacter baumannii genomes | eng |
| dc.type | Trabajo de grado - Maestría | |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | |
| dc.type.content | Text | |
| dc.type.driver | info:eu-repo/semantics/masterThesis | |
| dc.type.redcol | http://purl.org/redcol/resource_type/TM | |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | |
| dcterms.audience.professionaldevelopment | Bibliotecarios | |
| dcterms.audience.professionaldevelopment | Estudiantes | |
| dcterms.audience.professionaldevelopment | Investigadores | |
| dcterms.audience.professionaldevelopment | Maestros | |
| dcterms.audience.professionaldevelopment | Público general | |
| oaire.accessrights | http://purl.org/coar/access_right/c_abf2 |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- Trabajo Final de Maestría en Bioinformática.2025.pdf
- Tamaño:
- 2.33 MB
- Formato:
- Adobe Portable Document Format
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:

