A Recurrent Neural Network approach for whole genome bacteria classification

dc.contributorBarreto, Emilianospa
dc.contributor.authorLugo Martínez, Luis Eduardospa
dc.date.accessioned2019-07-03T07:26:06Zspa
dc.date.available2019-07-03T07:26:06Zspa
dc.date.issued2018-09spa
dc.description.abstractThe classification of bacteria plays an essential role in multiple areas of research. Those areas include experimental biology, food and water industries, pathology, microbiology, and evolutionary studies. Although there exist methodologies for classification - such as mass spectrometry, single-nucleotide polymorphisms, microscopic morphology, and neural network approaches - a transition to a whole genome sequence based taxonomy is already undergoing. Next Generation Sequencing helps the transition by producing DNA sequence data efficiently. However, the rate of DNA sequence data generation and the high dimensionality of such data need faster computer methodologies. Machine learning, an area of artificial intelligence, has the ability to analyze high dimensional data in a systematic, fast, and efficient way. Therefore, we propose a sequential deep learning model for bacteria classification. The proposed neural network exploits the vast amounts of information generated by Next Generation Sequencing, in order to extract a classification model for whole genome bacteria sequences. A distributed representation based on k-mers of k={3,4,5} provided an efficient encoding for the bacterial sequences. The classification model relies on a bidirectional recurrent neural network architecture. It generates an accuracy of 0.99455 +/- 0.00281 for 14 species, 0.95031 +/- 0.00469 for 48 species, and 0.89107 +/- 0.00392 for 111 species. After validating the classification model, the bidirectional recurrent neural network outperformed other classification approaches, such as Naive Bayes and Feedforward neural network. The proposed model provides an automated identification method. It infers species for bacterial whole genome sequences and it does not require any manual feature extraction.spa
dc.description.degreelevelMaestríaspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.eprintshttp://bdigital.unal.edu.co/69758/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/68663
dc.language.isospaspa
dc.relation.ispartofUniversidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrialspa
dc.relation.ispartofDepartamento de Ingeniería de Sistemas e Industrialspa
dc.relation.referencesLugo Martínez, Luis Eduardo (2018) A Recurrent Neural Network approach for whole genome bacteria classification. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.spa
dc.rightsDerechos reservados - Universidad Nacional de Colombiaspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddc0 Generalidades / Computer science, information and general worksspa
dc.subject.ddc5 Ciencias naturales y matemáticas / Sciencespa
dc.subject.ddc6 Tecnología (ciencias aplicadas) / Technologyspa
dc.subject.ddc62 Ingeniería y operaciones afines / Engineeringspa
dc.subject.proposalRecurrent neural networkspa
dc.subject.proposalBacteria identificationspa
dc.subject.proposalWhole genome sequencespa
dc.titleA Recurrent Neural Network approach for whole genome bacteria classificationspa
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
MastersFinalProject_LuisLugo.pdf
Tamaño:
985.57 KB
Formato:
Adobe Portable Document Format