Análisis del desempeño de DESeq2 para detección de genes diferencialmente expresados para datos de secuenciación genómica

dc.contributor.advisorLópez Kleine, Liliana
dc.contributor.authorBello Reyes, Nicolás
dc.date.accessioned2022-08-02T14:59:09Z
dc.date.available2022-08-02T14:59:09Z
dc.date.issued2022
dc.descriptionilustraciones, graficasspa
dc.description.abstractLas metodologías de secuenciación de ARN han acelerado en gran medida el entendimiento de los procesos biológicos a nivel molecular en diferentes organismos. Aún así, estas metodologías son costosas, lo que lleva a conjuntos de datos de alta dimensionalidad con tamaños de muestra reducidos. Actualmente DESeq2 es una de las metodologías más usadas para el análisis de expresión diferencial, y a pesar de tener una gran fexibilidad en términos de sus hiper-parámetros, en la mayoría de casos se usa con parámetros predeterminados. En este trabajo se analizan dos elementos importantes de esta metodología: se evalúa el desempeño cuando los conteos siguen una distribución Poisson en vez de Binomial negativa y se muestra como la sensibilidad del método aumenta con esta distribución. Adicionalmente se contrasta la corrección por pruebas múltiples de Benjamini y Hochberg con la propuesta de Boca y Leek, y se propone un gráfico para la identificación de la relación funcional con la covariable. (Texto tomado de la fuente)spa
dc.description.abstractARN sequencing methods have dramatically accelerated our understanding of molecular biological processes within different organisms. However, these methodologies are costly, leading to datasets of high dimensionality and limited sampling size. At present DESeq2 is among the most used methodologies for this type of analysis, and despite its great flexibility regarding its hyper-parameters, it is mostly used with default values. In this work we analyze two important elements in this methodology: we assess the performance when counts follow a Poisson distribution instead of a negative binomial and we show how the sensibility increases with this distribution. Additionally we contrast the multiple-test correction proposed by Benjamini and Hochberg with that of Boca and Leek, and we also suggest a plot for the correct identification of the functional relationship with the informative covariate.eng
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ciencias - Estadísticaspa
dc.format.extentv, 39 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/81767
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.departmentDepartamento de Estadísticaspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeBogotá, Colombiaspa
dc.publisher.programBogotá - Ciencias - Maestría en Ciencias - Estadísticaspa
dc.relation.indexedRedColspa
dc.relation.indexedLaReferenciaspa
dc.relation.referencesAl Mahi, Naim ; Begum, Munni: A two-step integrated approach to detect differentially expressed genes in RNA-Seq data. En: Journal of Bioinformatics and Computational Biology 14 (2016), Nr. 06, p. 1650034spa
dc.relation.referencesAnders, Simon ; Huber, Wolfgang: Differential expression analysis for sequence count data. En: Nature Precedings (2010), p. 1-1spa
dc.relation.referencesAuer, Paul L. ; Doerge, Rebecca W.: A two-stage Poisson model for testing RNA-seq data. En: Statistical applications in genetics and molecular biology 10 (2011), Nr. 1spa
dc.relation.referencesBenjamini, Yoav ; Hochberg, Yosef: Controlling the false discovery rate: a practical and powerful approach to multiple testing. En: Journal of the Royal statistical society: series B (Methodological) 57 (1995), Nr. 1, p. 289-300spa
dc.relation.referencesBoca, Simina M. ; Leek, Jeffrey T.: A direct approach to estimating false discovery rates conditional on covariates. En: bioRxiv (2018)spa
dc.relation.referencesCheung, Vivian G. ; Nayak, Renuka R. ; Wang, Isabel X. ; Elwyn, Susannah ; Cousins, Sarah M. ; Morley, Michael ; Spielman, Richard S.: Polymorphic cis-and trans-regulation of human gene expression. En: PLoS Biol 8 (2010), Nr. 9, p. e1000480spa
dc.relation.referencesDillies, Marie-Agnès ; Rau, Andrea ; Aubert, Julie ; Hennequet-Antier, Christelle ; Jeanmougin, Marine ; Servant, Nicolas ; Keime, Céline ; Marot, Guillemette ; Castel, David ; Estelle, Jordi [u. a.]: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. En: Briefings in bioinformatics 14 (2013), Nr. 6, p. 671-683spa
dc.relation.referencesGu, Jinghua ; Wang, Xiao ; Halakivi-Clarke, Leena ; Clarke, Robert ; Xuan, Jianhua: BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data. En: BMC bioinformatics Vol. 15 Springer, 2014, p. 1-11spa
dc.relation.referencesIgnatiadis, Nikolaos ; Klaus, Bernd ; Zaugg, Judith B. ; Huber, Wolfgang: Datadriven hypothesis weighting increases detection power in genome-scale multiple testing. En: Nature methods 13 (2016), Nr. 7, p. 577-580spa
dc.relation.referencesKorthauer, Keegan ; Kimes, Patrick K. ; Duvallet, Claire ; Reyes, Alejandro ; Subramanian, Ayshwarya ; Teng, Mingxiang ; Shukla, Chinmay ; Alm, Eric J. ; Hicks, Stephanie C.: A practical guide to methods controlling false discoveries in computational biology. En: Genome biology 20 (2019), Nr. 1, p. 1-21spa
dc.relation.referencesKorthauer, Keegan D. ; Chu, Li-Fang ; Newton, Michael A. ; Li, Yuan ; Thomson, James ; Stewart, Ron ; Kendziorski, Christina: A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. En: Genome biology 17 (2016), Nr. 1, p. 1-15spa
dc.relation.referencesLonsdale, John ; Thomas, Jeffrey ; Salvatore, Mike ; Phillips, Rebecca ; Lo, Edmund ; Shad, Saboor ; Hasz, Richard ; Walters, Gary ; Garcia, Fernando ; Young, Nancy [u. a.]: The genotype-tissue expression (GTEx) project. En: Nature genetics 45 (2013), Nr. 6, p. 580-585spa
dc.relation.referencesLove, Michael ; Huber, W ; Anders, S: Assessment of DESeq2 performance through simulation. En: DESeq2 vignette (2014)spa
dc.relation.referencesLove, Michael I. ; Huber, Wolfgang ; Anders, Simon: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. En: Genome biology 15 (2014), Nr. 12, p. 550spa
dc.relation.referencesOshlack, Alicia ; Robinson, Mark D. ; Young, Matthew D.: From RNA-seq reads to differential expression results. En: Genome biology 11 (2010), Nr. 12, p. 220spa
dc.relation.referencesPickrell, Joseph K. ; Marioni, John C. ; Pai, Athma A. ; Degner, Jacob F. ; Engelhardt, Barbara E. ; Nkadori, Everlyne ; Veyrieras, Jean-Baptiste ; Stephens, Matthew ; Gilad, Yoav ; Pritchard, Jonathan K.: Understanding mechanisms underlying human gene expression variation with RNA sequencing. En: Nature 464 (2010), Nr. 7289, p. 768-772spa
dc.relation.referencesReyes, Alejandro. Count RNA-seq data used for benchmarking FDR control methods. Oktober 2018spa
dc.relation.referencesReyes, Alejandro ; Huber, Wolfgang: Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. En: Nucleic acids research 46 (2018), Nr. 2, p. 582-592spa
dc.relation.referencesRitchie, Matthew E. ; Phipson, Belinda ; Wu, DI ; Hu, Yifang ; Law, Charity W. ; Shi, Wei ; Smyth, Gordon K.: limma powers differential expression analyses for RNAsequencing and microarray studies. En: Nucleic acids research 43 (2015), Nr. 7, p. e47-e47spa
dc.relation.referencesRobinson, Mark D. ; McCarthy, Davis J. ; Smyth, Gordon K.: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. En: Bioinformatics 26 (2010), Nr. 1, p. 139-140spa
dc.relation.referencesSchuster, Stephan C.: Next-generation sequencing transforms today's biology. En: Nature methods 5 (2008), Nr. 1, p. 16-18spa
dc.relation.referencesScott, James G. ; Kelly, Ryan C. ; Smith, Matthew A. ; Zhou, Pengcheng ; Kass, Robert E.: False discovery rate regression: an application to neural synchrony detection in primary visual cortex. En: Journal of the American Statistical Association 110 (2015), Nr. 510, p. 459-471spa
dc.relation.referencesSoneson, Charlotte: compcodeR - an R package for benchmarking differential expression methods for RNA-seq data. En: Bioinformatics 30 (2014), Nr. 17, p. 2517-2518spa
dc.relation.referencesSoneson, Charlotte ; Delorenzi, Mauro: A comparison of methods for differential expression analysis of RNA-seq data. En: BMC bioinformatics 14 (2013), Nr. 1, p. 1-18spa
dc.relation.referencesSun, Shiquan ; Hood, Michelle ; Scott, Laura ; Peng, Qinke ; Mukherjee, Sayan ; Tung, Jenny ; Zhou, Xiang: Differential expression analysis for RNAseq using Poisson mixed models. En: Nucleic acids research 45 (2017), Nr. 11, p. e106-e106spa
dc.relation.referencesWang, Tianyu ; Li, Boyang ; Nelson, Craig E. ; Nabavi, Sheida: Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. En: BMC bioinformatics 20 (2019), Nr. 1, p. 1-16spa
dc.relation.referencesWang, Zhong ; Gerstein, Mark ; Snyder, Michael: RNA-Seq: a revolutionary tool for transcriptomics. En: Nature reviews genetics 10 (2009), Nr. 1, p. 57-63spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddc570 - Biología::576 - Genética y evoluciónspa
dc.subject.lembARN MENSAJEROspa
dc.subject.lembRna, messengereng
dc.subject.lembDatabases, nucleic acidspa
dc.subject.lembBASES DE DATOS DE ACIDO NUCLEICOeng
dc.subject.proposalRNA-Seqeng
dc.subject.proposalExpresión diferencialspa
dc.subject.proposalModelos lineales generalizadosspa
dc.subject.proposalPruebas múltiplesspa
dc.subject.proposalDifferential expressioneng
dc.subject.proposalGeneralized Linear Modelseng
dc.subject.proposalMultiple testingeng
dc.titleAnálisis del desempeño de DESeq2 para detección de genes diferencialmente expresados para datos de secuenciación genómicaspa
dc.title.translatedAnalysis of the performance of DESeq2 for the detection of differentially expressed genes for genome sequencing dataeng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentEstudiantesspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
dcterms.audience.professionaldevelopmentMaestrosspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1031169106.2022.pdf
Tamaño:
6.11 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ciencias - Estadística

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
3.98 KB
Formato:
Item-specific license agreed upon to submission
Descripción: