dc.rights.license | Atribución-NoComercial 4.0 Internacional |
dc.contributor.advisor | López Kleine, Liliana |
dc.contributor.author | Bello Reyes, Nicolás |
dc.date.accessioned | 2022-08-02T14:59:09Z |
dc.date.available | 2022-08-02T14:59:09Z |
dc.date.issued | 2022 |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/81767 |
dc.description | ilustraciones, graficas |
dc.description.abstract | Las metodologías de secuenciación de ARN han acelerado en gran medida el entendimiento de los procesos biológicos a nivel molecular en diferentes organismos. Aún así, estas metodologías son costosas, lo que lleva a conjuntos de datos de alta dimensionalidad con tamaños de muestra reducidos. Actualmente DESeq2 es una de las metodologías más usadas para el análisis de expresión diferencial, y a pesar de tener una gran fexibilidad en términos de sus hiper-parámetros, en la mayoría de casos se usa con parámetros predeterminados. En este trabajo se analizan dos elementos importantes de esta metodología: se evalúa el desempeño cuando los conteos siguen una distribución Poisson en vez de Binomial negativa y se muestra como la sensibilidad del método aumenta con esta distribución. Adicionalmente se contrasta la corrección por pruebas múltiples de Benjamini y Hochberg con la propuesta de Boca y Leek, y se propone un gráfico para la identificación de la relación funcional con la covariable. (Texto tomado de la fuente) |
dc.description.abstract | ARN sequencing methods have dramatically accelerated our understanding of molecular biological processes within different organisms. However, these methodologies are costly, leading to datasets of high dimensionality and limited sampling size. At present DESeq2 is among the most used methodologies for this type of analysis, and despite its great flexibility regarding its hyper-parameters, it is mostly used with default values. In this work we analyze two important elements in this methodology: we assess the performance when counts follow a Poisson distribution instead of a negative binomial and we show how the sensibility increases with this distribution. Additionally we contrast the multiple-test correction proposed by Benjamini and Hochberg with that of Boca and Leek, and we also suggest a plot for the correct identification of the functional relationship with the informative covariate. |
dc.format.extent | v, 39 páginas |
dc.format.mimetype | application/pdf |
dc.language.iso | spa |
dc.publisher | Universidad Nacional de Colombia |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ |
dc.subject.ddc | 570 - Biología::576 - Genética y evolución |
dc.title | Análisis del desempeño de DESeq2 para detección de genes diferencialmente expresados para datos de secuenciación genómica |
dc.type | Trabajo de grado - Maestría |
dc.type.driver | info:eu-repo/semantics/masterThesis |
dc.type.version | info:eu-repo/semantics/acceptedVersion |
dc.publisher.program | Bogotá - Ciencias - Maestría en Ciencias - Estadística |
dc.description.degreelevel | Maestría |
dc.description.degreename | Magíster en Ciencias - Estadística |
dc.identifier.instname | Universidad Nacional de Colombia |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia |
dc.identifier.repourl | https://repositorio.unal.edu.co/ |
dc.publisher.department | Departamento de Estadística |
dc.publisher.faculty | Facultad de Ciencias |
dc.publisher.place | Bogotá, Colombia |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá |
dc.relation.indexed | RedCol |
dc.relation.indexed | LaReferencia |
dc.relation.references | Al Mahi, Naim ; Begum, Munni: A two-step integrated approach to detect differentially expressed genes in RNA-Seq data. En: Journal of Bioinformatics and Computational Biology 14 (2016), Nr. 06, p. 1650034 |
dc.relation.references | Anders, Simon ; Huber, Wolfgang: Differential expression analysis for sequence count data. En: Nature Precedings (2010), p. 1-1 |
dc.relation.references | Auer, Paul L. ; Doerge, Rebecca W.: A two-stage Poisson model for testing RNA-seq data. En: Statistical applications in genetics and molecular biology 10 (2011), Nr. 1 |
dc.relation.references | Benjamini, Yoav ; Hochberg, Yosef: Controlling the false discovery rate: a practical and powerful approach to multiple testing. En: Journal of the Royal statistical society: series B (Methodological) 57 (1995), Nr. 1, p. 289-300 |
dc.relation.references | Boca, Simina M. ; Leek, Jeffrey T.: A direct approach to estimating false discovery rates conditional on covariates. En: bioRxiv (2018) |
dc.relation.references | Cheung, Vivian G. ; Nayak, Renuka R. ; Wang, Isabel X. ; Elwyn, Susannah ; Cousins, Sarah M. ; Morley, Michael ; Spielman, Richard S.: Polymorphic cis-and trans-regulation of human gene expression. En: PLoS Biol 8 (2010), Nr. 9, p. e1000480 |
dc.relation.references | Dillies, Marie-Agnès ; Rau, Andrea ; Aubert, Julie ; Hennequet-Antier, Christelle ; Jeanmougin, Marine ; Servant, Nicolas ; Keime, Céline ; Marot, Guillemette ; Castel, David ; Estelle, Jordi [u. a.]: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. En: Briefings in bioinformatics 14 (2013), Nr. 6, p. 671-683 |
dc.relation.references | Gu, Jinghua ; Wang, Xiao ; Halakivi-Clarke, Leena ; Clarke, Robert ; Xuan, Jianhua: BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq data. En: BMC bioinformatics Vol. 15 Springer, 2014, p. 1-11 |
dc.relation.references | Ignatiadis, Nikolaos ; Klaus, Bernd ; Zaugg, Judith B. ; Huber, Wolfgang: Datadriven hypothesis weighting increases detection power in genome-scale multiple testing. En: Nature methods 13 (2016), Nr. 7, p. 577-580 |
dc.relation.references | Korthauer, Keegan ; Kimes, Patrick K. ; Duvallet, Claire ; Reyes, Alejandro ; Subramanian, Ayshwarya ; Teng, Mingxiang ; Shukla, Chinmay ; Alm, Eric J. ; Hicks, Stephanie C.: A practical guide to methods controlling false discoveries in computational biology. En: Genome biology 20 (2019), Nr. 1, p. 1-21 |
dc.relation.references | Korthauer, Keegan D. ; Chu, Li-Fang ; Newton, Michael A. ; Li, Yuan ; Thomson, James ; Stewart, Ron ; Kendziorski, Christina: A statistical approach for identifying differential distributions in single-cell RNA-seq experiments. En: Genome biology 17 (2016), Nr. 1, p. 1-15 |
dc.relation.references | Lonsdale, John ; Thomas, Jeffrey ; Salvatore, Mike ; Phillips, Rebecca ; Lo, Edmund ; Shad, Saboor ; Hasz, Richard ; Walters, Gary ; Garcia, Fernando ; Young, Nancy [u. a.]: The genotype-tissue expression (GTEx) project. En: Nature genetics 45 (2013), Nr. 6, p. 580-585 |
dc.relation.references | Love, Michael ; Huber, W ; Anders, S: Assessment of DESeq2 performance through simulation. En: DESeq2 vignette (2014) |
dc.relation.references | Love, Michael I. ; Huber, Wolfgang ; Anders, Simon: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. En: Genome biology 15 (2014), Nr. 12, p. 550 |
dc.relation.references | Oshlack, Alicia ; Robinson, Mark D. ; Young, Matthew D.: From RNA-seq reads to differential expression results. En: Genome biology 11 (2010), Nr. 12, p. 220 |
dc.relation.references | Pickrell, Joseph K. ; Marioni, John C. ; Pai, Athma A. ; Degner, Jacob F. ; Engelhardt, Barbara E. ; Nkadori, Everlyne ; Veyrieras, Jean-Baptiste ; Stephens, Matthew ; Gilad, Yoav ; Pritchard, Jonathan K.: Understanding mechanisms underlying human gene expression variation with RNA sequencing. En: Nature 464 (2010), Nr. 7289, p. 768-772 |
dc.relation.references | Reyes, Alejandro. Count RNA-seq data used for benchmarking FDR control methods. Oktober 2018 |
dc.relation.references | Reyes, Alejandro ; Huber, Wolfgang: Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues. En: Nucleic acids research 46 (2018), Nr. 2, p. 582-592 |
dc.relation.references | Ritchie, Matthew E. ; Phipson, Belinda ; Wu, DI ; Hu, Yifang ; Law, Charity W. ; Shi, Wei ; Smyth, Gordon K.: limma powers differential expression analyses for RNAsequencing and microarray studies. En: Nucleic acids research 43 (2015), Nr. 7, p. e47-e47 |
dc.relation.references | Robinson, Mark D. ; McCarthy, Davis J. ; Smyth, Gordon K.: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. En: Bioinformatics 26 (2010), Nr. 1, p. 139-140 |
dc.relation.references | Schuster, Stephan C.: Next-generation sequencing transforms today's biology. En: Nature methods 5 (2008), Nr. 1, p. 16-18 |
dc.relation.references | Scott, James G. ; Kelly, Ryan C. ; Smith, Matthew A. ; Zhou, Pengcheng ; Kass, Robert E.: False discovery rate regression: an application to neural synchrony detection in primary visual cortex. En: Journal of the American Statistical Association 110 (2015), Nr. 510, p. 459-471 |
dc.relation.references | Soneson, Charlotte: compcodeR - an R package for benchmarking differential expression methods for RNA-seq data. En: Bioinformatics 30 (2014), Nr. 17, p. 2517-2518 |
dc.relation.references | Soneson, Charlotte ; Delorenzi, Mauro: A comparison of methods for differential expression analysis of RNA-seq data. En: BMC bioinformatics 14 (2013), Nr. 1, p. 1-18 |
dc.relation.references | Sun, Shiquan ; Hood, Michelle ; Scott, Laura ; Peng, Qinke ; Mukherjee, Sayan ; Tung, Jenny ; Zhou, Xiang: Differential expression analysis for RNAseq using Poisson mixed models. En: Nucleic acids research 45 (2017), Nr. 11, p. e106-e106 |
dc.relation.references | Wang, Tianyu ; Li, Boyang ; Nelson, Craig E. ; Nabavi, Sheida: Comparative analysis of differential gene expression analysis tools for single-cell RNA sequencing data. En: BMC bioinformatics 20 (2019), Nr. 1, p. 1-16 |
dc.relation.references | Wang, Zhong ; Gerstein, Mark ; Snyder, Michael: RNA-Seq: a revolutionary tool for transcriptomics. En: Nature reviews genetics 10 (2009), Nr. 1, p. 57-63 |
dc.rights.accessrights | info:eu-repo/semantics/openAccess |
dc.subject.lemb | ARN MENSAJERO |
dc.subject.lemb | Rna, messenger |
dc.subject.lemb | Databases, nucleic acid |
dc.subject.lemb | BASES DE DATOS DE ACIDO NUCLEICO |
dc.subject.proposal | RNA-Seq |
dc.subject.proposal | Expresión diferencial |
dc.subject.proposal | Modelos lineales generalizados |
dc.subject.proposal | Pruebas múltiples |
dc.subject.proposal | Differential expression |
dc.subject.proposal | Generalized Linear Models |
dc.subject.proposal | Multiple testing |
dc.title.translated | Analysis of the performance of DESeq2 for the detection of differentially expressed genes for genome sequencing data |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa |
dc.type.content | Text |
dc.type.redcol | http://purl.org/redcol/resource_type/TM |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 |
dcterms.audience.professionaldevelopment | Estudiantes |
dcterms.audience.professionaldevelopment | Investigadores |
dcterms.audience.professionaldevelopment | Maestros |