An assessment of gene regulatory network inference algorithms

dc.contributor.advisorLopez-Kleine, Lilianaspa
dc.contributor.authorZuur Pedraza, Adrián Guillermospa
dc.contributor.researchgroupGrupo de Investigación en Bioinformática y Biología de Sistemasspa
dc.date.accessioned2021-02-16T15:26:28Zspa
dc.date.available2021-02-16T15:26:28Zspa
dc.date.issued2020-07-31spa
dc.description.abstractA conceptual issue regarding gene regulatory network (GRN) inference algorithms is establishing their validity or correctness. In this study, we argue that for this purpose it is useful to conceive these algorithms as estimators of graph-valued parameters of explicit models for gene expression data. On this basis, we perform an assessment of a selection of influential GRN inference algorithms as estimators for two types of models: (i) causal graphs with associated structural equations models (SEMs), and (ii) differential equations models based on the thermodynamics of gene expression. Our findings corroborate that networks of marginal dependence fail in estimating GRNs, but they also suggest that the strength of statistical association as measured by mutual information may be indicative of GRN structure. Also, in simulations, we find that the GRN inference algorithms GENIE3 and TIGRESS outperform competing algorithms. However, more importantly, we also find that many observed patterns hinge on the GRN topology and the assumed data generating mechanism.spa
dc.description.abstractUn problema conceptual con respecto a los algoritmos de inferencia de redes de regulación génica (RRG) es cómo establecer su validez. En este estudio sostenemos que para este objetivo conviene concebir estos algoritmos como estimadores de parámetros de modelos estadísticos explícitos para datos de expresión génica. Sobre esta base, realizamos una evaluación de una selección de algoritmos de inferencia de RRG como estimadores para dos tipos de modelos: (i) modelos de grafos causales asociados a modelos de ecuaciones estructurales (MEE), y (ii) modelos de ecuaciones diferenciales basados en la termodinámica de la expresion genica. Nuestros hallazgos corroboran que las redes de dependencias marginales fallan en la estimación de las RRG, pero también sugieren que la fuerza de la asociación estadística medida por la información mutua puede reflejar en cierto grado la estructura de las RRG. Además, en un estudio de simulaciones, encontramos que los algoritmos de inferencia GENIE3 y TIGRESS son los de mejor desempeño. Sin embargo, crucialmente, también encontramos que muchos patrones observados en las simulaciones dependen de la topología de la RRG y del modelo generador de datos.spa
dc.description.degreelevelMaestríaspa
dc.format.extent1 recurso en línea (130 páginas)spa
dc.format.mimetypeapplication/pdfspa
dc.identifier.citationZuur Pedraza, A. G. (2020). An assessment of gene regulatory network inference algorithms [Tesis de maestría, Universidad Nacional de Colombia]. Repositorio Institucional.spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/79256
dc.language.isoengspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.departmentDepartamento de Estadísticaspa
dc.publisher.programBogotá - Ciencias - Maestría en Ciencias - Estadísticaspa
dc.relation.referencesSiddhartha Mukherjee. The gene: an intimate history. Scribner, 2016.spa
dc.relation.referencesGreg Elgar and Tanya Vavouri. “Tuning in to the signals: noncoding sequence conservation in vertebrate genomes”. In: Trends in genetics 24.7 (2008), pp. 344–352.spa
dc.relation.referencesENCODE Project Consortium et al. “An integrated encyclopedia of DNA elements in the human genome”. In: Nature 489.7414 (2012), pp. 57–74.spa
dc.relation.referencesVân Anh Huynh-Thu and Guido Sanguinetti. “Gene Regulatory Network Inference: An Introductory Survey”. In: Gene Regulatory Networks: Methods and Protocols. Ed. by Guido Sanguinetti and Vân Anh Huynh-Thu. New York, NY: Springer New York, 2019, pp. 1–23. isbn: 978-1-4939-8882-2.spa
dc.relation.referencesTerry Brown. Understanding a Genome Sequence. Wiley-Liss, 2002. Chap. 7.spa
dc.relation.referencesLouis M. Straudt. Mouse cDNA Microarray. National Cancer Institute, 2001. url: https://visualsonline.cancer.gov/details.cfm?imageid=1849.spa
dc.relation.referencesJessica C Mar. “The rise of the distributions: why non-normality is important for understanding the transcriptome and beyond”. In: Biophysical reviews 11.1 (2019), pp. 89–94.spa
dc.relation.referencesAlbert-Laszlo Barabasi and Zoltan N Oltvai. “Network biology: understanding the cell’s functional organization”. In: Nature reviews genetics 5.2 (2004), pp. 101–113.spa
dc.relation.referencesAnna D Broido and Aaron Clauset. “Scale-free networks are rare”. In: Nature communications 10.1 (2019), pp. 1–10.spa
dc.relation.referencesAaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. “Power-law distributions in empirical data”. In: SIAM review 51.4 (2009), pp. 661–703.spa
dc.relation.referencesRomualdo Pastor-Satorras, Eric Smith, and Ricard V Solé. “Evolving protein interaction networks through gene duplication”. In: Journal of Theoretical biology 222.2 (2003), pp. 199–210.spa
dc.relation.referencesRobert D Leclerc. “Survival of the sparsest: robust gene networks are parsimonious”. In: Molecular systems biology 4.1 (2008), p. 213.spa
dc.relation.referencesZ Burda et al. “Motifs emerge from function in model gene regulatory networks”. In: Proceedings of the National Academy of Sciences 108.42 (2011), pp. 17263–17268.spa
dc.relation.referencesRéka Albert, Hawoong Jeong, and Albert-László Barabási. “Error and attack tolerance of complex networks”. In: nature 406.6794 (2000), pp. 378–382.spa
dc.relation.referencesMomoko Otsuka and Sho Tsugawa. “Robustness of network attack strategies against node sampling and link errors”. In: Plos one 14.9 (2019), e0221885.spa
dc.relation.referencesReuven Cohen and Shlomo Havlin. “Scale-free networks are ultrasmall”. In: Physical review letters 90.5 (2003), p. 058701.spa
dc.relation.referencesUlrike Von Luxburg. “A tutorial on spectral clustering”. In: Statistics and computing 17.4 (2007), pp. 395–416.spa
dc.relation.referencesM. E. J. Newman. “Modularity and community structure in networks”. In: Proceedings of the National Academy of Sciences 103.23 (2006), pp. 8577–8582. issn: 0027-8424.spa
dc.relation.referencesVincent D Blondel et al. “Fast unfolding of communities in large networks”. In: Journal of statistical mechanics: theory and experiment 2008.10 (2008), P10008.spa
dc.relation.referencesRoger B Nelsen. An introduction to copulas. Springer Science & Business Media, 2007.spa
dc.relation.referencesSteffen L Lauritzen. Graphical models. Vol. 17. Clarendon Press, 1996.spa
dc.relation.referencesRajen D Shah, Jonas Peters, et al. “The hardness of conditional independence testing and the generalised covariance measure”. In: Annals of Statistics 48.3 (2020), pp. 1514–1538.spa
dc.relation.referencesBin Zhang and Steve Horvath. “A general framework for weighted gene co-expression network analysis”. In: Statistical applications in genetics and molecular biology 4.1 (2005).spa
dc.relation.referencesAtul J Butte and Isaac S. Kohane. “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements.” In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing (2000), pp. 418–29.spa
dc.relation.referencesKshitij Khare and Bala Rajaratnam. “Sparse matrix decompositions and graph characterizations”. In: Linear algebra and its applications 437.3 (2012), pp. 932–947.spa
dc.relation.referencesAdam A. Margolin et al. “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context”. In: BMC Bioinformatics 7 (2006), S7 –S7.spa
dc.relation.referencesPatrick E Meyer et al. “Information-theoretic inference of large transcriptional regulatory networks”. In: EURASIP journal on bioinformatics and systems biology.spa
dc.relation.referencesXiujun Zhang et al. “NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference”. In: Bioinformatics 29.1spa
dc.relation.referencesNicolai Meinshausen, Peter Bühlmann, et al. “High-dimensional graphs and variable selection with the lasso”. In: The annals of statistics 34.3 (2006), pp. 1436–1462.spa
dc.relation.referencesAnne-Claire Haury et al. “TIGRESS: trustful inference of gene regulation using stability selection”. In: BMC systems biology 6.1 (2012), p. 145.spa
dc.relation.referencesBradley Efron et al. “Least angle regression”. In: The Annals of statistics 32.2 (2004), pp. 407–499.spa
dc.relation.referencesVân Anh Huynh-Thu et al. “Inferring regulatory networks from expression data using tree-based methods”. In: PloS one 5.9 (2010), pp. 1–10.spa
dc.relation.referencesOnureena Banerjee, Laurent El Ghaoui, and Alexandre d’Aspremont. “Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data”. In: Journal of Machine learning research 9.Mar (2008), pp. 485–516.spa
dc.relation.referencesJerome Friedman, Trevor Hastie, and Robert Tibshirani. “Sparse inverse covariance estimation with the graphical lasso”. In: Biostatistics 9.3 (2008), pp. 432–441.spa
dc.relation.referencesPeter Spirtes, Clark N Glymour, and Scheines. Causation, prediction, and search. 2nd ed. The MIT Press, 2000.spa
dc.relation.referencesXiujun Zhang et al. “Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information”. In: Bioinformatics 28.1 (2012), pp. 98–104.spa
dc.relation.referencesJames M Robins et al. “Uniform consistency in causal inference”. In: Biometrika 90.3 (2003), pp. 491–515.spa
dc.relation.referencesDiego Colombo and Marloes H Maathuis. “Order-independent constraint-based causal structure learning”. In: The Journal of Machine Learning Research 15.1 (2014), pp. 3741–3782.spa
dc.relation.referencesJeremiah J Faith et al. “Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles”. In: PLOS Biology 5.1 (Jan. 2007), pp. 1–13.spa
dc.relation.referencesDaniel Marbach et al. “Revealing strengths and weaknesses of methods for gene network inference”. In: Proceedings of the national academy of sciences 107.14 (2010), pp. 6286–6291.spa
dc.relation.referencesThomas Schaffter, Daniel Marbach, and Dario Floreano. “GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods”. In: Bioinformatics 27.16 (2011), pp. 2263–2270.spa
dc.relation.referencesDaniel Marbach et al. “Wisdom of crowds for robust gene network inference”. In: Nature methods 9.8 (2012), p. 796.spa
dc.relation.referencesAdriano V Werhli, Marco Grzegorczyk, and Dirk Husmeier. “Comparative evaluationof reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks”. In: Bioinformatics 22.20 (2006), pp. 2523–2531.spa
dc.relation.referencesWenbin Guo et al. “Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size”. In: BMC systems biology 11.1 (2017), p. 62.spa
dc.relation.referencesJonathan Ish-Horowicz and John Reid. “Mutual information estimation for transcriptional regulatory network inference”. In: bioRxiv (2017).spa
dc.relation.referencesShuonan Chen and Jessica C Mar. “Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data”. In: BMC bioinformatics 19.1 (2018), p. 232.spa
dc.relation.referencesSisi Ma et al. “De-Novo Learning of Genome-Scale Regulatory Networks in S. cerevisiae”. In: PLOS ONE 9.9 (Sept. 2014), pp. 1–20.spa
dc.relation.referencesJeffrey D Allen et al. “Comparing statistical methods for constructing large scale gene networks”. In: PloS one 7.1 (2012).spa
dc.relation.referencesAditya Pratapa et al. “Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data”. In: Nature Methods (2020), pp. 1–8.spa
dc.relation.referencesJudea Pearl. Causality. Cambridge University Press, 2009.spa
dc.relation.referencesDiego Colombo et al. “Learning high-dimensional directed acyclic graphs with latent and selection variables”. In: The Annals of Statistics (2012), pp. 294–321.spa
dc.relation.referencesStephan Bongers et al. “Theoretical Aspects of Cyclic Structural Causal Models”. In: arXiv.org preprint arXiv:1611.06221v2 [stat.ME] (Aug. 2018).spa
dc.relation.referencesJudea Pearl, Thomas Verma, et al. “A theory of inferred causation.” In: KR 91 (1991), pp. 441–452.spa
dc.relation.referencesNancy Cartwright. Hunting causes and using them: Approaches in philosophy and economics. Cambridge University Press, 2007.spa
dc.relation.referencesHolly Andersen. “When to expect violations of causal faithfulness and why it matters”. In: Philosophy of Science 80.5 (2013), pp. 672–683.spa
dc.relation.referencesJuliane Schäfer and Korbinian Strimmer. “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics”. In: Statistical applications in genetics and molecular biology 4.1 (2005).spa
dc.relation.referencesHiroyuki Toh and Katsuhisa Horimoto. “Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling”. In: Bioinformatics 18.2 (2002), pp. 287–297.spa
dc.relation.referencesOliver J Maclaren and Ruanui Nicholson. “What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems”. In: arXiv preprint arXiv:1904.02826 (2019).spa
dc.relation.referencesSunyong Kim, Seiya Imoto, and Satoru Miyano. “Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data”. In: Biosystems 75.1-3 (2004), pp. 57–65.spa
dc.relation.referencesBruno-Edouard Perrin et al. “Gene networks inference using dynamic Bayesian networks”. In: Bioinformatics 19.suppl 2 (2003), pp. ii138–ii148.spa
dc.relation.referencesStephan Bongers and Joris M Mooij. “From random differential equations to structural causal models: The stochastic case”. In: arXiv preprint arXiv:1803.08784 (2018).spa
dc.relation.referencesAlexander Sokol and Niels Richard Hansen. “Causal interpretation of stochastic differential equations”. In: Electronic Journal of Probability 19.100 (2014), pp. 1–24.spa
dc.relation.referencesPaul K Rubenstein et al. “From deterministic ODEs to dynamic structural causal models”. In: arXiv preprint arXiv:1608.08028 (2016).spa
dc.relation.referencesGary K Ackers, Alexander D Johnson, and Madeline A Shea. “Quantitative model for gene regulation by lambda phage repressor”. In: Proceedings of the National Academy of Sciences 79.4 (1982), pp. 1129–1133.spa
dc.relation.referencesLacramioara Bintu et al. “Transcriptional regulation by the numbers: models”. In: Current opinion in genetics & development 15.2 (2005), pp. 116–124.spa
dc.relation.referencesArwen Meister et al. “Learning a nonlinear dynamical system model of gene regulation: A perturbed steady-state approach”. In: The Annals of Applied Statistics 7.3 (2013), pp. 1311–1333.spa
dc.relation.referencesWilliam Chad Young, Ka Yee Yeung, and Adrian E Raftery. “Identifying dynamical time series model parameters from equilibrium samples, with application to gene regulatory networks”. In: Statistical Modelling 19.4 (2019), pp. 444–465.spa
dc.relation.referencesLacramioara Bintu et al. “Transcriptional regulation by the numbers: applications”. In: Current opinion in genetics & development 15.2 (2005), pp. 125–135.spa
dc.relation.referencesRuifei Cui et al. “Learning the Causal Structure of Copula Models with Latent Variables.” In: UAI. 2018, pp. 188–197.spa
dc.relation.referencesR Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2020.spa
dc.relation.referencesIoannis Tsamardinos and Giorgos Borboudakis. “Permutation Testing Improves Bayesian Network Learning”. In: Machine Learning and Knowledge Discovery in Databases. Ed. by José Luis Balcázar et al. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 322–337. isbn: 978-3-642-15939-8.spa
dc.relation.referencesJonathan Ish-Horowicz. fastGeneMI: A Suite of Mutual Information Estimators used for Gene Regulatory Network Inference from Microarray Expression Data. R package version 1.0. 2018.spa
dc.relation.referencesGabriele Sales and Chiara Romualdi. parmigene: Parallel Mutual Information estimation for Gene Network reconstruction. R package version 1.0.2. 2012.spa
dc.relation.referencesPatrick E. Meyer, Frederic Lafitte, and Gianluca Bontempi. “MINET: An open source R/Bioconductor Package for Mutual Information based Network Inference”. In: BMC Bioinformatics 9 (2008).spa
dc.relation.referencesXiujun Zhang. Website of Xiujun Zhang. url: https://sites.google.com/site/xiujunzhangcsb/software/narromi.spa
dc.relation.referencesMarco Scutari. “Learning Bayesian Networks with the bnlearn R Package”. In: Journal of Statistical Software 35.3 (2010), pp. 1–22.spa
dc.relation.referencesOsiris Rı́os et al. “A Boolean network model of human gonadal sex determination”. In: Theoretical Biology and Medical Modelling 12.1 (2015), pp. 1–18.spa
dc.relation.referencesJan Krumsiek et al. “Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network”. In: PloS one 6.8 (2011), e22649.spa
dc.relation.referencesAnna Lovrics et al. “Boolean modelling reveals new regulatory connections between transcription factors orchestrating the development of the ventral spinal cord”. In: PloS one 9.11 (2014), e111430.spa
dc.relation.referencesClare E Giacomantonio and Geoffrey J Goodhill. “A Boolean model of the gene regulatory network underlying Mammalian cortical area development”. In: PLoS Comput Biol 6.9 (2010), e1000936.spa
dc.relation.referencesDaniel Marbach et al. “Generating realistic in silico gene networks for performance assessment of reverse engineering methods”. In: Journal of computational biology 16.2 (2009), pp. 229–239.spa
dc.relation.referencesBrendan McKay. Graphs. url: http://users.cecs.anu.edu.au/~bdm/data/graphs.html.spa
dc.relation.references[83]Richard P Stanley. “Acyclic orientations of graphs”. In: Discrete Mathematics 5.2 (1973), pp. 171–178.spa
dc.relation.referencesP Hanlon. “The chromatic polynomial of an unlabeled graph”. In: Journal of Combinatorial Theory, Series B 38.3 (1985), pp. 226–239.spa
dc.relation.referencesJeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press, 2010.spa
dc.relation.referencesMarkus Kalisch and Peter Bühlmann. “Estimating high-dimensional directed acyclic graphs with the PC-algorithm”. In: Journal of Machine Learning Research 8.Mar (2007), pp. 613–636.spa
dc.rightsDerechos reservados - Universidad Nacional de Colombiaspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial-SinDerivadas 4.0 Internacionalspa
dc.rights.spaAcceso abiertospa
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/spa
dc.subject.ddc500 - Ciencias naturales y matemáticasspa
dc.subject.ddc570 - Biologíaspa
dc.subject.ddc519 - Probabilidades y matemáticas aplicadasspa
dc.subject.proposalGene regulatory networkeng
dc.subject.proposalRed de regulación génicaspa
dc.subject.proposalModelo termodinámicospa
dc.subject.proposalGene network inferenceeng
dc.subject.proposalGene regulationeng
dc.subject.proposalModelo de ecuaciones estructuralesspa
dc.subject.proposalRed de relevanciaspa
dc.subject.proposalBiological networkeng
dc.subject.proposalRed biológicaspa
dc.subject.proposalRelevance networkeng
dc.subject.proposalInferencia de redes génicasspa
dc.subject.proposalStructural equations modeleng
dc.subject.proposalRegulación génicaspa
dc.subject.proposalThermodynamic modeleng
dc.titleAn assessment of gene regulatory network inference algorithmsspa
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Tesis_MSc_UNAL.pdf
Tamaño:
2.15 MB
Formato:
Adobe Portable Document Format

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
3.87 KB
Formato:
Item-specific license agreed upon to submission
Descripción: