An assessment of gene regulatory network inference algorithms
| dc.contributor.advisor | Lopez-Kleine, Liliana | spa |
| dc.contributor.author | Zuur Pedraza, Adrián Guillermo | spa |
| dc.contributor.researchgroup | Grupo de Investigación en Bioinformática y Biología de Sistemas | spa |
| dc.date.accessioned | 2021-02-16T15:26:28Z | spa |
| dc.date.available | 2021-02-16T15:26:28Z | spa |
| dc.date.issued | 2020-07-31 | spa |
| dc.description.abstract | A conceptual issue regarding gene regulatory network (GRN) inference algorithms is establishing their validity or correctness. In this study, we argue that for this purpose it is useful to conceive these algorithms as estimators of graph-valued parameters of explicit models for gene expression data. On this basis, we perform an assessment of a selection of influential GRN inference algorithms as estimators for two types of models: (i) causal graphs with associated structural equations models (SEMs), and (ii) differential equations models based on the thermodynamics of gene expression. Our findings corroborate that networks of marginal dependence fail in estimating GRNs, but they also suggest that the strength of statistical association as measured by mutual information may be indicative of GRN structure. Also, in simulations, we find that the GRN inference algorithms GENIE3 and TIGRESS outperform competing algorithms. However, more importantly, we also find that many observed patterns hinge on the GRN topology and the assumed data generating mechanism. | spa |
| dc.description.abstract | Un problema conceptual con respecto a los algoritmos de inferencia de redes de regulación génica (RRG) es cómo establecer su validez. En este estudio sostenemos que para este objetivo conviene concebir estos algoritmos como estimadores de parámetros de modelos estadísticos explícitos para datos de expresión génica. Sobre esta base, realizamos una evaluación de una selección de algoritmos de inferencia de RRG como estimadores para dos tipos de modelos: (i) modelos de grafos causales asociados a modelos de ecuaciones estructurales (MEE), y (ii) modelos de ecuaciones diferenciales basados en la termodinámica de la expresion genica. Nuestros hallazgos corroboran que las redes de dependencias marginales fallan en la estimación de las RRG, pero también sugieren que la fuerza de la asociación estadística medida por la información mutua puede reflejar en cierto grado la estructura de las RRG. Además, en un estudio de simulaciones, encontramos que los algoritmos de inferencia GENIE3 y TIGRESS son los de mejor desempeño. Sin embargo, crucialmente, también encontramos que muchos patrones observados en las simulaciones dependen de la topología de la RRG y del modelo generador de datos. | spa |
| dc.description.degreelevel | Maestría | spa |
| dc.format.extent | 1 recurso en línea (130 páginas) | spa |
| dc.format.mimetype | application/pdf | spa |
| dc.identifier.citation | Zuur Pedraza, A. G. (2020). An assessment of gene regulatory network inference algorithms [Tesis de maestría, Universidad Nacional de Colombia]. Repositorio Institucional. | spa |
| dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/79256 | |
| dc.language.iso | eng | spa |
| dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá | spa |
| dc.publisher.department | Departamento de Estadística | spa |
| dc.publisher.program | Bogotá - Ciencias - Maestría en Ciencias - Estadística | spa |
| dc.relation.references | Siddhartha Mukherjee. The gene: an intimate history. Scribner, 2016. | spa |
| dc.relation.references | Greg Elgar and Tanya Vavouri. “Tuning in to the signals: noncoding sequence conservation in vertebrate genomes”. In: Trends in genetics 24.7 (2008), pp. 344–352. | spa |
| dc.relation.references | ENCODE Project Consortium et al. “An integrated encyclopedia of DNA elements in the human genome”. In: Nature 489.7414 (2012), pp. 57–74. | spa |
| dc.relation.references | Vân Anh Huynh-Thu and Guido Sanguinetti. “Gene Regulatory Network Inference: An Introductory Survey”. In: Gene Regulatory Networks: Methods and Protocols. Ed. by Guido Sanguinetti and Vân Anh Huynh-Thu. New York, NY: Springer New York, 2019, pp. 1–23. isbn: 978-1-4939-8882-2. | spa |
| dc.relation.references | Terry Brown. Understanding a Genome Sequence. Wiley-Liss, 2002. Chap. 7. | spa |
| dc.relation.references | Louis M. Straudt. Mouse cDNA Microarray. National Cancer Institute, 2001. url: https://visualsonline.cancer.gov/details.cfm?imageid=1849. | spa |
| dc.relation.references | Jessica C Mar. “The rise of the distributions: why non-normality is important for understanding the transcriptome and beyond”. In: Biophysical reviews 11.1 (2019), pp. 89–94. | spa |
| dc.relation.references | Albert-Laszlo Barabasi and Zoltan N Oltvai. “Network biology: understanding the cell’s functional organization”. In: Nature reviews genetics 5.2 (2004), pp. 101–113. | spa |
| dc.relation.references | Anna D Broido and Aaron Clauset. “Scale-free networks are rare”. In: Nature communications 10.1 (2019), pp. 1–10. | spa |
| dc.relation.references | Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. “Power-law distributions in empirical data”. In: SIAM review 51.4 (2009), pp. 661–703. | spa |
| dc.relation.references | Romualdo Pastor-Satorras, Eric Smith, and Ricard V Solé. “Evolving protein interaction networks through gene duplication”. In: Journal of Theoretical biology 222.2 (2003), pp. 199–210. | spa |
| dc.relation.references | Robert D Leclerc. “Survival of the sparsest: robust gene networks are parsimonious”. In: Molecular systems biology 4.1 (2008), p. 213. | spa |
| dc.relation.references | Z Burda et al. “Motifs emerge from function in model gene regulatory networks”. In: Proceedings of the National Academy of Sciences 108.42 (2011), pp. 17263–17268. | spa |
| dc.relation.references | Réka Albert, Hawoong Jeong, and Albert-László Barabási. “Error and attack tolerance of complex networks”. In: nature 406.6794 (2000), pp. 378–382. | spa |
| dc.relation.references | Momoko Otsuka and Sho Tsugawa. “Robustness of network attack strategies against node sampling and link errors”. In: Plos one 14.9 (2019), e0221885. | spa |
| dc.relation.references | Reuven Cohen and Shlomo Havlin. “Scale-free networks are ultrasmall”. In: Physical review letters 90.5 (2003), p. 058701. | spa |
| dc.relation.references | Ulrike Von Luxburg. “A tutorial on spectral clustering”. In: Statistics and computing 17.4 (2007), pp. 395–416. | spa |
| dc.relation.references | M. E. J. Newman. “Modularity and community structure in networks”. In: Proceedings of the National Academy of Sciences 103.23 (2006), pp. 8577–8582. issn: 0027-8424. | spa |
| dc.relation.references | Vincent D Blondel et al. “Fast unfolding of communities in large networks”. In: Journal of statistical mechanics: theory and experiment 2008.10 (2008), P10008. | spa |
| dc.relation.references | Roger B Nelsen. An introduction to copulas. Springer Science & Business Media, 2007. | spa |
| dc.relation.references | Steffen L Lauritzen. Graphical models. Vol. 17. Clarendon Press, 1996. | spa |
| dc.relation.references | Rajen D Shah, Jonas Peters, et al. “The hardness of conditional independence testing and the generalised covariance measure”. In: Annals of Statistics 48.3 (2020), pp. 1514–1538. | spa |
| dc.relation.references | Bin Zhang and Steve Horvath. “A general framework for weighted gene co-expression network analysis”. In: Statistical applications in genetics and molecular biology 4.1 (2005). | spa |
| dc.relation.references | Atul J Butte and Isaac S. Kohane. “Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements.” In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing (2000), pp. 418–29. | spa |
| dc.relation.references | Kshitij Khare and Bala Rajaratnam. “Sparse matrix decompositions and graph characterizations”. In: Linear algebra and its applications 437.3 (2012), pp. 932–947. | spa |
| dc.relation.references | Adam A. Margolin et al. “ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context”. In: BMC Bioinformatics 7 (2006), S7 –S7. | spa |
| dc.relation.references | Patrick E Meyer et al. “Information-theoretic inference of large transcriptional regulatory networks”. In: EURASIP journal on bioinformatics and systems biology. | spa |
| dc.relation.references | Xiujun Zhang et al. “NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference”. In: Bioinformatics 29.1 | spa |
| dc.relation.references | Nicolai Meinshausen, Peter Bühlmann, et al. “High-dimensional graphs and variable selection with the lasso”. In: The annals of statistics 34.3 (2006), pp. 1436–1462. | spa |
| dc.relation.references | Anne-Claire Haury et al. “TIGRESS: trustful inference of gene regulation using stability selection”. In: BMC systems biology 6.1 (2012), p. 145. | spa |
| dc.relation.references | Bradley Efron et al. “Least angle regression”. In: The Annals of statistics 32.2 (2004), pp. 407–499. | spa |
| dc.relation.references | Vân Anh Huynh-Thu et al. “Inferring regulatory networks from expression data using tree-based methods”. In: PloS one 5.9 (2010), pp. 1–10. | spa |
| dc.relation.references | Onureena Banerjee, Laurent El Ghaoui, and Alexandre d’Aspremont. “Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data”. In: Journal of Machine learning research 9.Mar (2008), pp. 485–516. | spa |
| dc.relation.references | Jerome Friedman, Trevor Hastie, and Robert Tibshirani. “Sparse inverse covariance estimation with the graphical lasso”. In: Biostatistics 9.3 (2008), pp. 432–441. | spa |
| dc.relation.references | Peter Spirtes, Clark N Glymour, and Scheines. Causation, prediction, and search. 2nd ed. The MIT Press, 2000. | spa |
| dc.relation.references | Xiujun Zhang et al. “Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information”. In: Bioinformatics 28.1 (2012), pp. 98–104. | spa |
| dc.relation.references | James M Robins et al. “Uniform consistency in causal inference”. In: Biometrika 90.3 (2003), pp. 491–515. | spa |
| dc.relation.references | Diego Colombo and Marloes H Maathuis. “Order-independent constraint-based causal structure learning”. In: The Journal of Machine Learning Research 15.1 (2014), pp. 3741–3782. | spa |
| dc.relation.references | Jeremiah J Faith et al. “Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles”. In: PLOS Biology 5.1 (Jan. 2007), pp. 1–13. | spa |
| dc.relation.references | Daniel Marbach et al. “Revealing strengths and weaknesses of methods for gene network inference”. In: Proceedings of the national academy of sciences 107.14 (2010), pp. 6286–6291. | spa |
| dc.relation.references | Thomas Schaffter, Daniel Marbach, and Dario Floreano. “GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods”. In: Bioinformatics 27.16 (2011), pp. 2263–2270. | spa |
| dc.relation.references | Daniel Marbach et al. “Wisdom of crowds for robust gene network inference”. In: Nature methods 9.8 (2012), p. 796. | spa |
| dc.relation.references | Adriano V Werhli, Marco Grzegorczyk, and Dirk Husmeier. “Comparative evaluationof reverse engineering gene regulatory networks with relevance networks, graphical Gaussian models and Bayesian networks”. In: Bioinformatics 22.20 (2006), pp. 2523–2531. | spa |
| dc.relation.references | Wenbin Guo et al. “Evaluation and improvement of the regulatory inference for large co-expression networks with limited sample size”. In: BMC systems biology 11.1 (2017), p. 62. | spa |
| dc.relation.references | Jonathan Ish-Horowicz and John Reid. “Mutual information estimation for transcriptional regulatory network inference”. In: bioRxiv (2017). | spa |
| dc.relation.references | Shuonan Chen and Jessica C Mar. “Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data”. In: BMC bioinformatics 19.1 (2018), p. 232. | spa |
| dc.relation.references | Sisi Ma et al. “De-Novo Learning of Genome-Scale Regulatory Networks in S. cerevisiae”. In: PLOS ONE 9.9 (Sept. 2014), pp. 1–20. | spa |
| dc.relation.references | Jeffrey D Allen et al. “Comparing statistical methods for constructing large scale gene networks”. In: PloS one 7.1 (2012). | spa |
| dc.relation.references | Aditya Pratapa et al. “Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data”. In: Nature Methods (2020), pp. 1–8. | spa |
| dc.relation.references | Judea Pearl. Causality. Cambridge University Press, 2009. | spa |
| dc.relation.references | Diego Colombo et al. “Learning high-dimensional directed acyclic graphs with latent and selection variables”. In: The Annals of Statistics (2012), pp. 294–321. | spa |
| dc.relation.references | Stephan Bongers et al. “Theoretical Aspects of Cyclic Structural Causal Models”. In: arXiv.org preprint arXiv:1611.06221v2 [stat.ME] (Aug. 2018). | spa |
| dc.relation.references | Judea Pearl, Thomas Verma, et al. “A theory of inferred causation.” In: KR 91 (1991), pp. 441–452. | spa |
| dc.relation.references | Nancy Cartwright. Hunting causes and using them: Approaches in philosophy and economics. Cambridge University Press, 2007. | spa |
| dc.relation.references | Holly Andersen. “When to expect violations of causal faithfulness and why it matters”. In: Philosophy of Science 80.5 (2013), pp. 672–683. | spa |
| dc.relation.references | Juliane Schäfer and Korbinian Strimmer. “A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics”. In: Statistical applications in genetics and molecular biology 4.1 (2005). | spa |
| dc.relation.references | Hiroyuki Toh and Katsuhisa Horimoto. “Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling”. In: Bioinformatics 18.2 (2002), pp. 287–297. | spa |
| dc.relation.references | Oliver J Maclaren and Ruanui Nicholson. “What can be estimated? Identifiability, estimability, causal inference and ill-posed inverse problems”. In: arXiv preprint arXiv:1904.02826 (2019). | spa |
| dc.relation.references | Sunyong Kim, Seiya Imoto, and Satoru Miyano. “Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data”. In: Biosystems 75.1-3 (2004), pp. 57–65. | spa |
| dc.relation.references | Bruno-Edouard Perrin et al. “Gene networks inference using dynamic Bayesian networks”. In: Bioinformatics 19.suppl 2 (2003), pp. ii138–ii148. | spa |
| dc.relation.references | Stephan Bongers and Joris M Mooij. “From random differential equations to structural causal models: The stochastic case”. In: arXiv preprint arXiv:1803.08784 (2018). | spa |
| dc.relation.references | Alexander Sokol and Niels Richard Hansen. “Causal interpretation of stochastic differential equations”. In: Electronic Journal of Probability 19.100 (2014), pp. 1–24. | spa |
| dc.relation.references | Paul K Rubenstein et al. “From deterministic ODEs to dynamic structural causal models”. In: arXiv preprint arXiv:1608.08028 (2016). | spa |
| dc.relation.references | Gary K Ackers, Alexander D Johnson, and Madeline A Shea. “Quantitative model for gene regulation by lambda phage repressor”. In: Proceedings of the National Academy of Sciences 79.4 (1982), pp. 1129–1133. | spa |
| dc.relation.references | Lacramioara Bintu et al. “Transcriptional regulation by the numbers: models”. In: Current opinion in genetics & development 15.2 (2005), pp. 116–124. | spa |
| dc.relation.references | Arwen Meister et al. “Learning a nonlinear dynamical system model of gene regulation: A perturbed steady-state approach”. In: The Annals of Applied Statistics 7.3 (2013), pp. 1311–1333. | spa |
| dc.relation.references | William Chad Young, Ka Yee Yeung, and Adrian E Raftery. “Identifying dynamical time series model parameters from equilibrium samples, with application to gene regulatory networks”. In: Statistical Modelling 19.4 (2019), pp. 444–465. | spa |
| dc.relation.references | Lacramioara Bintu et al. “Transcriptional regulation by the numbers: applications”. In: Current opinion in genetics & development 15.2 (2005), pp. 125–135. | spa |
| dc.relation.references | Ruifei Cui et al. “Learning the Causal Structure of Copula Models with Latent Variables.” In: UAI. 2018, pp. 188–197. | spa |
| dc.relation.references | R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2020. | spa |
| dc.relation.references | Ioannis Tsamardinos and Giorgos Borboudakis. “Permutation Testing Improves Bayesian Network Learning”. In: Machine Learning and Knowledge Discovery in Databases. Ed. by José Luis Balcázar et al. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 322–337. isbn: 978-3-642-15939-8. | spa |
| dc.relation.references | Jonathan Ish-Horowicz. fastGeneMI: A Suite of Mutual Information Estimators used for Gene Regulatory Network Inference from Microarray Expression Data. R package version 1.0. 2018. | spa |
| dc.relation.references | Gabriele Sales and Chiara Romualdi. parmigene: Parallel Mutual Information estimation for Gene Network reconstruction. R package version 1.0.2. 2012. | spa |
| dc.relation.references | Patrick E. Meyer, Frederic Lafitte, and Gianluca Bontempi. “MINET: An open source R/Bioconductor Package for Mutual Information based Network Inference”. In: BMC Bioinformatics 9 (2008). | spa |
| dc.relation.references | Xiujun Zhang. Website of Xiujun Zhang. url: https://sites.google.com/site/xiujunzhangcsb/software/narromi. | spa |
| dc.relation.references | Marco Scutari. “Learning Bayesian Networks with the bnlearn R Package”. In: Journal of Statistical Software 35.3 (2010), pp. 1–22. | spa |
| dc.relation.references | Osiris Rı́os et al. “A Boolean network model of human gonadal sex determination”. In: Theoretical Biology and Medical Modelling 12.1 (2015), pp. 1–18. | spa |
| dc.relation.references | Jan Krumsiek et al. “Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network”. In: PloS one 6.8 (2011), e22649. | spa |
| dc.relation.references | Anna Lovrics et al. “Boolean modelling reveals new regulatory connections between transcription factors orchestrating the development of the ventral spinal cord”. In: PloS one 9.11 (2014), e111430. | spa |
| dc.relation.references | Clare E Giacomantonio and Geoffrey J Goodhill. “A Boolean model of the gene regulatory network underlying Mammalian cortical area development”. In: PLoS Comput Biol 6.9 (2010), e1000936. | spa |
| dc.relation.references | Daniel Marbach et al. “Generating realistic in silico gene networks for performance assessment of reverse engineering methods”. In: Journal of computational biology 16.2 (2009), pp. 229–239. | spa |
| dc.relation.references | Brendan McKay. Graphs. url: http://users.cecs.anu.edu.au/~bdm/data/graphs.html. | spa |
| dc.relation.references | [83]Richard P Stanley. “Acyclic orientations of graphs”. In: Discrete Mathematics 5.2 (1973), pp. 171–178. | spa |
| dc.relation.references | P Hanlon. “The chromatic polynomial of an unlabeled graph”. In: Journal of Combinatorial Theory, Series B 38.3 (1985), pp. 226–239. | spa |
| dc.relation.references | Jeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press, 2010. | spa |
| dc.relation.references | Markus Kalisch and Peter Bühlmann. “Estimating high-dimensional directed acyclic graphs with the PC-algorithm”. In: Journal of Machine Learning Research 8.Mar (2007), pp. 613–636. | spa |
| dc.rights | Derechos reservados - Universidad Nacional de Colombia | spa |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
| dc.rights.license | Atribución-NoComercial-SinDerivadas 4.0 Internacional | spa |
| dc.rights.spa | Acceso abierto | spa |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | spa |
| dc.subject.ddc | 500 - Ciencias naturales y matemáticas | spa |
| dc.subject.ddc | 570 - Biología | spa |
| dc.subject.ddc | 519 - Probabilidades y matemáticas aplicadas | spa |
| dc.subject.proposal | Gene regulatory network | eng |
| dc.subject.proposal | Red de regulación génica | spa |
| dc.subject.proposal | Modelo termodinámico | spa |
| dc.subject.proposal | Gene network inference | eng |
| dc.subject.proposal | Gene regulation | eng |
| dc.subject.proposal | Modelo de ecuaciones estructurales | spa |
| dc.subject.proposal | Red de relevancia | spa |
| dc.subject.proposal | Biological network | eng |
| dc.subject.proposal | Red biológica | spa |
| dc.subject.proposal | Relevance network | eng |
| dc.subject.proposal | Inferencia de redes génicas | spa |
| dc.subject.proposal | Structural equations model | eng |
| dc.subject.proposal | Regulación génica | spa |
| dc.subject.proposal | Thermodynamic model | eng |
| dc.title | An assessment of gene regulatory network inference algorithms | spa |
| dc.type | Trabajo de grado - Maestría | spa |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
| dc.type.content | Text | spa |
| dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
| oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |

