R package for estimating parameters of some regression models with or without covariates using TensorFlow

Garcés Céspedes, Sara

Mostrar el registro sencillo del documento

dc.rights.license	Atribución-NoComercial 4.0 Internacional
dc.contributor.advisor	Hernández Barajas, Freddy
dc.contributor.author	Garcés Céspedes, Sara
dc.date.accessioned	2021-11-11T14:49:45Z
dc.date.available	2021-11-11T14:49:45Z
dc.date.issued	2021-11-10
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/80677
dc.description	ilustraciones, diagramas, tablas
dc.description.abstract	La tarea de estimar parámetros es muy importante tanto en aplicaciones científicas como de industria. El lenguaje de programación R provee una amplia variedad de funciones creadas para encontrar los estimadores de máxima verosimilitud de parámetros de distribuciones y de modelos de regresión. En este trabajo se presenta el paquete estimtf junto con sus principales funciones mle_tf y mlereg_tf. Este paquete fue diseñado con el objetivo de encontrar los estimadores de máxima verosimilitud de parámetros distribucionales y de regresión usando TensorFlow, una librería de código abierto para computación numérica creada por Google. Para alcanzar este objetivo se diseñó un proceso de estimación iterativo en el cual se utilizan los optimizadores incluidos en esta librería para maximizar la función de verosimilitud. Para ilustrar el uso del paquete estimtf y evaluar el desempeño del proceso de estimación, se llevó a cabo un estudio de simulación y se presentaron algunas aplicaciones usando bases de datos reales. A partir del estudio de simulación se observó que el tamaño de muestra, el optimizador seleccionado y el valor inicial de la tasa de aprendizaje afectan las estimaciones obtenidas con las funciones mle_tf y mlereg_tf. Adicionalmente, las estimaciones obtenidas con ambas funciones resultaron muy cercanas a los verdaderos valores de los parámetros y muy similares a las estimaciones obtenidas con otras funciones de R, las cuales son muy populares y comúnmente usadas para la estimación de parámetros. (Texto tomado de la fuente)
dc.description.abstract	The task of estimating parameters is very important in both scientific and industrial applications. The R programming language provides a wide variety of functions created to find the maximum likelihood estimates of parameters from distributions and regression models. In this work the estimtf package with its main functions mle_tf and mlereg_tf are presented. This package was design with the aim of finding the maximum likelihood estimates of distributional and regression parameters using TensorFlow, an open-source library for numerical computation created by Google. To achieve this goal an iterative estimation process was design in which the TensorFlow optimizers are used to maximize the likelihood function. To illustrate the use of the \pkg{estimtf} package and evaluate the performance of the estimation process, a simulation study was performed as well as some applications using real datasets. From the simulation study, an impact of the sample size, the selected optimizer, and the initial value of the learning rate on the estimates obtained with the mle_tf and the mlereg_tf functions was observed. Additionally, the estimates obtained with both functions were very close to the real value of the parameters and very similar to the estimates obtained with other R functions that are very popular and widely used for estimating parameters.
dc.format.extent	xv, 106 páginas
dc.format.mimetype	application/pdf
dc.language.iso	eng
dc.publisher	Universidad Nacional de Colombia
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.subject.ddc	510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas
dc.title	R package for estimating parameters of some regression models with or without covariates using TensorFlow
dc.type	Trabajo de grado - Maestría
dc.type.driver	info:eu-repo/semantics/masterThesis
dc.type.version	info:eu-repo/semantics/acceptedVersion
dc.publisher.program	Medellín - Ciencias - Maestría en Ciencias - Estadística
dc.description.degreelevel	Maestría
dc.description.degreename	Magíster en Ciencias - Estadística
dc.identifier.instname	Universidad Nacional de Colombia
dc.identifier.reponame	Repositorio Institucional Universidad Nacional de Colombia
dc.identifier.repourl	https://repositorio.unal.edu.co/
dc.publisher.department	Escuela de estadística
dc.publisher.faculty	Facultad de Ciencias
dc.publisher.place	Medellín, Colombia
dc.publisher.branch	Universidad Nacional de Colombia - Sede Medellín
dc.relation.references	Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, M., J. Isard, . . . Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.
dc.relation.references	Adamidis, K., Dimitrakopoulou, T., & Loukas, S. (2005). On an extension of the exponentialgeometric distribution. Statistics Probability Letters, 73 , 259-269.
dc.relation.references	Agresti, A. (2015). Foundations of linear and generalized linear models. Wiley
dc.relation.references	Allaire, J., & Tang, Y. (2021). tensorflow: R interface to “tensorflow” [Computer software manual]. Retrieved from https://github.com/rstudio/tensorflow (R package version 2.2.0.9000)
dc.relation.references	Bebbington, M., Lai, C.-D., & Zitikis, R. (2007). A flexible weibull extension. Reliability Engineering System Safety, 92 (6), 719-726.
dc.relation.references	Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures.
dc.relation.references	Bolker, B., & R Development Core Team. (2020). bbmle: Tools for general maximum likelihood estimation [Computer software manual]. Retrieved from https://CRAN.R -project.org/package=bbmle (R package version 1.0.23.1)
dc.relation.references	Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proc. of COMPSTAT.
dc.relation.references	Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.
dc.relation.references	Byrd, R., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal of Scientific Computing, 16 , 1190–1208.
dc.relation.references	Commenges, D., Jacqmin-Gadda, H., Proust-Lima, C., & Guedj, J. (2006). A newton-like algorithm for likelihood maximization: The robust-variance scoring algorithm. Arxiv math/0610402 .
dc.relation.references	Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39 (1), 1–38
dc.relation.references	Do, Q., Son, T., & Chaudri, J. (2017). Classification of asthma severity and medication using tensorflow and multilevel databases. Procedia Computer Science, 113 , 344-351.
dc.relation.references	Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12 , 2121-2159.
dc.relation.references	Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A, 222 , 309–368.
dc.relation.references	Fox, P. A., Hall, A. P., & Schryer, N. L. (1978). The port mathematical subroutine library. ACM Trans. Math. Softw., 4 (2), 104–126.
dc.relation.references	Galeone, P. (2019). Hands-on neural networks with tensorflow 2.0: understand tensorflow, from static graph to eager execution, and design neural networks (1st ed.). Packt Publishing.
dc.relation.references	Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. (http:// www.deeplearningbook.org)
dc.relation.references	Henningsen, A., & Toomet, O. (2011). maxlik: A package for maximum likelihood estimation in R. Computational Statistics, 26 (3), 443-458.
dc.relation.references	Garcés, S., & Hernández, F. (2021). estimtf: Estimation of distributional and regression parameters using tensorflow [Computer software manual]. Retrieved from https:// github.com/SaraGarcesCespedes/estimtf (R package version 0.1.0)
dc.relation.references	Hernandez, F., Usuga, O., Patino, C., Mosquera, J., & Urrea, A. (2021). Reldists: Estimation for some reliability distributions within gamlss framework [Computer software manual]
dc.relation.references	Hernández, F., & Usuga, O. (2019). Manual de R [Computer software manual]. Retrieved from https://fhernanb.github.io/Manual-de-R/
dc.relation.references	Ihaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5 (3), 299–314.
dc.relation.references	Karlis, D., & Xekalaki, E. (2003). Choosing initial values for the em algorithm for finite mixtures. In Comput. stat. data anal.
dc.relation.references	Keydana, S. (2020). tfprobability: Interface to “tensorflow probability” [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=tfprobability (R package version 0.11.0.0)
dc.relation.references	Kingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. International Conference on Learning Representations.
dc.relation.references	Kissell, R., & Poserina, J. (2017). Chapter 4 - advanced math and statistics. In R. Kissell & J. Poserina (Eds.), Optimal sports math, statistics, and fantasy (p. 103-135). Academic Press. Retrieved from https://www.sciencedirect.com/science/article/pii/B9780128051634000049
dc.relation.references	Devore, J. (2016). Probability and statistics for engineering and the sciences. Cengage Learning. Retrieved from https://books.google.com.co/books?id=UouECwAAQBAJ
dc.relation.references	Bakouch, H., Dey, S., Ramos, P., & Louzada, F. (2017). Binomial-exponential 2 Distribution: Different Estimation Methods with Weather Applications. TEMA (Sao Carlos), 18 , 233 - 251.
dc.relation.references	Bélisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on Rd. Journal of Applied Probability, 885–895.
dc.relation.references	Legendre, A. M. A. M. (1805). Nouvelles méthodes pour la détermination des orbites des cometes [microform] / par a.m. legendre. Paris: F. Didot.
dc.relation.references	Ling, M. (2018). A comparison of estimation methods for generalized gamma distribution with one-shot device testing data.
dc.relation.references	Little, T. (2014). The oxford handbook of quantitative methods (No. v. 1). Oxford University Press.
dc.relation.references	Louzada, F., Ramos, P. L., & Perdoná, G. (2016). Different estimation procedures for the parameters of the extended exponential geometric distribution for medical data. Computational and Mathematical Methods in Medicine, 2016 .
dc.relation.references	Mai Anh, T., Bastin, F., & Frejinger, E. (2014). On optimization algorithms for maximum likelihood estimation
dc.relation.references	Merovci, F. (2013). Transmuted rayleigh distribution. Austrian Journal of Statistics, 42 (1), 21-31. Retrieved from https://www.ajs.or.at/index.php/ajs/article/view/vol42%2C%20no1-2
dc.relation.references	Millar, R. (2011). Maximum likelihood estimation and inference: With examples in R, SAS and ADMB. Wiley.
dc.relation.references	Mosquera, J., & Hernandez, F. (2019). Estimationtools: Maximum likelihood estimation for probability functions from data sets [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=EstimationTools (R package version 1.2.1)
dc.relation.references	Mullen, K., Ardia, D., Gil, D., Windover, D., & Cline, J. (2011). DEoptim: An R package for global optimization by differential evolution. Journal of Statistical Software, 40 (6), 1–26. Retrieved from http://www.jstatsoft.org/v40/i06/
dc.relation.references	Muralidharan, K., & Khabia, A. (2014). Some statistical inferences on inlier(s) models. International Journal of System Assurance Engineering and Management, 8 .
dc.relation.references	Nash, J. C. (2014). Nonlinear parameter optimization using R tools..
dc.relation.references	Nelder, J., & Mead, R. (1965). A simplex method for function minimization. Comput. J., 7 , 308-313.
dc.relation.references	Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135 (3), 370–384
dc.relation.references	Nesterov, Y. (2014). Introductory lectures on convex optimization: A basic course (1st ed.). Springer Publishing Company, Incorporated.
dc.relation.references	Nocedal, J., & Wright, S. (2006). Numerical optimization. Springer New York. Retrieved from https://books.google.at/books?id=VbHYoSyelFcC
dc.relation.references	Pawitan, Y. (2013). In all likelihood: Statistical modelling and inference using likelihood. OUP Oxford.
dc.relation.references	Pearson, K. (1936). Method of moments and method of maximum likelihood. Biometrika, 28 (1/2), 34–59.
dc.relation.references	Qian, N. (1999). On the momentum term in gradient descent learning algorithms. Neural Networks, 12 (1), 145 - 151.
dc.relation.references	Ramos, P., & Louzada, F. (2019). A distribution for instantaneous failures. Stats, 2 , 247-258.
dc.relation.references	R Core Team. (2021). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/
dc.relation.references	Rigby, R. A., & Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society. Series C (Applied Statistics), 54 (3), 507–554. Retrieved from http://www.jstor.org/stable/3592732
dc.relation.references	Rizzo, M. (2007). Statistical computing with R. Chapman & Hall/CRC.
dc.relation.references	Ross, S. M. (2006). Simulation, fourth edition. USA: Academic Press, Inc.
dc.relation.references	RStudio. (2020). Tensorflow for R. Retrieved 24-06-2020, from https://tensorflow .rstudio.com/
dc.relation.references	Ruder, S. (2016). An overview of gradient descent optimization algorithms.
dc.relation.references	Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323 , 533-536.
dc.relation.references	Sawant, A., Bhandari, M., Yadav, R., Yele, R., & Bendale, S. (2018). Brain cancer detection from mri: a machine learning approach (tensorflow). International Research Journal of Engineering and Technology (IRJET), 5 (4).
dc.relation.references	Schnabel, R. B., Koonatz, J. E., & Weiss, B. E. (1985). A modular system of algorithms for unconstrained minimization. ACM Trans. Math. Softw., 11 (4), 419–440.
dc.relation.references	Stasinopoulos, D., Rigby, R., Heller, G., Voudouris, V., & De Bastiani, F. (2017). Flexible regression and smoothing: Using gamlss in R.
dc.relation.references	Stigler, S. M. (1981). Gauss and the invention of least squares. The Annals of Statistics, 9 (3), 465–474.
dc.relation.references	Storvik, G. (2011). Numerical optimization of likelihoods : Additional literature for stk 2120.
dc.relation.references	Sweeting, T. J. (1980). Uniform asymptotic normality of the maximum likelihood estimator. The Annals of Statistics, 8 (6), 1375–1381.
dc.relation.references	TensorFlow. (2020). Tensorflow core v2.2.0. Retrieved 11-06-2020, from https://www.tensorflow.org/
dc.relation.references	Variani, E., Bagby, T., McDermott, E., & Bacchiani, M. (2017). End-to-end training of acoustic models for large vocabulary continuous speech recognition with tensorflow. In Interspeech.
dc.relation.references	Wickham, H. (2015). R packages (1st ed.). OReilly Media, Inc
dc.relation.references	Wilks, D. S. (2019). Chapter 4 - parametric probability distributions. In D. S. Wilks (Ed.), Statistical methods in the atmospheric sciences (fourth edition) (Fourth Edition ed., p. 77-141). Elsevier.
dc.relation.references	Yang, X.-S. (2021). Chapter 1 - introduction to algorithms. In X.-S. Yang (Ed.), Natureinspired optimization algorithms (second edition) (Second Edition ed., p. 1-22). Academic Press. Retrieved from https://www.sciencedirect.com/science/article/pii/B9780128219867000081
dc.relation.references	Zakerzadeh, H., & Dolati, A. (2009). Generalized lindley distribution. Journal of Mathematical Extension, 3 , 1-17.
dc.relation.references	Zeiler, M. (2012). Adadelta: An adaptive learning rate method. , 1212
dc.relation.references	Dey, S., Raheem, E., & Mukherjee, S. (2017). Statistical properties and different methods of estimation of transmuted rayleigh distribution. Revista Colombiana de Estadística, 40 , 165 - 203. Retrieved from http://www.scielo.org.co/scielo.phpscript=sci_arttext&pid=S0120-17512017000100008&nrm=iso
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.subject.lemb	Estimación de parámetros
dc.subject.lemb	Parameter estimation
dc.subject.proposal	TensorFlow
dc.subject.proposal	Estimation of parameters
dc.subject.proposal	Maximum likelihood
dc.subject.proposal	Optimization algorithms
dc.subject.proposal	Estimación de parámetros
dc.subject.proposal	Máxima verosimilitud
dc.subject.proposal	Algoritmos de optimización
dc.title.translated	Propuesta de un paquete en R para la estimación de parámetros de algunos modelos de regresión con y sin covariables usando TensorFlow
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.content	Text
dc.type.redcol	http://purl.org/redcol/resource_type/TM
oaire.accessrights	http://purl.org/coar/access_right/c_abf2
dcterms.audience.professionaldevelopment	Investigadores
dc.description.curriculararea	Área Curricular Estadística

Archivos en el documento

Nombre:: 1037643159.2021.pdf
Tamaño:: 1.047Mb
Formato:: PDF
Descripción:: Tesis de Maestría en Ciencias- ...

Descargar

Este documento aparece en la(s) siguiente(s) colección(ones)

Maestría en Ciencias - Estadística [134]

Mostrar el registro sencillo del documento

Atribución-NoComercial 4.0 Internacional

Esta obra está bajo licencia internacional Creative Commons Reconocimiento-NoComercial 4.0.Este documento ha sido depositado por parte de el(los) autor(es) bajo la siguiente constancia de depósito