R package for estimating parameters of some regression models with or without covariates using TensorFlow

dc.contributor.advisorHernández Barajas, Freddy
dc.contributor.authorGarcés Céspedes, Sara
dc.date.accessioned2021-11-11T14:49:45Z
dc.date.available2021-11-11T14:49:45Z
dc.date.issued2021-11-10
dc.descriptionilustraciones, diagramas, tablasspa
dc.description.abstractLa tarea de estimar parámetros es muy importante tanto en aplicaciones científicas como de industria. El lenguaje de programación R provee una amplia variedad de funciones creadas para encontrar los estimadores de máxima verosimilitud de parámetros de distribuciones y de modelos de regresión. En este trabajo se presenta el paquete estimtf junto con sus principales funciones mle_tf y mlereg_tf. Este paquete fue diseñado con el objetivo de encontrar los estimadores de máxima verosimilitud de parámetros distribucionales y de regresión usando TensorFlow, una librería de código abierto para computación numérica creada por Google. Para alcanzar este objetivo se diseñó un proceso de estimación iterativo en el cual se utilizan los optimizadores incluidos en esta librería para maximizar la función de verosimilitud. Para ilustrar el uso del paquete estimtf y evaluar el desempeño del proceso de estimación, se llevó a cabo un estudio de simulación y se presentaron algunas aplicaciones usando bases de datos reales. A partir del estudio de simulación se observó que el tamaño de muestra, el optimizador seleccionado y el valor inicial de la tasa de aprendizaje afectan las estimaciones obtenidas con las funciones mle_tf y mlereg_tf. Adicionalmente, las estimaciones obtenidas con ambas funciones resultaron muy cercanas a los verdaderos valores de los parámetros y muy similares a las estimaciones obtenidas con otras funciones de R, las cuales son muy populares y comúnmente usadas para la estimación de parámetros. (Texto tomado de la fuente)spa
dc.description.abstractThe task of estimating parameters is very important in both scientific and industrial applications. The R programming language provides a wide variety of functions created to find the maximum likelihood estimates of parameters from distributions and regression models. In this work the estimtf package with its main functions mle_tf and mlereg_tf are presented. This package was design with the aim of finding the maximum likelihood estimates of distributional and regression parameters using TensorFlow, an open-source library for numerical computation created by Google. To achieve this goal an iterative estimation process was design in which the TensorFlow optimizers are used to maximize the likelihood function. To illustrate the use of the \pkg{estimtf} package and evaluate the performance of the estimation process, a simulation study was performed as well as some applications using real datasets. From the simulation study, an impact of the sample size, the selected optimizer, and the initial value of the learning rate on the estimates obtained with the mle_tf and the mlereg_tf functions was observed. Additionally, the estimates obtained with both functions were very close to the real value of the parameters and very similar to the estimates obtained with other R functions that are very popular and widely used for estimating parameters.eng
dc.description.curricularareaÁrea Curricular Estadísticaspa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ciencias - Estadísticaspa
dc.format.extentxv, 106 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/80677
dc.language.isoengspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.departmentEscuela de estadísticaspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Ciencias - Maestría en Ciencias - Estadísticaspa
dc.relation.referencesAbadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, M., J. Isard, . . . Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation.spa
dc.relation.referencesAdamidis, K., Dimitrakopoulou, T., & Loukas, S. (2005). On an extension of the exponentialgeometric distribution. Statistics Probability Letters, 73 , 259-269.spa
dc.relation.referencesAgresti, A. (2015). Foundations of linear and generalized linear models. Wileyspa
dc.relation.referencesAllaire, J., & Tang, Y. (2021). tensorflow: R interface to “tensorflow” [Computer software manual]. Retrieved from https://github.com/rstudio/tensorflow (R package version 2.2.0.9000)spa
dc.relation.referencesBebbington, M., Lai, C.-D., & Zitikis, R. (2007). A flexible weibull extension. Reliability Engineering System Safety, 92 (6), 719-726.spa
dc.relation.referencesBengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures.spa
dc.relation.referencesBolker, B., & R Development Core Team. (2020). bbmle: Tools for general maximum likelihood estimation [Computer software manual]. Retrieved from https://CRAN.R -project.org/package=bbmle (R package version 1.0.23.1)spa
dc.relation.referencesBottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proc. of COMPSTAT.spa
dc.relation.referencesBoyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge University Press.spa
dc.relation.referencesByrd, R., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal of Scientific Computing, 16 , 1190–1208.spa
dc.relation.referencesCommenges, D., Jacqmin-Gadda, H., Proust-Lima, C., & Guedj, J. (2006). A newton-like algorithm for likelihood maximization: The robust-variance scoring algorithm. Arxiv math/0610402 .spa
dc.relation.referencesDempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39 (1), 1–38spa
dc.relation.referencesDo, Q., Son, T., & Chaudri, J. (2017). Classification of asthma severity and medication using tensorflow and multilevel databases. Procedia Computer Science, 113 , 344-351.spa
dc.relation.referencesDuchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12 , 2121-2159.spa
dc.relation.referencesFisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A, 222 , 309–368.spa
dc.relation.referencesFox, P. A., Hall, A. P., & Schryer, N. L. (1978). The port mathematical subroutine library. ACM Trans. Math. Softw., 4 (2), 104–126.spa
dc.relation.referencesGaleone, P. (2019). Hands-on neural networks with tensorflow 2.0: understand tensorflow, from static graph to eager execution, and design neural networks (1st ed.). Packt Publishing.spa
dc.relation.referencesGoodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. (http:// www.deeplearningbook.org)spa
dc.relation.referencesHenningsen, A., & Toomet, O. (2011). maxlik: A package for maximum likelihood estimation in R. Computational Statistics, 26 (3), 443-458.spa
dc.relation.referencesGarcés, S., & Hernández, F. (2021). estimtf: Estimation of distributional and regression parameters using tensorflow [Computer software manual]. Retrieved from https:// github.com/SaraGarcesCespedes/estimtf (R package version 0.1.0)spa
dc.relation.referencesHernandez, F., Usuga, O., Patino, C., Mosquera, J., & Urrea, A. (2021). Reldists: Estimation for some reliability distributions within gamlss framework [Computer software manual]spa
dc.relation.referencesHernández, F., & Usuga, O. (2019). Manual de R [Computer software manual]. Retrieved from https://fhernanb.github.io/Manual-de-R/spa
dc.relation.referencesIhaka, R., & Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics, 5 (3), 299–314.spa
dc.relation.referencesKarlis, D., & Xekalaki, E. (2003). Choosing initial values for the em algorithm for finite mixtures. In Comput. stat. data anal.spa
dc.relation.referencesKeydana, S. (2020). tfprobability: Interface to “tensorflow probability” [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=tfprobability (R package version 0.11.0.0)spa
dc.relation.referencesKingma, D., & Ba, J. (2014). Adam: A method for stochastic optimization. International Conference on Learning Representations.spa
dc.relation.referencesKissell, R., & Poserina, J. (2017). Chapter 4 - advanced math and statistics. In R. Kissell & J. Poserina (Eds.), Optimal sports math, statistics, and fantasy (p. 103-135). Academic Press. Retrieved from https://www.sciencedirect.com/science/article/pii/B9780128051634000049spa
dc.relation.referencesDevore, J. (2016). Probability and statistics for engineering and the sciences. Cengage Learning. Retrieved from https://books.google.com.co/books?id=UouECwAAQBAJspa
dc.relation.referencesBakouch, H., Dey, S., Ramos, P., & Louzada, F. (2017). Binomial-exponential 2 Distribution: Different Estimation Methods with Weather Applications. TEMA (Sao Carlos), 18 , 233 - 251.spa
dc.relation.referencesBélisle, C. J. (1992). Convergence theorems for a class of simulated annealing algorithms on Rd. Journal of Applied Probability, 885–895.spa
dc.relation.referencesLegendre, A. M. A. M. (1805). Nouvelles méthodes pour la détermination des orbites des cometes [microform] / par a.m. legendre. Paris: F. Didot.spa
dc.relation.referencesLing, M. (2018). A comparison of estimation methods for generalized gamma distribution with one-shot device testing data.spa
dc.relation.referencesLittle, T. (2014). The oxford handbook of quantitative methods (No. v. 1). Oxford University Press.spa
dc.relation.referencesLouzada, F., Ramos, P. L., & Perdoná, G. (2016). Different estimation procedures for the parameters of the extended exponential geometric distribution for medical data. Computational and Mathematical Methods in Medicine, 2016 .spa
dc.relation.referencesMai Anh, T., Bastin, F., & Frejinger, E. (2014). On optimization algorithms for maximum likelihood estimationspa
dc.relation.referencesMerovci, F. (2013). Transmuted rayleigh distribution. Austrian Journal of Statistics, 42 (1), 21-31. Retrieved from https://www.ajs.or.at/index.php/ajs/article/view/vol42%2C%20no1-2spa
dc.relation.referencesMillar, R. (2011). Maximum likelihood estimation and inference: With examples in R, SAS and ADMB. Wiley.spa
dc.relation.referencesMosquera, J., & Hernandez, F. (2019). Estimationtools: Maximum likelihood estimation for probability functions from data sets [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=EstimationTools (R package version 1.2.1)spa
dc.relation.referencesMullen, K., Ardia, D., Gil, D., Windover, D., & Cline, J. (2011). DEoptim: An R package for global optimization by differential evolution. Journal of Statistical Software, 40 (6), 1–26. Retrieved from http://www.jstatsoft.org/v40/i06/spa
dc.relation.referencesMuralidharan, K., & Khabia, A. (2014). Some statistical inferences on inlier(s) models. International Journal of System Assurance Engineering and Management, 8 .spa
dc.relation.referencesNash, J. C. (2014). Nonlinear parameter optimization using R tools..spa
dc.relation.referencesNelder, J., & Mead, R. (1965). A simplex method for function minimization. Comput. J., 7 , 308-313.spa
dc.relation.referencesNelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society. Series A (General), 135 (3), 370–384spa
dc.relation.referencesNesterov, Y. (2014). Introductory lectures on convex optimization: A basic course (1st ed.). Springer Publishing Company, Incorporated.spa
dc.relation.referencesNocedal, J., & Wright, S. (2006). Numerical optimization. Springer New York. Retrieved from https://books.google.at/books?id=VbHYoSyelFcCspa
dc.relation.referencesPawitan, Y. (2013). In all likelihood: Statistical modelling and inference using likelihood. OUP Oxford.spa
dc.relation.referencesPearson, K. (1936). Method of moments and method of maximum likelihood. Biometrika, 28 (1/2), 34–59.spa
dc.relation.referencesQian, N. (1999). On the momentum term in gradient descent learning algorithms. Neural Networks, 12 (1), 145 - 151.spa
dc.relation.referencesRamos, P., & Louzada, F. (2019). A distribution for instantaneous failures. Stats, 2 , 247-258.spa
dc.relation.referencesR Core Team. (2021). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/spa
dc.relation.referencesRigby, R. A., & Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society. Series C (Applied Statistics), 54 (3), 507–554. Retrieved from http://www.jstor.org/stable/3592732spa
dc.relation.referencesRizzo, M. (2007). Statistical computing with R. Chapman & Hall/CRC.spa
dc.relation.referencesRoss, S. M. (2006). Simulation, fourth edition. USA: Academic Press, Inc.spa
dc.relation.referencesRStudio. (2020). Tensorflow for R. Retrieved 24-06-2020, from https://tensorflow .rstudio.com/spa
dc.relation.referencesRuder, S. (2016). An overview of gradient descent optimization algorithms.spa
dc.relation.referencesRumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323 , 533-536.spa
dc.relation.referencesSawant, A., Bhandari, M., Yadav, R., Yele, R., & Bendale, S. (2018). Brain cancer detection from mri: a machine learning approach (tensorflow). International Research Journal of Engineering and Technology (IRJET), 5 (4).spa
dc.relation.referencesSchnabel, R. B., Koonatz, J. E., & Weiss, B. E. (1985). A modular system of algorithms for unconstrained minimization. ACM Trans. Math. Softw., 11 (4), 419–440.spa
dc.relation.referencesStasinopoulos, D., Rigby, R., Heller, G., Voudouris, V., & De Bastiani, F. (2017). Flexible regression and smoothing: Using gamlss in R.spa
dc.relation.referencesStigler, S. M. (1981). Gauss and the invention of least squares. The Annals of Statistics, 9 (3), 465–474.spa
dc.relation.referencesStorvik, G. (2011). Numerical optimization of likelihoods : Additional literature for stk 2120.spa
dc.relation.referencesSweeting, T. J. (1980). Uniform asymptotic normality of the maximum likelihood estimator. The Annals of Statistics, 8 (6), 1375–1381.spa
dc.relation.referencesTensorFlow. (2020). Tensorflow core v2.2.0. Retrieved 11-06-2020, from https://www.tensorflow.org/spa
dc.relation.referencesVariani, E., Bagby, T., McDermott, E., & Bacchiani, M. (2017). End-to-end training of acoustic models for large vocabulary continuous speech recognition with tensorflow. In Interspeech.spa
dc.relation.referencesWickham, H. (2015). R packages (1st ed.). OReilly Media, Incspa
dc.relation.referencesWilks, D. S. (2019). Chapter 4 - parametric probability distributions. In D. S. Wilks (Ed.), Statistical methods in the atmospheric sciences (fourth edition) (Fourth Edition ed., p. 77-141). Elsevier.spa
dc.relation.referencesYang, X.-S. (2021). Chapter 1 - introduction to algorithms. In X.-S. Yang (Ed.), Natureinspired optimization algorithms (second edition) (Second Edition ed., p. 1-22). Academic Press. Retrieved from https://www.sciencedirect.com/science/article/pii/B9780128219867000081spa
dc.relation.referencesZakerzadeh, H., & Dolati, A. (2009). Generalized lindley distribution. Journal of Mathematical Extension, 3 , 1-17.spa
dc.relation.referencesZeiler, M. (2012). Adadelta: An adaptive learning rate method. , 1212spa
dc.relation.referencesDey, S., Raheem, E., & Mukherjee, S. (2017). Statistical properties and different methods of estimation of transmuted rayleigh distribution. Revista Colombiana de Estadística, 40 , 165 - 203. Retrieved from http://www.scielo.org.co/scielo.phpscript=sci_arttext&pid=S0120-17512017000100008&nrm=isospa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.lembEstimación de parámetros
dc.subject.lembParameter estimation
dc.subject.proposalTensorFloweng
dc.subject.proposalEstimation of parameterseng
dc.subject.proposalMaximum likelihoodeng
dc.subject.proposalOptimization algorithmseng
dc.subject.proposalEstimación de parámetrosspa
dc.subject.proposalMáxima verosimilitudspa
dc.subject.proposalAlgoritmos de optimizaciónspa
dc.titleR package for estimating parameters of some regression models with or without covariates using TensorFloweng
dc.title.translatedPropuesta de un paquete en R para la estimación de parámetros de algunos modelos de regresión con y sin covariables usando TensorFlowspa
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1037643159.2021.pdf
Tamaño:
1.05 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ciencias-Estadística

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
3.98 KB
Formato:
Item-specific license agreed upon to submission
Descripción: