Regresión multivariada robusta: un enfoque para datos con alta dimensionalidad

dc.contributor.advisorGuevara Gonzalez, Ruben Dariospa
dc.contributor.authorHerrera Santana, Juan Fernandospa
dc.date.accessioned2024-07-18T13:59:28Z
dc.date.available2024-07-18T13:59:28Z
dc.date.issued2024
dc.descriptionilustraciones (algunas a color), diagramasspa
dc.description.abstractLa regresión lineal múltiple multivariada es una técnica estadística ampliamente utilizada para modelar las relaciones entre varias variables respuesta y varias variables predictoras. Los métodos tradicionales basados en la verosimilitud pueden producir resultados muy engañosos en presencia de valores atípicos. En este trabajo, proponemos dos métodos robustos de regresión multivariada diseñados para manejar datos con alta dimensionalidad: uno basado en el estimador MRCD, un estimador robusto de localización y dispersión para datos con alta dimensionalidad; y otro enfocado en reducir la dimensionalidad del problema mediante la utilización de la metodología ROSPCA. A través de simulaciones, evaluamos la robustez y eficiencia de los estimadores obtenidos, la capacidad de las metodologías para clasificar correctamente observaciones en conjuntos de datos contaminados, y el costo computacional. Una aplicación con datos reales ilustra el uso de las metodologías propuestas. (Texto tomado de la fuente)spa
dc.description.abstractMultivariate multiple linear regression is a widely used statistical technique for modelling relationships between some response variables and several predictor variables. Traditional likelihood-based methods can produce very misleading results in the presence of outliers. In this work, we propose two robust multivariate regression methods designed to handle high-dimensional data: one based on the minimum regularized covariance determinant estimator, a robust estimator of location and scatter for high-dimensional data; and another based on dimensionality reduction using robust sparse principal component analysis. Through a study simulation, we evaluate the robustness and efficiency of the estimators obtained, the ability of the methodologies to correctly classify observations in contaminated datasets, and the computational cost. A real data application illustrates the use of the proposed methodologies. (Texto tomado de la fuente)eng
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ciencias - Estadísticaspa
dc.format.extentxv, 125 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/86553
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeBogotá, Colombiaspa
dc.publisher.programBogotá - Ciencias - Maestría en Ciencias - Estadísticaspa
dc.relation.referencesAylin Alin and Claudio Agostinelli. Robust iteratively reweighted SIMPLS. Journal of Chemometrics, 31(3):e2881, 2017.spa
dc.relation.referencesKris Boudt, Peter J Rousseeuw, Steven Vanduffel, and Tim Verdonck. The minimum regularized covariance determinant estimator. Statistics and Computing, 30(1):113--128, 2020.spa
dc.relation.referencesHasan Bulut. Mahalanobis distance based on minimum regularized covariance determinant estimators for high dimensional data. Communications in Statistics-Theory and Methods, 49(24):5897--5907, 2020.spa
dc.relation.referencesLe Chang and AH Welsh. Robust multivariate lasso regression with covariance estimation. Journal of Computational and Graphical Statistics, pages 1--13, 2022.spa
dc.relation.referencesP Laurie Davies. Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. The Annals of Statistics, pages 1269--1292, 1987.spa
dc.relation.referencesSijmen De Jong. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and intelligent laboratory systems, 18(3):251--263, 1993.spa
dc.relation.referencesJasper Engel, Lutgarde Buydens, and Lionel Blanchet. An overview of large-dimensional covariance and precision matrix estimators with applications in chemometrics. Journal of Chemometrics, 31(4):e2880, 2017.spa
dc.relation.referencesPeter Filzmoser and Klaus Nordhausen. Robust linear regression for high-dimensional data: An overview. Wiley Interdisciplinary Reviews: Computational Statistics, 13(4):e1524, 2021.spa
dc.relation.referencesArthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55--67, 1970.spa
dc.relation.referencesPeter J Huber. Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pages 492--518. Springer, 1992.spa
dc.relation.referencesMia Hubert. Robust methods for high-dimensional data. Comprehensive Chemometrics (Issue June). https://doi. org/10.1016/b978-0-12-409547-2.14883-8, 2020.spa
dc.relation.referencesMia Hubert and Michiel Debruyne. Minimum covariance determinant. Wiley interdisciplinary Reviews: Computational Statistics, 2(1):36--43, 2010.spa
dc.relation.referencesMia Hubert, Tom Reynkens, Eric Schmitt, and Tim Verdonck. Sparse PCA for high-dimensional data with outliers. Technometrics, 58(4):424--434, 2016.spa
dc.relation.referencesMia Hubert, Peter J Rousseeuw, and Karlien Vanden Branden. ROBPCA: a new approach to robust principal component analysis. Technometrics, 47(1):64--79, 2005.spa
dc.relation.referencesMia Hubert, Peter J Rousseeuw, and Tim Verdonck. A deterministic algorithm for robust location and scatter. Journal of Computational and Graphical Statistics, 21(3):618--637, 2012.spa
dc.relation.referencesMia Hubert and Sabine Verboven. A robust PCR method for high-dimensional regressors. Journal of Chemometrics: A Journal of the Chemometrics Society, 17(8-9):438--452, 2003.spa
dc.relation.referencesRichard Arnold Johnson, Dean W Wichern, et al. Applied multivariate statistical analysis. 2002.spa
dc.relation.referencesJohn T Kent and David E Tyler. Constrained M-estimation for multivariate location and scatter. The Annals of Statistics, 24(3):1346--1370, 1996.spa
dc.relation.referencesHendrik P Lopuhaa. On the relation between S-estimators and M-estimators of multivariate location and covariance. The Annals of Statistics, pages 1662--1683, 1989.spa
dc.relation.referencesRicardo Antonio Maronna. Robust M-estimators of multivariate location and scatter. The Annals of Statistics, pages 51--67, 1976.spa
dc.relation.referencesBrian G Osborne, Thomas Fearn, Andrew R Miller, and Stuart Douglas. Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit doughs. Journal of the Science of Food and Agriculture, 35(1):99--105, 1984.spa
dc.relation.referencesAHM Rahmatullah Imon. Identifying multiple influential observations in linear regression. Journal of Applied Statistics, 32(9):929--946, 2005.spa
dc.relation.referencesPeter J Rousseeuw. Least median of squares regression. Journal of the American Statistical association, 79(388):871--880, 1984.spa
dc.relation.referencesPeter J Rousseeuw. Multivariate estimation with high breakdown point. Mathematical statistics and applications, 8(283-297):37, 1985.spa
dc.relation.referencesPeter J Rousseeuw and Christophe Croux. Alternatives to the median absolute deviation. Journal of the American Statistical association, 88(424):1273--1283, 1993.spa
dc.relation.referencesPeter J Rousseeuw and Katrien Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212--223, 1999.spa
dc.relation.referencesPeter J Rousseeuw and Annick M Leroy. Robust regression and outlier detection. John Wiley & Sons, 2005.spa
dc.relation.referencesPeter J Rousseeuw, Stefan Van Aelst, Katrien Van Driessen, and Jose A Gulló. Robust multivariate regression. Technometrics, 46(3):293--305, 2004.spa
dc.relation.referencesPeter J Rousseeuw and Bert C Van Zomeren. Unmasking multivariate outliers and leverage points. Journal of the American Statistical association, 85(411):633--639, 1990.spa
dc.relation.referencesYuliana Susanti, Hasih Pratiwi, Sri Sulistijowati, Twenty Liana, et al. M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3):349--360, 2014.spa
dc.relation.referencesRobert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267--288, 1996.spa
dc.relation.referencesStefan Van Aelst and Peter Rousseeuw. Minimum volume ellipsoid. Wiley Interdisciplinary Reviews: Computational Statistics, 1(1):71--82, 2009.spa
dc.relation.referencesSiti Zahariah and Habshah Midi. Minimum regularized covariance determinant and principal component analysis-based method for the identification of high leverage points in high dimensional sparse data. Journal of Applied Statistics, pages 1--19, 2022.spa
dc.relation.referencesSiti Zahariah, Habshah Midi, and Mohd Shafie Mustafa. An improvised SIMPLS estimator based on MRCD-PCA weighting function and its application to real data. Symmetry, 13(11):2211, 2021.spa
dc.relation.referencesHui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301--320, 2005.spa
dc.relation.referencesHui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265--286, 2006.spa
dc.relation.referencesYijun Zuo and Hengjian Cui. Depth weighted scatter estimators. 2005.spa
dc.relation.referencesYijun Zuo, Hengjian Cui, and Xuming He. On the Stahel-Donoho estimator and depth-weighted means of multivariate data. The Annals of Statistics, pages 167--188, 2004.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.lembAnálisis multivariantespa
dc.subject.lembMultivariate analysiseng
dc.subject.lembEstadística matemáticaspa
dc.subject.lembMathematical statisticseng
dc.subject.lembAnálisis de regresión - Procesamiento de datosspa
dc.subject.lembRegression analysis - Data processingeng
dc.subject.lembTeoría de estimaciónspa
dc.subject.lembEstimation theoryeng
dc.subject.lembMétodos de simulaciónspa
dc.subject.lembSimulation methodseng
dc.subject.otherModelos multinivel (Estadística)spa
dc.subject.otherMultilevel models (Statistics)eng
dc.subject.otherEstadística robustaspa
dc.subject.otherRobust statisticseng
dc.subject.proposalRegresión lineal múltiple multivariadaspa
dc.subject.proposalDatos con alta dimensionalidadspa
dc.subject.proposalEstimadores robustosspa
dc.subject.proposalDatos atípicosspa
dc.subject.proposalMultivariate multiple linear regressioneng
dc.subject.proposalHigh-dimensional data;eng
dc.subject.proposalRobust estimatorseng
dc.subject.proposalOutlierseng
dc.titleRegresión multivariada robusta: un enfoque para datos con alta dimensionalidadspa
dc.title.translatedRobust multivariate regression: an approach for high-dimensional data Ingléseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentBibliotecariosspa
dcterms.audience.professionaldevelopmentEstudiantesspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
dcterms.audience.professionaldevelopmentMaestrosspa
dcterms.audience.professionaldevelopmentPúblico generalspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1076626278.2024.pdf
Tamaño:
11.25 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Magíster en Ciencias - Estadística

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: