Regresión multivariada robusta: un enfoque para datos con alta dimensionalidad
dc.contributor.advisor | Guevara Gonzalez, Ruben Dario | spa |
dc.contributor.author | Herrera Santana, Juan Fernando | spa |
dc.date.accessioned | 2024-07-18T13:59:28Z | |
dc.date.available | 2024-07-18T13:59:28Z | |
dc.date.issued | 2024 | |
dc.description | ilustraciones (algunas a color), diagramas | spa |
dc.description.abstract | La regresión lineal múltiple multivariada es una técnica estadística ampliamente utilizada para modelar las relaciones entre varias variables respuesta y varias variables predictoras. Los métodos tradicionales basados en la verosimilitud pueden producir resultados muy engañosos en presencia de valores atípicos. En este trabajo, proponemos dos métodos robustos de regresión multivariada diseñados para manejar datos con alta dimensionalidad: uno basado en el estimador MRCD, un estimador robusto de localización y dispersión para datos con alta dimensionalidad; y otro enfocado en reducir la dimensionalidad del problema mediante la utilización de la metodología ROSPCA. A través de simulaciones, evaluamos la robustez y eficiencia de los estimadores obtenidos, la capacidad de las metodologías para clasificar correctamente observaciones en conjuntos de datos contaminados, y el costo computacional. Una aplicación con datos reales ilustra el uso de las metodologías propuestas. (Texto tomado de la fuente) | spa |
dc.description.abstract | Multivariate multiple linear regression is a widely used statistical technique for modelling relationships between some response variables and several predictor variables. Traditional likelihood-based methods can produce very misleading results in the presence of outliers. In this work, we propose two robust multivariate regression methods designed to handle high-dimensional data: one based on the minimum regularized covariance determinant estimator, a robust estimator of location and scatter for high-dimensional data; and another based on dimensionality reduction using robust sparse principal component analysis. Through a study simulation, we evaluate the robustness and efficiency of the estimators obtained, the ability of the methodologies to correctly classify observations in contaminated datasets, and the computational cost. A real data application illustrates the use of the proposed methodologies. (Texto tomado de la fuente) | eng |
dc.description.degreelevel | Maestría | spa |
dc.description.degreename | Magíster en Ciencias - Estadística | spa |
dc.format.extent | xv, 125 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/86553 | |
dc.publisher | Universidad Nacional de Colombia | spa |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá | spa |
dc.publisher.faculty | Facultad de Ciencias | spa |
dc.publisher.place | Bogotá, Colombia | spa |
dc.publisher.program | Bogotá - Ciencias - Maestría en Ciencias - Estadística | spa |
dc.relation.references | Aylin Alin and Claudio Agostinelli. Robust iteratively reweighted SIMPLS. Journal of Chemometrics, 31(3):e2881, 2017. | spa |
dc.relation.references | Kris Boudt, Peter J Rousseeuw, Steven Vanduffel, and Tim Verdonck. The minimum regularized covariance determinant estimator. Statistics and Computing, 30(1):113--128, 2020. | spa |
dc.relation.references | Hasan Bulut. Mahalanobis distance based on minimum regularized covariance determinant estimators for high dimensional data. Communications in Statistics-Theory and Methods, 49(24):5897--5907, 2020. | spa |
dc.relation.references | Le Chang and AH Welsh. Robust multivariate lasso regression with covariance estimation. Journal of Computational and Graphical Statistics, pages 1--13, 2022. | spa |
dc.relation.references | P Laurie Davies. Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. The Annals of Statistics, pages 1269--1292, 1987. | spa |
dc.relation.references | Sijmen De Jong. SIMPLS: an alternative approach to partial least squares regression. Chemometrics and intelligent laboratory systems, 18(3):251--263, 1993. | spa |
dc.relation.references | Jasper Engel, Lutgarde Buydens, and Lionel Blanchet. An overview of large-dimensional covariance and precision matrix estimators with applications in chemometrics. Journal of Chemometrics, 31(4):e2880, 2017. | spa |
dc.relation.references | Peter Filzmoser and Klaus Nordhausen. Robust linear regression for high-dimensional data: An overview. Wiley Interdisciplinary Reviews: Computational Statistics, 13(4):e1524, 2021. | spa |
dc.relation.references | Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55--67, 1970. | spa |
dc.relation.references | Peter J Huber. Robust estimation of a location parameter. In Breakthroughs in statistics: Methodology and distribution, pages 492--518. Springer, 1992. | spa |
dc.relation.references | Mia Hubert. Robust methods for high-dimensional data. Comprehensive Chemometrics (Issue June). https://doi. org/10.1016/b978-0-12-409547-2.14883-8, 2020. | spa |
dc.relation.references | Mia Hubert and Michiel Debruyne. Minimum covariance determinant. Wiley interdisciplinary Reviews: Computational Statistics, 2(1):36--43, 2010. | spa |
dc.relation.references | Mia Hubert, Tom Reynkens, Eric Schmitt, and Tim Verdonck. Sparse PCA for high-dimensional data with outliers. Technometrics, 58(4):424--434, 2016. | spa |
dc.relation.references | Mia Hubert, Peter J Rousseeuw, and Karlien Vanden Branden. ROBPCA: a new approach to robust principal component analysis. Technometrics, 47(1):64--79, 2005. | spa |
dc.relation.references | Mia Hubert, Peter J Rousseeuw, and Tim Verdonck. A deterministic algorithm for robust location and scatter. Journal of Computational and Graphical Statistics, 21(3):618--637, 2012. | spa |
dc.relation.references | Mia Hubert and Sabine Verboven. A robust PCR method for high-dimensional regressors. Journal of Chemometrics: A Journal of the Chemometrics Society, 17(8-9):438--452, 2003. | spa |
dc.relation.references | Richard Arnold Johnson, Dean W Wichern, et al. Applied multivariate statistical analysis. 2002. | spa |
dc.relation.references | John T Kent and David E Tyler. Constrained M-estimation for multivariate location and scatter. The Annals of Statistics, 24(3):1346--1370, 1996. | spa |
dc.relation.references | Hendrik P Lopuhaa. On the relation between S-estimators and M-estimators of multivariate location and covariance. The Annals of Statistics, pages 1662--1683, 1989. | spa |
dc.relation.references | Ricardo Antonio Maronna. Robust M-estimators of multivariate location and scatter. The Annals of Statistics, pages 51--67, 1976. | spa |
dc.relation.references | Brian G Osborne, Thomas Fearn, Andrew R Miller, and Stuart Douglas. Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit doughs. Journal of the Science of Food and Agriculture, 35(1):99--105, 1984. | spa |
dc.relation.references | AHM Rahmatullah Imon. Identifying multiple influential observations in linear regression. Journal of Applied Statistics, 32(9):929--946, 2005. | spa |
dc.relation.references | Peter J Rousseeuw. Least median of squares regression. Journal of the American Statistical association, 79(388):871--880, 1984. | spa |
dc.relation.references | Peter J Rousseeuw. Multivariate estimation with high breakdown point. Mathematical statistics and applications, 8(283-297):37, 1985. | spa |
dc.relation.references | Peter J Rousseeuw and Christophe Croux. Alternatives to the median absolute deviation. Journal of the American Statistical association, 88(424):1273--1283, 1993. | spa |
dc.relation.references | Peter J Rousseeuw and Katrien Van Driessen. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41(3):212--223, 1999. | spa |
dc.relation.references | Peter J Rousseeuw and Annick M Leroy. Robust regression and outlier detection. John Wiley & Sons, 2005. | spa |
dc.relation.references | Peter J Rousseeuw, Stefan Van Aelst, Katrien Van Driessen, and Jose A Gulló. Robust multivariate regression. Technometrics, 46(3):293--305, 2004. | spa |
dc.relation.references | Peter J Rousseeuw and Bert C Van Zomeren. Unmasking multivariate outliers and leverage points. Journal of the American Statistical association, 85(411):633--639, 1990. | spa |
dc.relation.references | Yuliana Susanti, Hasih Pratiwi, Sri Sulistijowati, Twenty Liana, et al. M estimation, S estimation, and MM estimation in robust regression. International Journal of Pure and Applied Mathematics, 91(3):349--360, 2014. | spa |
dc.relation.references | Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267--288, 1996. | spa |
dc.relation.references | Stefan Van Aelst and Peter Rousseeuw. Minimum volume ellipsoid. Wiley Interdisciplinary Reviews: Computational Statistics, 1(1):71--82, 2009. | spa |
dc.relation.references | Siti Zahariah and Habshah Midi. Minimum regularized covariance determinant and principal component analysis-based method for the identification of high leverage points in high dimensional sparse data. Journal of Applied Statistics, pages 1--19, 2022. | spa |
dc.relation.references | Siti Zahariah, Habshah Midi, and Mohd Shafie Mustafa. An improvised SIMPLS estimator based on MRCD-PCA weighting function and its application to real data. Symmetry, 13(11):2211, 2021. | spa |
dc.relation.references | Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2):301--320, 2005. | spa |
dc.relation.references | Hui Zou, Trevor Hastie, and Robert Tibshirani. Sparse principal component analysis. Journal of computational and graphical statistics, 15(2):265--286, 2006. | spa |
dc.relation.references | Yijun Zuo and Hengjian Cui. Depth weighted scatter estimators. 2005. | spa |
dc.relation.references | Yijun Zuo, Hengjian Cui, and Xuming He. On the Stahel-Donoho estimator and depth-weighted means of multivariate data. The Annals of Statistics, pages 167--188, 2004. | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.license | Atribución-NoComercial 4.0 Internacional | spa |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | spa |
dc.subject.ddc | 510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas | spa |
dc.subject.lemb | Análisis multivariante | spa |
dc.subject.lemb | Multivariate analysis | eng |
dc.subject.lemb | Estadística matemática | spa |
dc.subject.lemb | Mathematical statistics | eng |
dc.subject.lemb | Análisis de regresión - Procesamiento de datos | spa |
dc.subject.lemb | Regression analysis - Data processing | eng |
dc.subject.lemb | Teoría de estimación | spa |
dc.subject.lemb | Estimation theory | eng |
dc.subject.lemb | Métodos de simulación | spa |
dc.subject.lemb | Simulation methods | eng |
dc.subject.other | Modelos multinivel (Estadística) | spa |
dc.subject.other | Multilevel models (Statistics) | eng |
dc.subject.other | Estadística robusta | spa |
dc.subject.other | Robust statistics | eng |
dc.subject.proposal | Regresión lineal múltiple multivariada | spa |
dc.subject.proposal | Datos con alta dimensionalidad | spa |
dc.subject.proposal | Estimadores robustos | spa |
dc.subject.proposal | Datos atípicos | spa |
dc.subject.proposal | Multivariate multiple linear regression | eng |
dc.subject.proposal | High-dimensional data; | eng |
dc.subject.proposal | Robust estimators | eng |
dc.subject.proposal | Outliers | eng |
dc.title | Regresión multivariada robusta: un enfoque para datos con alta dimensionalidad | spa |
dc.title.translated | Robust multivariate regression: an approach for high-dimensional data Inglés | eng |
dc.type | Trabajo de grado - Maestría | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
dcterms.audience.professionaldevelopment | Bibliotecarios | spa |
dcterms.audience.professionaldevelopment | Estudiantes | spa |
dcterms.audience.professionaldevelopment | Investigadores | spa |
dcterms.audience.professionaldevelopment | Maestros | spa |
dcterms.audience.professionaldevelopment | Público general | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 1076626278.2024.pdf
- Tamaño:
- 11.25 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Magíster en Ciencias - Estadística
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción: