Comparación de la metodología BART con otros métodos no paramétricos en la construcción de intervalos de predicción

Osorio Londoño, José Arturo

Comparación de la metodología BART con otros métodos no paramétricos en la construcción de intervalos de predicción

dc.contributor.advisor	Ramírez Guevara, Isabel Cristina
dc.contributor.author	Osorio Londoño, José Arturo
dc.date.accessioned	2024-01-29T19:35:29Z
dc.date.available	2024-01-29T19:35:29Z
dc.date.issued	2023
dc.description.abstract	En los últimos años, el uso de algoritmos de aprendizaje automático ha experimentado un rápido crecimiento en una amplia variedad de aplicaciones prácticas, así como un gran interés en la investigación teórica. Estas aplicaciones se centran en gran medida en problemas de predicción, donde el valor desconocido de una variable se estima en función de variables conocidas vinculadas a través de alguna función. Estos modelos se han vuelto cruciales en diversos campos, desde la gestión de calidad y el control industrial de procesos hasta la gestión de riesgos y la detección de enfermedades en el ámbito de la salud. A pesar de sus propiedades ventajosas y su popularidad, estos modelos sufren de una desventaja significativa: solo producen predicciones puntuales sin proporcionar ninguna medida de incertidumbre a estás predicciones. En esta investigación, evaluamos la capacidad de los Árboles de Regresión Aditivos Bayesianos (BART) frente a técnicas diseñadas para modelos de Random Forest y Gradient Boosting, así como heurísticas (método conformacional) y modelos clásicos como la regresión lineal y la regresión cuantílica,para generar intervalos de predicción. Se realizó un estudio de simulación bajo diferentes escenarios, y los métodos fueron validados utilizando un conjunto final de datos de aseguramiento de calidad. Los estudios de simulación revelaron que BART puede proporcionar intervalos de predicción (con una cobertura del 95% y 90% ) que engloban correctamente el verdadero valor predicho en la mayoría de los casos. En el caso de estudio, BART fue el mejor modelo en la generación de intervalos de predicción y en la precisión de las predicciones. Estos resultados resaltan el potencial de BART como una alternativa significativa para tareas de regresión en áreas críticas, donde predicciones precisas, modelamiento flexible y medidas de confianza en las predicciones son necesarias. (texto tomado de la fuente)	spa
dc.description.abstract	In recent years, the use of machine learning algorithms has rapidly expanded across a wide variety of practical applications as well as garnered significant interest in theoretical research. These applications largely focus on prediction problems, where the unknown value of a variable is estimated based on known variables linked through some function. Machine learning algorithms have become crucial in diverse domains, ranging from quality management and process control performance in industrial settings to risk management and disease detection in healthcare. Despite their advantageous properties and popularity, these models suffer from a significant drawback: they only produce point predictions without any measure of prediction uncertainty. In this research, we assess the capability of Bayesian Additive Regression Trees (BART) compared to techniques designed for Random Forest, Gradient Boosting ensemble models, heuristics (conformal prediction) and classic models as linear regression and quantile regression when generating prediction intervals. A simulation study was conducted under various scenarios, and the methods were validated using a final dataset from quality assurance. The simulation studies revealed that BART demonstrates an impressive ability to generate prediction intervals (at the 95% and 90% coverage) that correctly encompass the true predicted value in most of the cases. In the case study, validation BART was the best model in the prediction interval generation and in prediction accuracy. These results highlight BART’s potential as a significant alternative for regression tasks in critical areas, where accurate predictions, flexible modeling, and confidence measures on the predictions are imperative.	eng
dc.description.curriculararea	Área Curricular Estadística	spa
dc.description.degreelevel	Maestría	spa
dc.description.degreename	Maestría en ciencias - Estadística	spa
dc.format.extent	69 páginas	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.instname	Universidad Nacional de Colombia	spa
dc.identifier.reponame	Repositorio Institucional Universidad Nacional de Colombia	spa
dc.identifier.repourl	https://repositorio.unal.edu.co/	spa
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/85493
dc.language.iso	spa	spa
dc.publisher	Universidad Nacional de Colombia	spa
dc.publisher.branch	Universidad Nacional de Colombia - Sede Medellín	spa
dc.publisher.faculty	Facultad de Ciencias	spa
dc.publisher.place	Medellín, Colombia	spa
dc.publisher.program	Medellín - Ciencias - Maestría en Ciencias - Estadística	spa
dc.relation.references	Agresti, A. (2015). Foundations of linear and generalized linear models. John Wiley & Sons.	spa
dc.relation.references	Angelopoulos, A. N. & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511.	spa
dc.relation.references	Bertolini, M., Mezzogori, D., Neroni, M., & Zammori, F. (2021). Machine learning for industrial applications: A comprehensive literature review. Expert Systems with Applications, 175:114820.	spa
dc.relation.references	Bogner, K., Pappenberger, F., & Zappa, M. (2019). Machine learning techniques for predicting the energy consumption/production and its uncertainties driven by meteorological observations and forecasts. Sustainability, 11(12):3328.	spa
dc.relation.references	Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.	spa
dc.relation.references	Chen, T. & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In 65 Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794.	spa
dc.relation.references	Chipman, H. A., George, E. I., & McCulloch, R. (2010). Bart: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1):266–298.	spa
dc.relation.references	Chou, J.-S., Chiu, C.-K., Farfoura, M., & Al-Taharwa, I. (2011). Optimizing the prediction accuracy of concrete compressive strength based on a comparison of data-mining techniques. Journal of Computing in Civil Engineering, 25(3):242–253.	spa
dc.relation.references	De Brabanter, K., De Brabanter, J., Suykens, J. A., & De Moor, B. (2010). Approximate confidence and prediction intervals for least squares support vector regression. IEEE Transactions on Neural Networks, 22(1):110–120.	spa
dc.relation.references	Ehsan, B. M. A., Begum, F., Ilham, S. J., & Khan, R. S. (2019). Advanced wind speed prediction using convective weather variables through machine learning application. Applied Computing and Geosciences, 1:100002.	spa
dc.relation.references	Fenske, N., Kneib, T., & Hothorn, T. (2011). Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. Journal of the American Statistical Association, 106(494):494–510.	spa
dc.relation.references	Friedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning, volume 1. Springer series in statistics New York.	spa
dc.relation.references	Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.	spa
dc.relation.references	Geraci, M. & Bottai, M. (2007). Quantile regression for longitudinal data using the asymmetric laplace distribution. Biostatistics, 8(1):140–154.	spa
dc.relation.references	Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on tabular data? arXiv preprint arXiv:2207.08815.	spa
dc.relation.references	Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.	spa
dc.relation.references	He, J., Wanik, D. W., Hartman, B. M., Anagnostou, E. N., Astitha, M., & Frediani, M. E. (2017). Nonparametric tree-based predictive modeling of storm outages on an electric distribution network. Risk Analysis, 37(3):441–458.	spa
dc.relation.references	Hernández, B., Raftery, A. E., Pennington, S. R., & Parnell, A. C. (2018). Bayesian additive regression trees using bayesian model averaging. Statistics and computing, 28(4):869–890.	spa
dc.relation.references	Heskes, T. (1996). Practical confidence and prediction intervals. Advances in neural information processing systems, 9.	spa
dc.relation.references	Kapelner, A. & Bleich, J. (2013). bartmachine: Machine learning with bayesian additive regression trees. arXiv preprint arXiv:1312.2171.	spa
dc.relation.references	Khosravi, A., Nahavandi, S., Creighton, D., & Atiya, A. F. (2011). Comprehensive review of neural network-based prediction intervals and new advances. IEEE Transactions on neural networks, 22(9):1341–1356.	spa
dc.relation.references	Koenker, R. (2005). Quantile Regression. Econometric Society Monographs. Cambridge University Press.	spa
dc.relation.references	Koenker, R., Portnoy, S., Ng, P. T., Zeileis, A., Grosjean, P., & Ripley, B. D. (2012). Package ‘quantreg’.	spa
dc.relation.references	Kumar, S. & Srivistava, A. N. (2012). Bootstrap prediction intervals in non-parametric regression with applications to anomaly detection. In The 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, number ARC-E-DAA-TN6188.	spa
dc.relation.references	Lei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., &Wasserman, L. (2018). Distributionfree predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111.	spa
dc.relation.references	Lei, J., Rinaldo, A., & Wasserman, L. (2015). A conformal prediction approach to explore functional data. Annals of Mathematics and Artificial Intelligence, 74:29–43.	spa
dc.relation.references	Lei, J. & Wasserman, L. (2014). Distribution-free prediction bands for non-parametric regression. Journal of the Royal Statistical Society: Series B: Statistical Methodology, pages 71–96.	spa
dc.relation.references	Li, Y., Chen, J., & Feng, L. (2012). Dealing with uncertainty: A survey of theories and practices. IEEE Transactions on Knowledge and Data Engineering, 25(11):2463– 2482.	spa
dc.relation.references	Mayr, A., Hothorn, T., & Fenske, N. (2012). Prediction intervals for future bmi values of individual children-a non-parametric approach by quantile boosting. BMC Medical Research Methodology, 12(1):6.	spa
dc.relation.references	Meinshausen, N. (2006). Quantile regression forests. Journal of Machine Learning Research, 7(Jun):983–999.	spa
dc.relation.references	Meinshausen, N. (2007). Quantregforest: quantile regression forests. R package version 0.2-2.	spa
dc.relation.references	Pevec, D. & Kononenko, I. (2015). Prediction intervals in supervised learning for model evaluation and discrimination. Applied Intelligence, 42(4):790–804.	spa
dc.relation.references	Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and systems magazine, 6(3):21–45.	spa
dc.relation.references	Schapire, R. E. (2003). The boosting approach to machine learning: An overview. In Nonlinear estimation and classification, pages 149–171. Springer.	spa
dc.relation.references	Schmoyer, R. L. (1992). Asymptotically valid prediction intervals for linear models. Technometrics, 34(4):399–408.	spa
dc.relation.references	Seber, G. A. & Lee, A. J. (2012). Linear regression analysis. John Wiley & Sons.	spa
dc.relation.references	Shafer, G. & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9(Mar):371–421.	spa
dc.relation.references	Shehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M. A., Shambour, M. K. Y., Alsalibi, A. I., & Gandomi, A. H. (2022). Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145:105458.	spa
dc.relation.references	Stine, R. A. (1985). Bootstrap prediction intervals for regression. Journal of the American Statistical Association, 80(392):1026–1031.	spa
dc.relation.references	Su, D., Ting, Y. Y., & Ansel, J. (2018). Tight prediction intervals using expanded interval minimization. arXiv preprint arXiv:1806.11222.	spa
dc.relation.references	Tan, Y. V. & Roy, J. (2019). Bayesian additive regression trees and the general bart model. Statistics in medicine, 38(25):5048–5069.	spa
dc.relation.references	Yeh, I.-C. (1998). Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28(12):1797–1808.	spa
dc.relation.references	Yu, K. & Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4):437–447.	spa
dc.relation.references	Zapranis, A. & Livanis, E. (2005). Prediction intervals for neural network models. In Proceedings of the 9th WSEAS International Conference on Computers, page 76. World Scientific and Engineering Academy and Society (WSEAS).	spa
dc.relation.references	Zhang, H., Zimmerman, J., Nettleton, D., & Nordman, D. J. (2019). Random forest prediction intervals. The American Statistician, 74(4):392–406.	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.license	Reconocimiento 4.0 Internacional	spa
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	spa
dc.subject.ddc	510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas	spa
dc.subject.lemb	Análisis de regresión
dc.subject.lemb	Teoría Bayesiana de decisiones estadísticas
dc.subject.proposal	Árboles de regresión aditivos bayesianos	spa
dc.subject.proposal	BART	eng
dc.subject.proposal	modelos de ensamble	spa
dc.subject.proposal	intervalos de prediccion	spa
dc.subject.proposal	estudios de simulacion	spa
dc.subject.proposal	ensemble models	eng
dc.subject.proposal	Bayesian Additive Regression Trees	eng
dc.subject.proposal	prediction intervals	eng
dc.subject.proposal	statistical simulation	eng
dc.title	Comparación de la metodología BART con otros métodos no paramétricos en la construcción de intervalos de predicción	spa
dc.title.translated	Comparison of BART methodology with other nonparametric methods in the construction of prediction intervals	eng
dc.type	Trabajo de grado - Maestría	spa
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc	spa
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.content	Text	spa
dc.type.driver	info:eu-repo/semantics/masterThesis	spa
dc.type.redcol	http://purl.org/redcol/resource_type/TM	spa
dc.type.version	info:eu-repo/semantics/acceptedVersion	spa
dcterms.audience.professionaldevelopment	Estudiantes	spa
dcterms.audience.professionaldevelopment	Investigadores	spa
dcterms.audience.professionaldevelopment	Maestros	spa
dcterms.audience.professionaldevelopment	Público general	spa
oaire.accessrights	http://purl.org/coar/access_right/c_abf2	spa

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: 1037631465.2023.pdf
Tamaño:: 1009.18 KB
Formato:: Adobe Portable Document Format
Descripción:: Tesis de Maestría en Ciencias - Estadistica

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 5.74 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Maestría en Ciencias - Estadística