Comparación de la metodología BART con otros métodos no paramétricos en la construcción de intervalos de predicción

dc.contributor.advisorRamírez Guevara, Isabel Cristina
dc.contributor.authorOsorio Londoño, José Arturo
dc.date.accessioned2024-01-29T19:35:29Z
dc.date.available2024-01-29T19:35:29Z
dc.date.issued2023
dc.description.abstractEn los últimos años, el uso de algoritmos de aprendizaje automático ha experimentado un rápido crecimiento en una amplia variedad de aplicaciones prácticas, así como un gran interés en la investigación teórica. Estas aplicaciones se centran en gran medida en problemas de predicción, donde el valor desconocido de una variable se estima en función de variables conocidas vinculadas a través de alguna función. Estos modelos se han vuelto cruciales en diversos campos, desde la gestión de calidad y el control industrial de procesos hasta la gestión de riesgos y la detección de enfermedades en el ámbito de la salud. A pesar de sus propiedades ventajosas y su popularidad, estos modelos sufren de una desventaja significativa: solo producen predicciones puntuales sin proporcionar ninguna medida de incertidumbre a estás predicciones. En esta investigación, evaluamos la capacidad de los Árboles de Regresión Aditivos Bayesianos (BART) frente a técnicas diseñadas para modelos de Random Forest y Gradient Boosting, así como heurísticas (método conformacional) y modelos clásicos como la regresión lineal y la regresión cuantílica,para generar intervalos de predicción. Se realizó un estudio de simulación bajo diferentes escenarios, y los métodos fueron validados utilizando un conjunto final de datos de aseguramiento de calidad. Los estudios de simulación revelaron que BART puede proporcionar intervalos de predicción (con una cobertura del 95% y 90% ) que engloban correctamente el verdadero valor predicho en la mayoría de los casos. En el caso de estudio, BART fue el mejor modelo en la generación de intervalos de predicción y en la precisión de las predicciones. Estos resultados resaltan el potencial de BART como una alternativa significativa para tareas de regresión en áreas críticas, donde predicciones precisas, modelamiento flexible y medidas de confianza en las predicciones son necesarias. (texto tomado de la fuente)spa
dc.description.abstractIn recent years, the use of machine learning algorithms has rapidly expanded across a wide variety of practical applications as well as garnered significant interest in theoretical research. These applications largely focus on prediction problems, where the unknown value of a variable is estimated based on known variables linked through some function. Machine learning algorithms have become crucial in diverse domains, ranging from quality management and process control performance in industrial settings to risk management and disease detection in healthcare. Despite their advantageous properties and popularity, these models suffer from a significant drawback: they only produce point predictions without any measure of prediction uncertainty. In this research, we assess the capability of Bayesian Additive Regression Trees (BART) compared to techniques designed for Random Forest, Gradient Boosting ensemble models, heuristics (conformal prediction) and classic models as linear regression and quantile regression when generating prediction intervals. A simulation study was conducted under various scenarios, and the methods were validated using a final dataset from quality assurance. The simulation studies revealed that BART demonstrates an impressive ability to generate prediction intervals (at the 95% and 90% coverage) that correctly encompass the true predicted value in most of the cases. In the case study, validation BART was the best model in the prediction interval generation and in prediction accuracy. These results highlight BART’s potential as a significant alternative for regression tasks in critical areas, where accurate predictions, flexible modeling, and confidence measures on the predictions are imperative.eng
dc.description.curricularareaÁrea Curricular Estadísticaspa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMaestría en ciencias - Estadísticaspa
dc.format.extent69 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/85493
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Ciencias - Maestría en Ciencias - Estadísticaspa
dc.relation.referencesAgresti, A. (2015). Foundations of linear and generalized linear models. John Wiley & Sons.spa
dc.relation.referencesAngelopoulos, A. N. & Bates, S. (2021). A gentle introduction to conformal prediction and distribution-free uncertainty quantification. arXiv preprint arXiv:2107.07511.spa
dc.relation.referencesBertolini, M., Mezzogori, D., Neroni, M., & Zammori, F. (2021). Machine learning for industrial applications: A comprehensive literature review. Expert Systems with Applications, 175:114820.spa
dc.relation.referencesBogner, K., Pappenberger, F., & Zappa, M. (2019). Machine learning techniques for predicting the energy consumption/production and its uncertainties driven by meteorological observations and forecasts. Sustainability, 11(12):3328.spa
dc.relation.referencesBreiman, L. (2001). Random forests. Machine learning, 45(1):5–32.spa
dc.relation.referencesChen, T. & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In 65 Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794.spa
dc.relation.referencesChipman, H. A., George, E. I., & McCulloch, R. (2010). Bart: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1):266–298.spa
dc.relation.referencesChou, J.-S., Chiu, C.-K., Farfoura, M., & Al-Taharwa, I. (2011). Optimizing the prediction accuracy of concrete compressive strength based on a comparison of data-mining techniques. Journal of Computing in Civil Engineering, 25(3):242–253.spa
dc.relation.referencesDe Brabanter, K., De Brabanter, J., Suykens, J. A., & De Moor, B. (2010). Approximate confidence and prediction intervals for least squares support vector regression. IEEE Transactions on Neural Networks, 22(1):110–120.spa
dc.relation.referencesEhsan, B. M. A., Begum, F., Ilham, S. J., & Khan, R. S. (2019). Advanced wind speed prediction using convective weather variables through machine learning application. Applied Computing and Geosciences, 1:100002.spa
dc.relation.referencesFenske, N., Kneib, T., & Hothorn, T. (2011). Identifying risk factors for severe childhood malnutrition by boosting additive quantile regression. Journal of the American Statistical Association, 106(494):494–510.spa
dc.relation.referencesFriedman, J., Hastie, T., & Tibshirani, R. (2001). The elements of statistical learning, volume 1. Springer series in statistics New York.spa
dc.relation.referencesFriedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232.spa
dc.relation.referencesGeraci, M. & Bottai, M. (2007). Quantile regression for longitudinal data using the asymmetric laplace distribution. Biostatistics, 8(1):140–154.spa
dc.relation.referencesGrinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why do tree-based models still outperform deep learning on tabular data? arXiv preprint arXiv:2207.08815.spa
dc.relation.referencesHastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.spa
dc.relation.referencesHe, J., Wanik, D. W., Hartman, B. M., Anagnostou, E. N., Astitha, M., & Frediani, M. E. (2017). Nonparametric tree-based predictive modeling of storm outages on an electric distribution network. Risk Analysis, 37(3):441–458.spa
dc.relation.referencesHernández, B., Raftery, A. E., Pennington, S. R., & Parnell, A. C. (2018). Bayesian additive regression trees using bayesian model averaging. Statistics and computing, 28(4):869–890.spa
dc.relation.referencesHeskes, T. (1996). Practical confidence and prediction intervals. Advances in neural information processing systems, 9.spa
dc.relation.referencesKapelner, A. & Bleich, J. (2013). bartmachine: Machine learning with bayesian additive regression trees. arXiv preprint arXiv:1312.2171.spa
dc.relation.referencesKhosravi, A., Nahavandi, S., Creighton, D., & Atiya, A. F. (2011). Comprehensive review of neural network-based prediction intervals and new advances. IEEE Transactions on neural networks, 22(9):1341–1356.spa
dc.relation.referencesKoenker, R. (2005). Quantile Regression. Econometric Society Monographs. Cambridge University Press.spa
dc.relation.referencesKoenker, R., Portnoy, S., Ng, P. T., Zeileis, A., Grosjean, P., & Ripley, B. D. (2012). Package ‘quantreg’.spa
dc.relation.referencesKumar, S. & Srivistava, A. N. (2012). Bootstrap prediction intervals in non-parametric regression with applications to anomaly detection. In The 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, number ARC-E-DAA-TN6188.spa
dc.relation.referencesLei, J., G’Sell, M., Rinaldo, A., Tibshirani, R. J., &Wasserman, L. (2018). Distributionfree predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111.spa
dc.relation.referencesLei, J., Rinaldo, A., & Wasserman, L. (2015). A conformal prediction approach to explore functional data. Annals of Mathematics and Artificial Intelligence, 74:29–43.spa
dc.relation.referencesLei, J. & Wasserman, L. (2014). Distribution-free prediction bands for non-parametric regression. Journal of the Royal Statistical Society: Series B: Statistical Methodology, pages 71–96.spa
dc.relation.referencesLi, Y., Chen, J., & Feng, L. (2012). Dealing with uncertainty: A survey of theories and practices. IEEE Transactions on Knowledge and Data Engineering, 25(11):2463– 2482.spa
dc.relation.referencesMayr, A., Hothorn, T., & Fenske, N. (2012). Prediction intervals for future bmi values of individual children-a non-parametric approach by quantile boosting. BMC Medical Research Methodology, 12(1):6.spa
dc.relation.referencesMeinshausen, N. (2006). Quantile regression forests. Journal of Machine Learning Research, 7(Jun):983–999.spa
dc.relation.referencesMeinshausen, N. (2007). Quantregforest: quantile regression forests. R package version 0.2-2.spa
dc.relation.referencesPevec, D. & Kononenko, I. (2015). Prediction intervals in supervised learning for model evaluation and discrimination. Applied Intelligence, 42(4):790–804.spa
dc.relation.referencesPolikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and systems magazine, 6(3):21–45.spa
dc.relation.referencesSchapire, R. E. (2003). The boosting approach to machine learning: An overview. In Nonlinear estimation and classification, pages 149–171. Springer.spa
dc.relation.referencesSchmoyer, R. L. (1992). Asymptotically valid prediction intervals for linear models. Technometrics, 34(4):399–408.spa
dc.relation.referencesSeber, G. A. & Lee, A. J. (2012). Linear regression analysis. John Wiley & Sons.spa
dc.relation.referencesShafer, G. & Vovk, V. (2008). A tutorial on conformal prediction. Journal of Machine Learning Research, 9(Mar):371–421.spa
dc.relation.referencesShehab, M., Abualigah, L., Shambour, Q., Abu-Hashem, M. A., Shambour, M. K. Y., Alsalibi, A. I., & Gandomi, A. H. (2022). Machine learning in medical applications: A review of state-of-the-art methods. Computers in Biology and Medicine, 145:105458.spa
dc.relation.referencesStine, R. A. (1985). Bootstrap prediction intervals for regression. Journal of the American Statistical Association, 80(392):1026–1031.spa
dc.relation.referencesSu, D., Ting, Y. Y., & Ansel, J. (2018). Tight prediction intervals using expanded interval minimization. arXiv preprint arXiv:1806.11222.spa
dc.relation.referencesTan, Y. V. & Roy, J. (2019). Bayesian additive regression trees and the general bart model. Statistics in medicine, 38(25):5048–5069.spa
dc.relation.referencesYeh, I.-C. (1998). Modeling of strength of high-performance concrete using artificial neural networks. Cement and Concrete research, 28(12):1797–1808.spa
dc.relation.referencesYu, K. & Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4):437–447.spa
dc.relation.referencesZapranis, A. & Livanis, E. (2005). Prediction intervals for neural network models. In Proceedings of the 9th WSEAS International Conference on Computers, page 76. World Scientific and Engineering Academy and Society (WSEAS).spa
dc.relation.referencesZhang, H., Zimmerman, J., Nettleton, D., & Nordman, D. J. (2019). Random forest prediction intervals. The American Statistician, 74(4):392–406.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseReconocimiento 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/spa
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.lembAnálisis de regresión
dc.subject.lembTeoría Bayesiana de decisiones estadísticas
dc.subject.proposalÁrboles de regresión aditivos bayesianosspa
dc.subject.proposalBARTeng
dc.subject.proposalmodelos de ensamblespa
dc.subject.proposalintervalos de prediccionspa
dc.subject.proposalestudios de simulacionspa
dc.subject.proposalensemble modelseng
dc.subject.proposalBayesian Additive Regression Treeseng
dc.subject.proposalprediction intervalseng
dc.subject.proposalstatistical simulationeng
dc.titleComparación de la metodología BART con otros métodos no paramétricos en la construcción de intervalos de predicciónspa
dc.title.translatedComparison of BART methodology with other nonparametric methods in the construction of prediction intervalseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentEstudiantesspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
dcterms.audience.professionaldevelopmentMaestrosspa
dcterms.audience.professionaldevelopmentPúblico generalspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1037631465.2023.pdf
Tamaño:
1009.18 KB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ciencias - Estadistica

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: