Comparación de metodologías utilizadas para abordar el problema de datos faltantes en estudios longitudinales
dc.contributor.advisor | Mazo Lopera, Maurio Alejandro | |
dc.contributor.author | Viloria Rodriguez, Andres Felipe | |
dc.date.accessioned | 2024-07-31T16:49:15Z | |
dc.date.available | 2024-07-31T16:49:15Z | |
dc.date.issued | 2024 | |
dc.description | Ilustraciones, tablas | spa |
dc.description.abstract | En este estudio se compararon distintas metodologías utilizadas en la literatura para abordar el problema de datos faltantes en estudios longitudinales. Se consideraron tres condiciones propuestas por \cite{rubin1976} para la generación de datos faltantes: Perdidos Completamente al Azar (MCAR), Perdidos al Azar (MAR) y No Perdidos al Azar (NMAR). Se utilizaron dos bases de datos longitudinales provenientes del Portal de Datos Abiertos del Estado Colombiano, las cuales fueron ajustadas mediante modelos lineales mixtos. Posteriormente, se generaron datos faltantes bajo las tres condiciones antes mencionadas y se aplicaron diferentes métodos de imputación. Se compararon los modelos imputados utilizando el Root Mean Squared Error (RMSE) y se observó que el método Last Observation Carried Forward (LOCF) tuvo un mejor rendimiento en la mayoría de los casos. Además, se analizó la variabilidad en la precisión del modelo por departamento y se encontraron diferencias significativas entre los métodos de imputación. Se concluyó que la elección del método de imputación puede tener un impacto en la interpretación de los resultados del modelo y se hicieron recomendaciones para futuras investigaciones, como explorar otros métodos de imputación y considerar el impacto de la imputación en la precisión de las predicciones del modelo en estudios longitudinales. En resumen, este estudio destaca la importancia de abordar cuidadosamente el problema de datos faltantes y seleccionar el método de imputación más adecuado para obtener resultados precisos y fiables en estudios longitudinales. | spa |
dc.description.abstract | This study compared different methodologies used in the literature to address the problem of missing data in longitudinal studies. Three conditions proposed by Šcite{rubin1976} for missing data generation were considered: Missing Completely At Random (MCAR), Missing At Random (MAR), and Not Missing At Random (NMAR). Two longitudinal databases from the Colombian State's Open Data Portal were used, which were adjusted using linear mixed models. Subsequently, missing data were generated under the three aforementioned conditions and different imputation methods were applied. The imputed models were compared using the Root Mean Squared Error (RMSE) and it was observed that the Last Observation Carried Forward method (LOCF) performed better in most cases. In addition, the variability in model accuracy by department was analyzed and significant differences were found between the imputation methods. It was concluded that the choice of imputation method may have an impact on the interpretation of model results and recommendations for future research were made, such as exploring other imputation methods and considering the impact of imputation on the accuracy of model predictions in longitudinal studies. In summary, this study highlights the importance of carefully addressing the missing data problem and selecting the most appropriate imputation method to obtain accurate and reliable results in longitudinal studies. | eng |
dc.description.curriculararea | Área Curricular Estadística | spa |
dc.description.degreelevel | Maestría | spa |
dc.description.degreename | Magíster en Ciencias - Estadística | spa |
dc.format.extent | 97 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/86666 | |
dc.language.iso | spa | spa |
dc.publisher | Universidad Nacional de Colombia | spa |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | spa |
dc.publisher.faculty | Facultad de Ciencias | spa |
dc.publisher.place | Medellín, Colombia | spa |
dc.publisher.program | Medellín - Ciencias - Maestría en Ciencias - Estadística | spa |
dc.relation.references | West, B. T., Welch, K. B., y Galecki, A. T. (2022). Linear mixed models: a practical guide using statistical software. Crc Press. | spa |
dc.relation.references | Verbyla, A. P. (1990). A conditional derivation of residual maximum likelihood. Australian Journal of Statistics, 32 (2), 227–230. | spa |
dc.relation.references | Verbeke, G., Molenberghs, G., y Verbeke, G. (1997). Linear mixed models for longitudinal data. Springer. | spa |
dc.relation.references | van Ginkel, J. R., Linting, M., Rippe, R. C. A., y van der Voort, A. (2020). Rebut ting existing misconceptions about multiple imputation as a method for handling missing data. Journal of Personality Assessment, 102 (3), 297-308. Descargado de https://doi.org/10.1080/00223891.2018.1530680 (PMID: 30657714) doi: 10.1080/00223891.2018.1530680 | spa |
dc.relation.references | Van Buuren, S., Boshuizen, H. C., y Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in medicine, 18 (6), 681–694. | spa |
dc.relation.references | Van Buuren, S. (2012). Flexible imputation of missing data. boca raton. CRC Press. | spa |
dc.relation.references | Troxel, A. B., Harrington, D. P., y Lipsitz, S. R. (1998). Analysis of longitudi nal data with non-ignorable non-monotone missing values. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47 (3), 425-438. Descargado de https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9876.00119 doi: https://doi.org/10.1111/1467-9876.00119 | spa |
dc.relation.references | Schafer, J., y Graham, J. (2002). Missing data: our view of the state of the art. Psychological methods, 7 (2), 147. | spa |
dc.relation.references | Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business & Economic Statistics, 4 (1), 87–94. | spa |
dc.relation.references | Rubin, D. (1976). Inference and missing data. biometrika , 63 (3), . Biometrika, 63 (3), 581-592. | spa |
dc.relation.references | Roderick, J., y Little, A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90 (431), 1112-1121. Descargado de https://www.tandfonline.com/doi/abs/10.1080/01621459.1995.10476615 doi: 10.1080/01621459.1995.10476615 | spa |
dc.relation.references | Rahim, J., Chen, M., y Lipsitz, S. (2001). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 88 (469), 551–564. | spa |
dc.relation.references | Patterson, H. D., y Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58 (3), 545–554. | spa |
dc.relation.references | Nooraee, N., Molenberghs, G., Ormel, J., y Edwin R, V. d. H. (2018). Strategies for handling missing data in longitudinal studies with question naires. Journal of Statistical Computation and Simulation, 88 (17), 3415- 3436. Descargado de https://doi.org/10.1080/00949655.2018.1520854 doi: 10.1080/00949655.2018.1520854 | spa |
dc.relation.references | McNeish, D. (2017). Missing data methods for arbitrary missingness with small samples. Journal of Applied Statistics, 44 (1), 24-39. Des cargado de https://doi.org/10.1080/02664763.2016.1158246 doi: 10.1080/02664763.2016.1158246 | spa |
dc.relation.references | Laird, N. M., Ware, J. H., y Fitzmaurice, G. (2004). Missing data and dropout: Overview of concepts and methods. applied longitudinal analysis, 2nd edition. Hoboken, NJ: John Wiley & Sons, Inc,. | spa |
dc.relation.references | Laird, N. M., y Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 963–974. | spa |
dc.relation.references | Ibrahim, J. G., Chen, M.-H., Lipsitz, S. R., y Herring, A. H. (2005). Missing-data methods for generalized linear models. Journal of the American Statistical Association, 100 (469), 332-346. Descargado de https://doi.org/10.1198/016214504000001844 doi: 10.1198/016214504000001844 | spa |
dc.relation.references | Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American statistical association, 72 (358), 320–338. | spa |
dc.relation.references | Fitzmaurice, G., Davidian, M., Verbeke, G., y Molenberghs, M. (2008). Longitudinal data analysis.. London: Chapman and Hall. | spa |
dc.relation.references | Faraway, J. J. (2016). Extending the linear model with r: generalized linear, mixed effects and nonparametric regression models. Chapman and Hall/CRC. | spa |
dc.relation.references | Diggle, P., y Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 43 (1), 49-73. Descargado de https://rss.onlinelibrary.wiley.com/doi/abs/10.2307/2986113 doi: https://doi.org/10.2307/2986113 | spa |
dc.relation.references | Diggle, P., Heagerty, P., Liang, K., y Zeger, S. (2002). Analysis of longitudinal data. london. Oxford University Press. | spa |
dc.relation.references | Correa Morales, J. C., y Salazar Uribe, J. C. (2016). Introducci´on a los modelos mixtos. Escuela de Estad´ıstica. | spa |
dc.relation.references | Cooper, D., y Thompson, R. (1977). Note concerning the akaike and hannan estimation procedures for an autoregressive-moving average processa note on the estimation of the parameters of the autoregressive-moving average process. Biometrika, 64 (3), 625–628. | spa |
dc.relation.references | Casella, G., y Berger, R. (2002). Statistical reference. Duxbury Advanced Series. | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.license | Atribución-NoComercial-SinDerivadas 4.0 Internacional | spa |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | spa |
dc.subject.armarc | Modelos lineales (Estadística) - Procesamiento de datos | |
dc.subject.ddc | 510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas | spa |
dc.subject.jel | Modelos lineales (Estadística) | |
dc.subject.proposal | Datos faltantes | spa |
dc.subject.proposal | Estudios longitudinales | spa |
dc.subject.proposal | Modelos lineales mixtos | spa |
dc.subject.proposal | Metodologías de innovación | spa |
dc.subject.proposal | Condiciones de datos faltantes | spa |
dc.subject.proposal | Missing data | eng |
dc.subject.proposal | Longitudinal studies | eng |
dc.subject.proposal | Mixed linear models | eng |
dc.subject.proposal | Imputation methodologies | eng |
dc.subject.proposal | Missing data conditions | eng |
dc.title | Comparación de metodologías utilizadas para abordar el problema de datos faltantes en estudios longitudinales | spa |
dc.title.translated | Comparison of Methodologies Used to Address the Problem of Missing Data in Longitudinal Studies | eng |
dc.type | Trabajo de grado - Maestría | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
dcterms.audience.professionaldevelopment | Público general | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 1045741179.2024.pdf
- Tamaño:
- 678.05 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Maestría en Ciencias - Estadística
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción: