Comparación de metodologías utilizadas para abordar el problema de datos faltantes en estudios longitudinales

dc.contributor.advisorMazo Lopera, Maurio Alejandro
dc.contributor.authorViloria Rodriguez, Andres Felipe
dc.date.accessioned2024-07-31T16:49:15Z
dc.date.available2024-07-31T16:49:15Z
dc.date.issued2024
dc.descriptionIlustraciones, tablasspa
dc.description.abstractEn este estudio se compararon distintas metodologías utilizadas en la literatura para abordar el problema de datos faltantes en estudios longitudinales. Se consideraron tres condiciones propuestas por \cite{rubin1976} para la generación de datos faltantes: Perdidos Completamente al Azar (MCAR), Perdidos al Azar (MAR) y No Perdidos al Azar (NMAR). Se utilizaron dos bases de datos longitudinales provenientes del Portal de Datos Abiertos del Estado Colombiano, las cuales fueron ajustadas mediante modelos lineales mixtos. Posteriormente, se generaron datos faltantes bajo las tres condiciones antes mencionadas y se aplicaron diferentes métodos de imputación. Se compararon los modelos imputados utilizando el Root Mean Squared Error (RMSE) y se observó que el método Last Observation Carried Forward (LOCF) tuvo un mejor rendimiento en la mayoría de los casos. Además, se analizó la variabilidad en la precisión del modelo por departamento y se encontraron diferencias significativas entre los métodos de imputación. Se concluyó que la elección del método de imputación puede tener un impacto en la interpretación de los resultados del modelo y se hicieron recomendaciones para futuras investigaciones, como explorar otros métodos de imputación y considerar el impacto de la imputación en la precisión de las predicciones del modelo en estudios longitudinales. En resumen, este estudio destaca la importancia de abordar cuidadosamente el problema de datos faltantes y seleccionar el método de imputación más adecuado para obtener resultados precisos y fiables en estudios longitudinales.spa
dc.description.abstractThis study compared different methodologies used in the literature to address the problem of missing data in longitudinal studies. Three conditions proposed by Šcite{rubin1976} for missing data generation were considered: Missing Completely At Random (MCAR), Missing At Random (MAR), and Not Missing At Random (NMAR). Two longitudinal databases from the Colombian State's Open Data Portal were used, which were adjusted using linear mixed models. Subsequently, missing data were generated under the three aforementioned conditions and different imputation methods were applied. The imputed models were compared using the Root Mean Squared Error (RMSE) and it was observed that the Last Observation Carried Forward method (LOCF) performed better in most cases. In addition, the variability in model accuracy by department was analyzed and significant differences were found between the imputation methods. It was concluded that the choice of imputation method may have an impact on the interpretation of model results and recommendations for future research were made, such as exploring other imputation methods and considering the impact of imputation on the accuracy of model predictions in longitudinal studies. In summary, this study highlights the importance of carefully addressing the missing data problem and selecting the most appropriate imputation method to obtain accurate and reliable results in longitudinal studies.eng
dc.description.curricularareaÁrea Curricular Estadísticaspa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ciencias - Estadísticaspa
dc.format.extent97 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/86666
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Ciencias - Maestría en Ciencias - Estadísticaspa
dc.relation.referencesWest, B. T., Welch, K. B., y Galecki, A. T. (2022). Linear mixed models: a practical guide using statistical software. Crc Press.spa
dc.relation.referencesVerbyla, A. P. (1990). A conditional derivation of residual maximum likelihood. Australian Journal of Statistics, 32 (2), 227–230.spa
dc.relation.referencesVerbeke, G., Molenberghs, G., y Verbeke, G. (1997). Linear mixed models for longitudinal data. Springer.spa
dc.relation.referencesvan Ginkel, J. R., Linting, M., Rippe, R. C. A., y van der Voort, A. (2020). Rebut ting existing misconceptions about multiple imputation as a method for handling missing data. Journal of Personality Assessment, 102 (3), 297-308. Descargado de https://doi.org/10.1080/00223891.2018.1530680 (PMID: 30657714) doi: 10.1080/00223891.2018.1530680spa
dc.relation.referencesVan Buuren, S., Boshuizen, H. C., y Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in medicine, 18 (6), 681–694.spa
dc.relation.referencesVan Buuren, S. (2012). Flexible imputation of missing data. boca raton. CRC Press.spa
dc.relation.referencesTroxel, A. B., Harrington, D. P., y Lipsitz, S. R. (1998). Analysis of longitudi nal data with non-ignorable non-monotone missing values. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47 (3), 425-438. Descargado de https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9876.00119 doi: https://doi.org/10.1111/1467-9876.00119spa
dc.relation.referencesSchafer, J., y Graham, J. (2002). Missing data: our view of the state of the art. Psychological methods, 7 (2), 147.spa
dc.relation.referencesRubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business & Economic Statistics, 4 (1), 87–94.spa
dc.relation.referencesRubin, D. (1976). Inference and missing data. biometrika , 63 (3), . Biometrika, 63 (3), 581-592.spa
dc.relation.referencesRoderick, J., y Little, A. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90 (431), 1112-1121. Descargado de https://www.tandfonline.com/doi/abs/10.1080/01621459.1995.10476615 doi: 10.1080/01621459.1995.10476615spa
dc.relation.referencesRahim, J., Chen, M., y Lipsitz, S. (2001). A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association, 88 (469), 551–564.spa
dc.relation.referencesPatterson, H. D., y Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58 (3), 545–554.spa
dc.relation.referencesNooraee, N., Molenberghs, G., Ormel, J., y Edwin R, V. d. H. (2018). Strategies for handling missing data in longitudinal studies with question naires. Journal of Statistical Computation and Simulation, 88 (17), 3415- 3436. Descargado de https://doi.org/10.1080/00949655.2018.1520854 doi: 10.1080/00949655.2018.1520854spa
dc.relation.referencesMcNeish, D. (2017). Missing data methods for arbitrary missingness with small samples. Journal of Applied Statistics, 44 (1), 24-39. Des cargado de https://doi.org/10.1080/02664763.2016.1158246 doi: 10.1080/02664763.2016.1158246spa
dc.relation.referencesLaird, N. M., Ware, J. H., y Fitzmaurice, G. (2004). Missing data and dropout: Overview of concepts and methods. applied longitudinal analysis, 2nd edition. Hoboken, NJ: John Wiley & Sons, Inc,.spa
dc.relation.referencesLaird, N. M., y Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics, 963–974.spa
dc.relation.referencesIbrahim, J. G., Chen, M.-H., Lipsitz, S. R., y Herring, A. H. (2005). Missing-data methods for generalized linear models. Journal of the American Statistical Association, 100 (469), 332-346. Descargado de https://doi.org/10.1198/016214504000001844 doi: 10.1198/016214504000001844spa
dc.relation.referencesHarville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American statistical association, 72 (358), 320–338.spa
dc.relation.referencesFitzmaurice, G., Davidian, M., Verbeke, G., y Molenberghs, M. (2008). Longitudinal data analysis.. London: Chapman and Hall.spa
dc.relation.referencesFaraway, J. J. (2016). Extending the linear model with r: generalized linear, mixed effects and nonparametric regression models. Chapman and Hall/CRC.spa
dc.relation.referencesDiggle, P., y Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis. Journal of the Royal Statistical Society: Series C (Applied Statistics), 43 (1), 49-73. Descargado de https://rss.onlinelibrary.wiley.com/doi/abs/10.2307/2986113 doi: https://doi.org/10.2307/2986113spa
dc.relation.referencesDiggle, P., Heagerty, P., Liang, K., y Zeger, S. (2002). Analysis of longitudinal data. london. Oxford University Press.spa
dc.relation.referencesCorrea Morales, J. C., y Salazar Uribe, J. C. (2016). Introducci´on a los modelos mixtos. Escuela de Estad´ıstica.spa
dc.relation.referencesCooper, D., y Thompson, R. (1977). Note concerning the akaike and hannan estimation procedures for an autoregressive-moving average processa note on the estimation of the parameters of the autoregressive-moving average process. Biometrika, 64 (3), 625–628.spa
dc.relation.referencesCasella, G., y Berger, R. (2002). Statistical reference. Duxbury Advanced Series.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial-SinDerivadas 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/spa
dc.subject.armarcModelos lineales (Estadística) - Procesamiento de datos
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.jelModelos lineales (Estadística)
dc.subject.proposalDatos faltantesspa
dc.subject.proposalEstudios longitudinalesspa
dc.subject.proposalModelos lineales mixtosspa
dc.subject.proposalMetodologías de innovaciónspa
dc.subject.proposalCondiciones de datos faltantesspa
dc.subject.proposalMissing dataeng
dc.subject.proposalLongitudinal studieseng
dc.subject.proposalMixed linear modelseng
dc.subject.proposalImputation methodologieseng
dc.subject.proposalMissing data conditionseng
dc.titleComparación de metodologías utilizadas para abordar el problema de datos faltantes en estudios longitudinalesspa
dc.title.translatedComparison of Methodologies Used to Address the Problem of Missing Data in Longitudinal Studieseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentPúblico generalspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1045741179.2024.pdf
Tamaño:
678.05 KB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ciencias - Estadística

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: