Metodología para la gestión de la calidad de los datos empleando un enfoque data driven para implementar procesos de evaluación y mejoramiento de la calidad de los datos en iniciativas de gestión de datos maestros

dc.contributor.advisorBranch Bedoya, John Willian
dc.contributor.advisorIral Palomino, René
dc.contributor.authorMarín Benjumea, Yubar Daniel
dc.date.accessioned2022-05-31T20:26:19Z
dc.date.available2022-05-31T20:26:19Z
dc.date.issued2022
dc.descriptionilustraciones, diagramas, tablasspa
dc.description.abstractEn los últimos años, el aumento de toma de decisiones basadas en datos ha sufrido un aumento vertiginoso. Esto ha debelado un sin número de problemas relacionados con la calidad de los datos, dejando claro la importancia de contar con estrategias para mejorar y garantizar la calidad de los conjuntos de datos a la hora de implementar iniciativas de datos maestros. Esta investigación se centra en platear una metodología que permita evaluar y solucionar los problemas de calidad de datos directamente sobre los datos, dando en primer lugar una revisión y evaluación de los esfuerzos y metodología encontradas en la literatura, seguido por la presentación de la metodología propuesta y detallando los procesos para su implementación de manera inmediata y por último, realizando la implementación de la metodología en un conjunto de datos y presentando los resultados obtenidos en cada etapa. (Texto tomado de la fuente)spa
dc.description.abstractIn recent years, the rise of data-driven decision making has skyrocketed. This has led to several data quality issues, highlighting the importance of strategies to improve and ensure the quality of data sets when implementing master data initiatives. This research focuses on establishing a methodology that allows evaluating and solving data quality problems directly on the data, giving first a review and evaluation of the efforts and methodology found in the literature, followed by the presentation of the methodology and proposal. detailing the processes for its immediate implementation and finally, carrying out the implementation of the methodology in a data set and presenting the results obtained at each stage.eng
dc.description.curricularareaÁrea Curricular de Ingeniería de Sistemas e Informáticaspa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ingeniería - Ingeniería de Sistemasspa
dc.format.extent69 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/81466
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.departmentDepartamento de la Computación y la Decisiónspa
dc.publisher.facultyFacultad de Minasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Minas - Doctorado en Ingeniería - Sistemasspa
dc.relation.referencesAmicis, F. D. (2004). A methodology for data quality assessment on financial data. Studies in Communication Sciences, 4(2), 115-137.spa
dc.relation.referencesBallou, D. P., & Pazer, H. L. (1985). Modeling data and process quality in multiinput, multi-output information systems. Management science, 31(2), 150-162.spa
dc.relation.referencesBallou, D. P., & Tayi, G. K. (1989). Methodology for allocating resources for data quality enhancement. Communications of the ACM, 32(3), 320-329.spa
dc.relation.referencesBarchard, K. A., & Pace, L. A. (2011). Preventing human error: The impact of data entry methods on data accuracy and statistical results. Computers in Human Behavior, 27(5), 1834-1839.spa
dc.relation.referencesBatini, C., Barone, D., Mastrella, M., Maurino, A., & Ruffini, C. (2007). A Framework And A Methodology For Data Quality Assessment And Monitoring. In ICIQ (pp. 333- 346).spa
dc.relation.referencesBatini, C., Cabitza, F., Cappiello, C., & Francalanci, C. (2008). A comprehensive data quality methodology for web and structured data. International Journal of Innovative Computing and Applications, 1(3), 205-218.spa
dc.relation.referencesBatini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR), 41(3), 1-52.spa
dc.relation.referencesCappiello, C., Ficiaro, P., & Pernici, B. (2006). HIQM: A Methodology for Information Quality Monitoring, Measurement, and Improvement. Lecture Notes in Computer Science, 339–351. doi:10.1007/11908883_41.spa
dc.relation.referencesCarlo, B., Daniele, B., Federico, C., & Simone, G. (2011). A data quality methodology for heterogeneous data. International Journal of Database Management Systems, 3(1), 60-79.spa
dc.relation.referencesCaro, A., Calero, C., & Piattini, M. (2007, November). A Portal Data Quality Model For Users And Developers. In ICIQ (pp. 462-476).spa
dc.relation.referencesCichy, C., & Rass, S. (2019). An overview of data quality frameworks. IEEE Access, 7, 24634-24648spa
dc.relation.referencesCorrea-Morales, J. C., & Barrera-Causil, C. (2021). Elicitation of the Parameters of Múltiple Linear Models. Revista Colombiana de Estadística, 44(1), 159-170.spa
dc.relation.referencesDel Pilar Angeles, M., & García-Ugalde, F. (2009). A data quality practical approach. International Journal on Advances in Software Volume 1, Numbers 2&3, 2009.spa
dc.relation.referencesEfron, B. (1994). Missing data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475.spa
dc.relation.referencesElmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2006). Duplicate record detection: A survey. IEEE Transactions on knowledge and data engineering, 19(1), 1-16.spa
dc.relation.referencesEnglish, L. P. (1999). Improving data warehouse and business information quality: methods for reducing costs and increasing profits. John Wiley & Sons, Inc..spa
dc.relation.referencesEppler, M., & Helfert, M. (2004, November). A classification and analysis of data quality costs. In International Conference on Information Quality (pp. 311-325).spa
dc.relation.referencesGuadalupe, M. (2017, diciembre). Pruebas de bondad de ajuste. Área Académica: Licenciatura en Ingeniería Industrial. https://www.uaeh.edu.mx/docencia/P_Presentaciones/Sahagun/industrial/2017/Pr uebas_de_bondad_de_ajuste.pdfspa
dc.relation.referencesHernández, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data mining and knowledge discovery, 2(1), 9-37.spa
dc.relation.referencesHickey, A. M., & Davis, A. M. (2003, January). Requirements elicitation and elicitation technique selection: model for two knowledge-intensive software development processes. In 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the (pp. 10-pp). IEEE.spa
dc.relation.referencesHuh, Y. U., Keller, F. R., Redman, T. C., & Watkins, A. R. (1990). Data quality. Information and software technology, 32(8), 559-565.spa
dc.relation.referencesJeusfeld, M. A., Quix, C., & Jarke, M. (1998, November). Design and analysis of quality information for data warehouses. In International Conference on Conceptual Modeling (pp. 349-362). Springer, Berlin, Heidelberg.spa
dc.relation.referencesJohnson, J. R., Leitch, R. A., & Neter, J. (1981). Characteristics of errors in accounts receivable and inventory audits. Accounting Review, 270-293.spa
dc.relation.referencesKPMG, (2017). KPMG: Disrupt and Grow, 2017 Global CEO Outlook. [Online]. Available: https://assets.kpmg.com/content/dam/kpmg/xx/pdf/2017/06/2017-globalceo-outlook.pdfspa
dc.relation.referencesLabeeb, K., Chowdhury, K. B. Q., Riha, R. B., Abedin, M. Z., Yesmin, S., & Khan, M. N. R. (2020, December). Pre-Processing Data In Weather Monitoring Application By Using Big Data Quality Framework. In 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE) (pp. 284-287). IEEE. ▪spa
dc.relation.referencesLaudon, K. C. (1986). Data quality and due process in large interorganizational record systems. Communications of the ACM, 29(1), 4-11.spa
dc.relation.referencesLaudon, K. C. (1986). Data quality and due process in large interorganizational record systems. Communications of the ACM, 29(1), 4-11.spa
dc.relation.referencesLee, Y. W., Strong, D. M., Kahn, B. K., & Wang, R. Y. (2002). AIMQ: a methodology for information quality assessment. Information & management, 40(2), 133-146.spa
dc.relation.referencesLevitin, A., & Redman, T. (1995). Quality dimensions of a conceptual view. Information Processing & Management, 31(1), 81-88.spa
dc.relation.referencesLong, J. A., Seko, C. E., & Wang, Y. R. (2005). A cyclic-hierarchical method for database data-quality evaluation and improvement. In Information quality. Routledge.spa
dc.relation.referencesLoshin, D. (2001). Enterprise knowledge management: The data quality approach. Morgan Kaufmann.spa
dc.relation.referencesMadnick, S. E., Wang, R. Y., Lee, Y. W., & Zhu, H. (2009). Overview and framework for data and information quality research. Journal of Data and Information Quality (JDIQ), 1(1), 1-22.spa
dc.relation.referencesMarin, Y. (2022). Algoritmos para implementar calidad de datos en Python. Github. https://github.com/ydmarinb/calidad-datos.spa
dc.relation.referencesMedina, F., & Galván, M. (2007). Imputación de datos: teoría y práctica. Cepal.spa
dc.relation.referencesMoges, H. T., Van Vlasselaer, V., Lemahieu, W., & Baesens, B. (2016). Determining the use of data quality metadata (DQM) for decision making purposes and its impact on decision outcomes—An exploratory study. Decision Support Systems, 83, 32-46spa
dc.relation.referencesWand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95. doi:10.1145/240455.240479spa
dc.relation.referencesWang, R. Y. (1998). A product perspective on total data quality management. Communications of the ACM, 41(2), 58-65.spa
dc.relation.referencesWang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 12(4), 5-33.spa
dc.relation.referencesWang, R. Y., Kon, H. B., & Madnick, S. E. (1993, April). Data quality requirements analysis and modeling. In Proceedings of IEEE 9th International Conference on Data Engineering (pp. 670-677). IEEE.spa
dc.relation.referencesWang, R. Y., Reddy, M. P., & Kon, H. B. (1995). Toward quality data: An attributebased approach. Decision support systems, 13(3-4), 349-372.spa
dc.relation.referencesWang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE transactions on knowledge and data engineering, 7(4), 623- 640.spa
dc.relation.referencesZhang, Z. (2016). Missing data imputation: focusing on single imputation. Annals of translational medicine, 4(1).spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial-SinDerivadas 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/spa
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadoresspa
dc.subject.lembDatos
dc.subject.lembComprensión de datos
dc.subject.proposalCalidad de datosspa
dc.subject.proposalGestión de datos maestrosspa
dc.subject.proposalDimensiones de calidad de datosspa
dc.subject.proposalLimpieza de datosspa
dc.subject.proposalData qualityeng
dc.subject.proposalMaster data managementeng
dc.subject.proposalData quality dimensionseng
dc.subject.proposalData cleaningeng
dc.titleMetodología para la gestión de la calidad de los datos empleando un enfoque data driven para implementar procesos de evaluación y mejoramiento de la calidad de los datos en iniciativas de gestión de datos maestrosspa
dc.title.translatedMethodology for data quality management using a data driven approach to implement data quality assessment and improvement processes in master data management initiativeseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentEstudiantesspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1216722806.2022.pdf
Tamaño:
1.14 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ingeniería - Ingeniería de Sistemas

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.98 KB
Formato:
Item-specific license agreed upon to submission
Descripción: