Metodología para la gestión de la calidad de los datos empleando un enfoque data driven para implementar procesos de evaluación y mejoramiento de la calidad de los datos en iniciativas de gestión de datos maestros
dc.contributor.advisor | Branch Bedoya, John Willian | |
dc.contributor.advisor | Iral Palomino, René | |
dc.contributor.author | Marín Benjumea, Yubar Daniel | |
dc.date.accessioned | 2022-05-31T20:26:19Z | |
dc.date.available | 2022-05-31T20:26:19Z | |
dc.date.issued | 2022 | |
dc.description | ilustraciones, diagramas, tablas | spa |
dc.description.abstract | En los últimos años, el aumento de toma de decisiones basadas en datos ha sufrido un aumento vertiginoso. Esto ha debelado un sin número de problemas relacionados con la calidad de los datos, dejando claro la importancia de contar con estrategias para mejorar y garantizar la calidad de los conjuntos de datos a la hora de implementar iniciativas de datos maestros. Esta investigación se centra en platear una metodología que permita evaluar y solucionar los problemas de calidad de datos directamente sobre los datos, dando en primer lugar una revisión y evaluación de los esfuerzos y metodología encontradas en la literatura, seguido por la presentación de la metodología propuesta y detallando los procesos para su implementación de manera inmediata y por último, realizando la implementación de la metodología en un conjunto de datos y presentando los resultados obtenidos en cada etapa. (Texto tomado de la fuente) | spa |
dc.description.abstract | In recent years, the rise of data-driven decision making has skyrocketed. This has led to several data quality issues, highlighting the importance of strategies to improve and ensure the quality of data sets when implementing master data initiatives. This research focuses on establishing a methodology that allows evaluating and solving data quality problems directly on the data, giving first a review and evaluation of the efforts and methodology found in the literature, followed by the presentation of the methodology and proposal. detailing the processes for its immediate implementation and finally, carrying out the implementation of the methodology in a data set and presenting the results obtained at each stage. | eng |
dc.description.curriculararea | Área Curricular de Ingeniería de Sistemas e Informática | spa |
dc.description.degreelevel | Maestría | spa |
dc.description.degreename | Magíster en Ingeniería - Ingeniería de Sistemas | spa |
dc.format.extent | 69 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/81466 | |
dc.language.iso | spa | spa |
dc.publisher | Universidad Nacional de Colombia | spa |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | spa |
dc.publisher.department | Departamento de la Computación y la Decisión | spa |
dc.publisher.faculty | Facultad de Minas | spa |
dc.publisher.place | Medellín, Colombia | spa |
dc.publisher.program | Medellín - Minas - Doctorado en Ingeniería - Sistemas | spa |
dc.relation.references | Amicis, F. D. (2004). A methodology for data quality assessment on financial data. Studies in Communication Sciences, 4(2), 115-137. | spa |
dc.relation.references | Ballou, D. P., & Pazer, H. L. (1985). Modeling data and process quality in multiinput, multi-output information systems. Management science, 31(2), 150-162. | spa |
dc.relation.references | Ballou, D. P., & Tayi, G. K. (1989). Methodology for allocating resources for data quality enhancement. Communications of the ACM, 32(3), 320-329. | spa |
dc.relation.references | Barchard, K. A., & Pace, L. A. (2011). Preventing human error: The impact of data entry methods on data accuracy and statistical results. Computers in Human Behavior, 27(5), 1834-1839. | spa |
dc.relation.references | Batini, C., Barone, D., Mastrella, M., Maurino, A., & Ruffini, C. (2007). A Framework And A Methodology For Data Quality Assessment And Monitoring. In ICIQ (pp. 333- 346). | spa |
dc.relation.references | Batini, C., Cabitza, F., Cappiello, C., & Francalanci, C. (2008). A comprehensive data quality methodology for web and structured data. International Journal of Innovative Computing and Applications, 1(3), 205-218. | spa |
dc.relation.references | Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM computing surveys (CSUR), 41(3), 1-52. | spa |
dc.relation.references | Cappiello, C., Ficiaro, P., & Pernici, B. (2006). HIQM: A Methodology for Information Quality Monitoring, Measurement, and Improvement. Lecture Notes in Computer Science, 339–351. doi:10.1007/11908883_41. | spa |
dc.relation.references | Carlo, B., Daniele, B., Federico, C., & Simone, G. (2011). A data quality methodology for heterogeneous data. International Journal of Database Management Systems, 3(1), 60-79. | spa |
dc.relation.references | Caro, A., Calero, C., & Piattini, M. (2007, November). A Portal Data Quality Model For Users And Developers. In ICIQ (pp. 462-476). | spa |
dc.relation.references | Cichy, C., & Rass, S. (2019). An overview of data quality frameworks. IEEE Access, 7, 24634-24648 | spa |
dc.relation.references | Correa-Morales, J. C., & Barrera-Causil, C. (2021). Elicitation of the Parameters of Múltiple Linear Models. Revista Colombiana de Estadística, 44(1), 159-170. | spa |
dc.relation.references | Del Pilar Angeles, M., & García-Ugalde, F. (2009). A data quality practical approach. International Journal on Advances in Software Volume 1, Numbers 2&3, 2009. | spa |
dc.relation.references | Efron, B. (1994). Missing data, imputation, and the bootstrap. Journal of the American Statistical Association, 89(426), 463-475. | spa |
dc.relation.references | Elmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2006). Duplicate record detection: A survey. IEEE Transactions on knowledge and data engineering, 19(1), 1-16. | spa |
dc.relation.references | English, L. P. (1999). Improving data warehouse and business information quality: methods for reducing costs and increasing profits. John Wiley & Sons, Inc.. | spa |
dc.relation.references | Eppler, M., & Helfert, M. (2004, November). A classification and analysis of data quality costs. In International Conference on Information Quality (pp. 311-325). | spa |
dc.relation.references | Guadalupe, M. (2017, diciembre). Pruebas de bondad de ajuste. Área Académica: Licenciatura en Ingeniería Industrial. https://www.uaeh.edu.mx/docencia/P_Presentaciones/Sahagun/industrial/2017/Pr uebas_de_bondad_de_ajuste.pdf | spa |
dc.relation.references | Hernández, M. A., & Stolfo, S. J. (1998). Real-world data is dirty: Data cleansing and the merge/purge problem. Data mining and knowledge discovery, 2(1), 9-37. | spa |
dc.relation.references | Hickey, A. M., & Davis, A. M. (2003, January). Requirements elicitation and elicitation technique selection: model for two knowledge-intensive software development processes. In 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the (pp. 10-pp). IEEE. | spa |
dc.relation.references | Huh, Y. U., Keller, F. R., Redman, T. C., & Watkins, A. R. (1990). Data quality. Information and software technology, 32(8), 559-565. | spa |
dc.relation.references | Jeusfeld, M. A., Quix, C., & Jarke, M. (1998, November). Design and analysis of quality information for data warehouses. In International Conference on Conceptual Modeling (pp. 349-362). Springer, Berlin, Heidelberg. | spa |
dc.relation.references | Johnson, J. R., Leitch, R. A., & Neter, J. (1981). Characteristics of errors in accounts receivable and inventory audits. Accounting Review, 270-293. | spa |
dc.relation.references | KPMG, (2017). KPMG: Disrupt and Grow, 2017 Global CEO Outlook. [Online]. Available: https://assets.kpmg.com/content/dam/kpmg/xx/pdf/2017/06/2017-globalceo-outlook.pdf | spa |
dc.relation.references | Labeeb, K., Chowdhury, K. B. Q., Riha, R. B., Abedin, M. Z., Yesmin, S., & Khan, M. N. R. (2020, December). Pre-Processing Data In Weather Monitoring Application By Using Big Data Quality Framework. In 2020 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering (WIECON-ECE) (pp. 284-287). IEEE. ▪ | spa |
dc.relation.references | Laudon, K. C. (1986). Data quality and due process in large interorganizational record systems. Communications of the ACM, 29(1), 4-11. | spa |
dc.relation.references | Laudon, K. C. (1986). Data quality and due process in large interorganizational record systems. Communications of the ACM, 29(1), 4-11. | spa |
dc.relation.references | Lee, Y. W., Strong, D. M., Kahn, B. K., & Wang, R. Y. (2002). AIMQ: a methodology for information quality assessment. Information & management, 40(2), 133-146. | spa |
dc.relation.references | Levitin, A., & Redman, T. (1995). Quality dimensions of a conceptual view. Information Processing & Management, 31(1), 81-88. | spa |
dc.relation.references | Long, J. A., Seko, C. E., & Wang, Y. R. (2005). A cyclic-hierarchical method for database data-quality evaluation and improvement. In Information quality. Routledge. | spa |
dc.relation.references | Loshin, D. (2001). Enterprise knowledge management: The data quality approach. Morgan Kaufmann. | spa |
dc.relation.references | Madnick, S. E., Wang, R. Y., Lee, Y. W., & Zhu, H. (2009). Overview and framework for data and information quality research. Journal of Data and Information Quality (JDIQ), 1(1), 1-22. | spa |
dc.relation.references | Marin, Y. (2022). Algoritmos para implementar calidad de datos en Python. Github. https://github.com/ydmarinb/calidad-datos. | spa |
dc.relation.references | Medina, F., & Galván, M. (2007). Imputación de datos: teoría y práctica. Cepal. | spa |
dc.relation.references | Moges, H. T., Van Vlasselaer, V., Lemahieu, W., & Baesens, B. (2016). Determining the use of data quality metadata (DQM) for decision making purposes and its impact on decision outcomes—An exploratory study. Decision Support Systems, 83, 32-46 | spa |
dc.relation.references | Wand, Y., & Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39(11), 86–95. doi:10.1145/240455.240479 | spa |
dc.relation.references | Wang, R. Y. (1998). A product perspective on total data quality management. Communications of the ACM, 41(2), 58-65. | spa |
dc.relation.references | Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of management information systems, 12(4), 5-33. | spa |
dc.relation.references | Wang, R. Y., Kon, H. B., & Madnick, S. E. (1993, April). Data quality requirements analysis and modeling. In Proceedings of IEEE 9th International Conference on Data Engineering (pp. 670-677). IEEE. | spa |
dc.relation.references | Wang, R. Y., Reddy, M. P., & Kon, H. B. (1995). Toward quality data: An attributebased approach. Decision support systems, 13(3-4), 349-372. | spa |
dc.relation.references | Wang, R. Y., Storey, V. C., & Firth, C. P. (1995). A framework for analysis of data quality research. IEEE transactions on knowledge and data engineering, 7(4), 623- 640. | spa |
dc.relation.references | Zhang, Z. (2016). Missing data imputation: focusing on single imputation. Annals of translational medicine, 4(1). | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.license | Atribución-NoComercial-SinDerivadas 4.0 Internacional | spa |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | spa |
dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores | spa |
dc.subject.lemb | Datos | |
dc.subject.lemb | Comprensión de datos | |
dc.subject.proposal | Calidad de datos | spa |
dc.subject.proposal | Gestión de datos maestros | spa |
dc.subject.proposal | Dimensiones de calidad de datos | spa |
dc.subject.proposal | Limpieza de datos | spa |
dc.subject.proposal | Data quality | eng |
dc.subject.proposal | Master data management | eng |
dc.subject.proposal | Data quality dimensions | eng |
dc.subject.proposal | Data cleaning | eng |
dc.title | Metodología para la gestión de la calidad de los datos empleando un enfoque data driven para implementar procesos de evaluación y mejoramiento de la calidad de los datos en iniciativas de gestión de datos maestros | spa |
dc.title.translated | Methodology for data quality management using a data driven approach to implement data quality assessment and improvement processes in master data management initiatives | eng |
dc.type | Trabajo de grado - Maestría | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
dcterms.audience.professionaldevelopment | Estudiantes | spa |
dcterms.audience.professionaldevelopment | Investigadores | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 1216722806.2022.pdf
- Tamaño:
- 1.14 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Maestría en Ingeniería - Ingeniería de Sistemas
Bloque de licencias
1 - 1 de 1
No hay miniatura disponible
- Nombre:
- license.txt
- Tamaño:
- 3.98 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción: