Metadatos estadísticos para el aseguramiento de la calidad de los datos
| dc.contributor.advisor | Jiménez Ramírez, Claudia | spa |
| dc.contributor.author | Rodríguez Flores, Ivonne Elizabeth | spa |
| dc.contributor.corporatename | Universidad Nacional de Colombia - Sede Medellín | spa |
| dc.contributor.researchgroup | GIDIA: Grupo de Investigación y Desarrollo en Inteligencia Artificial | spa |
| dc.date.accessioned | 2020-09-08T21:07:06Z | spa |
| dc.date.available | 2020-09-08T21:07:06Z | spa |
| dc.date.issued | 2020-08-18 | spa |
| dc.description.abstract | Nowadays, the organization is immersed in an environment influenced by the Data Revolution. On the one hand, organizations produce and share datasets, and on the other hand, there is the opportunity to use these resources for tasks such as data analysis and knowledge discovery, integration or decision-making. However, datasets are represented in various formats, standards, vocabularies, and models; and, to this heterogeneity, quality problems are added, which means that the data is not suitable for use. Facing this reality, this Doctoral Thesis proposes a statistical metadata model that allows data quality assurance. For this purpose, the definition of statistical metadata was adopted and expanded, and the conceptual model not only considers recognized standards but also represents other additional properties, allowing higher levels of detail about data and content quality. The model is the result of the integration of three key components to support quality assurance in the organization, and these are a metadata registry, mapping capability and quality measurement capabilities of the dataset before it is used. This mixed research also, empirically allowed to contribute with a classification of data quality problems and dimensions, and 60 metrics to operationalize the quality measurement of a dataset. This model was validated using a prototype implemented at a public university in Ecuador, demonstrating its practical applicability | spa |
| dc.description.abstract | Hoy en día la organización está inmersa en un entorno influenciado por la Revolución de los datos. Por un lado, las organizaciones producen y comparten datasets, y por el otro, también está la oportunidad del uso de estos recursos ya sea para tareas de análisis y descubrimiento de conocimiento, integración o toma de decisiones. Sin embargo, los datasets se representan en diversos formatos, estándares, vocabularios y modelos; y, a esta heterogeneidad se agregan problemas de calidad, lo que conlleva a que los datos no sean aptos para ser utilizados. Ante esta realidad, con la presente Tesis Doctoral se propone un modelo de metadatos estadísticos que permita el aseguramiento de la calidad de los datos. Para tal propósito, se adoptó y amplió la definición de los metadatos estadísticos, y el modelo conceptual no sólo considera estándares reconocidos, sino que también representa otras propiedades adicionales, permitiendo mayores niveles de detalle sobre los datos y la calidad de los contenidos. El modelo es el resultado de la integración de tres componentes claves para el apoyo del aseguramiento de la calidad en la organización, y estos son: el registro de metadatos, posibilidad de mapeo y capacidad de medición de calidad del dataset antes de su uso. El tipo de investigación mixta, además, facultó de manera empírica, contribuir con una clasificación de problemas y dimensiones de calidad de datos, y con 60 métricas para operacionalizar la medición de calidad de un dataset. El modelo fue validado mediante un prototipo implementado en una Universidad del sector gubernamental del Ecuador, demostrando su aplicabilidad práctica | spa |
| dc.description.additional | Línea de Investigación: Gestión de la información, Calidad de datos | spa |
| dc.description.degreelevel | Doctorado | spa |
| dc.format.extent | 137 | spa |
| dc.format.mimetype | application/pdf | spa |
| dc.identifier.citation | Rodriguez, I. E. (2020) Metadatos estadísticos para el aseguramiento de la calidad de los datos. Doctorado thesis, Universidad Nacional de Colombia - Sede Medellín | spa |
| dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/78418 | |
| dc.language.iso | spa | spa |
| dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | spa |
| dc.publisher.program | Medellín - Minas - Doctorado en Ingeniería - Sistemas | spa |
| dc.relation.references | Abedjan, Z., Golab, L., & Naumann, F. (2017). Data Profiling: A Tutorial. Proceedings of the 2017 ACM International Conference on Management of Data - SIGMOD ’17, 1747–1751. https://doi.org/10.1145/3035918.3054772 | spa |
| dc.relation.references | Assaf, A., Senart, A., & Troncy, R. (2016). Towards An Objective Assessment Framework for Linked Data Quality: Enriching Dataset Profiles with Quality Indicators. International Journal on Semantic Web and Information Systems (IJSWIS), 12(3), 111–133. https://doi.org/http://dx.doi.org/10.4018/IJSWIS.2016070104 | spa |
| dc.relation.references | Batini, C., & Scannapieco, M. (2006). Data Quality: Concepts, Methodologies and Techniques (1st ed.). https://doi.org/http://dx.doi.org/10.1007/3-540-33173-5 | spa |
| dc.relation.references | Berners-Lee, T. (1997). Web architecture: Metadata. Retrieved July 15, 2016, from https://www.w3.org/DesignIssues/Metadata.html | spa |
| dc.relation.references | Daas, P. J. H., & Ossen, S. J. L. (2011). Metadata quality evaluation of secondary data sources. Proceedings of 5th International Quality Conference, 823–836. | spa |
| dc.relation.references | European Commission-Eurostat. (n.d.). RAMON - Reference And Management Of Nomenclatures. Retrieved October 22, 2017, from http://ec.europa.eu/eurostat/ramon/index.cfm?TargetUrl=DSP_PUB_WELC | spa |
| dc.relation.references | Grossmann, W. (2002). Structures for Metadata. In J.-P. Kent (Ed.), MetaNet Work Package 1: Methodology and Tools (pp. 11–28). Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.203.5184&rep=rep1&type=pdf | spa |
| dc.relation.references | ISO/IEC 11179-3. (2013). Information Technology - Metadata Registries - Part 3: Registry metamodel and basic attributes. Retrieved from https://www.iso.org/standard/50340.html | spa |
| dc.relation.references | ISO/IEC 25024. (2015). Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — Measurement of data quality. Retrieved from https://www.iso.org/standard/35749.html | spa |
| dc.relation.references | LIU, L., & ÖZSU, M. T. (Eds.). (2009). Encyclopedia of Database Systems. https://doi.org/10.1007/978-0-387-39940-9 | spa |
| dc.relation.references | Moges, H.-T., Vlasselaer, V. Van, Lemahieu, W., & Baesens, B. (2016). Determining the use of data quality metadata (DQM) for decision making purposes and its impact on decision outcomes — An exploratory study. Decision Support Systems, 83, 32–46. https://doi.org/10.1016/j.dss.2015.12.006 | spa |
| dc.relation.references | OECD. (2006). Management of Statistical Metadata at the OECD. Retrieved from http://www.oecd.org/sdd/managementofstatisticalmetadataattheoecd.htm | spa |
| dc.relation.references | OMG. (2003). Common Warehouse Metamodel (CWM) - version v1. 1 (Vol. 1). Vol. 1. Retrieved from https://www.omg.org/spec/CWM/About-CWM/ | spa |
| dc.relation.references | Rahm, E., & Hai Do, H. (2000). Data Cleaning: Problems and Current Approaches. IEEE Data Eng. Bull., 23(4), 3–13. | spa |
| dc.relation.references | Runeson, P., & Höst, M. (2009). Guidelines for conducting and reporting case study research in software engineering. Empirical Software Engineering, 14(2), 131–164. https://doi.org/10.1007/s10664-008-9102-8 | spa |
| dc.relation.references | Smith, K., Seligman, L., Rosenthal, A., Kurcz, C., Greer, M., Macheret, C., … Eckstein, A. (2014). Big Metadata: The Need for Principled Metadata Management in Big Data Ecosystems. Proceedings of Workshop on Data Analytics in the Cloud - DanaC’14, 1–4. https://doi.org/10.1145/2627770.2627776 | spa |
| dc.relation.references | UNECE. (2000). Terminology on Statistical Metadata. Conference of European Statisticians, Statistical Standards and Studies, No. 53, 47. Retrieved from http://www.unece.org/fileadmin/DAM/stats/publications/53metadaterminology.pdf | spa |
| dc.relation.references | Vaziri, R., Mohsenzadeh, M., & Habibi, J. (2017). Measuring data quality with weighted metrics. Total Quality Management & Business Excellence, 30(5–6), 708–720. https://doi.org/10.1080/14783363.2017.1332954 | spa |
| dc.relation.references | Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., & Auer, S. (2016). Quality assessment for Linked Data: A Survey. Semantic Web, 7(1), 63–93. https://doi.org/10.3233/SW-150175 | spa |
| dc.relation.references | Zuiderwijk, A., Janssen, M., & Davis, C. (2014). Innovation with open data: Essential elements of open data ecosystems. Information Polity, 19(1,2), 17–33. https://doi.org/10.3233/IP-140329 | spa |
| dc.rights | Derechos reservados - Universidad Nacional de Colombia | spa |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
| dc.rights.license | Atribución-NoComercial 4.0 Internacional | spa |
| dc.rights.spa | Acceso abierto | spa |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | spa |
| dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales::003 - Sistemas | spa |
| dc.subject.proposal | calidad de datos | spa |
| dc.subject.proposal | data quality | eng |
| dc.subject.proposal | statistical metadata | eng |
| dc.subject.proposal | metadatos estadísticos | spa |
| dc.subject.proposal | conceptual model | eng |
| dc.subject.proposal | modelo conceptual | spa |
| dc.subject.proposal | registro de metadatos | spa |
| dc.subject.proposal | metadata registry | eng |
| dc.subject.proposal | data quality measurement | eng |
| dc.subject.proposal | medición de calidad de datos | spa |
| dc.title | Metadatos estadísticos para el aseguramiento de la calidad de los datos | spa |
| dc.title.alternative | Statistical metadata for data quality assurance | spa |
| dc.type | Trabajo de grado - Doctorado | spa |
| dc.type.coar | http://purl.org/coar/resource_type/c_db06 | spa |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
| dc.type.content | Text | spa |
| dc.type.driver | info:eu-repo/semantics/doctoralThesis | spa |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
| oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 527779.2020.pdf
- Tamaño:
- 6.03 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Doctorado en Ingeniería - Sistemas
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 3.8 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:

