An ontology-based information extractor for data-rich documents in the information technology domain

Jiménez Vargas, Sergio Gonzalo; González Osorio, Fabio Augusto

Mostrar el registro sencillo del documento

dc.rights.license	Atribución-NoComercial 4.0 Internacional
dc.contributor.author	Jiménez Vargas, Sergio Gonzalo
dc.contributor.author	González Osorio, Fabio Augusto
dc.date.accessioned	2019-06-25T22:35:59Z
dc.date.available	2019-06-25T22:35:59Z
dc.date.issued	2008
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/24330
dc.description.abstract	This paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inconsistent abbreviation styles. We propose a new character-level edit distance sensitive to prefixes called root distance and a token-level similarity algorithm for fuzzy acronym detection. Additionally, a WSD strategy using an ontology-based semantic relatedness measure is used to solve the inherent ambiguity of some entities. The WSD module finds a sense combination over all the document length optimizing the document semantic coherence. Our approach seems to be suitable to extract information from data-rich documents describing Orly one main object (i.e. product) by document. The results showed a precision of 78.9% with 99.5% recall using documents and an ontology related to laptop computers domain.
dc.format.mimetype	application/pdf
dc.language.iso	spa
dc.publisher	Universidad Nacional de Colombia -Sede Medellín
dc.relation	http://revistas.unal.edu.co/index.php/avances/article/view/9972
dc.relation.ispartof	Universidad Nacional de Colombia Revistas electrónicas UN Avances en Sistemas e Informática
dc.relation.ispartof	Avances en Sistemas e Informática
dc.relation.ispartofseries	Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) 1909-0056 1657-7663
dc.rights	Derechos reservados - Universidad Nacional de Colombia
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.title	An ontology-based information extractor for data-rich documents in the information technology domain
dc.type	Artículo de revista
dc.type.driver	info:eu-repo/semantics/article
dc.type.version	info:eu-repo/semantics/publishedVersion
dc.identifier.eprints	http://bdigital.unal.edu.co/15367/
dc.relation.references	Jiménez Vargas, Sergio Gonzalo and González Osorio, Fabio Augusto (2008) An ontology-based information extractor for data-rich documents in the information technology domain. Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) 1909-0056 1657-7663 .
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.subject.proposal	Knowledge Management
dc.subject.proposal	Information Extraction
dc.subject.proposal	Ontologies
dc.subject.proposal	Fuzzy String Searching
dc.subject.proposal	Word Sense Disambiguation
dc.subject.proposal	Semantic Relatedness
dc.type.coar	http://purl.org/coar/resource_type/c_6501
dc.type.coarversion	http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.content	Text
dc.type.redcol	http://purl.org/redcol/resource_type/ART
oaire.accessrights	http://purl.org/coar/access_right/c_abf2

Archivos en el documento

Nombre:: 9972-18047-1-PB.pdf
Tamaño:: 283.0Kb
Formato:: PDF

Descargar

Este documento aparece en la(s) siguiente(s) colección(ones)

Avances en Sistemas e Informática [299]

Mostrar el registro sencillo del documento

Atribución-NoComercial 4.0 Internacional

Esta obra está bajo licencia internacional Creative Commons Reconocimiento-NoComercial 4.0.Este documento ha sido depositado por parte de el(los) autor(es) bajo la siguiente constancia de depósito