Mostrar el registro sencillo del documento
An ontology-based information extractor for data-rich documents in the information technology domain
dc.rights.license | Atribución-NoComercial 4.0 Internacional |
dc.contributor.author | Jiménez Vargas, Sergio Gonzalo |
dc.contributor.author | González Osorio, Fabio Augusto |
dc.date.accessioned | 2019-06-25T22:35:59Z |
dc.date.available | 2019-06-25T22:35:59Z |
dc.date.issued | 2008 |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/24330 |
dc.description.abstract | This paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inconsistent abbreviation styles. We propose a new character-level edit distance sensitive to prefixes called root distance and a token-level similarity algorithm for fuzzy acronym detection. Additionally, a WSD strategy using an ontology-based semantic relatedness measure is used to solve the inherent ambiguity of some entities. The WSD module finds a sense combination over all the document length optimizing the document semantic coherence. Our approach seems to be suitable to extract information from data-rich documents describing Orly one main object (i.e. product) by document. The results showed a precision of 78.9% with 99.5% recall using documents and an ontology related to laptop computers domain. |
dc.format.mimetype | application/pdf |
dc.language.iso | spa |
dc.publisher | Universidad Nacional de Colombia -Sede Medellín |
dc.relation | http://revistas.unal.edu.co/index.php/avances/article/view/9972 |
dc.relation.ispartof | Universidad Nacional de Colombia Revistas electrónicas UN Avances en Sistemas e Informática |
dc.relation.ispartof | Avances en Sistemas e Informática |
dc.relation.ispartofseries | Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) 1909-0056 1657-7663 |
dc.rights | Derechos reservados - Universidad Nacional de Colombia |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ |
dc.title | An ontology-based information extractor for data-rich documents in the information technology domain |
dc.type | Artículo de revista |
dc.type.driver | info:eu-repo/semantics/article |
dc.type.version | info:eu-repo/semantics/publishedVersion |
dc.identifier.eprints | http://bdigital.unal.edu.co/15367/ |
dc.relation.references | Jiménez Vargas, Sergio Gonzalo and González Osorio, Fabio Augusto (2008) An ontology-based information extractor for data-rich documents in the information technology domain. Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) 1909-0056 1657-7663 . |
dc.rights.accessrights | info:eu-repo/semantics/openAccess |
dc.subject.proposal | Knowledge Management |
dc.subject.proposal | Information Extraction |
dc.subject.proposal | Ontologies |
dc.subject.proposal | Fuzzy String Searching |
dc.subject.proposal | Word Sense Disambiguation |
dc.subject.proposal | Semantic Relatedness |
dc.type.coar | http://purl.org/coar/resource_type/c_6501 |
dc.type.coarversion | http://purl.org/coar/version/c_970fb48d4fbd8a85 |
dc.type.content | Text |
dc.type.redcol | http://purl.org/redcol/resource_type/ART |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 |
Archivos en el documento
Este documento aparece en la(s) siguiente(s) colección(ones)
![Atribución-NoComercial 4.0 Internacional](/themes/Mirage2//images/creativecommons/cc-generic.png)