Mostrar el registro sencillo del documento

dc.rights.licenseAtribución-NoComercial 4.0 Internacional
dc.contributor.authorJiménez Vargas, Sergio Gonzalo
dc.contributor.authorGonzález Osorio, Fabio Augusto
dc.date.accessioned2019-06-25T22:35:59Z
dc.date.available2019-06-25T22:35:59Z
dc.date.issued2008
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/24330
dc.description.abstractThis paper presents an information extraction method, suitable for data-rich documents, based on the knowledge represented in a domain ontology. The extractor combines a fuzzy string matcher and a word sense disambiguation (WSD) algorithm. The fuzzy string matcher finds mentions of terms combining character-level and token-level similarity measures dealing with non-standardized acronyms and inconsistent abbreviation styles. We propose a new character-level edit distance sensitive to prefixes called root distance and a token-level similarity algorithm for fuzzy acronym detection. Additionally, a WSD strategy using an ontology-based semantic relatedness measure is used to solve the inherent ambiguity of some entities. The WSD module finds a sense combination over all the document length optimizing the document semantic coherence. Our approach seems to be suitable to extract information from data-rich documents describing Orly one main object (i.e. product) by document. The results showed a precision of 78.9% with 99.5% recall using documents and an ontology related to laptop computers domain.
dc.format.mimetypeapplication/pdf
dc.language.isospa
dc.publisherUniversidad Nacional de Colombia -Sede Medellín
dc.relationhttp://revistas.unal.edu.co/index.php/avances/article/view/9972
dc.relation.ispartofUniversidad Nacional de Colombia Revistas electrónicas UN Avances en Sistemas e Informática
dc.relation.ispartofAvances en Sistemas e Informática
dc.relation.ispartofseriesAvances en Sistemas e Informática; Vol. 5, núm. 1 (2008) Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) 1909-0056 1657-7663
dc.rightsDerechos reservados - Universidad Nacional de Colombia
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.titleAn ontology-based information extractor for data-rich documents in the information technology domain
dc.typeArtículo de revista
dc.type.driverinfo:eu-repo/semantics/article
dc.type.versioninfo:eu-repo/semantics/publishedVersion
dc.identifier.eprintshttp://bdigital.unal.edu.co/15367/
dc.relation.referencesJiménez Vargas, Sergio Gonzalo and González Osorio, Fabio Augusto (2008) An ontology-based information extractor for data-rich documents in the information technology domain. Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) Avances en Sistemas e Informática; Vol. 5, núm. 1 (2008) 1909-0056 1657-7663 .
dc.rights.accessrightsinfo:eu-repo/semantics/openAccess
dc.subject.proposalKnowledge Management
dc.subject.proposalInformation Extraction
dc.subject.proposalOntologies
dc.subject.proposalFuzzy String Searching
dc.subject.proposalWord Sense Disambiguation
dc.subject.proposalSemantic Relatedness
dc.type.coarhttp://purl.org/coar/resource_type/c_6501
dc.type.coarversionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.contentText
dc.type.redcolhttp://purl.org/redcol/resource_type/ART
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2


Archivos en el documento

Thumbnail

Este documento aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del documento

Atribución-NoComercial 4.0 InternacionalEsta obra está bajo licencia internacional Creative Commons Reconocimiento-NoComercial 4.0.Este documento ha sido depositado por parte de el(los) autor(es) bajo la siguiente constancia de depósito