An information retrieval strategy for large multimodal data collections involving source code and natural language

Baquero Vargas, Juan Felipe

An information retrieval strategy for large multimodal data collections involving source code and natural language

dc.contributor	González Osorio, Fabio Augusto	spa
dc.contributor	Restrepo Calle, Felipe	spa
dc.contributor.author	Baquero Vargas, Juan Felipe	spa
dc.date.accessioned	2020-03-30T06:22:16Z	spa
dc.date.available	2020-03-30T06:22:16Z	spa
dc.date.issued	2019-07-03	spa
dc.description.abstract	Source code repositories store data from software products. Among this data we can find the evolution of the source code, requirements, bugs and communication between developers. Source code repositories have been growing rapidly in the recent years andwith them the need of extracting information from them. An interesting source code repository that is growing both in usage and information is Stack Overflow (SO), this web site provides one of the biggest Question Answering places used by thousands of developers everyday. In SO the developers can ask any question related to a programming issue and it will be answered by other users. We can find a source code repository with both source code and natural language with thousands of samples and the possibility of combining both sources of information to extract useful and not eye-noticeable information from it. In this thesis, we explore how to represent source code and natural language and how to combine these representations. We try to solve the task of understanding how users in SO talk about the programming language, how similar these programming languages are among them based on how users talk about them, and finally, we provide tools on the building of an information retrieval strategy by identifying duplicated post.	spa
dc.description.abstract	Los repositorios de software almacenan datos sobre los productos de software, datos relacionados con la evolución de código fuente, requerimientos de software, reporte de bugs y comunicación entre desarrolladores. Los repositorios de software han crecido rápidamente en los últimos años y con ellos la necesidad de extraer información significativa de ellos. Un repositorio de software intersante es Stack Overflow(SO), este sitio web es uno de los sitios de Question Answering más grandes y usados por miles de desarrolladores de sofware en su día a día. En SO los desarrollares pueden preguntar cualquier duda relacionada con programación y software que será respondida por otros usuarios. Como SO, existen muchos repositorios de software con código fuente y texto con millones de ejemplares y la posibilidad de combinar ambas fuentes para extraer información de ellos que no es visible a simple vista. En este trabajo de tesis, exploramos como representar código fuente y lenguaje natural y cómo combinar estas representaciones. Intentamos resolver la tarea de entender como los usuarios de SO hablan sobre un lenguage de programación, que tan similares son los lenguajes de programación basados en cómo los usuarios hablen sobre ellos y, finalmente, proporcionar herramientas para construir una estrategia de recuperación de información para identificar post duplicados.	spa
dc.description.degreelevel	Maestría	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.eprints	http://bdigital.unal.edu.co/73062/	spa
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/76556
dc.language.iso	spa	spa
dc.relation.haspart	0 Generalidades / Computer science, information and general works	spa
dc.relation.haspart	6 Tecnología (ciencias aplicadas) / Technology	spa
dc.relation.haspart	62 Ingeniería y operaciones afines / Engineering	spa
dc.relation.ispartof	Universidad Nacional de Colombia Sede Bogotá Facultad de Ingeniería Departamento de Ingeniería de Sistemas e Industrial Ingeniería de Sistemas	spa
dc.relation.ispartof	Ingeniería de Sistemas	spa
dc.relation.references	Baquero Vargas, Juan Felipe (2019) An information retrieval strategy for large multimodal data collections involving source code and natural language. Maestría thesis, Universidad Nacional de Colombia - Sede Bogotá.	spa
dc.rights	Derechos reservados - Universidad Nacional de Colombia	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.license	Atribución-NoComercial 4.0 Internacional	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	spa
dc.subject.proposal	Stack Overflow	spa
dc.subject.proposal	source code analysis	spa
dc.subject.proposal	Duplication detection	spa
dc.subject.proposal	Predicting programming language	spa
dc.subject.proposal	Análisis de código fuente	spa
dc.subject.proposal	Detección de duplicados	spa
dc.subject.proposal	Predecir el lenguaje de programación	spa
dc.title	An information retrieval strategy for large multimodal data collections involving source code and natural language	spa
dc.type	Trabajo de grado - Maestría	spa
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc	spa
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.content	Text	spa
dc.type.driver	info:eu-repo/semantics/masterThesis	spa
dc.type.redcol	http://purl.org/redcol/resource_type/TM	spa
dc.type.version	info:eu-repo/semantics/acceptedVersion	spa
oaire.accessrights	http://purl.org/coar/access_right/c_abf2	spa

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Tesis_Maestra_JFBV__Universidad_Nacional_de_Colombia.pdf
Tamaño:: 1.09 MB
Formato:: Adobe Portable Document Format

Descargar

Colecciones

Maestría en Ingeniería - Sistemas y Computación