Aplicación de técnicas de detección de temáticas emergentes para establecer prioridades de Investigación y Desarrollo en el área de Machine Learning

dc.contributor.advisorVelasquez Henao, Juan David
dc.contributor.authorVásquez Hernandez, Valentina
dc.contributor.researchgroupBig Data y Data Analyticsspa
dc.date.accessioned2023-09-20T15:52:20Z
dc.date.available2023-09-20T15:52:20Z
dc.date.issued2023-05-28
dc.descriptionilustraciones, diagramasspa
dc.description.abstractLa detección de temáticas emergentes es de gran relevancia para los equipos que se dedican a la Investigación y Desarrollo en Machine Learning, ya que les permite formular proyectos de investigación, generar nuevas oportunidades de negocio y aportar valor en términos de producto, tecnología y conocimiento. Sin embargo, estos equipos se enfrentan a varios obstáculos, como la rápida generación de información, la diversidad de fuentes de datos disponibles y la falta de implementaciones no comerciales escalables que permitan automatizar el análisis. Con el objetivo de abordar esta necesidad, se propone una discusión sobre las metodologías actuales para la detección de temáticas emergentes. Además, se presenta una propuesta metodológica que combina tres técnicas de procesamiento de lenguaje natural: el Clasificador de Ontología de Ciencias de la Computación, los mapas temáticos y BERTopic. Una vez establecida esta metodología, se aplica a textos de artículos científicos en el campo del Machine Learning, lo que permite obtener una lista de temas prioritarios. Finalmente, se realiza una discusión de los resultados, contrastándolos con los cambios estructurales dentro del área. Como resultado de este estudio, se logra identificar subáreas de investigación y temas específicos que se consideran emergentes en la actualidad. Se recomienda a los equipos de investigación aborden estos temas, ya que representan áreas de gran potencial y relevancia en el campo del Machine Learning. (Texto tomado de la fuente)spa
dc.description.abstractThe emerging topics detection is of great relevance for research and development teams in Machine Learning, as it empowers them to formulate research projects, generate novel business prospects, and contribute value in terms of products, technologies, and knowledge. Nonetheless, these teams encounter diverse challenges including the swift information generation, the abundance of data sources, and the scarcity of scalable non- commercial implementations that facilitate automated analysis. To address this need, a discussion on current methodologies for detecting emerging topics is proposed. Additionally, a methodological proposal is presented, which combines three natural language processing techniques: Computer Science Ontology Classifier, thematic maps, and BERTopic. This methodology is applied to scientific articles in the Machine Learning domain, resulting in a prioritized inventory of topics. Lastly, a comprehensive discussion of the outcomes is conducted, contrasting them with the structural transformations occurring within the field. As a result of this study, specific research subareas and topics that are considered emerging in the present time are identified. It is recommended that research teams address these topics, as they represent areas of significant potential and relevance in the field of Machine Learning.eng
dc.description.curricularareaÁrea Curricular de Ingeniería de Sistemas e Informáticaspa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ingeniería - Analíticaspa
dc.description.researchareaMétodos computaciones para el análisis de datosspa
dc.format.extentxvi, 63 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/84718
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.facultyFacultad de Minasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Minas - Maestría en Ingeniería - Analíticaspa
dc.relation.indexedRedColspa
dc.relation.indexedLaReferenciaspa
dc.relation.references[1] D. H. R. M. Daniele Rotolo, «What is an emerging technology?,» Research Policy, vol. 44, no 10, pp. 1827-1843, 2015.spa
dc.relation.references[2] Q. Wang, «A Bibliometric Model for Identifying Emerging Research Topics,» Journal of the Association for Information Science and Technology, vol. 69, no 2, pp. 290-304, 2018.spa
dc.relation.references[3] L. H. A. P. L. Shuo Xu, «Review on emerging research topics with key-route main path analysis,» Scientometrics, vol. 122, pp. 607-624, 2020.spa
dc.relation.references[4] S. W. C. Alan L. Porter, «How Tech Mining Works,» de Tech Mining, New Jersey, John Wiley & Sons, INC, 2005, pp. 17-20.spa
dc.relation.references[5] J. G. F. C. C. N. Alan L. Porter, «Emergence scoring to identify frontier R&D topics and key players,» Technological Forecasting & Social Change, vol. 146, pp. 628-643, 2019.spa
dc.relation.references[6] E. M. Rogers, «Four Main Elements in the Diffusion of Innovations,» de Diffusion of Innovations, New York, NY, The Free Press, 1962, pp. 10-35.spa
dc.relation.references[7] L. H. G. Y. K. L. X. A. Shuo Xu, «A topic models based framework for detecting and forecasting emerging technologies,» Technological Forecasting & Social Change, vol. 162, pp. 120-366, 2021.spa
dc.relation.references[8] S. W. C. Alan L. Porter, «Finding the Right Sources,» de Tech Mining, New Jersey, John Wiley & Sons, INC, 2005, pp. 69-94.spa
dc.relation.references[9] J. S. L. P. Y. Ying Huang, «A systematic method to create search strategies for emerging technologies based on the Web of Science: illustrated for ‘Big Data’,» Scientometrics, vol. 105, pp. 2005-2022, 2015.spa
dc.relation.references[10] J. M. L. B. L. Zhentao Liang, «Combining deep neural network and bibliometric indicator for emerging research topic prediction,» Information Processing and Management, vol. 58, pp. 102-611, 2021.spa
dc.relation.references[11] M. G. Christian Mühlroth, «Artificial Intelligence in Innovation: How to Spot Emerging Trends and Technologies,» IEEE Transactions on Engineering Management , vol. 69, no 2, 2022.spa
dc.relation.references[12] D. C. N. C. N. Alan L. Porter, «Measuring tech emergence: A contest,» Technological Forecasting & Social Change, vol. 159, pp. 120-176, 2020.spa
dc.relation.references[13] L. H. X. A. G. Y. F. W. Shuo Xu, «Emerging research topics detection with multiple machine learning models,» Journal of Informetrics, vol. 13, pp. 100-983, 2019.spa
dc.relation.references[14] S. W. C. Alan L. Porter, «Technological Innovation and the Need for Tech Mining,» de Tech Mining: Exploiting New Technologies for Competitive Advantage, New Jersey, John Wiley & Sons, INC, 2005, pp. 3-8.spa
dc.relation.references[15] D. J. Jackson, «What is an Innovation Ecosystem,» National Science Foundation, vol. 1, no 2, pp. 1-13, 2022.spa
dc.relation.references[16] J. Z. Manzoor Ahmad, «The Cyclical and Nonlinear Impact of R&D and Innovation Activities on Economic Growth in OECD Economies: a New Perspective,» Journal of the Knowledge Economy, 2022.spa
dc.relation.references[17] OECD, «Gross domestic spending on R&D,» 2022. [En línea]. Available: https://data.oecd.org/rd/gross-domestic-spending-on-r-d.htm. [Último acceso: 18 04 2022].spa
dc.relation.references[18] OECD, «Embrasing Innovation in Government: Global Trends,» OECD. Observatory of Public Sector Innovation, 2017.spa
dc.relation.references[19] OECD, «OECD. Innovation Scoreboard. HERD as percentage,» OECD, 2022. [En línea]. Available: https://www.oecd.org/innovation/scoreboard.htm. [Último acceso: 18 04 2022].spa
dc.relation.references[20] G. Medda, «External R&D, product and process innovation in European manufacturing companies,» The Journal of Technology Transfer, vol. 45, pp. 339-369, 2018.spa
dc.relation.references[21] National Science Board (NSB), « U.S. and Global Science and Technology Capabilities,» National Science Foundation (NSF), 2022.spa
dc.relation.references[22] National Center for Science and Engineering Statistics (NCSES), «New Data on U.S. R&D: Summary Statistics from the 2019–20,» National Patterns of R&D Resources, pp. 22-314, 2021.spa
dc.relation.references[23] K. White, «Publication Output by Country, Region, or Economy and Scientific Field,» 2021. [En línea]. Available: https://ncses.nsf.gov/pubs/nsb20214/publication-output-by- country-region-or-economy-and-scientific-field. [Último acceso: 18 04 2022].spa
dc.relation.references[24] N. M. E. B. J. E. T. L. J. M. H. N. J. C. N. M. S. E. S. Y. S. J. C. a. R. P. Daniel Zhang, «The AI Index 2022 Annual Report,» AI Index Steering Committee, Stanford Institute for Human-Centered AI, Stanford University, 2022.spa
dc.relation.references[25] World Intellectual Property Organization (WIPO), «IP Statistics Data Center,» Febrero 2022. [En línea]. Available: https://www.wipo.int/edocs/infogdocs/en/ipfactsandfigures/. [Último acceso: 18 04 2022].spa
dc.relation.references[26] University of Groningen, «Information Literacy History: Types of databases,» 3 Marzo 2022. [En línea]. Available: https://libguides.rug.nl/c.php?g=470628&p=3283312. [Último acceso: 18 04 2022].spa
dc.relation.references[27] «The History of the Scientific Journal,» 2022. [En línea]. Available: https://arts.st- andrews.ac.uk/philosophicaltransactions/brief-history-of-phil-trans/.spa
dc.relation.references[28] Scopus, «Scopus Content Coverage Guide,» Elsevier, 2020.spa
dc.relation.references[29] Clarivate, «Web of Science Coverage Details,» Clarivate, Agosto 2021. [En línea]. Available: https://clarivate.libguides.com/librarianresources/coverage. [Último acceso: 18 04 2022].spa
dc.relation.references[30] Google Schoolar, «Inclusion Guidelines for Webmasters,» 2022. [En línea]. Available: https://scholar.google.com/intl/es/scholar/inclusion.html. [Último acceso: 18 04 2022].spa
dc.relation.references[31] Vantage Point , «Vantage Points Products,» 2022. [En línea]. Available: https://www.thevantagepoint.com/products.html. [Último acceso: 18 04 2022].spa
dc.relation.references[32] C. C. Aggarwal, de Machine Learning for Text, New York, Springer, 2018, p. 2.spa
dc.relation.references[33] E. D. Liddy, «Natural Language Processing. In Encyclopedia of Library and Information Science 2nd Ed,» Marcel Decker Inc, New York, 2001.spa
dc.relation.references[34] R. U. H. Ish Kumar Dhammi, «What is indexing,» Indian Journal of Orthopaedics, pp. 115-116, 2016.spa
dc.relation.references[35] S. W. C. Alan L. Porter, «What Tech Mining Can Do for You,» de Tech Mining, New Jersey, John Wiley & Sons, INC, 2005, pp. 33-40.spa
dc.relation.references[36] A. L.-H. E. H.-V. F. H. M.J. Cobo, «An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field,» Journal of Informetrics, vol. 5, no 1, pp. 146-166, 2011.spa
dc.relation.references[37] K.-S. Sr, «Bibliometrix,» 2023. [En línea]. Available: https://www.bibliometrix.org/home/index.php/layout/bibliometrix. [Último acceso: 02 04 2023].spa
dc.relation.references[38] C. C. Massimo Aria, «bibliometrix: An R-tool for comprehensive science mapping analysis,» Journal of Informetrics, vol. 11, pp. 959-975, 2017.spa
dc.relation.references[39] M. Aria, C. Cuccurullo, L. D’Aniello, M. Misuraca y M. Spano, «Thematic Analysis as a New Culturomic Tool: The Social Media Coverage on COVID-19,» Sustainability , vol. 14, 2022.spa
dc.relation.references[40] A. L.-H. E. H.-V. F. H. M.J. Cobo, «An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the Fuzzy Sets Theory field,» Journal of Informetrics, vol. 5, pp. 146-166, 2011.spa
dc.relation.references[41] T. T. A. M. A. B. F. O. E. M. Angelo A. Salatino, «The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas,» Data Intelligence, vol. 2, no 3, 2020.spa
dc.relation.references[42] T. T. A. M. Angelo A. Salatino, «CSO Classifier Github Page,» Github, 01 04 2023. [En línea]. Available: https://github.com/angelosalatino/cso-classifier#about. [Último acceso: 05 04 2023].spa
dc.relation.references[43] SBERT, «Pretrained Models,» 07 09 2022. [En línea]. Available: https://www.sbert.net/docs/pretrained_models.html. [Último acceso: 06 05 2023].spa
dc.relation.references[44] A. P. Andy Coenen, «A deeper dive into UMAP theory,» Google, [En línea]. Available: https://pair-code.github.io/understanding-umap/supplement.html. [Último acceso: 06 05 2023].spa
dc.relation.references[45] Google, «Algoritmos de agrupamiento,» Google, [En línea]. Available: https://developers.google.com/machine-learning/clustering/clustering-algorithms?hl=es- 419. [Último acceso: 06 05 2023].spa
dc.relation.references[46] M. Grootendorst, «BERTopic,» [En línea]. Available: https://maartengr.github.io/BERTopic/algorithm/algorithm.html. [Último acceso: 06 05 2023].spa
dc.relation.references[47] A. S. F. O. Alessandra Belfiore, «Characterising Research Areas in the field of AI,» de SIS2022 51a Reunión Científica de la Sociedad Estadística Italiana, Caserta, 2022.spa
dc.relation.references[48] G. V. S. E. S.-G. Kenji Contreras, «Using Topic Modelling for Analyzing Panamanian Parliamentary Proceedings with Neural and Statistical Methods,» de 2022 IEEE 40th Central America and Panama Convention (CONCAPAN), Panamá, 2022.spa
dc.relation.references[49] M. Grootendorst, «BERTopic: Neural topic modeling with a class-based TF-IDF procedure,» de BERTopic: Neural topic modeling with a class-based TF-IDF procedure, Netherlands, 2022.spa
dc.relation.references[50] Scopus, «Scopus: Access and use Support Center,» Scopus, [En línea]. Available: https://service-elsevier- com.ezproxy.unal.edu.co/app/answers/detail/a_id/11234/supporthub/scopus/#anchor. [Último acceso: 06 04 2023].spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-SinDerivadas 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nd/4.0/spa
dc.subject.ddc600 - Tecnología (Ciencias aplicadas)spa
dc.subject.ddc620 - Ingeniería y operaciones afinesspa
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadoresspa
dc.subject.lembAprendizaje automático (Inteligencia artificial)
dc.subject.lembTecnología de la información
dc.subject.lembInformation technology
dc.subject.proposalNatural Language Processingeng
dc.subject.proposalMachine Learningeng
dc.subject.proposalTopic Modelingeng
dc.subject.proposalBERTOPICeng
dc.subject.proposalCSO Classifiereng
dc.subject.proposalThematic Mapseng
dc.titleAplicación de técnicas de detección de temáticas emergentes para establecer prioridades de Investigación y Desarrollo en el área de Machine Learningspa
dc.title.translatedApplication of emerging topics detection techniques to establish Research and Development priorities in the field of Machine Learningeng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
TrabajoFinalMaestría-ValentinaVasquezH-CC1017251620-ConCorecciones.pdf
Tamaño:
2.21 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ingenieria - Analítica

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: