Nachabot - Modelo de inteligencia artificial como asistente al proceso de admisión de la Universidad Nacional de Colombia

Tafur Devia, Cristian David

Nachabot - Modelo de inteligencia artificial como asistente al proceso de admisión de la Universidad Nacional de Colombia

dc.contributor.advisor	Niño Vásquez, Luis Fernando	spa
dc.contributor.author	Tafur Devia, Cristian David	spa
dc.contributor.researchgroup	laboratorio de Investigación en Sistemas Inteligentes Lisi	spa
dc.coverage.country	Colombia	spa
dc.coverage.tgn	http://vocab.getty.edu/page/tgn/1000050
dc.date.accessioned	2025-09-17T20:45:31Z
dc.date.available	2025-09-17T20:45:31Z
dc.date.issued	2025-07-17
dc.description	ilustraciones, diagramas	spa
dc.description.abstract	Este trabajo de grado presenta el desarrollo de Nachabot, un asistente conversacional inteligente diseñado para responder preguntas frecuentes sobre el proceso de admisión a programas de pregrado y posgrado en la Universidad Nacional de Colombia. El objetivo principal fue construir un sistema basado en la arquitectura de Generación Aumentada por Recuperación (RAG), utilizando fuentes oficiales como documentos institucionales y páginas web, integrando herramientas como LangChain, LangGraph, Ollama y Streamlit. El diseño incluyó el Web scraping y procesamiento de datos desde el portal de admisiones, la segmentación y vectorización del corpus, y la implementación de múltiples flujos conversacionales sobre grafos de estado. Se compararon tres configuraciones del sistema: GPT-4o con embeddings de OpenAI, LLaMA3.2 con embeddings de OpenAI, y LLaMA3.2 con embeddings propios. La validación se realizó con LangSmith, evaluando métricas como exactitud, fidelidad al contexto, relevancia y latencia. Los resultados muestran que GPT-4o obtuvo los mejores puntajes en groundedness (0.88) y correctness (0.71), mientras que LLaMA3.2 con embeddings propios fue la solución más rápida (latencia P50: 2.1 s), aunque con menor calidad en las respuestas. La aplicación fue desplegada exitosamente en la nube mediante Streamlit, permitiendo el acceso público al sistema. Nachabot constituye una solución adaptable y reproducible para instituciones educativas que deseen automatizar la atención a aspirantes. (Texto tomado de la fuente).	spa
dc.description.abstract	This thesis presents the development of Nachabot, an intelligent conversational assistant designed to answer frequently asked questions regarding undergraduate and graduate admissions at the National University of Colombia. The project aimed to build a system based on the Retrieval-Augmented Generation (RAG) architecture, combining official institutional sources—such as web content and PDF documents—with technologies like LangChain, LangGraph, Ollama, and Streamlit. The system was designed modularly to integrate web scraping, document segmentation, semantic vectorization, and conversational flows modeled as graphs. Three configurations of the system were implemented and evaluated: GPT-4o with OpenAIEmbeddings, LLaMA3.2 with OpenAIEmbeddings, and LLaMA3.2 with local OllamaEmbeddings. The evaluation used LangSmith's LLM-as-judge framework, analyzing correctness, groundedness, relevance, and latency. Results showed that the GPT-4o configuration achieved the best scores in groundedness (0.88) and correctness (0.71), while LLaMA3.2 with local embeddings yielded the lowest latency (P50: 2.1 s) but also the lowest response quality. The final system was successfully deployed using Streamlit Cloud, allowing public interaction with the assistant. Nachabot demonstrates the viability of building robust, low-cost, and extensible conversational agents for educational institutions aiming to automate and improve applicant support services.	eng
dc.description.degreelevel	Maestría	spa
dc.description.degreename	Magíster en Ingeniería - Ingeniería de Sistemas y Computación	spa
dc.description.researcharea	Sistemas inteligentes	spa
dc.format.extent	64 páginas	spa
dc.format.mimetype	application/pdf
dc.identifier.instname	Universidad Nacional de Colombia	spa
dc.identifier.reponame	Repositorio Institucional Universidad Nacional de Colombia	spa
dc.identifier.repourl	https://repositorio.unal.edu.co/	spa
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/88885
dc.language.iso	spa
dc.publisher	Universidad Nacional de Colombia	spa
dc.publisher.branch	Universidad Nacional de Colombia - Sede Bogotá	spa
dc.publisher.faculty	Facultad de Ingeniería	spa
dc.publisher.place	Bogotá, Colombia	spa
dc.publisher.program	Bogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computación	spa
dc.relation.references	I. B. Cruz, S. S. Mart´ ınez, A. R. Abed, R. G. Abalo, and M. M. G. ´ Lorenzo, neuronales recurrentes para el analisis de secuencias, ” ´ Revista Cubana de Ciencias Informaticas , vol. 1, no. 4, pp. 48–57, 2007
dc.relation.references	Estadísticas Aspirantes Unal, portal web: https://estadisticas.unal.edu.co/Aspirantes/
dc.relation.references	Jia, J. (2003). The Study of the Application of a Keywords-based Chatbot System on the Teaching of Foreign Languages. ArXiv preprint cs/0310018. Recuperado de: https://arxiv.org/abs/cs/0310018
dc.relation.references	Turing, A. M. (1950). Computing Machinery And Intelligence. Mind, LIX(236), 433–460. Recuperado de: http://doi.org/10.1093/mind/lix.236.433
dc.relation.references	«chatbot, neologismo válido (sic)» (html). Fundación del Español Urgente. 13 de junio de 2020. Archivado desde el original el 13 de junio de 2019. Recuperado en https://web.archive.org/web/20190613115414/https://www.fundeu.es/recomendacion/chatbot-neolo gismo-valido/
dc.relation.references	Alvarado Troncoso, Marco Antonio (diciembre de 2012). «Sistema para el Aprendizaje del Mapudungun. Incluyendo características de reconocimiento de voz y bot conversacional.». Pontificia Universidad Católica de Valparaíso.
dc.relation.references	Sansonnet, J. -P., Leray, D., & Martin, J. -C. (2006). Architecture of a Framework for Generic Assisting Conversational Agents. Intelligent Virtual Agents Lecture Notes in Computer Science, 145–156. http://doi.org/10.1007/11821830 12
dc.relation.references	Hugging Face. Natural Language Processing. En línea. Recuperado en: https://huggingface.co/learn/nlp-course/chapter1/2?fw=pt
dc.relation.references	IA conversacional. Globant. Recuperado de https://www.globant.com/es/tech-terms/ia-conversacional
dc.relation.references	Zhang et. al. DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation. https://arxiv.org/abs/1911.00536
dc.relation.references	D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors, ” nature, vol. 323, no. 6088, pp. 533–536, 1986
dc.relation.references	Richaud, “Redes perceptrón multicapa (MLP): ¿Qué son, cómo funcionan y cuándo utilizarlas? (o no)” , https://antonio-richaud.com/blog/archivo/publicaciones/41-redes-perceptron-multicapa.html
dc.relation.references	I. B. Cruz, S. S. Mart´ ınez, A. R. Abed, R. G. Abalo, and M. M. G. ´ Lorenzo, neuronales recurrentes para el analisis de secuencias, ” ´ Revista Cubana de Ciencias Informaticas , vol. 1, no. 4, pp. 48–57, 2007
dc.relation.references	https://www.researchgate.net/publication/28215712_Interfaz_visual_para_el_prototipado_rapido_de_clasificadores_de_gajos_de_mandarina_basados_redes_neuronales
dc.relation.references	J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Gated feedback recurrent neural networks,” in International conference on machine learning. PMLR, 2015, pp. 2067–2075
dc.relation.references	K. Zhan, Y. Li, R. Osmani, X. Wang, and B. Cao, “Data exploration and classification of news article reliability: Deep learning study,” JMIR infodemiology, vol. 2, no. 2, p. e38839, 2022
dc.relation.references	H. Wang and B. Raj, “On the origin of deep learning”. 2017
dc.relation.references	https://medium.com/deeplearningbrasilia/deep-learning-recurrent-neural-networks-f9482a24d010
dc.relation.references	M. Jabrel y A. Moreno, “A Deep Learning-Based Approach for Multi-LabelEmotion Classiﬁcation in Tweets”. 2019. Rcuperado en: https://www.researchgate.net/publication/331848495_A_Deep_Learning-Based_Approach_for_Multi-Label_Emotion_Classification_in_Tweets
dc.relation.references	J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014
dc.relation.references	A. Agarwal and P. Meel, “Stacked bi-lstm with attention and contextual bert embeddings for fake news analysis,” in 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1. IEEE, 2021, pp. 233–237.
dc.relation.references	J. Alghamdi, Y. Lin, and S. Luo, “Towards covid-19 fake news detection using transformer-based models,” Knowledge-Based Systems, vol. 274, p. 110642, 2023.
dc.relation.references	R. Abyaad, M. R. Kabir, and S. Hasan, “A novel approach to categorize news articles from headlines and short text,” in 2020 IEEE Region 10 Symposium (TENSYMP). IEEE, 2020, pp. 162–165
dc.relation.references	Kuis José R. Desarrollo de una Red Neuronal Convulsional para Clasificación de Imágenes con TensorFlow, 2023. Recuperado en: https://www.linkedin.com/pulse/desarrollo-de-una-red-neuronal-convulsional-para-con-luis-jos%C3%A9-ser4e/
dc.relation.references	J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018
dc.relation.references	Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
dc.relation.references	Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019.
dc.relation.references	Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-xl: Attentive language models beyond a fixed-length context,” arXiv preprint arXiv:1901.02860, 2019.
dc.relation.references	D. Liu, D. Greene, and R. Dong, “A novel perspective to look at attention: Bi-level attention-based explainable topic modeling for news classification,” arXiv preprint arXiv:2203.07216, 2022.
dc.relation.references	A. Elnagar, O. Einea, and R. Al-Debsi, “Automatic text tagging of arabic news articles using ensemble deep learning models,” in Proceedings of the 3rd international conference on natural language and speech processing, 2019, pp. 59–66
dc.relation.references	K. M. Alzhrani, “Political ideology detection of news articles using deep neural networks.” Intelligent Automation & Soft Computing, vol. 33, no. 1, 2022.
dc.relation.references	T. T. Nguyen, A. D. Le, H. T. Hoang y T. Nguyen, “NEU-chatbot: Chatbot for admission of National Economics University”, 2021. Rcuperado en: https://doi.org/10.1016/j.caeai.2021.100036
dc.relation.references	S. Vidivelli, M Ramachandran y A. Dharunbalaji, “Efficiency-Driven Custom Chatbot Development: Unleashing LangChain, RAG, and Performance-Optimized LLM Fusion”, 2024. Recuperado en: https://doi.org/10.32604/cmc.2024.054360
dc.relation.references	A. Aloqayli y H. Abdelhafez, “Intelligent Chatbot for Admission in Higher Education”, 2023. Recuperado en: https://www.ijiet.org/vol13/IJIET-V13N9-1937.pdf
dc.relation.references	Langchain, Build a Retrieval Augmented Generation (RAG) App: Part 1. Recuperado en: https://python.langchain.com/docs/tutorials/rag/
dc.relation.references	Langchain, Retrieval augmented generation (RAG). Recuperado en: https://python.langchain.com/docs/concepts/rag/
dc.relation.references	Chase, H. (2023). LangChain documentation. LangChain. https://docs.langchain.com/
dc.relation.references	Harris, R. (2024). LangGraph: State machines for LLM applications. LangChain. https://python.langgraph.org/
dc.relation.references	Streamlit. (2023). Streamlit: The fastest way to build and share data apps. Streamlit Inc. https://streamlit.io/
dc.rights.accessrights	info:eu-repo/semantics/openAccess
dc.rights.license	Atribución-NoComercial 4.0 Internacional
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.subject.ddc	000 - Ciencias de la computación, información y obras generales::006 - Métodos especiales de computación	spa
dc.subject.proposal	LLMs	eng
dc.subject.proposal	Chatbot	eng
dc.subject.proposal	Asistente de IA	spa
dc.subject.proposal	RAG	eng
dc.subject.proposal	Llama	eng
dc.subject.proposal	Admisiones UNAL	spa
dc.subject.proposal	Gpt-4	eng
dc.subject.proposal	LLMs	eng
dc.subject.proposal	Chatbot	eng
dc.subject.proposal	AI Assistant	eng
dc.subject.proposal	RAG	eng
dc.subject.proposal	Llama	eng
dc.subject.proposal	Gpt-4	eng
dc.subject.proposal	UNAL admission	eng
dc.subject.unesco	Tecnología de la información	spa
dc.subject.unesco	Information technology	eng
dc.subject.unesco	Sistemas de información documental	spa
dc.subject.unesco	Documentary information systems	eng
dc.subject.unesco	Procesamiento de datos	spa
dc.subject.unesco	Data processing	eng
dc.subject.unesco	Administración de la educación	spa
dc.subject.unesco	Administración de la educación	eng
dc.title	Nachabot - Modelo de inteligencia artificial como asistente al proceso de admisión de la Universidad Nacional de Colombia	spa
dc.title.translated	Nachabot - Artificial intelligence model as an assistant to the admissions process at the National University of Colombia	eng
dc.type	Trabajo de grado - Maestría	spa
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.content	Text
dc.type.driver	info:eu-repo/semantics/masterThesis
dc.type.redcol	http://purl.org/redcol/resource_type/TM
dc.type.version	info:eu-repo/semantics/acceptedVersion
dcterms.audience.professionaldevelopment	Investigadores	spa
oaire.accessrights	http://purl.org/coar/access_right/c_abf2

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Tesis1030684081-Nachabot.pdf
Tamaño:: 2.83 MB
Formato:: Adobe Portable Document Format
Descripción:: Tesis de Maestría en Ingeniería - Ingeniería de Sistemas y Computación

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 5.74 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Maestría en Ingeniería - Sistemas y Computación