Visualización de datos categóricos empleando métodos de reducción de dimensionalidad enfocados en datos socioeconómicos
dc.contributor.advisor | Branch Bedoya, John Willian | |
dc.contributor.advisor | Iral Palomino, René | |
dc.contributor.author | Oliveros Duran, Daniel Alejandro | |
dc.date.accessioned | 2025-07-09T14:12:16Z | |
dc.date.available | 2025-07-09T14:12:16Z | |
dc.date.issued | 2025 | |
dc.description | Ilustraciones, gráficos | spa |
dc.description.abstract | En el contexto actual, la disponibilidad creciente de herramientas para el análisis y la visualización estadística ha facilitado significativamente la exploración de datos y sus relaciones. No obstante, el incremento exponencial en la complejidad y el volumen de los datos plantea desafíos considerables, especialmente en el tratamiento de variables categóricas. Estas variables exhiben desafíos particulares en términos de representación gráfica, integración en modelos analíticos y en la interpretación de los resultados. Entre los principales desafíos que se destacan se encuentran la alta cardinalidad, que genera combinaciones complejas y dificulta el análisis individual de cada categoría, así como el incremento de la dimensionalidad derivado de técnicas de codificación como el one-hot encoding. El propósito de este estudio es desarrollar un procedimiento de visualización para datos categóricos con alta cardinalidad y dimensionalidad. Para lograr este objetivo, se propone un enfoque que abarca el procesamiento y la selección de variables categóricas, con el propósito de facilitar la aplicación de técnicas de reducción de dimensionalidad. Posteriormente, se determinará un método de visualización adecuado para representar el conjunto de datos reducido, de manera que sea posible analizar las relaciones entre las variables categóricas en un espacio de menor dimensión. (Tomado de la fuente) | spa |
dc.description.abstract | In the current context, the increasing availability of tools for statistical analysis and visualization has significantly facilitated the exploration of data and their relationships. However, the exponential increase in the complexity and volume of data poses considerable challenges, especially in the treatment of categorical variables. These variables exhibit particular challenges in terms of graphical representation, integration into analytical models, and interpretation of results. Among the main challenges are high cardinality, which generates complex combinations and makes the individual analysis of each category difficult, as well as increased dimensionality derived from coding techniques such as one-hot encoding. The purpose of this study is to develop a visualization procedure for categorical data with high cardinality and dimensionality. To achieve this objective, an approach is proposed that encompasses the processing and selection of categorical variables, with the purpose of facilitating the application of dimensionality reduction techniques. Subsequently, a suitable visualization method will be determined to represent the reduced data set, so that it will be possible to analyze the relationships between categorical variables in a lower dimensional space. | eng |
dc.description.curriculararea | Ingeniería De Sistemas E Informática.Sede Medellín | spa |
dc.description.degreelevel | Maestría | spa |
dc.format.extent | 95 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/88315 | |
dc.language.iso | spa | spa |
dc.publisher | Universidad Nacional de Colombia | spa |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | spa |
dc.publisher.faculty | Facultad de Minas | spa |
dc.publisher.place | Medellín, Colombia | spa |
dc.publisher.program | Medellín - Minas - Maestría en Ingeniería - Analítica | spa |
dc.relation.indexed | LaReferencia | spa |
dc.relation.references | Amazon Web Services, Inc. ¿qué son los datos estructurados? https://aws.amazon.com/e s/what-is/structured-data/, 2024. URL https://aws.amazon.com/es/what-is/st ructured-data/. Consultado el 1 de enero de 2025. | spa |
dc.relation.references | Sercan O. Arik and Tomas Pfister. Tabnet: Attentive interpretable tabular learning, 2020. URL https://arxiv.org/abs/1908.07442. | spa |
dc.relation.references | Ashokkumar, Don, and S. High dimensional data visualization: A survey. 2018. | spa |
dc.relation.references | Brian Baingana and Georgios B. Giannakis. Kernel-based embeddings for large graphs with centrality constraints. volume 2015-August, page 1901 – 1905, 2015. doi: 10.1109/ICAS SP.2015.7178301. URL https://scopus.unalproxy.elogim.com/inward/record.uri? eid=2-s2.0-84946067493&doi=10.1109%2fICASSP.2015.7178301&partnerID=40&md 5=48b068448402534fc2fa8622723d74f8. | spa |
dc.relation.references | Rodrigo Kraus Barragán. Tratamiento de variables categóricas en modelos de machine learning. 2022. | spa |
dc.relation.references | Amreen Batool and Yung-Cheol Byun. Enhanced sentiment analysis and topic modeling during the pandemic using automated latent dirichlet allocation. 12:81206 – 81220, 2024. doi: 10.1109/ACCESS.2024.3411717. URL https://scopus.unalproxy.elogim.com/ inward/record.uri?eid=2-s2.0-85196056646&doi=10.1109%2fACCESS.2024.34117 17&partnerID=40&md5=e8e42f2063e9a4455ec24eb6c4e8c132. Cited by: 1; All Open Access, Gold Open Access. | spa |
dc.relation.references | Martin Becker, Jens Lippel, and Thomas Zielke. Dimensionality reduction for data visuali- zation and linear classification, and the trade-off between robustness and classification accuracy. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6478--6485, 2020. doi: 10.1109/ICPR48806.2021.9412865. | spa |
dc.relation.references | Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M. Buhmann. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition, pages 3121--3124, 2010. doi: 10.1109/ICPR.2010.764. | spa |
dc.relation.references | Andreas Buja, Deborah F. Swayne, Michael L. Littman, Nathaniel Dean, Heike Hofmann, and Lisha Chen. Data visualization with multidimensional scaling. 17(2):444 – 472, 2008. doi: 10.1198/106186008X318440. URL https://scopus.unalproxy.elogim.com/inwa rd/record.uri?eid=2-s2.0-46849092792&doi=10.1198%2f106186008X318440&partnerID=40&md5=5b1db7ffeda3bea179da2c8361978cf3. Cited by: 241; All Open Access, Green Open Access. | spa |
dc.relation.references | Lijing Cao. Integration of som and pca for analyzing sports economic data and designing a management system. 2022, 2022. doi: 10.1155/2022/6922554. URL https://scopus.una lproxy.elogim.com/inward/record.uri?eid=2-s2.0-85131182194&doi=10.1155%2 f2022%2f6922554&partnerID=40&md5=8e02d17b661e60684d7f40ed56497b0f. Cited by: 0; All Open Access, Gold Open Access. | spa |
dc.relation.references | Danilo B Coimbra, Rafael M Martins, Tácito TAT Neves, Alexandru C Telea, and Fernando V Paulovich. Explaining three-dimensional dimensionality reduction plots. 15(2):154--172, 2015. ISSN 1473-8724. doi: 10.1177/1473871615600010. | spa |
dc.relation.references | Tarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Waleed Mustafa, and Mohamed Hefeeda. Spca: Scalable principal component analysis for big data on distributed platforms. volume 2015-May, page 79 – 91, 2015. doi: 10.1145/2723372.2751520. URL https://sc opus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-84957607832&doi=10. 1145%2f2723372.2751520&partnerID=40&md5=8650bdd892d21d556a85a249910b9065. Cited by: 26. | spa |
dc.relation.references | Cheng Guo and Felix Berkhahn. Entity embeddings of categorical variables, 2016. URL https://arxiv.org/abs/1604.06737. | spa |
dc.relation.references | Fushing Hsieh, Elizabeth P. Chou, and Ting-Li Chen. Mimicking complexity of structured data matrix’s information content: Categorical exploratory data analysis. 23(5), 2021. doi: 10.3390/e23050594. URL https://scopus.unalproxy.elogim.com/inward/record.ur i?eid=2-s2.0-85106576672&doi=10.3390%2fe23050594&partnerID=40&md5=2498ce 049ac6793d421bb23398204f62. Cited by: 6; All Open Access, Gold Open Access, Green Open Access. | spa |
dc.relation.references | Hsiang Hsu, Salman Salamatian, and Flavio P. Calmon. Generalizing correspondence analysis for applications in machine learning. 44(12):9347 – 9362, 2022. doi: 10.1109/TPAMI.20 21.3127870. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid= 2-s2.0-85118991377&doi=10.1109%2fTPAMI.2021.3127870&partnerID=40&md5=fea 8ff2eccaf8168eb563ba0adedd7b4. Cited by: 3; All Open Access, Green Open Access. | spa |
dc.relation.references | C. Lafuente Ibáñez and A. Marín Egoscozábal. Metodologías de la investigación en las ciencias sociales: Fases, fuentes y selección de técnicas. Revista Escuela de Administración de Negocios (Rev. esc. adm. neg.), (64):5--18, August 2008. | spa |
dc.relation.references | Ian T. Jolliffe and Jorge Cadima. Principal component analysis: a review and recent developments. 374(2065):20150202, 2016. ISSN 1471-2962. doi: 10.1098/rsta.2015.0202. | spa |
dc.relation.references | Dominik Jäckle, Fabian Fischer, Tobias Schreck, and Daniel A. Keim. Temporal mds plots for analysis of multivariate data. 22(1):141 – 150, 2016. doi: 10.1109/TVCG.2015.2467553. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-849 46607160&doi=10.1109%2fTVCG.2015.2467553&partnerID=40&md5=50ddd48750f288 7c77c88b69c528b66e. Cited by: 67; All Open Access, Green Open Access. | spa |
dc.relation.references | Alina Lazar, Ling Jin, C. Anna Spurlock, Kesheng Wu, Alex Sim, and Annika Todd. Evaluating the effects of missing values and mixed data types on social sequence clustering using t-sne visualization. 11(2), 2019. doi: 10.1145/3301294. URL https://scopus.una lproxy.elogim.com/inward/record.uri?eid=2-s2.0-85062875287&doi=10.1145%2 f3301294&partnerID=40&md5=4cedec6529f9309974011046b4a20165. Cited by: 7; All Open Access, Bronze Open Access, Green Open Access. | spa |
dc.relation.references | John Aldo Lee, Cyril De Bodt, Ludovic Journaux, and Lucile Sautot. Proximities in dimensionality reduction. 2022. URL https://scopus.unalproxy.elogim.com/inward /record.uri?eid=2-s2.0-85129394450&partnerID=40&md5=cf22de7cba38e86076e6 cb732e7a9d3d. Cited by: 0. | spa |
dc.relation.references | Alejandro Marcos Alvarez, Makoto Yamada, and Akisato Kimura. Exploiting socially- generated side information in dimensionality reduction. page 9 – 12, 2013. doi: 10.1145/25 09916.2509923. URL https://scopus.unalproxy.elogim.com/inward/record.uri?ei d=2-s2.0-84887179409&doi=10.1145%2f2509916.2509923&partnerID=40&md5=9ee1 5fdda4c217a712c19b267171b6d6. Cited by: 1; All Open Access, Green Open Access. | spa |
dc.relation.references | Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction, 2020. URL https://arxiv.org/abs/1802.03426. | spa |
dc.relation.references | Tácito Trindade de Araújo Tiburtino Neves, Rafael Messias Martins, Danilo Barbosa Coimbra, Kostiantyn Kucher, Andreas Kerren, and Fernando V. Paulovich. Fast and reliable incremental dimensionality reduction for streaming data. 102:233 – 244, 2022. doi: 10.1016/j.cag.2021.08.009. URL https://scopus.unalproxy.elogim.com/inward/rec ord.uri?eid=2-s2.0-85114639865&doi=10.1016%2fj.cag.2021.08.009&partnerID =40&md5=4a9ed8a97c60ded9bd41db77baf7af71. | spa |
dc.relation.references | Evandro S. Ortigossa, Fábio Felix Dias, and Diego Carvalho do Nascimento. Getting over high-dimensionality: How multidimensional projection methods can assist data science. 12 (13):6799, 2022. ISSN 2076-3417. doi: 10.3390/app12136799. | spa |
dc.relation.references | Matthew J Page, Joanne E McKenzie, Patrick M Bossuyt, Isabelle Boutron, Tammy C Hoffmann, Cynthia D Mulrow, Larissa Shamseer, Jennifer M Tetzlaff, Elie A Akl, Sue E Brennan, Roger Chou, Julie Glanville, Jeremy M Grimshaw, Asbjørn Hróbjartsson, Ma- noj M Lalu, Tianjing Li, Elizabeth W Loder, Evan Mayo-Wilson, Steve McDonald, Luke A McGuinness, Lesley A Stewart, James Thomas, Andrea C Tricco, Vivian A Welch, Penny Whiting, and David Moher. The prisma 2020 statement: an updated guideline for reporting systematic reviews. BMJ, 372, 2021. doi: 10.1136/bmj.n71. URL https://www.bmj.com/content/372/bmj.n71. | spa |
dc.relation.references | Diego H. Peluffo, John A. Lee, and Michel Verleysen. Recent methods for dimensionality reduction: A brief comparative analysis. page 189 – 194, 2014. URL https://scopus.u nalproxy.elogim.com/inward/record.uri?eid=2-s2.0-84962032978&partnerID=4 0&md5=b0086fe4078be49718ffdbb2db6f0bd8. | spa |
dc.relation.references | S.A. Priyanka, R. Pandimeena, H. Ananya, and K. Reshma. Predictive modeling for autism prediction: Deep learning insights from facial image data. 2023. doi: 10.1109/ICDSAAI5 9313.2023.10452504. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-85187779488&doi=10.1109%2fICDSAAI59313.2023.10452504&partne rID=40&md5=cb285c40e237fb551a64799f7b633669. | spa |
dc.relation.references | Dong Qiao, Xinxian Ma, and Jicong Fan. Federated t-sne and umap for distributed data visualization, 2024. | spa |
dc.relation.references | E. Renard, P. Dupont, and M. Verleysen. User control for adjusting conflicting objectives in parameter-dependent visualization of data. page 27 – 31, 2013. doi: 10.2312/PE.VAM P.VAMP2013.027-031. URL https://scopus.unalproxy.elogim.com/inward/record. uri?eid=2-s2.0-85122506673&doi=10.2312%2fPE.VAMP.VAMP2013.027-031&partner ID=40&md5=ac74a1efe4a142a5dd022b476c14725e. Cited by: 0. | spa |
dc.relation.references | Alvaro Manuel Rodriguez-Rodriguez, Marta De la Fuente-Costa, Mario Escalera-de la Riva, Borja Perez-Dominguez, Gustavo Paseiro-Ares, Jose Casaña, and Maria Blanco-Diaz. Ai-enhanced evaluation of youtube content on post-surgical incontinence following pelvic cancer treatment. 26, 2024. doi: 10.1016/j.ssmph.2024.101677. URL https://scopus.u nalproxy.elogim.com/inward/record.uri?eid=2-s2.0-85192788469&doi=10.1016 %2fj.ssmph.2024.101677&partnerID=40&md5=85cb6dcaa2b7e4db0deba5151eda22bf. Cited by: 0; All Open Access, Gold Open Access, Green Open Access. | spa |
dc.relation.references | Cedric Seger. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing. PhD thesis, 2018. URL https: //urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-237426. | spa |
dc.relation.references | Sumanta Singha and Prakash P. Shenoy. An adaptive heuristic for feature selection based on complementarity. 107(12):2027 – 2071, 2018. doi: 10.1007/s10994-018-5728-y. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-8504877 9534&doi=10.1007%2fs10994-018-5728-y&partnerID=40&md5=c4a3999918a470744a dd26e3f0481e49. Cited by: 27; All Open Access, Bronze Open Acces | spa |
dc.relation.references | Alexander Strang, David Sewell, Alexander Kim, Kevin Alcedo, and David Rosenbluth. Principal trade-off analysis. 23(3):258 – 271, 2024. doi: 10.1177/14738716241239018. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-8518995 6524&doi=10.1177%2f14738716241239018&partnerID=40&md5=0545db17e46cb297ba 80baff8f763c47. Cited by: 0; All Open Access, Green Open Access, Hybrid Gold Open Access | spa |
dc.relation.references | Gjorgji Strezoski, Lucas Fijen, Jonathan Mitnik, Dániel László, Pieter De Marez Oyens, Yoni Schirris, and Marcel Worring. Tindart: A personal visual arts recommender. page 4524 – 4526, 2020. doi: 10.1145/3394171.3414445. URL https://scopus.unalproxy.elo gim.com/inward/record.uri?eid=2-s2.0-85106873336&doi=10.1145%2f3394171.3 414445&partnerID=40&md5=f98eef59c81cb86c1080225a5bbb2093. | spa |
dc.relation.references | Zheng Sun, Weiqing Xing, Wenjun Guo, Seungwook Kim, Hongze Li, Wenye Li, Jianru Wu, Yiwen Zhang, Bin Cheng, and Shenghui Cheng. A Survey on Dimension Reduction Algorithms in Big Data Visualization, pages 375--395. Springer International Publishing, 2020. ISBN 9783030485139. doi: 10.1007/978-3-030-48513-9_31. | spa |
dc.relation.references | Flora S Tsai. Dimensionality reduction techniques for blog visualization. Expert Systems with Applications, 38(3):2766--2773, 2011 | spa |
dc.relation.references | Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579--2605, 2008. URL http://jmlr.org/papers/v9 /vandermaaten08a.html. | spa |
dc.relation.references | Yang Wang and Kwan-Liu Ma. Revealing the fog-of-war: A visualization-directed, uncertainty-aware approach for exploring high-dimensional data. In 2015 IEEE In- ternational Conference on Big Data (Big Data), pages 629--638, 2015. doi: 10.1109/BigD ata.2015.7363843 | spa |
dc.relation.references | Yingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik. Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization. 22, 2021. URL https://scopus.unalproxy .elogim.com/inward/record.uri?eid=2-s2.0-85116938640&partnerID=40&md5=6a f7521fe51484d1fb0f1282b068dbd2. | spa |
dc.relation.references | Tiandong Xiao and Yosuke Onoue. Visualization of topic transitions in snss using document embedding and dimensionality reduction. volume 2021-April, page 216 – 220, 2021. doi: 10.1109/PacificVis52677.2021.00035. URL https://scopus.unalproxy.elogim.com/in ward/record.uri?eid=2-s2.0-85107433051&doi=10.1109%2fPacificVis52677.2021. 00035&partnerID=40&md5=d038de6550e0fdb267efe1c50d0ff251. | spa |
dc.relation.references | Xianchao Zhu, Xianjun Shen, Xingpeng Jiang, Kaiping Wei, Tingting He, Yuanyuan Ma, Jiaqi Liu, and Xiaohua Hu. Nonlinear expression and visualization of nonmetric relationships in genetic diseases and microbiome data. 19, 2018. doi: 10.1186/s12859-018-2537-z. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-8505891 0039&doi=10.1186%2fs12859-018-2537-z&partnerID=40&md5=1b25545fb30a3ca207 afc4490c90fb3a. Cited by: 5; All Open Access, Gold Open Access, Green Open Access | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.license | Reconocimiento 4.0 Internacional | spa |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | spa |
dc.subject.armarc | Métodos estadísticos | |
dc.subject.armarc | Análisis estadístico | |
dc.subject.armarc | Reducción de datos | |
dc.subject.ddc | 510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas | spa |
dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales::006 - Métodos especiales de computación | spa |
dc.subject.lemb | Variables (Estadística) | |
dc.subject.lemb | Análisis multivariante | |
dc.subject.lemb | Sistemas de recolección automática de datos | |
dc.subject.lemb | Situación socioeconómica - Procesamiento de datos | |
dc.subject.lemb | Aplicaciones analíticas | |
dc.subject.proposal | Reducción de dimensionalidad | spa |
dc.subject.proposal | Visualización | spa |
dc.subject.proposal | Clasificación socioeconómica | spa |
dc.subject.proposal | Datos Categóricos | spa |
dc.subject.proposal | Socioeconomic classification | eng |
dc.subject.proposal | Dimensionality reduction | eng |
dc.subject.proposal | Embeddings | eng |
dc.subject.proposal | Socioeconomic data | eng |
dc.subject.proposal | Categorical data | eng |
dc.title | Visualización de datos categóricos empleando métodos de reducción de dimensionalidad enfocados en datos socioeconómicos | spa |
dc.title.translated | Visualizing categorical data using dimensionality reduction methods focused on socioeconomic data | eng |
dc.type | Trabajo de grado - Maestría | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
dcterms.audience.professionaldevelopment | Estudiantes | spa |
dcterms.audience.professionaldevelopment | Investigadores | spa |
dcterms.audience.professionaldevelopment | Público general | spa |
dcterms.audience.professionaldevelopment | Responsables políticos | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 1152208477.2025.pdf
- Tamaño:
- 12.88 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Maestría en Ingeniería - Analítica
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción: