Visualización de datos categóricos empleando métodos de reducción de dimensionalidad enfocados en datos socioeconómicos

dc.contributor.advisorBranch Bedoya, John Willian
dc.contributor.advisorIral Palomino, René
dc.contributor.authorOliveros Duran, Daniel Alejandro
dc.date.accessioned2025-07-09T14:12:16Z
dc.date.available2025-07-09T14:12:16Z
dc.date.issued2025
dc.descriptionIlustraciones, gráficosspa
dc.description.abstractEn el contexto actual, la disponibilidad creciente de herramientas para el análisis y la visualización estadística ha facilitado significativamente la exploración de datos y sus relaciones. No obstante, el incremento exponencial en la complejidad y el volumen de los datos plantea desafíos considerables, especialmente en el tratamiento de variables categóricas. Estas variables exhiben desafíos particulares en términos de representación gráfica, integración en modelos analíticos y en la interpretación de los resultados. Entre los principales desafíos que se destacan se encuentran la alta cardinalidad, que genera combinaciones complejas y dificulta el análisis individual de cada categoría, así como el incremento de la dimensionalidad derivado de técnicas de codificación como el one-hot encoding. El propósito de este estudio es desarrollar un procedimiento de visualización para datos categóricos con alta cardinalidad y dimensionalidad. Para lograr este objetivo, se propone un enfoque que abarca el procesamiento y la selección de variables categóricas, con el propósito de facilitar la aplicación de técnicas de reducción de dimensionalidad. Posteriormente, se determinará un método de visualización adecuado para representar el conjunto de datos reducido, de manera que sea posible analizar las relaciones entre las variables categóricas en un espacio de menor dimensión. (Tomado de la fuente)spa
dc.description.abstractIn the current context, the increasing availability of tools for statistical analysis and visualization has significantly facilitated the exploration of data and their relationships. However, the exponential increase in the complexity and volume of data poses considerable challenges, especially in the treatment of categorical variables. These variables exhibit particular challenges in terms of graphical representation, integration into analytical models, and interpretation of results. Among the main challenges are high cardinality, which generates complex combinations and makes the individual analysis of each category difficult, as well as increased dimensionality derived from coding techniques such as one-hot encoding. The purpose of this study is to develop a visualization procedure for categorical data with high cardinality and dimensionality. To achieve this objective, an approach is proposed that encompasses the processing and selection of categorical variables, with the purpose of facilitating the application of dimensionality reduction techniques. Subsequently, a suitable visualization method will be determined to represent the reduced data set, so that it will be possible to analyze the relationships between categorical variables in a lower dimensional space.eng
dc.description.curricularareaIngeniería De Sistemas E Informática.Sede Medellínspa
dc.description.degreelevelMaestríaspa
dc.format.extent95 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/88315
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.facultyFacultad de Minasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Minas - Maestría en Ingeniería - Analíticaspa
dc.relation.indexedLaReferenciaspa
dc.relation.referencesAmazon Web Services, Inc. ¿qué son los datos estructurados? https://aws.amazon.com/e s/what-is/structured-data/, 2024. URL https://aws.amazon.com/es/what-is/st ructured-data/. Consultado el 1 de enero de 2025.spa
dc.relation.referencesSercan O. Arik and Tomas Pfister. Tabnet: Attentive interpretable tabular learning, 2020. URL https://arxiv.org/abs/1908.07442.spa
dc.relation.referencesAshokkumar, Don, and S. High dimensional data visualization: A survey. 2018.spa
dc.relation.referencesBrian Baingana and Georgios B. Giannakis. Kernel-based embeddings for large graphs with centrality constraints. volume 2015-August, page 1901 – 1905, 2015. doi: 10.1109/ICAS SP.2015.7178301. URL https://scopus.unalproxy.elogim.com/inward/record.uri? eid=2-s2.0-84946067493&doi=10.1109%2fICASSP.2015.7178301&partnerID=40&md 5=48b068448402534fc2fa8622723d74f8.spa
dc.relation.referencesRodrigo Kraus Barragán. Tratamiento de variables categóricas en modelos de machine learning. 2022.spa
dc.relation.referencesAmreen Batool and Yung-Cheol Byun. Enhanced sentiment analysis and topic modeling during the pandemic using automated latent dirichlet allocation. 12:81206 – 81220, 2024. doi: 10.1109/ACCESS.2024.3411717. URL https://scopus.unalproxy.elogim.com/ inward/record.uri?eid=2-s2.0-85196056646&doi=10.1109%2fACCESS.2024.34117 17&partnerID=40&md5=e8e42f2063e9a4455ec24eb6c4e8c132. Cited by: 1; All Open Access, Gold Open Access.spa
dc.relation.referencesMartin Becker, Jens Lippel, and Thomas Zielke. Dimensionality reduction for data visuali- zation and linear classification, and the trade-off between robustness and classification accuracy. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 6478--6485, 2020. doi: 10.1109/ICPR48806.2021.9412865.spa
dc.relation.referencesKay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M. Buhmann. The balanced accuracy and its posterior distribution. In 2010 20th International Conference on Pattern Recognition, pages 3121--3124, 2010. doi: 10.1109/ICPR.2010.764.spa
dc.relation.referencesAndreas Buja, Deborah F. Swayne, Michael L. Littman, Nathaniel Dean, Heike Hofmann, and Lisha Chen. Data visualization with multidimensional scaling. 17(2):444 – 472, 2008. doi: 10.1198/106186008X318440. URL https://scopus.unalproxy.elogim.com/inwa rd/record.uri?eid=2-s2.0-46849092792&doi=10.1198%2f106186008X318440&partnerID=40&md5=5b1db7ffeda3bea179da2c8361978cf3. Cited by: 241; All Open Access, Green Open Access.spa
dc.relation.referencesLijing Cao. Integration of som and pca for analyzing sports economic data and designing a management system. 2022, 2022. doi: 10.1155/2022/6922554. URL https://scopus.una lproxy.elogim.com/inward/record.uri?eid=2-s2.0-85131182194&doi=10.1155%2 f2022%2f6922554&partnerID=40&md5=8e02d17b661e60684d7f40ed56497b0f. Cited by: 0; All Open Access, Gold Open Access.spa
dc.relation.referencesDanilo B Coimbra, Rafael M Martins, Tácito TAT Neves, Alexandru C Telea, and Fernando V Paulovich. Explaining three-dimensional dimensionality reduction plots. 15(2):154--172, 2015. ISSN 1473-8724. doi: 10.1177/1473871615600010.spa
dc.relation.referencesTarek Elgamal, Maysam Yabandeh, Ashraf Aboulnaga, Waleed Mustafa, and Mohamed Hefeeda. Spca: Scalable principal component analysis for big data on distributed platforms. volume 2015-May, page 79 – 91, 2015. doi: 10.1145/2723372.2751520. URL https://sc opus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-84957607832&doi=10. 1145%2f2723372.2751520&partnerID=40&md5=8650bdd892d21d556a85a249910b9065. Cited by: 26.spa
dc.relation.referencesCheng Guo and Felix Berkhahn. Entity embeddings of categorical variables, 2016. URL https://arxiv.org/abs/1604.06737.spa
dc.relation.referencesFushing Hsieh, Elizabeth P. Chou, and Ting-Li Chen. Mimicking complexity of structured data matrix’s information content: Categorical exploratory data analysis. 23(5), 2021. doi: 10.3390/e23050594. URL https://scopus.unalproxy.elogim.com/inward/record.ur i?eid=2-s2.0-85106576672&doi=10.3390%2fe23050594&partnerID=40&md5=2498ce 049ac6793d421bb23398204f62. Cited by: 6; All Open Access, Gold Open Access, Green Open Access.spa
dc.relation.referencesHsiang Hsu, Salman Salamatian, and Flavio P. Calmon. Generalizing correspondence analysis for applications in machine learning. 44(12):9347 – 9362, 2022. doi: 10.1109/TPAMI.20 21.3127870. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid= 2-s2.0-85118991377&doi=10.1109%2fTPAMI.2021.3127870&partnerID=40&md5=fea 8ff2eccaf8168eb563ba0adedd7b4. Cited by: 3; All Open Access, Green Open Access.spa
dc.relation.referencesC. Lafuente Ibáñez and A. Marín Egoscozábal. Metodologías de la investigación en las ciencias sociales: Fases, fuentes y selección de técnicas. Revista Escuela de Administración de Negocios (Rev. esc. adm. neg.), (64):5--18, August 2008.spa
dc.relation.referencesIan T. Jolliffe and Jorge Cadima. Principal component analysis: a review and recent developments. 374(2065):20150202, 2016. ISSN 1471-2962. doi: 10.1098/rsta.2015.0202.spa
dc.relation.referencesDominik Jäckle, Fabian Fischer, Tobias Schreck, and Daniel A. Keim. Temporal mds plots for analysis of multivariate data. 22(1):141 – 150, 2016. doi: 10.1109/TVCG.2015.2467553. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-849 46607160&doi=10.1109%2fTVCG.2015.2467553&partnerID=40&md5=50ddd48750f288 7c77c88b69c528b66e. Cited by: 67; All Open Access, Green Open Access.spa
dc.relation.referencesAlina Lazar, Ling Jin, C. Anna Spurlock, Kesheng Wu, Alex Sim, and Annika Todd. Evaluating the effects of missing values and mixed data types on social sequence clustering using t-sne visualization. 11(2), 2019. doi: 10.1145/3301294. URL https://scopus.una lproxy.elogim.com/inward/record.uri?eid=2-s2.0-85062875287&doi=10.1145%2 f3301294&partnerID=40&md5=4cedec6529f9309974011046b4a20165. Cited by: 7; All Open Access, Bronze Open Access, Green Open Access.spa
dc.relation.referencesJohn Aldo Lee, Cyril De Bodt, Ludovic Journaux, and Lucile Sautot. Proximities in dimensionality reduction. 2022. URL https://scopus.unalproxy.elogim.com/inward /record.uri?eid=2-s2.0-85129394450&partnerID=40&md5=cf22de7cba38e86076e6 cb732e7a9d3d. Cited by: 0.spa
dc.relation.referencesAlejandro Marcos Alvarez, Makoto Yamada, and Akisato Kimura. Exploiting socially- generated side information in dimensionality reduction. page 9 – 12, 2013. doi: 10.1145/25 09916.2509923. URL https://scopus.unalproxy.elogim.com/inward/record.uri?ei d=2-s2.0-84887179409&doi=10.1145%2f2509916.2509923&partnerID=40&md5=9ee1 5fdda4c217a712c19b267171b6d6. Cited by: 1; All Open Access, Green Open Access.spa
dc.relation.referencesLeland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction, 2020. URL https://arxiv.org/abs/1802.03426.spa
dc.relation.referencesTácito Trindade de Araújo Tiburtino Neves, Rafael Messias Martins, Danilo Barbosa Coimbra, Kostiantyn Kucher, Andreas Kerren, and Fernando V. Paulovich. Fast and reliable incremental dimensionality reduction for streaming data. 102:233 – 244, 2022. doi: 10.1016/j.cag.2021.08.009. URL https://scopus.unalproxy.elogim.com/inward/rec ord.uri?eid=2-s2.0-85114639865&doi=10.1016%2fj.cag.2021.08.009&partnerID =40&md5=4a9ed8a97c60ded9bd41db77baf7af71.spa
dc.relation.referencesEvandro S. Ortigossa, Fábio Felix Dias, and Diego Carvalho do Nascimento. Getting over high-dimensionality: How multidimensional projection methods can assist data science. 12 (13):6799, 2022. ISSN 2076-3417. doi: 10.3390/app12136799.spa
dc.relation.referencesMatthew J Page, Joanne E McKenzie, Patrick M Bossuyt, Isabelle Boutron, Tammy C Hoffmann, Cynthia D Mulrow, Larissa Shamseer, Jennifer M Tetzlaff, Elie A Akl, Sue E Brennan, Roger Chou, Julie Glanville, Jeremy M Grimshaw, Asbjørn Hróbjartsson, Ma- noj M Lalu, Tianjing Li, Elizabeth W Loder, Evan Mayo-Wilson, Steve McDonald, Luke A McGuinness, Lesley A Stewart, James Thomas, Andrea C Tricco, Vivian A Welch, Penny Whiting, and David Moher. The prisma 2020 statement: an updated guideline for reporting systematic reviews. BMJ, 372, 2021. doi: 10.1136/bmj.n71. URL https://www.bmj.com/content/372/bmj.n71.spa
dc.relation.referencesDiego H. Peluffo, John A. Lee, and Michel Verleysen. Recent methods for dimensionality reduction: A brief comparative analysis. page 189 – 194, 2014. URL https://scopus.u nalproxy.elogim.com/inward/record.uri?eid=2-s2.0-84962032978&partnerID=4 0&md5=b0086fe4078be49718ffdbb2db6f0bd8.spa
dc.relation.referencesS.A. Priyanka, R. Pandimeena, H. Ananya, and K. Reshma. Predictive modeling for autism prediction: Deep learning insights from facial image data. 2023. doi: 10.1109/ICDSAAI5 9313.2023.10452504. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-85187779488&doi=10.1109%2fICDSAAI59313.2023.10452504&partne rID=40&md5=cb285c40e237fb551a64799f7b633669.spa
dc.relation.referencesDong Qiao, Xinxian Ma, and Jicong Fan. Federated t-sne and umap for distributed data visualization, 2024.spa
dc.relation.referencesE. Renard, P. Dupont, and M. Verleysen. User control for adjusting conflicting objectives in parameter-dependent visualization of data. page 27 – 31, 2013. doi: 10.2312/PE.VAM P.VAMP2013.027-031. URL https://scopus.unalproxy.elogim.com/inward/record. uri?eid=2-s2.0-85122506673&doi=10.2312%2fPE.VAMP.VAMP2013.027-031&partner ID=40&md5=ac74a1efe4a142a5dd022b476c14725e. Cited by: 0.spa
dc.relation.referencesAlvaro Manuel Rodriguez-Rodriguez, Marta De la Fuente-Costa, Mario Escalera-de la Riva, Borja Perez-Dominguez, Gustavo Paseiro-Ares, Jose Casaña, and Maria Blanco-Diaz. Ai-enhanced evaluation of youtube content on post-surgical incontinence following pelvic cancer treatment. 26, 2024. doi: 10.1016/j.ssmph.2024.101677. URL https://scopus.u nalproxy.elogim.com/inward/record.uri?eid=2-s2.0-85192788469&doi=10.1016 %2fj.ssmph.2024.101677&partnerID=40&md5=85cb6dcaa2b7e4db0deba5151eda22bf. Cited by: 0; All Open Access, Gold Open Access, Green Open Access.spa
dc.relation.referencesCedric Seger. An investigation of categorical variable encoding techniques in machine learning: binary versus one-hot and feature hashing. PhD thesis, 2018. URL https: //urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-237426.spa
dc.relation.referencesSumanta Singha and Prakash P. Shenoy. An adaptive heuristic for feature selection based on complementarity. 107(12):2027 – 2071, 2018. doi: 10.1007/s10994-018-5728-y. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-8504877 9534&doi=10.1007%2fs10994-018-5728-y&partnerID=40&md5=c4a3999918a470744a dd26e3f0481e49. Cited by: 27; All Open Access, Bronze Open Accesspa
dc.relation.referencesAlexander Strang, David Sewell, Alexander Kim, Kevin Alcedo, and David Rosenbluth. Principal trade-off analysis. 23(3):258 – 271, 2024. doi: 10.1177/14738716241239018. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-8518995 6524&doi=10.1177%2f14738716241239018&partnerID=40&md5=0545db17e46cb297ba 80baff8f763c47. Cited by: 0; All Open Access, Green Open Access, Hybrid Gold Open Accessspa
dc.relation.referencesGjorgji Strezoski, Lucas Fijen, Jonathan Mitnik, Dániel László, Pieter De Marez Oyens, Yoni Schirris, and Marcel Worring. Tindart: A personal visual arts recommender. page 4524 – 4526, 2020. doi: 10.1145/3394171.3414445. URL https://scopus.unalproxy.elo gim.com/inward/record.uri?eid=2-s2.0-85106873336&doi=10.1145%2f3394171.3 414445&partnerID=40&md5=f98eef59c81cb86c1080225a5bbb2093.spa
dc.relation.referencesZheng Sun, Weiqing Xing, Wenjun Guo, Seungwook Kim, Hongze Li, Wenye Li, Jianru Wu, Yiwen Zhang, Bin Cheng, and Shenghui Cheng. A Survey on Dimension Reduction Algorithms in Big Data Visualization, pages 375--395. Springer International Publishing, 2020. ISBN 9783030485139. doi: 10.1007/978-3-030-48513-9_31.spa
dc.relation.referencesFlora S Tsai. Dimensionality reduction techniques for blog visualization. Expert Systems with Applications, 38(3):2766--2773, 2011spa
dc.relation.referencesLaurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579--2605, 2008. URL http://jmlr.org/papers/v9 /vandermaaten08a.html.spa
dc.relation.referencesYang Wang and Kwan-Liu Ma. Revealing the fog-of-war: A visualization-directed, uncertainty-aware approach for exploring high-dimensional data. In 2015 IEEE In- ternational Conference on Big Data (Big Data), pages 629--638, 2015. doi: 10.1109/BigD ata.2015.7363843spa
dc.relation.referencesYingfan Wang, Haiyang Huang, Cynthia Rudin, and Yaron Shaposhnik. Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization. 22, 2021. URL https://scopus.unalproxy .elogim.com/inward/record.uri?eid=2-s2.0-85116938640&partnerID=40&md5=6a f7521fe51484d1fb0f1282b068dbd2.spa
dc.relation.referencesTiandong Xiao and Yosuke Onoue. Visualization of topic transitions in snss using document embedding and dimensionality reduction. volume 2021-April, page 216 – 220, 2021. doi: 10.1109/PacificVis52677.2021.00035. URL https://scopus.unalproxy.elogim.com/in ward/record.uri?eid=2-s2.0-85107433051&doi=10.1109%2fPacificVis52677.2021. 00035&partnerID=40&md5=d038de6550e0fdb267efe1c50d0ff251.spa
dc.relation.referencesXianchao Zhu, Xianjun Shen, Xingpeng Jiang, Kaiping Wei, Tingting He, Yuanyuan Ma, Jiaqi Liu, and Xiaohua Hu. Nonlinear expression and visualization of nonmetric relationships in genetic diseases and microbiome data. 19, 2018. doi: 10.1186/s12859-018-2537-z. URL https://scopus.unalproxy.elogim.com/inward/record.uri?eid=2-s2.0-8505891 0039&doi=10.1186%2fs12859-018-2537-z&partnerID=40&md5=1b25545fb30a3ca207 afc4490c90fb3a. Cited by: 5; All Open Access, Gold Open Access, Green Open Accessspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseReconocimiento 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/spa
dc.subject.armarcMétodos estadísticos
dc.subject.armarcAnálisis estadístico
dc.subject.armarcReducción de datos
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::006 - Métodos especiales de computaciónspa
dc.subject.lembVariables (Estadística)
dc.subject.lembAnálisis multivariante
dc.subject.lembSistemas de recolección automática de datos
dc.subject.lembSituación socioeconómica - Procesamiento de datos
dc.subject.lembAplicaciones analíticas
dc.subject.proposalReducción de dimensionalidadspa
dc.subject.proposalVisualizaciónspa
dc.subject.proposalClasificación socioeconómicaspa
dc.subject.proposalDatos Categóricosspa
dc.subject.proposalSocioeconomic classificationeng
dc.subject.proposalDimensionality reductioneng
dc.subject.proposalEmbeddingseng
dc.subject.proposalSocioeconomic dataeng
dc.subject.proposalCategorical dataeng
dc.titleVisualización de datos categóricos empleando métodos de reducción de dimensionalidad enfocados en datos socioeconómicosspa
dc.title.translatedVisualizing categorical data using dimensionality reduction methods focused on socioeconomic dataeng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentEstudiantesspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
dcterms.audience.professionaldevelopmentPúblico generalspa
dcterms.audience.professionaldevelopmentResponsables políticosspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1152208477.2025.pdf
Tamaño:
12.88 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ingeniería - Analítica

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: