Análisis de clúster automático

Correa Henao, Marisol

Análisis de clúster automático

dc.contributor.advisor	Velasquez Henao, Juan David
dc.contributor.author	Correa Henao, Marisol
dc.date.accessioned	2021-12-15T15:30:20Z
dc.date.available	2021-12-15T15:30:20Z
dc.date.issued	2021-12-08
dc.description	ilustraciones, gráficas, tablas	spa
dc.description.abstract	En este documento se desarrolla el proceso de software de análisis de clúster automático, aunque en la actualidad, existen varias librerías que permiten realizar análisis de clúster, se busca automatizar el proceso y lograr diferentes opciones centralizadas en un mismo paquete; facilitando el análisis y la parametrización de los modelos. Para su elaboración, se utilizaron las librerías ya existentes en Python, tomando como base lo que se tiene en diferentes herramientas y software estadístico o de análisis de datos, de manera que se puedan usar tanto por una persona con conocimientos básicos como por una persona con conocimientos profundos que quiera parametrizar sus análisis. Los resultados de este trabajo muestran que es posible facilitar los procesos de agrupamiento y su respectivo análisis de datos a través de los algoritmos actuales, guiando al usuario de manera simple, gráfica, intuitiva en todo el proceso, llevando a concluir que los resultados del análisis de clúster se ve sujeto a la subjetividad o a los conocimientos del usuario sin embargo esta subjetividad es posible reducirla a través de estrategias, técnicas, análisis y el buen uso de las herramientas existentes. (Texto tomado de la fuente)	spa
dc.description.abstract	In this document the automatic cluster analysis software process is developed, although at present, there are several libraries that allow cluster analysis to be carried out. The aim is to automate the process and achieve different centralized options in the same package, facilitating the analysis and parameterization of the models. For its preparation, existing libraries in python were used, taking as a basis what is available in statistical tools and software or data analysis, so that they can be used both by a person with basic knowledge and by a person with knowledge, that you want to parameterize your analysis. The results of this process show that it is possible to facilitate the grouping results and their respective data analysis through current algorithms, guiding the user in a simple, graphical, intuitive way throughout the process, leading to the conclusion that the results of the analysis Clustering is subject to subjectivity or user knowledge, however this subjectivity can be reduced through strategies, techniques, analysis and the proper use of existing tools.	eng
dc.description.curriculararea	Área Curricular de Ingeniería de Sistemas e Informática	spa
dc.description.degreelevel	Maestría	spa
dc.description.degreename	Magíster en Ingeniería - Analítica	spa
dc.description.researcharea	Análisis de clúster	spa
dc.description.technicalinfo	Documento con detalle de funcionamiento de software	spa
dc.format.extent	xi, 63 páginas	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.instname	Universidad Nacional de Colombia	spa
dc.identifier.reponame	Repositorio Institucional Universidad Nacional de Colombia	spa
dc.identifier.repourl	https://repositorio.unal.edu.co/	spa
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/80784
dc.language.iso	spa	spa
dc.publisher	Universidad Nacional de Colombia	spa
dc.publisher.branch	Universidad Nacional de Colombia - Sede Medellín	spa
dc.publisher.department	Departamento de la Computación y la Decisión	spa
dc.publisher.faculty	Facultad de Minas	spa
dc.publisher.place	Medellín, Colombia	spa
dc.publisher.program	Medellín - Minas - Maestría en Ingeniería - Analítica	spa
dc.relation.references	Aguilar, L. J. (2016). Big Data, Análisis de grandes volúmenes de datos en organizaciones. Alfaomega Grupo Editor.	spa
dc.relation.references	Aldenderfer, M. S., & Blashfield, R. K. (1984). A review of clustering methods. Cluster analysis, 33-61.	spa
dc.relation.references	Anderberg, M. R. (1973). Cluster Analysis for applications. Academic Press. New York and London.	spa
dc.relation.references	Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.	spa
dc.relation.references	Ashenden,A., Ward-Dutton, N., & Wentworth, C., (2016). La nueva tendencia de automatización: Machine Learning y más. MWD Advisors. Disponible en: https://www.ibm.com/downloads/cas/M1PG1J23.	spa
dc.relation.references	Äyrämö, S., & Kärkkäinen, T. (2006). Introduction to partitioning-based clustering methods with a robust example. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, (1/2006).	spa
dc.relation.references	Birant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1), 208-221.	spa
dc.relation.references	Chojnacki, A., Dai, C., Farahi, A., Shi, G., Webb, J., Zhang, D.T., Abernethy, J., Schwartz, E., (2017). A Data Science Approach to Understanding Residential Water Contamination in Flint, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17. ACM, New York, NY, USA, pp. 1407– 1416. https://doi.org/10.1145/3097983.3098078	spa
dc.relation.references	Aliguliyev, R. M. (2009). Performance evaluation of density-based clustering methods. Information Sciences, 179(20), 3583-3602.	spa
dc.relation.references	Aluja, T. (2001). La minería de datos, entre la estadística y la inteligencia artificial. Qüestió: quaderns d'estadística i investigació operativa, 25(3), 479-498.	spa
dc.relation.references	Chou, Y. L., & Armer, V. A. (1977). Análisis estadístico (No. 04; RMD, HA29 C4 1977.). Interamericana.	spa
dc.relation.references	Cleveland, W.S., 2001. Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics. Int. Stat. Rev. 69, 21–26. https://doi.org/10.1111/j.1751- 5823.2001.tb00477.x	spa
dc.relation.references	Cluster (Mahout Map-Reduce 0.13.0 API). (2017, April 14). Apache.org. https://mahout.apache.org/docs/0.13.0/api/docs/mahoutmr/org/apache/mahout/clustering/Cluster.html	spa
dc.relation.references	Clustering \|KNIME. (2021). KNIME. https://www.knime.com/nodeguide/analytics/clustering	spa
dc.relation.references	Correa, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisol	spa
dc.relation.references	Dhillon, I. S., Guan, Y., & Kulis, B. (2004, August). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 551-556). ACM.	spa
dc.relation.references	Correa, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisolCorreaHenao/tree/main/docs	spa
dc.relation.references	Dane, A. D., & Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: Background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669. 2009 a simple	spa
dc.relation.references	Díaz, M., León, Á., Alvin, H., & Díaz Mora, M. E. (2016). Introducción al análisis estadístico multivariado aplicado. Experiencia y casos en el Caribe colombiano. Universidad del Norte.	spa
dc.relation.references	Dubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern recognition, 11(4), 235-254.	spa
dc.relation.references	Eluri, V. R., Ramesh, M., Al-Jabri, A. S. M., & Jane, M. (2016, March). A comparative study of various clustering techniques on big data sets using Apache Mahout. In 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC) (pp. 1-4). IEEE.	spa
dc.relation.references	Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).	spa
dc.relation.references	Fernández, S. F., Sánchez, J. M. C., Córdoba, A., & Largo, A. C. (2002). Estadística descriptiva. Esic Editorial.	spa
dc.relation.references	Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962-2970).	spa
dc.relation.references	Gelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 63(1), 155-166.	spa
dc.relation.references	Gómez-Skarmeta, A. F., Delgado, M., & Vila, M. A. (1999). About the use of fuzzy clustering techniques for fuzzy model identification. Fuzzy sets and systems, 106(2), 179- 188.	spa
dc.relation.references	Hazen, B.T., Boone, C.A., Ezell, J.D., Jones-Farmer, L.A., (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 154, 72– 80. https://doi.org/10.1016/j.ijpe.2014.04.018.	spa
dc.relation.references	Hierarchical Clustering — Orange Visual Programming 3 documentation. (2021). Readthedocs.io. https://orange3.readthedocs.io/projects/orange-visualprogramming/en/latest/widgets/unsupervised/hierarchicalclustering.html	spa
dc.relation.references	Huang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), 283-304.	spa
dc.relation.references	ipywidgets — Jupyter Widgets 7.6.5 documentation. (2021). Readthedocs.io. https://ipywidgets.readthedocs.io/en/stable/	spa
dc.relation.references	Ji, J., Bai, T., Zhou, C., Ma, C., & Wang, Z. (2013). An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing, 120, 590-596	spa
dc.relation.references	SAS Institute. (2012). SAS/OR 9.3 User's Guide: Mathematical Programming Examples. SAS institute. Kaufman, L., & Rousseeuw, P. J. (1990).	spa
dc.relation.references	Finding groups in data: An introduction to clúster analysis. New York: Wiley 2009 a simple.	spa
dc.relation.references	López, C. P. (2007). Minería de datos: técnicas y herramientas. Editorial Paraninfo.	spa
dc.relation.references	Lückeheide, S., Velásquez, J. D., & Cerda, L. (2007). Segmentación de los contribuyentes que declaran iva aplicando herramientas de clustering. Revista de Ingeniería de Sistemas, 21, 87-110.	spa
dc.relation.references	MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press. 2009 a simple.	spa
dc.relation.references	Maheswaran, G., Jayarajan, P., Jose, J., & Joseph, J. (2013). K Means Clustering Algorithms: A Comparitive Study.	spa
dc.relation.references	Matplotlib: Python plotting —Matplotlib 3.4.3 documentation. (2012). Matplotlib.org. https://matplotlib.org/	spa
dc.relation.references	Meilă, M., & Heckerman, D. (2001). An experimental comparison of model-based clustering methods. Machine learning, 42(1), 9-29.	spa
dc.relation.references	Morán, L. L., & Alonso, J. H. (2009). Estadística descriptiva. Ediciones Académicas.	spa
dc.relation.references	Müllner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9), 1-18.	spa
dc.relation.references	Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems (pp. 849-856).	spa
dc.relation.references	NumPy. (2021). Numpy.org. https://numpy.org/	spa
dc.relation.references	Pandas - Python Data Analysis Library. (2021). Pydata.org. https://pandas.pydata.org/	spa
dc.relation.references	Park, H. S., & Jun, C. H. (2009). A simple and fast algorithm for K-medoids clustering. Expert systems with applications, 36(2), 3336-3341.	spa
dc.relation.references	Peña, D. (2002). Análisis de datos multivariantes (Vol. 24). Madrid: McGraw-hill.	spa
dc.relation.references	Provost, F., Fawcett, T., (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc.	spa
dc.relation.references	RapidMiner GmbH. (2021). k-Means - RapidMiner Documentation. Rapidminer.com. https://docs.rapidminer.com/latest/studio/operators/modeling/segmentation/k_means.html	spa
dc.relation.references	Reynolds, A. P., Richards, G., & Rayward-Smith, V. J. (2004, August). The application of k-medoids and pam to the clustering of rules. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 173-178).	spa
dc.relation.references	Springer, Berlin, Heidelberg. Rodrigo, J.A. (2020). Clustering con Python. Disponible en https://www.cienciadedatos.net/documentos/py20-clustering-con-python.html [10-06- 2021].	spa
dc.relation.references	Ross, S. M. (2007). Introducción a la estadística. Reverté.	spa
dc.relation.references	Sangüesa Solé, Ramón (coord.) (2000). Data mining: una introducción. Barcelona: Universitat Oberta de Catalunya.	spa
dc.relation.references	Santana, Ó. F. (1991). El análisis de clúster: aplicación, interpretación y validación. Papers: revista de sociologia, (37), 65-76. Scikit-learn, (S.F.).	spa
dc.relation.references	Scikit-learn: Clustering. Disponible en: https://scikitlearn.org/stable/modules/clustering.html.	spa
dc.relation.references	SAS/STAT Cluster Analysis Procedures. (2018, November 20). Sas.com. https://support.sas.com/rnd/app/stat/procedures/ClusterAnalysis.html	spa
dc.relation.references	Shen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., & Shao, L. (2016). Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEE	spa
dc.relation.references	Transactions on Image Processing, 25(12), 5933-5942.	spa
dc.relation.references	Sys — Parámetros y funciones específicos del sistema — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3.10/library/sys.html	spa
dc.relation.references	The Jupyter Notebook — Jupyter Notebook 6.4.5 documentation. (2021). Readthedocs.io. https://jupyter-notebook.readthedocs.io/en/stable/ time —	spa
dc.relation.references	Time access and conversions — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/time.html	spa
dc.relation.references	Tutorial de Python — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3/tutorial/	spa
dc.relation.references	Uriel, E., & Aldás, J. (2005). Análisis multivariante aplicado. 1ª. Edición. Thomson. Madrid.	spa
dc.relation.references	Van der Aalst, W.M., 2016. Process mining: data science in action. Springer.	spa
dc.relation.references	Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4), 395-416.	spa
dc.relation.references	Wang, K., Zhang, J., Li, D., Zhang, X., & Guo, T. (2008). Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096. warnings —	spa
dc.relation.references	Warning control — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/warnings.html	spa
dc.relation.references	Yellowbrick: Machine Learning Visualization — Yellowbrick v1.3.post1 documentation. (2021). Scikit-Yb.org. https://www.scikit-yb.org/en/latest/	spa
dc.relation.references	Zelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. Advances in neural information processing systems (pp. 1601-1608).	spa
dc.relation.references	Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 25(2), 103-114.	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.license	Atribución-CompartirIgual 4.0 Internacional	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	spa
dc.subject.ddc	000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores	spa
dc.subject.lemb	Cluster analysis
dc.subject.lemb	Análisis clúster
dc.subject.proposal	Análisis	spa
dc.subject.proposal	Clúster,	spa
dc.subject.proposal	Software	eng
dc.subject.proposal	Python	eng
dc.subject.proposal	Librería	spa
dc.subject.proposal	Aprendizaje de máquinas automático	spa
dc.subject.proposal	Analysis	eng
dc.subject.proposal	Cluster	eng
dc.subject.proposal	Python	eng
dc.subject.proposal	Library	eng
dc.subject.proposal	Automatic machine learning	eng
dc.title	Análisis de clúster automático	spa
dc.title.translated	Automatic cluster analysis	eng
dc.type	Trabajo de grado - Maestría	spa
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc	spa
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.content	Text	spa
dc.type.driver	info:eu-repo/semantics/masterThesis	spa
dc.type.redcol	http://purl.org/redcol/resource_type/TM	spa
dc.type.version	info:eu-repo/semantics/acceptedVersion	spa
dcterms.audience.professionaldevelopment	Investigadores	spa
dcterms.audience.professionaldevelopment	Público general	spa
oaire.accessrights	http://purl.org/coar/access_right/c_abf2	spa

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: 1017230592.2021.pdf
Tamaño:: 2 MB
Formato:: Adobe Portable Document Format
Descripción:: Tesis de Maestría en Ingeniería - Analítica

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 3.98 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Maestría en Ingeniería - Analítica