Análisis de clúster automático

dc.contributor.advisorVelasquez Henao, Juan David
dc.contributor.authorCorrea Henao, Marisol
dc.date.accessioned2021-12-15T15:30:20Z
dc.date.available2021-12-15T15:30:20Z
dc.date.issued2021-12-08
dc.descriptionilustraciones, gráficas, tablasspa
dc.description.abstractEn este documento se desarrolla el proceso de software de análisis de clúster automático, aunque en la actualidad, existen varias librerías que permiten realizar análisis de clúster, se busca automatizar el proceso y lograr diferentes opciones centralizadas en un mismo paquete; facilitando el análisis y la parametrización de los modelos. Para su elaboración, se utilizaron las librerías ya existentes en Python, tomando como base lo que se tiene en diferentes herramientas y software estadístico o de análisis de datos, de manera que se puedan usar tanto por una persona con conocimientos básicos como por una persona con conocimientos profundos que quiera parametrizar sus análisis. Los resultados de este trabajo muestran que es posible facilitar los procesos de agrupamiento y su respectivo análisis de datos a través de los algoritmos actuales, guiando al usuario de manera simple, gráfica, intuitiva en todo el proceso, llevando a concluir que los resultados del análisis de clúster se ve sujeto a la subjetividad o a los conocimientos del usuario sin embargo esta subjetividad es posible reducirla a través de estrategias, técnicas, análisis y el buen uso de las herramientas existentes. (Texto tomado de la fuente)spa
dc.description.abstractIn this document the automatic cluster analysis software process is developed, although at present, there are several libraries that allow cluster analysis to be carried out. The aim is to automate the process and achieve different centralized options in the same package, facilitating the analysis and parameterization of the models. For its preparation, existing libraries in python were used, taking as a basis what is available in statistical tools and software or data analysis, so that they can be used both by a person with basic knowledge and by a person with knowledge, that you want to parameterize your analysis. The results of this process show that it is possible to facilitate the grouping results and their respective data analysis through current algorithms, guiding the user in a simple, graphical, intuitive way throughout the process, leading to the conclusion that the results of the analysis Clustering is subject to subjectivity or user knowledge, however this subjectivity can be reduced through strategies, techniques, analysis and the proper use of existing tools.eng
dc.description.curricularareaÁrea Curricular de Ingeniería de Sistemas e Informáticaspa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ingeniería - Analíticaspa
dc.description.researchareaAnálisis de clústerspa
dc.description.technicalinfoDocumento con detalle de funcionamiento de softwarespa
dc.format.extentxi, 63 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/80784
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.departmentDepartamento de la Computación y la Decisiónspa
dc.publisher.facultyFacultad de Minasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Minas - Maestría en Ingeniería - Analíticaspa
dc.relation.referencesAguilar, L. J. (2016). Big Data, Análisis de grandes volúmenes de datos en organizaciones. Alfaomega Grupo Editor.spa
dc.relation.referencesAldenderfer, M. S., & Blashfield, R. K. (1984). A review of clustering methods. Cluster analysis, 33-61.spa
dc.relation.referencesAnderberg, M. R. (1973). Cluster Analysis for applications. Academic Press. New York and London.spa
dc.relation.referencesAnkerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.spa
dc.relation.referencesAshenden,A., Ward-Dutton, N., & Wentworth, C., (2016). La nueva tendencia de automatización: Machine Learning y más. MWD Advisors. Disponible en: https://www.ibm.com/downloads/cas/M1PG1J23.spa
dc.relation.referencesÄyrämö, S., & Kärkkäinen, T. (2006). Introduction to partitioning-based clustering methods with a robust example. Reports of the Department of Mathematical Information Technology. Series C, Software engineering and computational intelligence, (1/2006).spa
dc.relation.referencesBirant, D., & Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60(1), 208-221.spa
dc.relation.referencesChojnacki, A., Dai, C., Farahi, A., Shi, G., Webb, J., Zhang, D.T., Abernethy, J., Schwartz, E., (2017). A Data Science Approach to Understanding Residential Water Contamination in Flint, in: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17. ACM, New York, NY, USA, pp. 1407– 1416. https://doi.org/10.1145/3097983.3098078spa
dc.relation.referencesAliguliyev, R. M. (2009). Performance evaluation of density-based clustering methods. Information Sciences, 179(20), 3583-3602.spa
dc.relation.referencesAluja, T. (2001). La minería de datos, entre la estadística y la inteligencia artificial. Qüestió: quaderns d'estadística i investigació operativa, 25(3), 479-498.spa
dc.relation.referencesChou, Y. L., & Armer, V. A. (1977). Análisis estadístico (No. 04; RMD, HA29 C4 1977.). Interamericana.spa
dc.relation.referencesCleveland, W.S., 2001. Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics. Int. Stat. Rev. 69, 21–26. https://doi.org/10.1111/j.1751- 5823.2001.tb00477.xspa
dc.relation.referencesCluster (Mahout Map-Reduce 0.13.0 API). (2017, April 14). Apache.org. https://mahout.apache.org/docs/0.13.0/api/docs/mahoutmr/org/apache/mahout/clustering/Cluster.htmlspa
dc.relation.referencesClustering |KNIME. (2021). KNIME. https://www.knime.com/nodeguide/analytics/clusteringspa
dc.relation.referencesCorrea, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisolspa
dc.relation.referencesDhillon, I. S., Guan, Y., & Kulis, B. (2004, August). Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 551-556). ACM.spa
dc.relation.referencesCorrea, M., (2021, October 12). TDGMarisolCorreaHenao/docs at main · marcorhe/TDGMarisolCorreaHenao. GitHub. https://github.com/marcorhe/TDGMarisolCorreaHenao/tree/main/docsspa
dc.relation.referencesDane, A. D., & Kateman, G. (1993). On k-medoid clustering of large data sets with the aid of a genetic algorithm: Background, feasibility and comparison. Analytica Chimica Acta, 282, 647–669. 2009 a simplespa
dc.relation.referencesDíaz, M., León, Á., Alvin, H., & Díaz Mora, M. E. (2016). Introducción al análisis estadístico multivariado aplicado. Experiencia y casos en el Caribe colombiano. Universidad del Norte.spa
dc.relation.referencesDubes, R., & Jain, A. K. (1979). Validity studies in clustering methodologies. Pattern recognition, 11(4), 235-254.spa
dc.relation.referencesEluri, V. R., Ramesh, M., Al-Jabri, A. S. M., & Jane, M. (2016, March). A comparative study of various clustering techniques on big data sets using Apache Mahout. In 2016 3rd MEC International Conference on Big Data and Smart City (ICBDSC) (pp. 1-4). IEEE.spa
dc.relation.referencesEster, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd (Vol. 96, No. 34, pp. 226-231).spa
dc.relation.referencesFernández, S. F., Sánchez, J. M. C., Córdoba, A., & Largo, A. C. (2002). Estadística descriptiva. Esic Editorial.spa
dc.relation.referencesFeurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962-2970).spa
dc.relation.referencesGelbard, R., Goldman, O., & Spiegler, I. (2007). Investigating diversity of clustering methods: An empirical comparison. Data & Knowledge Engineering, 63(1), 155-166.spa
dc.relation.referencesGómez-Skarmeta, A. F., Delgado, M., & Vila, M. A. (1999). About the use of fuzzy clustering techniques for fuzzy model identification. Fuzzy sets and systems, 106(2), 179- 188.spa
dc.relation.referencesHazen, B.T., Boone, C.A., Ezell, J.D., Jones-Farmer, L.A., (2014). Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. Int. J. Prod. Econ. 154, 72– 80. https://doi.org/10.1016/j.ijpe.2014.04.018.spa
dc.relation.referencesHierarchical Clustering — Orange Visual Programming 3 documentation. (2021). Readthedocs.io. https://orange3.readthedocs.io/projects/orange-visualprogramming/en/latest/widgets/unsupervised/hierarchicalclustering.htmlspa
dc.relation.referencesHuang, Z. (1998). Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3), 283-304.spa
dc.relation.referencesipywidgets — Jupyter Widgets 7.6.5 documentation. (2021). Readthedocs.io. https://ipywidgets.readthedocs.io/en/stable/spa
dc.relation.referencesJi, J., Bai, T., Zhou, C., Ma, C., & Wang, Z. (2013). An improved k-prototypes clustering algorithm for mixed numeric and categorical data. Neurocomputing, 120, 590-596spa
dc.relation.referencesSAS Institute. (2012). SAS/OR 9.3 User's Guide: Mathematical Programming Examples. SAS institute. Kaufman, L., & Rousseeuw, P. J. (1990).spa
dc.relation.referencesFinding groups in data: An introduction to clúster analysis. New York: Wiley 2009 a simple.spa
dc.relation.referencesLópez, C. P. (2007). Minería de datos: técnicas y herramientas. Editorial Paraninfo.spa
dc.relation.referencesLückeheide, S., Velásquez, J. D., & Cerda, L. (2007). Segmentación de los contribuyentes que declaran iva aplicando herramientas de clustering. Revista de Ingeniería de Sistemas, 21, 87-110.spa
dc.relation.referencesMacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press. 2009 a simple.spa
dc.relation.referencesMaheswaran, G., Jayarajan, P., Jose, J., & Joseph, J. (2013). K Means Clustering Algorithms: A Comparitive Study.spa
dc.relation.referencesMatplotlib: Python plotting —Matplotlib 3.4.3 documentation. (2012). Matplotlib.org. https://matplotlib.org/spa
dc.relation.referencesMeilă, M., & Heckerman, D. (2001). An experimental comparison of model-based clustering methods. Machine learning, 42(1), 9-29.spa
dc.relation.referencesMorán, L. L., & Alonso, J. H. (2009). Estadística descriptiva. Ediciones Académicas.spa
dc.relation.referencesMüllner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software, 53(9), 1-18.spa
dc.relation.referencesNg, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems (pp. 849-856).spa
dc.relation.referencesNumPy. (2021). Numpy.org. https://numpy.org/spa
dc.relation.referencesPandas - Python Data Analysis Library. (2021). Pydata.org. https://pandas.pydata.org/spa
dc.relation.referencesPark, H. S., & Jun, C. H. (2009). A simple and fast algorithm for K-medoids clustering. Expert systems with applications, 36(2), 3336-3341.spa
dc.relation.referencesPeña, D. (2002). Análisis de datos multivariantes (Vol. 24). Madrid: McGraw-hill.spa
dc.relation.referencesProvost, F., Fawcett, T., (2013). Data Science for Business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc.spa
dc.relation.referencesRapidMiner GmbH. (2021). k-Means - RapidMiner Documentation. Rapidminer.com. https://docs.rapidminer.com/latest/studio/operators/modeling/segmentation/k_means.htmlspa
dc.relation.referencesReynolds, A. P., Richards, G., & Rayward-Smith, V. J. (2004, August). The application of k-medoids and pam to the clustering of rules. In International Conference on Intelligent Data Engineering and Automated Learning (pp. 173-178).spa
dc.relation.referencesSpringer, Berlin, Heidelberg. Rodrigo, J.A. (2020). Clustering con Python. Disponible en https://www.cienciadedatos.net/documentos/py20-clustering-con-python.html [10-06- 2021].spa
dc.relation.referencesRoss, S. M. (2007). Introducción a la estadística. Reverté.spa
dc.relation.referencesSangüesa Solé, Ramón (coord.) (2000). Data mining: una introducción. Barcelona: Universitat Oberta de Catalunya.spa
dc.relation.referencesSantana, Ó. F. (1991). El análisis de clúster: aplicación, interpretación y validación. Papers: revista de sociologia, (37), 65-76. Scikit-learn, (S.F.).spa
dc.relation.referencesScikit-learn: Clustering. Disponible en: https://scikitlearn.org/stable/modules/clustering.html.spa
dc.relation.referencesSAS/STAT Cluster Analysis Procedures. (2018, November 20). Sas.com. https://support.sas.com/rnd/app/stat/procedures/ClusterAnalysis.htmlspa
dc.relation.referencesShen, J., Hao, X., Liang, Z., Liu, Y., Wang, W., & Shao, L. (2016). Real-time superpixel segmentation by DBSCAN clustering algorithm. IEEEspa
dc.relation.referencesTransactions on Image Processing, 25(12), 5933-5942.spa
dc.relation.referencesSys — Parámetros y funciones específicos del sistema — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3.10/library/sys.htmlspa
dc.relation.referencesThe Jupyter Notebook — Jupyter Notebook 6.4.5 documentation. (2021). Readthedocs.io. https://jupyter-notebook.readthedocs.io/en/stable/ time —spa
dc.relation.referencesTime access and conversions — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/time.htmlspa
dc.relation.referencesTutorial de Python — documentación de Python - 3.10.0. (2021). Python.org. https://docs.python.org/es/3/tutorial/spa
dc.relation.referencesUriel, E., & Aldás, J. (2005). Análisis multivariante aplicado. 1ª. Edición. Thomson. Madrid.spa
dc.relation.referencesVan der Aalst, W.M., 2016. Process mining: data science in action. Springer.spa
dc.relation.referencesVon Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17(4), 395-416.spa
dc.relation.referencesWang, K., Zhang, J., Li, D., Zhang, X., & Guo, T. (2008). Adaptive affinity propagation clustering. arXiv preprint arXiv:0805.1096. warnings —spa
dc.relation.referencesWarning control — Python 3.10.0 documentation. (2021). Python.org. https://docs.python.org/3/library/warnings.htmlspa
dc.relation.referencesYellowbrick: Machine Learning Visualization — Yellowbrick v1.3.post1 documentation. (2021). Scikit-Yb.org. https://www.scikit-yb.org/en/latest/spa
dc.relation.referencesZelnik-Manor, L., & Perona, P. (2005). Self-tuning spectral clustering. Advances in neural information processing systems (pp. 1601-1608).spa
dc.relation.referencesZhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. ACM sigmod record, 25(2), 103-114.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-CompartirIgual 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadoresspa
dc.subject.lembCluster analysis
dc.subject.lembAnálisis clúster
dc.subject.proposalAnálisisspa
dc.subject.proposalClúster,spa
dc.subject.proposalSoftwareeng
dc.subject.proposalPythoneng
dc.subject.proposalLibreríaspa
dc.subject.proposalAprendizaje de máquinas automáticospa
dc.subject.proposalAnalysiseng
dc.subject.proposalClustereng
dc.subject.proposalPythoneng
dc.subject.proposalLibraryeng
dc.subject.proposalAutomatic machine learningeng
dc.titleAnálisis de clúster automáticospa
dc.title.translatedAutomatic cluster analysiseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
dcterms.audience.professionaldevelopmentPúblico generalspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1017230592.2021.pdf
Tamaño:
2 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ingeniería - Analítica

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.98 KB
Formato:
Item-specific license agreed upon to submission
Descripción: