Analítica predictiva y desarrollo de un modelo cuantitativo para estudio y segmentación del mercado farmacéutico para patologías de alto costo
dc.contributor.advisor | Olaya Morales, Yris | |
dc.contributor.author | Trillos Paredes, Jose Antonio | |
dc.contributor.orcid | Trillos Paredes, Jose Antonio [0009-0007-6723-9867] | spa |
dc.date.accessioned | 2025-06-25T13:58:38Z | |
dc.date.available | 2025-06-25T13:58:38Z | |
dc.date.issued | 2025-06-23 | |
dc.description | Ilustraciones, gráficos | spa |
dc.description.abstract | Actualmente, Colombia enfrenta importantes retos en su sistema de salud, especialmente en la gestión eficiente de los recursos, la cobertura y la administración de los prestadores de servicios. Dentro de este contexto, el sector farmacéutico, encargado de la comercialización de medicamentos, enfrenta desafíos específicos debido a la limitada cobertura del sistema y la falta de sistemas de información robustos. Para garantizar un acceso adecuado a las terapias según las necesidades de la población, surge la necesidad de analizar la información del mercado de medicamentos de alto costo y del sistema de salud colombiano, con el objetivo de respaldar la toma de decisiones de manera analítica y fundamentada. Este trabajo propone la aplicación de modelos de aprendizaje automático y técnicas de minería de datos para abordar dichas necesidades, centrándose en la construcción de un modelo de clusterización inspirado en la metodología RFM (Recency, Frequency, Monetary) (Hughes, 1996), adaptada al contexto del sector farmacéutico. Para ello, se ajustaron los parámetros clásicos del modelo RFM, reemplazando la dimensión de recencia por los días de inventario de cada cliente y frecuencia por variaciones en las compras, mientras que la dimensión valor monetario se mantuvo para representar el valor económico asociado. Esta adaptación permitió capturar mejor la dinámica de compra de los clientes en función de su estabilidad de inventario y su impacto financiero. Los resultados de la clasificación son el resultado de un proceso estructurado que inició con la recopilación y consolidación de una base de datos de 110,322 registros correspondientes a transacciones comerciales realizadas durante un año. Posteriormente, se realizó un proceso de pretratamiento de los datos, que incluyó la limpieza, normalización y transformación de las variables para garantizar su calidad y coherencia. A continuación, se emplearon técnicas de clusterización no supervisada para segmentar a los clientes en grupos homogéneos, utilizando los algoritmos K-Means, Gaussian Mixture Model (GMM) y Aglomerativo (Hierarchical Clustering). La validación de los modelos se realizó mediante la comparación de la métrica WCSS (Within-Cluster Sum of Squares) adaptada a las características de cada algoritmo, permitiendo seleccionar el modelo con la mejor cohesión interna. Como resultado, el algoritmo K-Means mostró el mejor desempeño, evidenciado por un menor valor de WCSS de 5.93, lo que garantiza una mayor compactación de los clusters y, por ende, una segmentación más precisa. Finalmente, el modelo entrenado permitió clasificar a los clientes en función de su comportamiento histórico de compra y predecir su posible evolución futura, facilitando la identificación de segmentos con diferentes perfiles de riesgo financiero. Esta clasificación ofrece una herramienta práctica para la toma de decisiones comerciales basadas en datos, permitiendo implementar estrategias diferenciadas para cada grupo y optimizar la gestión del mercado de medicamentos de alto costo en Colombia. La metodología desarrollada es escalable y puede adaptarse a otras patologías, consolidando así una solución replicable para distintos contextos dentro del sector farmacéutico. (Tomado de la fuente) | spa |
dc.description.abstract | Currently, Colombia faces significant challenges in its healthcare system, particularly in the efficient management of resources, coverage, and the administration of healthcare service providers. Within this context, the pharmaceutical sector, responsible for the commercialization of medications, encounters specific challenges due to the system's limited coverage and the lack of robust information systems. To ensure adequate access to therapies according to the population's needs, there is a need to analyze information from the high-cost medication market and the Colombian healthcare system, aiming to support decision-making in an analytical and evidence-based manner. This study proposes the application of machine learning models and data mining techniques to address these needs, focusing on the development of a clustering model inspired by the RFM methodology (Recency, Frequency, Monetary) (Hughes, 1996), adapted to the pharmaceutical sector's context. For this purpose, the classic RFM parameters were adjusted by replacing the recency dimension with each client’s inventory days and the frequency dimension with variations in purchases, while the monetary value dimension was maintained to represent the associated economic value. This adaptation allowed for a better capture of customers' purchasing dynamics based on their inventory stability and financial impact. The classification results from a structured process that began with the collection and consolidation of a database with 110,322 records of commercial transactions carried out over one year. Subsequently, a data preprocessing phase was conducted, including data cleaning, normalization, and variable transformation to ensure quality and consistency. Next, unsupervised clustering techniques were employed to segment customers into homogeneous groups using the K-Means, Gaussian Mixture Model (GMM), and Agglomerative (Hierarchical Clustering) algorithms. The validation of the models was performed by comparing the WCSS (Within-Cluster Sum of Squares) metric, adapted to the characteristics of each algorithm, allowing the selection of the model with the highest internal cohesion. As a result, the K-Means algorithm demonstrated the best performance, evidenced by a lower WCSS value of 5.93, ensuring greater cluster compactness and, consequently, more precise segmentation. Finally, the trained model enabled the classification of customers based on their historical purchasing behavior and the prediction of their potential future evolution, facilitating the identification of segments with different financial risk profiles. This classification provides a practical tool for data-driven commercial decision-making, enabling differentiated strategies for each group and optimizing the management of the high-cost medication market in Colombia. The developed methodology is scalable and can be adapted to other pathologies, thus establishing a replicable solution for various contexts within the pharmaceutical sector. | eng |
dc.description.curriculararea | Ingeniería De Sistemas E Informática.Sede Medellín | spa |
dc.description.degreelevel | Maestría | spa |
dc.description.degreename | Magíster en Ingeniería - Analítica | spa |
dc.format.extent | 65 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/88246 | |
dc.language.iso | spa | spa |
dc.publisher | Universidad Nacional de Colombia | spa |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | spa |
dc.publisher.faculty | Facultad de Minas | spa |
dc.publisher.place | Medellín, Colombia | spa |
dc.publisher.program | Medellín - Minas - Maestría en Ingeniería - Analítica | spa |
dc.relation.indexed | LaReferencia | spa |
dc.relation.references | Anitha, P., & Patil, M. M. (2022). RFM model for customer purchase behavior using K-Means algorithm. Journal of King Saud University-Computer and Information Sciences, 34(5), 1785-1792. | spa |
dc.relation.references | Asllani, A., & Halstead, D. (2015). A Multi-Objective Optimization Approach Using the RFM Model in Direct Marketing. Academy of Marketing Studies Journal, 19, 65. | spa |
dc.relation.references | Cheng, Ching-Hsue & Chen, You-Shyang. (2009). Classifying the segmentation of customer value via RFM model and RS theory. Expert Systems with Applications. 36. 4176-4184. 10.1016/j.eswa.2008.04.003. | spa |
dc.relation.references | Claycamp, H. J., & Massy, W. F. (1968). A theory of market segmentation. Journal of Marketing Research, 5(4), 388-394. | spa |
dc.relation.references | Colombo, R., & Jiang, W. (1999). A stochastic RFM model. Journal of Interactive Marketing, 13(3), 2-12. | spa |
dc.relation.references | Dumka, Ankur & Ashok, Alaknanda & Verma, Parag & Verma, Poonam. (2020). Advance Object Detection and Clustering Techniques Used for Big Data. 10.1201/9780429351310- 7. | spa |
dc.relation.references | Ernawati, E., Baharin, S. S. K., & Kasmin, F. (2021, April). A review of data mining methods in RFM-based customer segmentation. In Journal of Physics: Conference Series (Vol. 1869, No. 1, p. 012085). IOP Publishing. | spa |
dc.relation.references | Fernández-Huerga, E. (2023). La teoría de la segmentación del mercado de trabajo: enfoques, situación actual y perspectivas de futuro. Investigación Económica, 69(273), 115– 150. | spa |
dc.relation.references | Hajibaba, Homa & Grün, Bettina & Dolnicar, Sara. (2019). Improving the stability of market segmentation analysis. International Journal of Contemporary Hospitality Management. ahead-of-print. 10.1108/IJCHM-02-2019-0137. | spa |
dc.relation.references | Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer. | spa |
dc.relation.references | Hughes, A. M. (1996). Boosting response with RFM. Marketing Tools, 3(3), 4-10. | spa |
dc.relation.references | IBM. (2024, December 19). K-Means Clustering. Retrieved from https://www.ibm.com/think/topics/k-means-clustering | spa |
dc.relation.references | IBM Developer. (2023). Ibm.com.ttps://developer.ibm.com/articles/cc-unsupervised- learning-data-classification/ | spa |
dc.relation.references | J. C. Dunn (1973) A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clústeres, Journal of Cybernetics, 3:3, 32-57, DOI: 10.1080/01969727308546046 | spa |
dc.relation.references | J Clin Pathol. 2007 Mar;60(3):336. doi: 10.1136/jcp.2006.032300.corr1. Erratum for: J Clin Pathol. 60:8. PMCID: PMC1860553. | spa |
dc.relation.references | Kubat, M. (2017). An introduction to machine learning (p. 273). Springer. | spa |
dc.relation.references | Likas, A., Vlassis, N., & J. Verbeek, J. (2003). The global k-means clustering algorithm. Pattern Recognit., 36(2), 451–461. doi: 10.1016/S0031-3203(02)00060-2 | spa |
dc.relation.references | Liu, Y., Ram, S., Lusch, R. F., & Brusco, M. (2010). Multicriterion market segmentation: a new model, implementation, and evaluation. Marketing Science, 29(5), 880-894. | spa |
dc.relation.references | MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281-297). | spa |
dc.relation.references | Marija Burinskiene & Vitalija Rudzkiene (2007) Application of logit regression models for the identification of market segments, Journal of Business Economics and Management, 8:4, 253-258, DOI: 10.1080/16111699.2007.9636177 | spa |
dc.relation.references | McKinney, W. (2010). Data Structures for Statistical Computing in Python. ResearchGate, 56–61. doi: 10.25080/Majora-92bf1922-00a | spa |
dc.relation.references | Ministerio de Salud y Protección Social. (2023). Minsalud.gov.co. https://www.minsalud.gov.co. | spa |
dc.relation.references | Monaco C, Nanchahal J, Taylor P, Feldmann M. Anti-TNF therapy: past, present and future. Int Immunol. 2015 Jan;27(1):55-62. doi: 10.1093/intimm/dxu102. Epub 2014 Nov 19. PMID: 25411043; PMCID: PMC4279876. | spa |
dc.relation.references | Nwokeji, J. C., & Matovu, R. (2021). A systematic literature review on big data extraction, transformation and loading (etl). In Intelligent Computing: Proceedings of the 2021 Computing Conference, Volume 2 (pp. 308-324). Springer International Publishing. | spa |
dc.relation.references | Peter J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, Volume 20, 1987, Pages 53-65, ISSN 0377-0427, https://doi.org/10.1016/0377-0427(87)90125-7. | spa |
dc.relation.references | Qian, Y., Jiang, Y., Du, Y., Sun, J., & Liu, Y. (2020). Segmenting market structure from multi- channel clickstream data: A novel generative model. Electronic Commerce Research, 20, 509-533. | spa |
dc.relation.references | Reich, M., Gordon, D. M., & Edwards, R. C. (1973). A Theory of Labor Market Segmentation. The American Economic Review, 63(2), 359–365. http://www.jstor.org/stable/1817097 | spa |
dc.relation.references | Roshan, H., & Afsharinezhad, M. (2017). The new approach in market segmentation by using RFM model. Journal of applied research on industrial engineering, 4(4), 259-267. | spa |
dc.relation.references | Safari, F., Safari, N., & Montazer, G. A. (2016). Customer lifetime value determination based on RFM model. Marketing Intelligence & Planning, 34(4), 446-461. | spa |
dc.relation.references | Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. | spa |
dc.relation.references | SISPRO - Sistema Integrado de Información de la Protección Social. (2023). Sispro.gov.co. https://www.sispro.gov.co. | spa |
dc.relation.references | SPSS Statistics Subscription - Classic. (2024, September 30). Retrieved from https://www.ibm.com/docs/es/spss-statistics/saas?topic=features-hierarchical-cluster- analysis | spa |
dc.relation.references | U. Maulik and S. Bandyopadhyay, "Performance evaluation of some clustering algorithms and validity indices," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 12, pp. 1650-1654, Dec. 2002, doi: 10.1109/TPAMI.2002.1114856. | spa |
dc.relation.references | Ullmann, T., Hennig, C., & Boulesteix, A.-L. (2021). Validation of cluster analysis results on validation data: A systematic framework. ResearchGate. doi: 10.48550/arXiv.2103.01281 | spa |
dc.relation.references | Wei, J. T., Lin, S. Y., & Wu, H. H. (2010). A review of the application of RFM model. African Journal of Business Management, 4(19), 4199. | spa |
dc.relation.references | Ziegel, E. R. (2003). The elements of statistical learning. | spa |
dc.relation.references | Zufryden, F. S. (1979). ZIPMAP—A zero-one integer programming model for market segmentation and product positioning. Journal of the Operational Research Society, 30, 63- 70. | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.license | Atribución-NoComercial-SinDerivadas 4.0 Internacional | spa |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | spa |
dc.subject.ddc | 000 - Ciencias de la computación, información y obras generales::004 - Procesamiento de datos Ciencia de los computadores | spa |
dc.subject.ddc | 610 - Medicina y salud::615 - Farmacología y terapéutica | spa |
dc.subject.lemb | Medicamentos - Precios - Procesamiento de datos | |
dc.subject.lemb | Minería de datos - Procesamiento de datos | |
dc.subject.lemb | Aprendizaje automático (Inteligencia artificial) | |
dc.subject.lemb | Control de inventarios - Procesamiento de datos | |
dc.subject.proposal | Industria de alto costo | spa |
dc.subject.proposal | toma de decisiones | spa |
dc.subject.proposal | modelo de datos | spa |
dc.subject.proposal | Modelo RFM | spa |
dc.subject.proposal | oportunidades emergentes | spa |
dc.subject.proposal | optimización de recursos | spa |
dc.subject.proposal | segmentación de clientes | spa |
dc.subject.proposal | High-cost industry | eng |
dc.subject.proposal | Data model | eng |
dc.subject.proposal | decision making | eng |
dc.subject.proposal | RFM model | eng |
dc.subject.proposal | emerging opportunities | eng |
dc.subject.proposal | resource optimization | eng |
dc.subject.proposal | customer segmentation | eng |
dc.title | Analítica predictiva y desarrollo de un modelo cuantitativo para estudio y segmentación del mercado farmacéutico para patologías de alto costo | spa |
dc.title.translated | Predictive analytics and development of a quantitative model for the study and segmentation of the pharmaceutical market for high-cost pathologies | eng |
dc.type | Trabajo de grado - Maestría | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
dcterms.audience.professionaldevelopment | Estudiantes | spa |
dcterms.audience.professionaldevelopment | Investigadores | spa |
dcterms.audience.professionaldevelopment | Maestros | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 1007015432.2025.pdf
- Tamaño:
- 1.67 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Maestría en Ingeniería - Analítica
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción: