Detección de anomalías en series de tiempo utilizando métodos no supervisados
dc.contributor.advisor | Giraldo Gómez, Norman Diego | |
dc.contributor.author | Duque Granda, Carlos Andres | |
dc.date.accessioned | 2025-07-04T20:26:04Z | |
dc.date.available | 2025-07-04T20:26:04Z | |
dc.date.issued | 2025-03-04 | |
dc.description.abstract | Este trabajo de investigación se enfoca en el análisis y la comparación de diversos modelos no supervisados para la detección de anomalías en series temporales. Estas series son generadas a partir de patrones estacionales simulados y la introducción de anomalías utilizando cadenas de Markov. Las series temporales combinan comportamientos cíclicos y componentes estacionales, empleando funciones de coseno ajustadas y valores generados a partir de distribuciones de Poisson. Las anomalías son inyectadas mediante una matriz de transición que altera el comportamiento esperado de la serie, simulando eventos raros o atípicos. Este enfoque permite generar datos que imitan situaciones reales en las que las anomalías son eventos poco frecuentes y difíciles de predecir. Los modelos evaluados incluyen Isolation Forest, Autoencoders y K-Nearest Neighbors (KNN), los cuales fueron seleccionados por su eficacia en diferentes contextos de detección de anomalías. Cada uno de estos modelos se sometió a una evaluación exhaustiva utilizando métricas como la precisión, el recall, el F1-score, la exactitud, así como las tasas de falsos positivos y negativos. Los resultados obtenidos muestran que los Autoencoders son particularmente efectivos para detectar anomalías complejas y no lineales, mientras que el Isolation Forest sobresale en la identificación de outliers en conjuntos de datos con alta dimensionalidad. Por otro lado, el K-Nearest Neighbors (KNN) demostró ser útil en la detección de anomalías en entornos con menor dimensionalidad y patrones de proximidad bien definidos, donde las anomalías se caracterizan por estar alejadas de los puntos normales. (Texto tomado de la fuente) | spa |
dc.description.abstract | This research focuses on the analysis and comparison of various unsupervised models for anomaly detection in time series data. These series are generated from simulated seasonal patterns and the introduction of anomalies using Markov chains. The time series combine cyclical behaviors and seasonal components, using adjusted cosine functions and values generated from Poisson distributions. Anomalies are injected through a transition matrix that alters the expected behavior of the series, simulating rare or atypical events. This approach allows for the generation of data that mimic real-world situations where anomalies are infrequent and difficult to predict. The evaluated models include Isolation Forest, Autoencoders, and K-Nearest Neighbors (KNN), which were selected for their effectiveness in different anomaly detection contexts. Each of these models underwent a comprehensive evaluation using metrics such as precision, recall, F1-score, accuracy, as well as false positive and false negative rates. The results show that Autoencoders are particularly effective in detecting complex and non-linear anomalies, while Isolation Forest excels at identifying outliers in high-dimensional datasets. On the other hand, K-Nearest Neighbors (KNN) proved to be useful for detecting anomalies in lower-dimensional environments and welldefined proximity patterns, where anomalies are characterized by being far from normal data points | eng |
dc.description.curriculararea | Área Curricular Estadística | spa |
dc.description.degreelevel | Maestría | spa |
dc.description.degreename | Magíster en Ciencias - Estadística | spa |
dc.description.researcharea | Analítica Series de Tiempo | spa |
dc.format.extent | 76 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/88298 | |
dc.language.iso | spa | spa |
dc.publisher | Universidad Nacional de Colombia | spa |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | spa |
dc.publisher.faculty | Facultad de Ciencias | spa |
dc.publisher.place | Medellín, Colombia | spa |
dc.publisher.program | Medellín - Ciencias - Maestría en Ciencias - Estadística | spa |
dc.relation.references | [Aggarwal, 2016] Aggarwal, C. (2016). Outlier Analysis. Springer International Publishing. | spa |
dc.relation.references | [Ahmad et al., 2017] Ahmad, S., Lavin, A., Purdy, S., and Agha, Z. (2017). Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 262:134–147. Online Real-Time Learning Strategies for Data Streams. | spa |
dc.relation.references | [Ahmed et al., 2016] Ahmed, M., Naser Mahmood, A., and Hu, J. (2016). A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60:19–31. | spa |
dc.relation.references | [Bahrpeyma et al., 2021] Bahrpeyma, F., Roantree, M., Cappellari, P., Scriney, M., and Mc- Carren, A. (2021). A methodology for validating diversity in synthetic time series generation. Methods X, 8. | spa |
dc.relation.references | [Bandaragoda et al., 2018] Bandaragoda, T., Ting, K., Albrecht, D., Liu, F. T., Zhu, Y., and Wells, J. (2018). Isolation-based anomaly detection using nearest-neighbor ensembles: inne. Computational Intelligence, 34. | spa |
dc.relation.references | [Bergstra and Bengio, 2012] Bergstra, J. and Bengio, Y. (2012). Random search for hyperparameter optimization. J. Mach. Learn. Res., 13(null):281–305. | spa |
dc.relation.references | [Blazquez-Garcıa et al., 2021] Blazquez-Garcıa, A., Conde, A., Mori, U., and Lozano, J. A. (2021). A review on outlier/anomaly detection in time series data. ACM Comput. Surv., 54(3). | spa |
dc.relation.references | [Box et al., 2015] Box, G., Jenkins, G., Reinsel, G., and Ljung, G. (2015). Time Series Analysis: Forecasting and Control. Wiley Series in Probability and Statistics. Wiley. | spa |
dc.relation.references | [Breunig et al., 2000] Breunig, M., Kroger, P., Ng, R., and Sander, J. (2000). Lof: Identifying density-based local outliers. volume 29, pages 93–104. | spa |
dc.relation.references | [Brockwell and Davis, 2013] Brockwell, P. and Davis, R. (2013). Introduction to Time Series and Forecasting. Springer Texts in Statistics. Springer New York. | spa |
dc.relation.references | [Canonical Ltd., 2024] Canonical Ltd. (2024). Ubuntu: The leading operating system for PCs, IoT devices, servers and the cloud. Accessed: 2024-10-01. | spa |
dc.relation.references | [Carletti et al., 2019] Carletti, M., Masiero, C., Beghi, A., and Susto, G. A. (2019). Explainable machine learning in industry 4.0: Evaluating feature importance in anomaly detection to enable root cause analysis. pages 21–26. | spa |
dc.relation.references | [Chalapathy and Chawla, 2019] Chalapathy, R. and Chawla, S. (2019). Deep learning for anomaly detection: A survey. | spa |
dc.relation.references | [Chandola et al., 2009] Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Comput. Surv., 41(3):9–20. | spa |
dc.relation.references | [Chatfield and Xing, 2019] Chatfield, C. and Xing, H. (2019). The Analysis of Time Series: An Introduction with R. Chapman & Hall/CRC Texts in Statistical Science. CRC Press. | spa |
dc.relation.references | [Chen et al., 2019] Chen, C., Liu, Y., Kumar, M., and Qin, J. (2019). Energy consumption modelling using deep learning technique—a case study of eaf. Procedia CIRP, 72:1063– 1068. | spa |
dc.relation.references | [Cheng et al., 2009] Cheng, H., Tan, P.-N., Potter, C., and Klooster, S. (2009). Detection and characterization of anomalies in multivariate time series. In Proceedings of the 2009 SIAM international conference on data mining, pages 413–424. Society for Industrial and Applied Mathematics. | spa |
dc.relation.references | [Chicco and Jurman, 2020] Chicco, D. and Jurman, G. (2020). The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genomics, 21(6):1–13. | spa |
dc.relation.references | [Choi and Kim, 2024] Choi, W.-H. and Kim, J. (2024). Unsupervised learning approach for anomaly detection in industrial control systems. Applied System Innovation, 7(2). | spa |
dc.relation.references | [Cryer and Chan, 2008] Cryer, J. and Chan, K. (2008). Time Series Analysis: With Applications in R. Springer Texts in Statistics. Springer New York. | spa |
dc.relation.references | [Cui et al., 2023] Cui, Y., Liu, Z., and Lian, S. (2023). A survey on unsupervised anomaly detection algorithms for industrial images. IEEE Access, 11:55297–55315. | spa |
dc.relation.references | [Dash et al., 2023] Dash, C. S. K., Behera, A. K., Dehuri, S., and Ghosh, A. (2023). An outliers detection and elimination framework in classification task of data mining. Decision Analytics Journal, 6:100164. | spa |
dc.relation.references | [Davis and Goadrich, 2006] Davis, J. and Goadrich, M. (2006). Roc analysis and the auc metric. Machine Learning, 31(2):51–60. | spa |
dc.relation.references | [De Maesschalck et al., 2000] De Maesschalck, R., Jouan-Rimbaud, D., and Massart, D. (2000). The mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1):1–18. | spa |
dc.relation.references | [Ding and Fei, 2013] Ding, Z. and Fei, M. (2013). An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proceedings Volumes, 46(20):12–17. 3rd IFAC Conference on Intelligent Control and Automation Science ICONS 2013. | spa |
dc.relation.references | [Dowle and Srinivasan, 2023] Dowle, M. and Srinivasan, A. (2023). data.table: Extension of ‘data.frame‘. R package version 1.14.8. | spa |
dc.relation.references | [Durbin and Koopman, 2012] Durbin, J. and Koopman, S. (2012). Time Series Analysis by State Space Methods: Second Edition. Oxford Statistical Science Series. OUP Oxford. | spa |
dc.relation.references | [Efron and Tibshirani, 1994] Efron, B. and Tibshirani, R. (1994). An Introduction to the Bootstrap. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis. | spa |
dc.relation.references | [Emmott et al., 2013] Emmott, A. F., Das, S., Dietterich, T., Fern, A., and Wong, W.-K. (2013). Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ODD ’13, page 16–21, New York, NY, USA. Association for Computing Machinery. | spa |
dc.relation.references | [Enders, 2008] Enders, W. (2008). Applied Econometric Time Series, 2nd Ed. Wiley India Pvt. Limited. | spa |
dc.relation.references | [Fawcett, 2006] Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8):861–874. | spa |
dc.relation.references | [Garcıa et al., 2009] Garcıa, V., Mollineda, R. A., and Sanchez, J. S. (2009). Index of balanced accuracy: A performance measure for skewed class distributions. In Araujo, H., Mendon¸ca, A. M., Pinho, A. J., and Torres, M. I., editors, Pattern Recognition and Image Analysis, pages 441–448, Berlin, Heidelberg. Springer Berlin Heidelberg. | spa |
dc.relation.references | [Goodfellow et al., 2016] Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org. | spa |
dc.relation.references | [Guha et al., 2016] Guha, S., Mishra, N., Roy, G., and Schrijvers, O. (2016). Robust random cut forest based anomaly detection on streams. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 2712–2721. JMLR.org. | spa |
dc.relation.references | [Gupta et al., 2021] Gupta, M., Gao, J., Aggarwal, C., and Han, J. (2021). A comprehensive survey on machine learning for anomaly detection. ACM Computing Surveys (CSUR). | spa |
dc.relation.references | [Gupta et al., 2014] Gupta, M., Gao, J., Aggarwal, C. C., and Han, J. (2014). Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9):2250–2267. | spa |
dc.relation.references | [Hadley Wickham and M¨uller, 2015] Hadley Wickham, Romain Fran¸cois, L. H. and M¨uller, K. (2015). dplyr: A Grammar of Data Manipulation. R package version 0.4.3. | spa |
dc.relation.references | [Hamilton, 2020] Hamilton, J. (2020). Time Series Analysis. Princeton University Press. | spa |
dc.relation.references | [Hand et al., 2001] Hand, D. J., Mannila, H., and Smyth, P. (2001). Principles of data mining. MIT Press, Cambridge, MA. | spa |
dc.relation.references | [Hariri et al., 2021] Hariri, S., Kind, M. C., and Brunner, R. J. (2021). Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, 33(4):1479–1489. | spa |
dc.relation.references | [Hodge and Austin, 2004] Hodge, V. J. and Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85–126. | spa |
dc.relation.references | [Hossen et al., 2024] Hossen, M. J., Hoque, J. M. Z., Aziz, N. A. B. A., Ramanathan, T. T., and Raja, J. E. (2024). Unsupervised novelty detection for time series using a deep learning approach. Heliyon, 10(3):3–5. | spa |
dc.relation.references | [Hyndman and Athanasopoulos, 2018] Hyndman, R. and Athanasopoulos, G. (2018). Forecasting: principles and practice. OTexts. | spa |
dc.relation.references | [Jia, 2018] Jia, Y. (2018). Some Models for Count TimeSeries. Tesis doctoral, Clemson University, 105 Sikes Hall, Clemson, SC 29634, Estados Unidos. | spa |
dc.relation.references | [Laptev et al., 2015] Laptev, N., Amizadeh, S., and Flint, I. (2015). Generic and Scalable Framework for Automated Time-series Anomaly Detection. | spa |
dc.relation.references | [Li et al., 2012] Li, S.-H., Yen, D. C., Lu, W.-H., and Wang, C. (2012). Identifying the signs of fraudulent accounts using data mining techniques. Computers in Human Behavior, 28(3):1002–1013. | spa |
dc.relation.references | [Liu et al., 2008a] Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008a). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413–422. | spa |
dc.relation.references | [Liu et al., 2008b] Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008b). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413–422. IEEE. | spa |
dc.relation.references | [Liu et al., 2012] Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2012). Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data, 6(1). | spa |
dc.relation.references | [Liu et al., 2010] Liu, S., Yamada, M., Collier, N., and Sugiyama, M. (2010). Change-point detection in time-series data by relative density-ratio estimation. Neural Networks, 43:72– 83. | spa |
dc.relation.references | [Lu and Lysecky, 2017] Lu, S. and Lysecky, R. (2017). Time and sequence integrated runtime anomaly detection for embedded systems. ACM Transactions on Embedded Computing Systems, 17:1–27. | spa |
dc.relation.references | [Lutkepohl, 2007] Lutkepohl, H. (2007). New Introduction to Multiple Time Series Analysis. Springer Berlin Heidelberg. | spa |
dc.relation.references | [Malhotra et al., 2016] Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., and Shroff, G. (2016). Lstm-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148. | spa |
dc.relation.references | [Manning et al., 2008] Manning, C. D., Raghavan, P., and Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, USA. | spa |
dc.relation.references | [Martı et al., 2015] Martı, L., Sanchez-Pi, N., Molina, J. M., and Garcia, A. C. B. (2015). Anomaly detection based on sensor data in petroleum industry applications. Sensors, 15(2):2774–2797. | spa |
dc.relation.references | [McHugh, 2012a] McHugh, M. L. (2012a). Interrater reliability: the kappa statistic. Biochem Med (Zagreb), 22(3):276–282. Department of Nursing, National University, Aero Court, San Diego, California, USA. mchugh8688@gmail.com. | spa |
dc.relation.references | [McHugh, 2012b] McHugh, M. L. (2012b). Interrater reliability: the kappa statistic. Biochemia medica, 22(3):276–282. Available at: https://pubmed.ncbi.nlm.nih.gov/23092060/. | spa |
dc.relation.references | [Microsoft Corporation, 2020] Microsoft Corporation (2020). Windows subsystem for linux documentation. https://docs.microsoft.com/en-us/windows/wsl/about. Accessed: 2024- 10-01. | spa |
dc.relation.references | [Microsoft Corporation, 2024] Microsoft Corporation (2024). Github copilot. Accessed: 2024-04-03. Prompt: ‘Summarize the Geneva Convention in 50 words.’ Generated using https://copilot.microsoft.com/. | spa |
dc.relation.references | [Olivas et al., 2023] Olivas, E., Isla, M., Cruz, R., and Caballer, B. (2023). Sistemas de Aprendizaje Automatico. Big data, Data Science e Inteligencia Artificial. Ra-Ma S.A. Editorial y Publicaciones. | spa |
dc.relation.references | [Pimentel et al., 2014] Pimentel, M. A., Clifton, D. A., Clifton, L., and Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99:215–249. | spa |
dc.relation.references | [Prado et al., 2021] Prado, R., Ferreira, M., and West, M. (2021). Time Series: Modeling, Computation, and Inference, Second Edition. Chapman & Hall/CRC Texts in Statistical Science. CRC Press. | spa |
dc.relation.references | [Priestley, 1981] Priestley, M. (1981). Spectral Analysis and Time Series: Multivariate series, prediction and control. Probability and mathematical statistics. Academic Press. | spa |
dc.relation.references | [R Core Team, 2024] R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. | spa |
dc.relation.references | [Ramaswamy et al., 2000] Ramaswamy, S., Rastogi, R., and Shim, K. (2000). Efficient Algorithms for Mining Outliers from Large Data Sets., volume 29. | spa |
dc.relation.references | [Ratnadip and Agrawal, 2013] Ratnadip, A. and Agrawal, R. (2013). An Introductory Study on Time Series Modeling and Forecasting. Lap Lambert Academic Publishing GmbH KG. | spa |
dc.relation.references | [Renz et al., 2023] Renz, P., Cutajar, K., Twomey, N., Cheung, G. K. C., and Xie, H. (2023). Low-count time series anomaly detection. In 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), page 1–6. IEEE. | spa |
dc.relation.references | [Sakurada and Yairi, 2014] Sakurada, M. and Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. pages 4–11. | spa |
dc.relation.references | [Sarfraz et al., 2024] Sarfraz, M. S., Chen, M.-Y., Layer, L., Peng, K., and Koulakis, M. (2024). Position: Quo vadis, unsupervised time series anomaly detection? | spa |
dc.relation.references | [Shumway and Stoffer, 2017] Shumway, R. and Stoffer, D. (2017). Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics. Springer International Publishing. | spa |
dc.relation.references | [Siffer et al., 2017] Siffer, A., Fouque, P.-A., Termier, A., and Largouet, C. (2017). Anomaly Detection in Streams with Extreme Value Theory. | spa |
dc.relation.references | [Takens, 1981] Takens, F. (1981). Detecting strange attractors in turbulence. In Rand, D. and Young, L.-S., editors, Dynamical Systems and Turbulence, Warwick 1980, pages 366–381, Berlin, Heidelberg. Springer Berlin Heidelberg. | spa |
dc.relation.references | [Tan et al., 2011] Tan, S. C., Ting, K. M., and Liu, T. F. (2011). Fast anomaly detection for streaming data. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Two, IJCAI’11, page 1511–1516. AAAI Press. | spa |
dc.relation.references | [Team, 2021] Team, R. (2021). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA. | spa |
dc.relation.references | [Tsay, 2013] Tsay, R. (2013). Multivariate Time Series Analysis: With R and Financial Applications. Wiley Series in Probability and Statistics. Wiley. | spa |
dc.relation.references | [Van Rossum and Drake Jr, 1995] Van Rossum, G. and Drake Jr, F. L. (1995). Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam. | spa |
dc.relation.references | [Wickham, 2016] Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. | spa |
dc.relation.references | [Xu et al., 2017] Xu, D., Yan, Y., Ricci, E., and Sebe, N. (2017). Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, 156:117–127. Image and Video Understanding in Big Data. | spa |
dc.relation.references | [Zhang et al., 2021] Zhang, Y., Chen, Y., Wang, J., and Pan, Z. (2021). Unsupervised deep anomaly detection for multi-sensor time-series signals. | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.license | Reconocimiento 4.0 Internacional | spa |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | spa |
dc.subject.armarc | Análisis de series de tiempo | |
dc.subject.armarc | Procesos de Poisson | |
dc.subject.ddc | 510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas | spa |
dc.subject.lemb | Procesos de Markov | |
dc.subject.proposal | Anomalıas | spa |
dc.subject.proposal | Series de tiempo | spa |
dc.subject.proposal | Isolation Forest | eng |
dc.subject.proposal | Gaussian Mixture Model | eng |
dc.subject.proposal | Autoencoders | eng |
dc.subject.proposal | Simulacion | spa |
dc.title | Detección de anomalías en series de tiempo utilizando métodos no supervisados | spa |
dc.title.translated | Anomaly detection in time Series using unsupervised methods | eng |
dc.type | Trabajo de grado - Maestría | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
dcterms.audience.professionaldevelopment | Público general | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_14cb | spa |