Detección de anomalías en series de tiempo utilizando métodos no supervisados

dc.contributor.advisorGiraldo Gómez, Norman Diego
dc.contributor.authorDuque Granda, Carlos Andres
dc.date.accessioned2025-07-04T20:26:04Z
dc.date.available2025-07-04T20:26:04Z
dc.date.issued2025-03-04
dc.description.abstractEste trabajo de investigación se enfoca en el análisis y la comparación de diversos modelos no supervisados para la detección de anomalías en series temporales. Estas series son generadas a partir de patrones estacionales simulados y la introducción de anomalías utilizando cadenas de Markov. Las series temporales combinan comportamientos cíclicos y componentes estacionales, empleando funciones de coseno ajustadas y valores generados a partir de distribuciones de Poisson. Las anomalías son inyectadas mediante una matriz de transición que altera el comportamiento esperado de la serie, simulando eventos raros o atípicos. Este enfoque permite generar datos que imitan situaciones reales en las que las anomalías son eventos poco frecuentes y difíciles de predecir. Los modelos evaluados incluyen Isolation Forest, Autoencoders y K-Nearest Neighbors (KNN), los cuales fueron seleccionados por su eficacia en diferentes contextos de detección de anomalías. Cada uno de estos modelos se sometió a una evaluación exhaustiva utilizando métricas como la precisión, el recall, el F1-score, la exactitud, así como las tasas de falsos positivos y negativos. Los resultados obtenidos muestran que los Autoencoders son particularmente efectivos para detectar anomalías complejas y no lineales, mientras que el Isolation Forest sobresale en la identificación de outliers en conjuntos de datos con alta dimensionalidad. Por otro lado, el K-Nearest Neighbors (KNN) demostró ser útil en la detección de anomalías en entornos con menor dimensionalidad y patrones de proximidad bien definidos, donde las anomalías se caracterizan por estar alejadas de los puntos normales. (Texto tomado de la fuente)spa
dc.description.abstractThis research focuses on the analysis and comparison of various unsupervised models for anomaly detection in time series data. These series are generated from simulated seasonal patterns and the introduction of anomalies using Markov chains. The time series combine cyclical behaviors and seasonal components, using adjusted cosine functions and values generated from Poisson distributions. Anomalies are injected through a transition matrix that alters the expected behavior of the series, simulating rare or atypical events. This approach allows for the generation of data that mimic real-world situations where anomalies are infrequent and difficult to predict. The evaluated models include Isolation Forest, Autoencoders, and K-Nearest Neighbors (KNN), which were selected for their effectiveness in different anomaly detection contexts. Each of these models underwent a comprehensive evaluation using metrics such as precision, recall, F1-score, accuracy, as well as false positive and false negative rates. The results show that Autoencoders are particularly effective in detecting complex and non-linear anomalies, while Isolation Forest excels at identifying outliers in high-dimensional datasets. On the other hand, K-Nearest Neighbors (KNN) proved to be useful for detecting anomalies in lower-dimensional environments and welldefined proximity patterns, where anomalies are characterized by being far from normal data pointseng
dc.description.curricularareaÁrea Curricular Estadísticaspa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ciencias - Estadísticaspa
dc.description.researchareaAnalítica Series de Tiempospa
dc.format.extent76 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/88298
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeMedellín, Colombiaspa
dc.publisher.programMedellín - Ciencias - Maestría en Ciencias - Estadísticaspa
dc.relation.references[Aggarwal, 2016] Aggarwal, C. (2016). Outlier Analysis. Springer International Publishing.spa
dc.relation.references[Ahmad et al., 2017] Ahmad, S., Lavin, A., Purdy, S., and Agha, Z. (2017). Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 262:134–147. Online Real-Time Learning Strategies for Data Streams.spa
dc.relation.references[Ahmed et al., 2016] Ahmed, M., Naser Mahmood, A., and Hu, J. (2016). A survey of network anomaly detection techniques. Journal of Network and Computer Applications, 60:19–31.spa
dc.relation.references[Bahrpeyma et al., 2021] Bahrpeyma, F., Roantree, M., Cappellari, P., Scriney, M., and Mc- Carren, A. (2021). A methodology for validating diversity in synthetic time series generation. Methods X, 8.spa
dc.relation.references[Bandaragoda et al., 2018] Bandaragoda, T., Ting, K., Albrecht, D., Liu, F. T., Zhu, Y., and Wells, J. (2018). Isolation-based anomaly detection using nearest-neighbor ensembles: inne. Computational Intelligence, 34.spa
dc.relation.references[Bergstra and Bengio, 2012] Bergstra, J. and Bengio, Y. (2012). Random search for hyperparameter optimization. J. Mach. Learn. Res., 13(null):281–305.spa
dc.relation.references[Blazquez-Garcıa et al., 2021] Blazquez-Garcıa, A., Conde, A., Mori, U., and Lozano, J. A. (2021). A review on outlier/anomaly detection in time series data. ACM Comput. Surv., 54(3).spa
dc.relation.references[Box et al., 2015] Box, G., Jenkins, G., Reinsel, G., and Ljung, G. (2015). Time Series Analysis: Forecasting and Control. Wiley Series in Probability and Statistics. Wiley.spa
dc.relation.references[Breunig et al., 2000] Breunig, M., Kroger, P., Ng, R., and Sander, J. (2000). Lof: Identifying density-based local outliers. volume 29, pages 93–104.spa
dc.relation.references[Brockwell and Davis, 2013] Brockwell, P. and Davis, R. (2013). Introduction to Time Series and Forecasting. Springer Texts in Statistics. Springer New York.spa
dc.relation.references[Canonical Ltd., 2024] Canonical Ltd. (2024). Ubuntu: The leading operating system for PCs, IoT devices, servers and the cloud. Accessed: 2024-10-01.spa
dc.relation.references[Carletti et al., 2019] Carletti, M., Masiero, C., Beghi, A., and Susto, G. A. (2019). Explainable machine learning in industry 4.0: Evaluating feature importance in anomaly detection to enable root cause analysis. pages 21–26.spa
dc.relation.references[Chalapathy and Chawla, 2019] Chalapathy, R. and Chawla, S. (2019). Deep learning for anomaly detection: A survey.spa
dc.relation.references[Chandola et al., 2009] Chandola, V., Banerjee, A., and Kumar, V. (2009). Anomaly detection: A survey. ACM Comput. Surv., 41(3):9–20.spa
dc.relation.references[Chatfield and Xing, 2019] Chatfield, C. and Xing, H. (2019). The Analysis of Time Series: An Introduction with R. Chapman & Hall/CRC Texts in Statistical Science. CRC Press.spa
dc.relation.references[Chen et al., 2019] Chen, C., Liu, Y., Kumar, M., and Qin, J. (2019). Energy consumption modelling using deep learning technique—a case study of eaf. Procedia CIRP, 72:1063– 1068.spa
dc.relation.references[Cheng et al., 2009] Cheng, H., Tan, P.-N., Potter, C., and Klooster, S. (2009). Detection and characterization of anomalies in multivariate time series. In Proceedings of the 2009 SIAM international conference on data mining, pages 413–424. Society for Industrial and Applied Mathematics.spa
dc.relation.references[Chicco and Jurman, 2020] Chicco, D. and Jurman, G. (2020). The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC Genomics, 21(6):1–13.spa
dc.relation.references[Choi and Kim, 2024] Choi, W.-H. and Kim, J. (2024). Unsupervised learning approach for anomaly detection in industrial control systems. Applied System Innovation, 7(2).spa
dc.relation.references[Cryer and Chan, 2008] Cryer, J. and Chan, K. (2008). Time Series Analysis: With Applications in R. Springer Texts in Statistics. Springer New York.spa
dc.relation.references[Cui et al., 2023] Cui, Y., Liu, Z., and Lian, S. (2023). A survey on unsupervised anomaly detection algorithms for industrial images. IEEE Access, 11:55297–55315.spa
dc.relation.references[Dash et al., 2023] Dash, C. S. K., Behera, A. K., Dehuri, S., and Ghosh, A. (2023). An outliers detection and elimination framework in classification task of data mining. Decision Analytics Journal, 6:100164.spa
dc.relation.references[Davis and Goadrich, 2006] Davis, J. and Goadrich, M. (2006). Roc analysis and the auc metric. Machine Learning, 31(2):51–60.spa
dc.relation.references[De Maesschalck et al., 2000] De Maesschalck, R., Jouan-Rimbaud, D., and Massart, D. (2000). The mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1):1–18.spa
dc.relation.references[Ding and Fei, 2013] Ding, Z. and Fei, M. (2013). An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proceedings Volumes, 46(20):12–17. 3rd IFAC Conference on Intelligent Control and Automation Science ICONS 2013.spa
dc.relation.references[Dowle and Srinivasan, 2023] Dowle, M. and Srinivasan, A. (2023). data.table: Extension of ‘data.frame‘. R package version 1.14.8.spa
dc.relation.references[Durbin and Koopman, 2012] Durbin, J. and Koopman, S. (2012). Time Series Analysis by State Space Methods: Second Edition. Oxford Statistical Science Series. OUP Oxford.spa
dc.relation.references[Efron and Tibshirani, 1994] Efron, B. and Tibshirani, R. (1994). An Introduction to the Bootstrap. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis.spa
dc.relation.references[Emmott et al., 2013] Emmott, A. F., Das, S., Dietterich, T., Fern, A., and Wong, W.-K. (2013). Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description, ODD ’13, page 16–21, New York, NY, USA. Association for Computing Machinery.spa
dc.relation.references[Enders, 2008] Enders, W. (2008). Applied Econometric Time Series, 2nd Ed. Wiley India Pvt. Limited.spa
dc.relation.references[Fawcett, 2006] Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters, 27(8):861–874.spa
dc.relation.references[Garcıa et al., 2009] Garcıa, V., Mollineda, R. A., and Sanchez, J. S. (2009). Index of balanced accuracy: A performance measure for skewed class distributions. In Araujo, H., Mendon¸ca, A. M., Pinho, A. J., and Torres, M. I., editors, Pattern Recognition and Image Analysis, pages 441–448, Berlin, Heidelberg. Springer Berlin Heidelberg.spa
dc.relation.references[Goodfellow et al., 2016] Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. http://www.deeplearningbook.org.spa
dc.relation.references[Guha et al., 2016] Guha, S., Mishra, N., Roy, G., and Schrijvers, O. (2016). Robust random cut forest based anomaly detection on streams. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, page 2712–2721. JMLR.org.spa
dc.relation.references[Gupta et al., 2021] Gupta, M., Gao, J., Aggarwal, C., and Han, J. (2021). A comprehensive survey on machine learning for anomaly detection. ACM Computing Surveys (CSUR).spa
dc.relation.references[Gupta et al., 2014] Gupta, M., Gao, J., Aggarwal, C. C., and Han, J. (2014). Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering, 26(9):2250–2267.spa
dc.relation.references[Hadley Wickham and M¨uller, 2015] Hadley Wickham, Romain Fran¸cois, L. H. and M¨uller, K. (2015). dplyr: A Grammar of Data Manipulation. R package version 0.4.3.spa
dc.relation.references[Hamilton, 2020] Hamilton, J. (2020). Time Series Analysis. Princeton University Press.spa
dc.relation.references[Hand et al., 2001] Hand, D. J., Mannila, H., and Smyth, P. (2001). Principles of data mining. MIT Press, Cambridge, MA.spa
dc.relation.references[Hariri et al., 2021] Hariri, S., Kind, M. C., and Brunner, R. J. (2021). Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, 33(4):1479–1489.spa
dc.relation.references[Hodge and Austin, 2004] Hodge, V. J. and Austin, J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85–126.spa
dc.relation.references[Hossen et al., 2024] Hossen, M. J., Hoque, J. M. Z., Aziz, N. A. B. A., Ramanathan, T. T., and Raja, J. E. (2024). Unsupervised novelty detection for time series using a deep learning approach. Heliyon, 10(3):3–5.spa
dc.relation.references[Hyndman and Athanasopoulos, 2018] Hyndman, R. and Athanasopoulos, G. (2018). Forecasting: principles and practice. OTexts.spa
dc.relation.references[Jia, 2018] Jia, Y. (2018). Some Models for Count TimeSeries. Tesis doctoral, Clemson University, 105 Sikes Hall, Clemson, SC 29634, Estados Unidos.spa
dc.relation.references[Laptev et al., 2015] Laptev, N., Amizadeh, S., and Flint, I. (2015). Generic and Scalable Framework for Automated Time-series Anomaly Detection.spa
dc.relation.references[Li et al., 2012] Li, S.-H., Yen, D. C., Lu, W.-H., and Wang, C. (2012). Identifying the signs of fraudulent accounts using data mining techniques. Computers in Human Behavior, 28(3):1002–1013.spa
dc.relation.references[Liu et al., 2008a] Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008a). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413–422.spa
dc.relation.references[Liu et al., 2008b] Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2008b). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining, pages 413–422. IEEE.spa
dc.relation.references[Liu et al., 2012] Liu, F. T., Ting, K. M., and Zhou, Z.-H. (2012). Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data, 6(1).spa
dc.relation.references[Liu et al., 2010] Liu, S., Yamada, M., Collier, N., and Sugiyama, M. (2010). Change-point detection in time-series data by relative density-ratio estimation. Neural Networks, 43:72– 83.spa
dc.relation.references[Lu and Lysecky, 2017] Lu, S. and Lysecky, R. (2017). Time and sequence integrated runtime anomaly detection for embedded systems. ACM Transactions on Embedded Computing Systems, 17:1–27.spa
dc.relation.references[Lutkepohl, 2007] Lutkepohl, H. (2007). New Introduction to Multiple Time Series Analysis. Springer Berlin Heidelberg.spa
dc.relation.references[Malhotra et al., 2016] Malhotra, P., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., and Shroff, G. (2016). Lstm-based encoder-decoder for multi-sensor anomaly detection. arXiv preprint arXiv:1607.00148.spa
dc.relation.references[Manning et al., 2008] Manning, C. D., Raghavan, P., and Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, USA.spa
dc.relation.references[Martı et al., 2015] Martı, L., Sanchez-Pi, N., Molina, J. M., and Garcia, A. C. B. (2015). Anomaly detection based on sensor data in petroleum industry applications. Sensors, 15(2):2774–2797.spa
dc.relation.references[McHugh, 2012a] McHugh, M. L. (2012a). Interrater reliability: the kappa statistic. Biochem Med (Zagreb), 22(3):276–282. Department of Nursing, National University, Aero Court, San Diego, California, USA. mchugh8688@gmail.com.spa
dc.relation.references[McHugh, 2012b] McHugh, M. L. (2012b). Interrater reliability: the kappa statistic. Biochemia medica, 22(3):276–282. Available at: https://pubmed.ncbi.nlm.nih.gov/23092060/.spa
dc.relation.references[Microsoft Corporation, 2020] Microsoft Corporation (2020). Windows subsystem for linux documentation. https://docs.microsoft.com/en-us/windows/wsl/about. Accessed: 2024- 10-01.spa
dc.relation.references[Microsoft Corporation, 2024] Microsoft Corporation (2024). Github copilot. Accessed: 2024-04-03. Prompt: ‘Summarize the Geneva Convention in 50 words.’ Generated using https://copilot.microsoft.com/.spa
dc.relation.references[Olivas et al., 2023] Olivas, E., Isla, M., Cruz, R., and Caballer, B. (2023). Sistemas de Aprendizaje Automatico. Big data, Data Science e Inteligencia Artificial. Ra-Ma S.A. Editorial y Publicaciones.spa
dc.relation.references[Pimentel et al., 2014] Pimentel, M. A., Clifton, D. A., Clifton, L., and Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 99:215–249.spa
dc.relation.references[Prado et al., 2021] Prado, R., Ferreira, M., and West, M. (2021). Time Series: Modeling, Computation, and Inference, Second Edition. Chapman & Hall/CRC Texts in Statistical Science. CRC Press.spa
dc.relation.references[Priestley, 1981] Priestley, M. (1981). Spectral Analysis and Time Series: Multivariate series, prediction and control. Probability and mathematical statistics. Academic Press.spa
dc.relation.references[R Core Team, 2024] R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.spa
dc.relation.references[Ramaswamy et al., 2000] Ramaswamy, S., Rastogi, R., and Shim, K. (2000). Efficient Algorithms for Mining Outliers from Large Data Sets., volume 29.spa
dc.relation.references[Ratnadip and Agrawal, 2013] Ratnadip, A. and Agrawal, R. (2013). An Introductory Study on Time Series Modeling and Forecasting. Lap Lambert Academic Publishing GmbH KG.spa
dc.relation.references[Renz et al., 2023] Renz, P., Cutajar, K., Twomey, N., Cheung, G. K. C., and Xie, H. (2023). Low-count time series anomaly detection. In 2023 IEEE 33rd International Workshop on Machine Learning for Signal Processing (MLSP), page 1–6. IEEE.spa
dc.relation.references[Sakurada and Yairi, 2014] Sakurada, M. and Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. pages 4–11.spa
dc.relation.references[Sarfraz et al., 2024] Sarfraz, M. S., Chen, M.-Y., Layer, L., Peng, K., and Koulakis, M. (2024). Position: Quo vadis, unsupervised time series anomaly detection?spa
dc.relation.references[Shumway and Stoffer, 2017] Shumway, R. and Stoffer, D. (2017). Time Series Analysis and Its Applications: With R Examples. Springer Texts in Statistics. Springer International Publishing.spa
dc.relation.references[Siffer et al., 2017] Siffer, A., Fouque, P.-A., Termier, A., and Largouet, C. (2017). Anomaly Detection in Streams with Extreme Value Theory.spa
dc.relation.references[Takens, 1981] Takens, F. (1981). Detecting strange attractors in turbulence. In Rand, D. and Young, L.-S., editors, Dynamical Systems and Turbulence, Warwick 1980, pages 366–381, Berlin, Heidelberg. Springer Berlin Heidelberg.spa
dc.relation.references[Tan et al., 2011] Tan, S. C., Ting, K. M., and Liu, T. F. (2011). Fast anomaly detection for streaming data. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Two, IJCAI’11, page 1511–1516. AAAI Press.spa
dc.relation.references[Team, 2021] Team, R. (2021). RStudio: Integrated Development Environment for R. RStudio, PBC, Boston, MA.spa
dc.relation.references[Tsay, 2013] Tsay, R. (2013). Multivariate Time Series Analysis: With R and Financial Applications. Wiley Series in Probability and Statistics. Wiley.spa
dc.relation.references[Van Rossum and Drake Jr, 1995] Van Rossum, G. and Drake Jr, F. L. (1995). Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam.spa
dc.relation.references[Wickham, 2016] Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.spa
dc.relation.references[Xu et al., 2017] Xu, D., Yan, Y., Ricci, E., and Sebe, N. (2017). Detecting anomalous events in videos by learning deep representations of appearance and motion. Computer Vision and Image Understanding, 156:117–127. Image and Video Understanding in Big Data.spa
dc.relation.references[Zhang et al., 2021] Zhang, Y., Chen, Y., Wang, J., and Pan, Z. (2021). Unsupervised deep anomaly detection for multi-sensor time-series signals.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseReconocimiento 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/spa
dc.subject.armarcAnálisis de series de tiempo
dc.subject.armarcProcesos de Poisson
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.lembProcesos de Markov
dc.subject.proposalAnomalıasspa
dc.subject.proposalSeries de tiempospa
dc.subject.proposalIsolation Foresteng
dc.subject.proposalGaussian Mixture Modeleng
dc.subject.proposalAutoencoderseng
dc.subject.proposalSimulacionspa
dc.titleDetección de anomalías en series de tiempo utilizando métodos no supervisadosspa
dc.title.translatedAnomaly detection in time Series using unsupervised methodseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentPúblico generalspa
oaire.accessrightshttp://purl.org/coar/access_right/c_14cbspa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1152704045.2025.pdf
Tamaño:
3.4 MB
Formato:
Adobe Portable Document Format
Descripción:

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: