Interpretabilidad categórica de clasificadores automáticos sobre contenido relacionado a la percepción de la seguridad

dc.contributor.advisorGómez Jaramillo, Francisco Albeiro
dc.contributor.authorBermúdez García, Andrés Julián
dc.contributor.researchgroupComputational Modeling of Biological Systems Research Group - COMBIOSspa
dc.contributor.subjectmatterexpertChaparro , Luisa Fernanda
dc.date.accessioned2023-06-20T16:36:28Z
dc.date.available2023-06-20T16:36:28Z
dc.date.issued2023-06-06
dc.descriptionilustracionesspa
dc.description.abstractLa percepción de la seguridad está relacionada con los sentimientos de los ciudadanos ante el riesgo asociado a los sucesos de seguridad y la magnitud de sus consecuencias. Debido a esta naturaleza subjetiva, es un tema complejo de cuantificar. Por ello, las redes sociales surgieron como una alternativa para cuantificar estas opiniones. Recientemente, se han utilizado métodos de aprendizaje automático supervisado multiclase para cuantificar distintos niveles de percepción de la seguridad. Sin embargo, estos métodos carecen de interpretabilidad sobre por qué un grupo de tweets clasifica en el mismo nivel de percepción de seguridad. En este trabajo, se propone una estrategia novedosa de interpretabilidad categórica y selección agnóstica al modelo para un grupo de predicciones relacionadas con el mismo nivel de percepción de la seguridad. Los resultados sugieren que el modelo propuesto presenta altos niveles de interpretabilidad para las diferentes categorías de percepción de seguridad. Adicionalmente, las métricas de interpretabilidad introducidas mejoran el proceso de selección de los modelos. (Texto tomado de la fuente)spa
dc.description.abstractThe perception of security relates to citizens’ feelings in the face of risk associated with security events and the magnitude of its consequences. Because of this subjective nature, it is a complex subject to quantify. Therefore, social networks emerged as an alternative to quantifying these opinions. Recently, multiclass supervised machine learning methods quantified different levels of security perception. However, these methods lack interpretability about why a group of tweets classifies in the same level of perception of security. This work proposes a novel strategy of categorical interpretability and model-agnostic selection for a group of predictions related to the same level of perception of security. The results suggest that the proposed model presents high levels of interpretability for the different PoS categories. Additionally, the introduced interpretability metrics improve the model selection process.eng
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ciencias - Matemática Aplicadaspa
dc.description.researchareaInterpretabilidad en aprendizaje automático.spa
dc.format.extentxiv, 70 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/84030
dc.language.isospaspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeBogotá,Colombiaspa
dc.publisher.programBogotá - Ciencias - Maestría en Ciencias - Matemática Aplicadaspa
dc.relation.referencesZschech, P., Weinzierl, S., Hambauer, N., Zilker, S., and Kraus, M. (2022). Gam (e) changer or not? an evaluation o interpretable machine learning models based on additive model constraints. arXiv preprint arXiv:2204.09123.spa
dc.relation.referencesZhu, Q. (2020). On the performance of matthews correlation coefficient (mcc) for imbalanced dataset. Pattern Recognition Letters, 136:71–80.spa
dc.relation.referencesZhang, Y., Chen, M., and Liu, L. (2015). A review on text mining. In 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS), pages 681–685. IEEE.spa
dc.relation.referencesYadav, S. and Sarkar, M. (2018). Enhancing sentiment analysis using domain-specific lexicon: A case study on gst. In 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pages 1109–1114. IEEE.spa
dc.relation.referencesYadav, R. and Sheoran, S. K. (2018). Crime prediction using auto regression techniques for time series data. In 2018 3rd International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE), pages 1–5. IEEE.spa
dc.relation.referencesWang, Z. J., Kale, A., Nori, H., Stella, P., Nunnally, M., Chau, D. H., Vorvoreanu, M., Vaughan, J. W., and Caruana, R. (2021). Gam changer: Editing generalized additive models with interactive visualization. arXiv preprint arXiv:2112.03245.spa
dc.relation.referencesWang, X., Gerber, M. S., and Brown, D. E. (2012). Automatic crime prediction using events extracted from twitter posts. In International conference on social computing, behavioral-cultural modeling, and prediction, pages 231–238. Springer.spa
dc.relation.referencesVillegas, A.J.R., Pabón, J.S.M., Rubio, M.D., Quintero, S., Vargas, J.G., and García, H. (2022). Spatio-temporal sparsity in homicide prediction models. IEEE Access, 10:14359–14367.spa
dc.relation.referencesVictorino, J., Rudas, J., Reyes, A. M., Pulido, C., Chaparro, L. F., Narváez, L. A., Martinez, D., and Gómez, F. (2020). Spatial-temporal patterns of aggressive behaviors: A case study Bogotá, Colombia. In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 685–691. IEEE.spa
dc.relation.referencesSoumya George, K. and Joseph, S. (2014). Text classification by augmenting bag of words (bow) representation with co-occurrence feature. IOSR Journal of Computer Engineering, 16(1):34–38.spa
dc.relation.referencesSchultz-Jones, B. (2009). Examining information behavior through social networks: An interdisciplinary review. Journal of Documentation.spa
dc.relation.referencesSánchez, M. (2008). La percepción de seguridad y la realidad social. Cuadernos de seguridad, 219.spa
dc.relation.referencesRutjens, B. T. and Brandt, M. J. (2019). Belief systems and the perception of reality. Routledge London, UK.spa
dc.relation.referencesRundmo, T. r. and Moen, B. r.-E. (2006). Risk perception and demand for risk mitigation in transport: A comparison of lay people, politicians and experts. Journal of Risk research, 9(6):623–640.spa
dc.relation.referencesRibeiro, M. T., Singh, S., and Guestrin, C. (2016b). “why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135– 1144.spa
dc.relation.referencesRibeiro, M. T., Singh, S., and Guestrin, C. (2016a). ”why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, pages 1135–1144.spa
dc.relation.referencesReyes, A. M., Rudas, J., Pulido, C., Victorino, J., Martínez, D., Narváez, L. Á., and Gómez, F. (2020). Characterization of temporal patterns in the occurrence of aggressive behaviors in Bogotá (Colombia). In 2020 7th International conference on behavioural and social computing (BESC), pages 1–4. IEEE.spa
dc.relation.referencesRefaeilzadeh, P., Tang, L., and Liu, H. (2009). Cross-validation. Encyclopedia of database systems, 5:532–538.spa
dc.relation.referencesQi, P., Zhang, Y., Zhang, Y., Bolton, J., and Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations.spa
dc.relation.referencesPulido, C., Prieto, J., and Gómez, F. (2019). How the social interactions in communities affect the fear of crime. Systems Research and Behavioral Science, 36(6):789–798.spa
dc.relation.referencesPulido, C., Chaparro, L. F., Rudas, J., Reyes, A. M., Victorino, J., Narváez, L. Á., Martínez, D., and Gómez, F. (2021). Data filtering and classification for the identification of texts related to security in Bogotá Colombia. 7th International Conference on Computational Social Science IC2S2 2021.spa
dc.relation.referencesPrieto Curiel, R., Cresci, S., Muntean, C. I., and Bishop, S. R. (2020). Crime and its fear in social media. Palgrave Communications, 6(1):1–12.spa
dc.relation.referencesPrieto Curiel, R. and Bishop, S. R. (2016). A metric of the difference between perception of security and victimisation rates. Crime Science, 5(1):1–15.spa
dc.relation.referencesPrathap, B. R. and Ramesha, K. (2018). Twitter sentiment for analysing different types of crimes. In 2018 International Conference on Communication, Computing and Internet of Things (IC3IoT), pages 483–488. IEEE.spa
dc.relation.referencesPorter, M. F. (2001). Snowball: A language for stemming algorithms.spa
dc.relation.referencesPedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830.spa
dc.relation.referencesPabón, J. S. M., Rubio, M. D., Castaño, Y., Riascos, A. J., and Díaz, P. R. (2020). A manifold learning data enrichment methodology for homicide prediction. In 2020 7th International Conference on Behavioural and Social Computing (BESC), pages 1–4. IEEE.spa
dc.relation.referencesOrdoñez-Eraso, H.-A., Pardo-Calvache, C.-J., and Cobos-Lozada, C.-A. (2020). Detection of homicide trends in Colombia using machine learning. Learning, 29(54):e11740.spa
dc.relation.referencesNocedal, J. and Wright, S. J. (2006). Numerical optimization 2nd edition.spa
dc.relation.referencesNaser, M. and Alavi, A. (2020). Insights into performance fitness and error metrics for machine learning. arXiv preprint arXiv:2006.00887.spa
dc.relation.referencesMolnar, C. (2022). Interpretable Machine Learning. 2 edition.spa
dc.relation.referencesExplanation in artificial intelligence: Insights from the social sciences. Artificial intelligence, 267:1–38.spa
dc.relation.referencesMessalas, A., Kanellopoulos, Y., and Makris, C. (2019). Modelagnostic interpretability with shapley values. In 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pages 1–7. IEEE.spa
dc.relation.referencesMatthews, B. W. (1975). Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA)-Protein Structure, 405(2):442–451.spa
dc.relation.referencesMalleson, N. and Andresen, M. A. (2015). The impact of using social media data in crime rate calculations: shifting hot spots and changing spatial patterns. Cartography and Geographic Information Science, 42(2):112–121.spa
dc.relation.referencesMadasamy, K. and Ramaswami, M. (2017). Data imbalance and classifiers: Impact and solutions from a big data perspective. International Journal of Computational Intelligence Research, 13(9):2267–2281.spa
dc.relation.referencesLuque, C., Luna, J. M., Luque, M., and Ventura, S. (2019). An advanced review on text mining in medicine. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3):e1302.spa
dc.relation.referencesLe, Q. and Mikolov, T. (2014). Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR.spa
dc.relation.referencesLatané, B. (1981). The psychology of social impact. American psychologist, 36(4).spa
dc.relation.referencesLadani, D. J. and Desai, N. P. (2020). Stopword identification and removal techniques on tc and ir applications: A survey. In 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), pages 466– 472. IEEE.spa
dc.relation.referencesKounadi, O., Lampoltshammer, T. J., Groff, E., Sitko, I., and Leitner, M. (2015). Exploring twitter to analyze the public’s reaction patterns to recently reported homicides in london. PloS one, 10(3):e0121848.spa
dc.relation.referencesKleck, G. and Barnes, J. C. (2014). Do more police lead to more crime deterrence? Crime & Delinquency, 60(5):716–738.spa
dc.relation.referencesKim, B., Khanna, R., and Koyejo, O. O. (2016). Examples are not enough, learn to criticize! criticism for interpretability. Advances in neural information processing systems, 29.spa
dc.relation.referencesKetkar, N. (2017). Stochastic gradient descent. In Deep learning with Python, pages 113–132. Springer.spa
dc.relation.referencesKaur, J. and Buttar, P. K. (2018). A systematic review on stopword removal algorithms. International Journal on Future Revolution in Computer Science & Communication Engineering, 4(4):207–210.spa
dc.relation.referencesJurman, G., Riccadonna, S., and Furlanello, C. (2012). A comparison of MCC and CEN error measures in multi-class prediction.spa
dc.relation.referencesJava, A., Song, X., Finin, T., and Tseng, B. (2007). Why we twitter: An analysis of a microblogging community. In International Workshop on Social Network Mining and Analysis, pages 118–138. Springer.spa
dc.relation.referencesHummelsheim, D., Hirtenlehner, H., Jackson, J., and Oberwittler, D. (2011). Social insecurities and fear of crime: A cross-national study on the impact of welfare state policies on crime-related anxieties. European sociological review, 27(3):327–345.spa
dc.relation.referencesHooker, G. (2007). Generalized functional anova diagnostics for highdimensional functions of dependent variables. Journal of Computational and Graphical Statistics, 16(3):709–732.spa
dc.relation.referencesHooker, G. (2004). Discovering additive structure in black box functions. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 575–580.spa
dc.relation.referencesHollis, M. E., Downey, S., Del Carmen, A., and Dobbs, R. R. (2017). The relationship between media portrayals and crime: perceptions of fear of crime among citizens. Crime prevention and community safety, 19(1):46–60.spa
dc.relation.referencesHirschberg, J. and Manning, C. D. (2015). Advances in natural language processing. Science, 349(6245):261–266.spa
dc.relation.referencesGrandini, M., Bagli, E., and Visani, G. (2020). Metrics for multi-class classification: an overview. arXiv preprint arXiv:2008.05756.spa
dc.relation.referencesGhosh, S., Roy, S., and Bandyopadhyay, S. K. (2012). A tutorial review on text mining algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 1(4):7.spa
dc.relation.referencesGerber, M. S. (2014). Predicting crime using twitter and kernel density estimation. Decision Support Systems, 61:115–125.spa
dc.relation.referencesGaisbauer, F., Pournaki, A., Banisch, S., and Olbrich, E. (2021). Ideological differences in engagement in public debate on twitter. Plos one, 16(3):e0249241.spa
dc.relation.referencesEfron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004). Least angle regression. The Annals of statistics, 32(2):407–499.spa
dc.relation.referencesDrakulich, K. M. (2015). Social capital, information, and perceived safety from crime: The differential effects of reassuring social connections and vicarious victimization. Social Science Quarterly, 96(1):176–190.spa
dc.relation.referencesDoshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.spa
dc.relation.referencesDa Silva, N. F., Hruschka, E. R., and Hruschka Jr, E. R. (2014). Tweet sentiment analysis with classifier ensembles. Decision support systems, 66:170–179.spa
dc.relation.referencesCvetojevic, S. and Hochmair, H. H. (2018). Analyzing the spread of tweets in response to paris attacks. Computers, Environment and Urban Systems, 71:14–26.spa
dc.relation.referencesCuriel, R. and Bishop, S. (2017). Modelling the fear of crime. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Science, 473:20170156.spa
dc.relation.referencesCruz, F. L., Troyano, J. A., Pontes, B., and Ortega, F. J. (2014). Mlsenticon: Un lexicón multilingüe de polaridades semánticas a nivel de lemas. Procesamiento del Lenguaje Natural, 53:113–120.spa
dc.relation.referencesChristopher, D. M., Prabhakar, R., and Hinrich, S. (2008). Introduction to information retrieval.spa
dc.relation.referencesChowdhury, G. G. and Chowdhury, S. (2003). Introduction to digital libraries. Facet publishing.spa
dc.relation.referencesChowdhary, K. (2020). Natural language processing. Fundamentals of artificial intelligence, pages 603–649.spa
dc.relation.referencesChicco, D. and Jurman, G. (2020). The advantages of the matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics, 21(1):1–13.spa
dc.relation.referencesChen, X., Cho, Y., and Jang, S. Y. (2015). Crime prediction using twitter sentiment and weather. In 2015 systems and information engineering design symposium, pages 63–68. IEEE.spa
dc.relation.referencesChaparro, L. F., Pulido, C., Rudas, J., Victorino, J., Reyes, A. M., Estrada, C., Narvaez, L. A., and Gómez, F. (2021b). Quantifying perception of security through social media and its relationship with crime. IEEE Access, 9:139201–139213.spa
dc.relation.referencesChaparro, L. F., Pulido, C., Rudas, J., Reyes, A. M., Victorino, J., Narv´aez, L., Martinez, D., and Gómez, F. (2021a). Interpretability of the perception of security based on tweets content. In 2021 International Conference on Applied Artificial Intelligence (ICAPAI), pages 1–6. IEEE.spa
dc.relation.referencesChaparro, L. F., Pulido, C., Rudas, J., Reyes, A., Victorino, J., Narváez, L. a., Gómez, F., and Martinez, D. (2020). Sentiment analysis of social network content to characterize the perception of security. In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 685– 691. IEEE.spa
dc.relation.referencesCarvalho, D. V., Pereira, E. M., and Cardoso, J. S. (2019). Machine learning interpretability: A survey on methods and metrics. Electronics, 8(8):832.spa
dc.relation.referencesCambria, E., Xing, F., Thelwall, M., andWelsch, R. (2022). Sentiment analysis as a multidisciplinary research area. IEEE Transactions on Artificial Intelligence, 3(2):1–3.spa
dc.relation.referencesCamargo, J. E., Torres, C. A., Martínez, O. H., and Gómez, F. A. (2016). A big data analytics system to analyze citizens’ perception of security. In 2016 IEEE International Smart Cities Conference (ISC2), pages 1–5. IEEE.spa
dc.relation.referencesCámara de Comercio de Bogotá . (2022). Encuesta de percepción y victimización de Bogotá-2021.spa
dc.relation.referencesBrown, M. E., Dustman, P. A., and Barthelemy, J. J. (2021). Twitter impact on a community trauma: An examination of who, what, and why it radiated. Journal of community psychology, 49(3):838–853.spa
dc.relation.referencesBrooker, R. G. and Schaefer, T. (2015). Methods of measuring public opinion. Public opinion in the 21st century. https://www. uky.edu/AS/PoliSci/Peffley/pdf/473Measuring% 20Public% 20Opinion. pdf.spa
dc.relation.referencesBottou, L. (2010). Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer.spa
dc.relation.referencesBokinsky, H., McKenzie, A., Bayoumi, A., McCaslin, R., Patterson, A., Matthews, M., Schmidley, J., and Eisner, L. (2013). Application of natural language processing techniques to marine v-22 maintenance data for populating a cbm-oriented database. In AHS Airworthiness, CBM, and HUMS Specialists’ Meeting, Huntsville, AL.spa
dc.relation.referencesBishop, C. M. and Nasrabadi, N. M. (2006). Pattern recognition and machine learning, volume 4. Springer.spa
dc.relation.referencesNltk: the natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, pages 69–72.spa
dc.relation.referencesBhagvat, S. (2011). Clustering of twitter technology tweets and the impact of stopwords on clusters.spa
dc.relation.referencesBendler, J., Brandt, T., Wagner, S., and Neumann, D. (2014). Investigating crime-to-twitter relationships in urban environments-facilitating a virtual neighborhood watch. Association for Information Systems (AIS) eLibrary.spa
dc.relation.referencesBarreras, F., Diaz, C., Riascos, A., and Ribero, M. (2016). Comparison of different crime prediction models in bogot´a. 2016.spa
dc.relation.referencesBalakrishnan, V. and Lloyd-Yemoh, E. (2014). Stemming and lemmatization: A comparison of retrieval performances.spa
dc.relation.referencesAyodele, T. O. (2010). Types of machine learning algorithms in new advances in machine learning. Croatia, Rijeka.spa
dc.relation.referencesAnguiano-Hernández, E. (2009). Naive bayes multinomial para clasificación de texto usando un esquema de pesado por clases.spa
dc.relation.referencesAgirre, E., Alegria, I., Arregi, X., Artola, X., de Ilarraza, A. D., Maritxalar, M., Sarasola, K., and Urkia, M. (1992). Xuxen: A spelling checker/corrector for basque based on two-level morphology. In Third Conference on Applied Natural Language Processing, pages 119–125.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial-CompartirIgual 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/spa
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.ddc000 - Ciencias de la computación, información y obras generales::001 - Conocimientospa
dc.subject.lembSentimientos
dc.subject.proposalPercepción de Seguridad (PoS)spa
dc.subject.proposalInterpretabilidad Localspa
dc.subject.proposalInterpretabilidad Categóricaspa
dc.subject.proposalProcesamiento de Lenguaje Natural (NLP)spa
dc.subject.proposalLIMEspa
dc.subject.proposalPerception of Security (PoS)eng
dc.subject.proposalLocal and Categorical interpretabilityeng
dc.subject.proposalNatural Language Processing (NPL)eng
dc.subject.proposalLIMEeng
dc.titleInterpretabilidad categórica de clasificadores automáticos sobre contenido relacionado a la percepción de la seguridadspa
dc.title.translatedCategorical interpretability of automatic classifiers on content related to the perception of securityeng
dc.title.translatedInterpretabilidade categórica de classificadores automáticos sobre conteúdo relacionado à percepção de segurança.por
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1010217881.2023.pdf
Tamaño:
3.02 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ciencias - Matemática Aplicada

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: