Mostrar el registro sencillo del documento

dc.rights.licenseReconocimiento 4.0 Internacional
dc.contributor.advisorNiño Vasquez, Luis Fernando
dc.contributor.authorCalero Espinosa, Juan Camilo
dc.date.accessioned2021-05-26T16:54:28Z
dc.date.available2021-05-26T16:54:28Z
dc.date.issued2021
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/79567
dc.descriptiondiagramas, ilustraciones a color, tablas
dc.description.abstractTopic detection on a large corpus of documents requires a considerable amount of computational resources, and the number of topics increases the burden as well. However, even a large number of topics might not be as specific as desired, or simply the topic quality starts decreasing after a certain number. To overcome these obstacles, we propose a new methodology for hierarchical topic detection, which uses multi-view clustering to link different topic models extracted from document named entities and part of speech tags. Results on three different datasets evince that the methodology decreases the memory cost of topic detection, improves topic quality and allows the detection of more topics.
dc.description.abstractLa detección de temas en grandes colecciones de documentos requiere una considerable cantidad de recursos computacionales, y el número de temas también puede aumentar la carga computacional. Incluso con un elevado nùmero de temas, estos pueden no ser tan específicos como se desea, o simplemente la calidad de los temas comienza a disminuir después de cierto número. Para superar estos obstáculos, proponemos una nueva metodología para la detección jerárquica de temas, que utiliza agrupamiento multi-vista para vincular diferentes modelos de temas extraídos de las partes del discurso y de las entidades nombradas de los documentos. Los resultados en tres conjuntos de documentos muestran que la metodología disminuye el costo en memoria de la detección de temas, permitiendo detectar màs temas y al mismo tiempo mejorar su calidad.
dc.format.extent1 recurso en línea (88 páginas)
dc.format.mimetypeapplication/pdf
dc.language.isoeng
dc.publisherUniversidad Nacional de Colombia
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subject.ddc000 - Ciencias de la computación, información y obras generales
dc.titleMulti-view learning for hierarchical topic detection on corpus of documents
dc.typeTrabajo de grado - Maestría
dc.type.driverinfo:eu-repo/semantics/masterThesis
dc.type.versioninfo:eu-repo/semantics/acceptedVersion
dc.publisher.programBogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computación
dc.contributor.researchgroupLABORATORIO DE INVESTIGACIÓN EN SISTEMAS INTELIGENTES - LISI
dc.description.degreelevelMaestría
dc.description.degreenameMagíster en Ingeniería – Sistemas y Computación
dc.description.researchareaProcesamiento de lenguaje natural
dc.identifier.instnameUniversidad Nacional de Colombia
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombia
dc.identifier.repourlhttps://repositorio.unal.edu.co/
dc.publisher.departmentDepartamento de Ingeniería de Sistemas e Industrial
dc.publisher.facultyFacultad de Ingeniería
dc.publisher.placeBogotá
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotá
dc.relation.referencesStephen E. Palmer. “Hierarchical structure in perceptual representation”. In: Cogni- tive Psychology 9.4 (Oct. 1977), pp. 441–474. issn: 0010-0285. doi: 10.1016/0010- 0285(77)90016-0. url: https://www.sciencedirect.com/science/article/pii/ 0010028577900160.
dc.relation.referencesE. Wachsmuth, M. W. Oram, and D. I. Perrett. “Recognition of Objects and Their Component Parts: Responses of Single Units in the Temporal Cortex of the Macaque”. In: Cerebral Cortex 4.5 (Sept. 1994), pp. 509–522. issn: 1047-3211. doi: 10.1093/ cercor/4.5.509. url: https://academic.oup.com/cercor/article-lookup/doi/ 10.1093/cercor/4.5.509.
dc.relation.referencesN K Logothetis and D L Sheinberg. “Visual Object Recognition”. In: Annual Review of Neuroscience 19.1 (Mar. 1996), pp. 577–621. issn: 0147-006X. doi: 10.1146/annurev. ne . 19 . 030196 . 003045. url: http : / / www . annualreviews . org / doi / 10 . 1146 / annurev.ne.19.030196.003045.
dc.relation.referencesDaniel D. Lee and H. Sebastian Seung. “Learning the parts of objects by non-negative matrix factorization”. In: Nature 401.6755 (Oct. 1999), pp. 788–791. issn: 00280836. doi: 10.1038/44565. url: http://www.nature.com/articles/44565.
dc.relation.referencesDavid M. Blei, Andrew Y. Ng, and Michael I. Jordan. “Latent Dirichlet Allocation”. In: Journal of Machine Learning Research 3.Jan (2003), pp. 993–1022. issn: ISSN 1533-7928. url: http://www.jmlr.org/papers/v3/blei03a.html.
dc.relation.referencesThomas L. Griffiths et al. “Hierarchical Topic Models and the Nested Chinese Restau- rant Process”. In: Advances in Neural Information Processing Systems (2003), pp. 17– 24. url: https://papers.nips.cc/paper/2466- hierarchical- topic- models- and-the-nested-chinese%20-restaurant-process.pdf.
dc.relation.referencesStella X. Yu and Jianbo Shi. “Multiclass spectral clustering”. In: Proceedings of the IEEE International Conference on Computer Vision. Vol. 1. Institute of Electrical and Electronics Engineers Inc., 2003, pp. 313–319. doi: 10.1109/iccv.2003.1238361. url: https://ieeexplore.ieee.org/abstract/document/1238361.
dc.relation.referencesS. Bickel and T. Scheffer. “Multi-View Clustering”. In: Fourth IEEE International Conference on Data Mining (ICDM’04). IEEE, 2004, pp. 19–26. isbn: 0-7695-2142-8. doi: 10.1109/ICDM.2004.10095. url: http://ieeexplore.ieee.org/document/ 1410262/.74 Bibliography
dc.relation.referencesNevin L Zhang and Lzhang@cs Ust Hk. Hierarchical Latent Class Models for Cluster Analysis. Tech. rep. 2004, pp. 697–723. url: https : / / www . jmlr . org / papers / volume5/zhang04a/zhang04a.pdf.
dc.relation.referencesDavid Newman, Chaitanya Chemudugunta, and Padhraic Smyth. “Statistical entity- topic models”. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. 2006. Association for Computing Ma- chinery, 2006, pp. 680–686. isbn: 1595933395. doi: 10.1145/1150402.1150487.
dc.relation.referencesLi Wei and Andrew McCallum. “Pachinko allocation: DAG-structured mixture models of topic correlations”. In: ACM International Conference Proceeding Series. Vol. 148. 2006, pp. 577–584. isbn: 1595933832. doi: 10.1145/1143844.1143917. url: https: //dl.acm.org/doi/abs/10.1145/1143844.1143917.
dc.relation.referencesDavid M. Blei, Thomas L. Griffiths, and Michael I. Jordan. “The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies”. In: (Oct. 2007). url: https://arxiv.org/abs/0710.0845.
dc.relation.referencesDavid Mimno, Wei Li, and Andrew McCallum. “Mixtures of hierarchical topics with Pachinko allocation”. In: ACM International Conference Proceeding Series. Vol. 227. 2007, pp. 633–640. doi: 10.1145/1273496.1273576. url: https://dl.acm.org/ doi/abs/10.1145/1273496.1273576.
dc.relation.referencesTomoaki Nakamura, Takayuki Nagai, and Naoto Iwahashi. “Multimodal object cat- egorization by a robot”. In: 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Oct. 2007, pp. 2415–2420. isbn: 978-1-4244-0911-2. doi: 10 . 1109 / IROS . 2007 . 4399634. url: http : / / ieeexplore . ieee . org / document / 4399634/.
dc.relation.referencesChaitanya Chemudugunta et al. “Modeling Documents by Combining Semantic Con- cepts with Unsupervised Statistical Learning”. In: 2008, pp. 229–244. doi: 10.1007/ 978-3-540-88564-1{\_}15.
dc.relation.referencesYi Wang, Nevin L. Zhang, and Tao Chen. “Latent tree models and approximate in- ference in Bayesian networks”. In: Journal of Artificial Intelligence Research 32 (Aug. 2008), pp. 879–900. issn: 10769757. doi: 10.1613/jair.2530. url: https://www. jair.org/index.php/jair/article/view/10564.
dc.relation.referencesNevin L. Zhang et al. “Latent tree models and diagnosis in traditional Chinese medicine”. In: Artificial Intelligence in Medicine 42.3 (Mar. 2008), pp. 229–245. issn: 09333657. doi: 10.1016/j.artmed.2007.10.004. url: https://www.sciencedirect.com/ science/article/pii/S0933365707001443.
dc.relation.referencesDavid Andrzejewski, Xiaojin Zhu, and Mark Craven. “Incorporating domain knowledge into topic modeling via Dirichlet forest priors”. In: ACM International Conference Proceeding Series. Vol. 382. 2009. isbn: 9781605585161. doi: 10 . 1145 / 1553374 . 1553378.Bibliography 75
dc.relation.referencesJonathan Chang et al. Reading Tea Leaves: How Humans Interpret Topic Models. Tech. rep. 2009. url: http://rexa.info.
dc.relation.referencesTomoaki Nakamura, Takayuki Nagai, and Naoto Iwahashi. “Grounding of word mean- ings in multimodal concepts using LDA”. In: 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, Oct. 2009, pp. 3943–3948. isbn: 978-1-4244- 3803-7. doi: 10.1109/IROS.2009.5354736. url: http://ieeexplore.ieee.org/ document/5354736/.
dc.relation.referencesGuangcan Liu et al. “Robust Recovery of Subspace Structures by Low-Rank Repre- sentation”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 35.1 (Oct. 2010), pp. 171–184. doi: 10.1109/TPAMI.2012.88. url: http://arxiv.org/ abs/1010.2955%20http://dx.doi.org/10.1109/TPAMI.2012.88.
dc.relation.referencesJames Petterson et al. Word Features for Latent Dirichlet Allocation. Tech. rep. 2010, pp. 1921–1929.
dc.relation.referencesNakatani Shuyo. Language Detection Library for Java. 2010. url: http : / / code . google.com/p/language-detection/.
dc.relation.referencesAbhishek Kumar and Hal Daumé III. A Co-training Approach for Multi-view Spectral Clustering. Tech. rep. 2011. url: http://legacydirs.umiacs.umd.edu/~abhishek/ cospectral.icml11.pdf.
dc.relation.referencesAbhishek Kumar, Piyush Rai, and Hal Daumé III. Co-regularized Multi-view Spectral Clustering. Tech. rep. 2011.
dc.relation.referencesDavid Mimno et al. Optimizing Semantic Coherence in Topic Models. Tech. rep. 2011, pp. 262–272. url: https://www.aclweb.org/anthology/D11-1024.pdf.
dc.relation.referencesTomoaki Nakamura, Takayuki Nagai, and Naoto Iwahashi. “Bag of multimodal LDA models for concept formation”. In: 2011 IEEE International Conference on Robotics and Automation. IEEE, May 2011, pp. 6233–6238. isbn: 978-1-61284-386-5. doi: 10. 1109 / ICRA . 2011 . 5980324. url: http : / / ieeexplore . ieee . org / document / 5980324/.
dc.relation.referencesEhsan Elhamifar and Rene Vidal. “Sparse Subspace Clustering: Algorithm, Theory, and Applications”. In: IEEE Transactions on Pattern Analysis and Machine Intelli- gence 35.11 (Mar. 2012), pp. 2765–2781. url: http://arxiv.org/abs/1203.1005.
dc.relation.referencesJagadeesh Jagarlamudi, Hal Daumé Iii, and Raghavendra Udupa. Incorporating Lexical Priors into Topic Models. Tech. rep. 2012, pp. 204–213. url: https://www.aclweb. org/anthology/E12-1021.pdf.
dc.relation.referencesXiao Cai, Feiping Nie, and Heng Huang. Multi-View K-Means Clustering on Big Data. Tech. rep. 2013. url: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10. 1.1.415.8610&rep=rep1&type=pdf.76 Bibliography
dc.relation.referencesZhiyuan Chen et al. “Discovering Coherent Topics Using General Knowledge Data Mining View project Web-KDD-KDD Workshop Series on Web Mining and Web Usage Analysis View project Discovering Coherent Topics Using General Knowledge”. In: dl.acm.org (2013), pp. 209–218. doi: 10.1145/2505515.2505519. url: http://dx. doi.org/10.1145/2505515.2505519.
dc.relation.referencesZhiyuan Chen et al. “Leveraging Multi-Domain Prior Knowledge in Topic Models”. In: IJCAI International Joint Conference on Artificial Intelligence. Nov. 2013, pp. 2071– 2077.
dc.relation.referencesLinmei Hu et al. “Incorporating entities in news topic modeling”. In: Communications in Computer and Information Science. Vol. 400. Springer Verlag, Nov. 2013, pp. 139– 150. isbn: 9783642416439. doi: 10.1007/978-3-642-41644-6{\_}14. url: https: //link.springer.com/chapter/10.1007/978-3-642-41644-6_14.
dc.relation.referencesTomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. “Linguistic Regularities in Contin- uous Space Word Representations”. In: June (2013), pp. 746–751.
dc.relation.referencesTomas Mikolov et al. Distributed Representations of Words and Phrases and their Compositionality. Tech. rep. 2013. url: http : / / papers . nips . cc / paper / 5021 - distributed-representations-of-words-and-phrases-and.
dc.relation.referencesTomas Mikolov et al. “Efficient estimation of word representations in vector space”. In: 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings. International Conference on Learning Representations, ICLR, Jan. 2013.
dc.relation.referencesKonstantinos N. Vavliakis, Andreas L. Symeonidis, and Pericles A. Mitkas. “Event identification in web social media through named entity recognition and topic model- ing”. In: Data and Knowledge Engineering 88 (Nov. 2013), pp. 1–24. issn: 0169023X. doi: 10.1016/j.datak.2013.08.006.
dc.relation.referencesYuening Hu et al. “Interactive topic modeling”. In: Mach Learn 95 (2014), pp. 423– 469. doi: 10.1007/s10994- 013- 5413- 0. url: http://www.policyagendas.org/ page/topic-codebook..
dc.relation.referencesYeqing Li et al. Large-Scale Multi-View Spectral Clustering with Bipartite Graph. Tech. rep. 2015. url: https://dl.acm.org/doi/10.5555/2886521.2886704.
dc.relation.referencesZechao Li et al. “Robust structured subspace learning for data representation”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 37.10 (Oct. 2015), pp. 2085–2098. issn: 01628828. doi: 10.1109/TPAMI.2015.2400461. url: https: //ieeexplore.ieee.org/document/7031960.Bibliography 77
dc.relation.referencesAndrew J. McMinn and Joemon M. Jose. “Real-time entity-based event detection for twitter”. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9283. Springer Verlag, 2015, pp. 65–77. isbn: 9783319240268. doi: 10.1007/978-3-319-24027-5{\_}6. url: https://link.springer.com/chapter/10.1007/978-3-319-24027-5_6.
dc.relation.referencesJohn Paisley et al. “Nested hierarchical dirichlet processes”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 37.2 (Feb. 2015), pp. 256–270. issn: 01628828. doi: 10.1109/TPAMI.2014.2318728. url: https://ieeexplore.ieee. org/abstract/document/6802355.
dc.relation.referencesZhao Zhang et al. “Joint low-rank and sparse principal feature coding for enhanced robust representation and visual classification”. In: IEEE Transactions on Image Pro- cessing 25.6 (June 2016), pp. 2429–2443. issn: 10577149. doi: 10.1109/TIP.2016. 2547180. url: https://ieeexplore.ieee.org/document/7442126.
dc.relation.referencesMehdi Allahyari and Krys Kochut. “Discovering Coherent Topics with Entity Topic Models”. In: Proceedings - 2016 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2016. Institute of Electrical and Electronics Engineers Inc., Jan. 2017, pp. 26–33. isbn: 9781509044702. doi: 10.1109/WI.2016.0015.
dc.relation.referencesPeixian Chen et al. “Latent Tree Models for Hierarchical Topic Detection”. In: Artificial Intelligence 250 (May 2017), pp. 105–124. url: http://arxiv.org/abs/1605.06650.
dc.relation.referencesZhourong Chen et al. Sparse Boltzmann Machines with Structure Learning as Applied to Text Analysis. Tech. rep. 2017. url: www.aaai.org.
dc.relation.referencesMatthew Honnibal and Ines Montani. “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing”. 2017.
dc.relation.referencesAshish Vaswani et al. “Transformer: Attention is all you need”. In: Advances in Neu- ral Information Processing Systems 30 (2017), pp. 5998–6008. issn: 10495258. url: https://arxiv.org/abs/1706.03762.
dc.relation.referencesJing Zhao et al. “Multi-view learning overview: Recent progress and new challenges”. In: Information Fusion 38 (2017), pp. 43–54. issn: 15662535. doi: 10.1016/j.inffus. 2017.02.007. url: http://dx.doi.org/10.1016/j.inffus.2017.02.007.
dc.relation.referencesXiaojun Chen et al. “Spectral clustering of large-scale data by directly solving normal- ized cut”. In: Proceedings of the ACM SIGKDD International Conference on Knowl- edge Discovery and Data Mining. Association for Computing Machinery, July 2018, pp. 1206–1215. isbn: 9781450355520. doi: 10.1145/3219819.3220039. url: https: //dl.acm.org/doi/10.1145/3219819.3220039.
dc.relation.referencesJacob Devlin et al. “BERT: Pre-training of Deep Bidirectional Transformers for Lan- guage Understanding”. In: (Oct. 2018). url: http://arxiv.org/abs/1810.04805.78 Bibliography
dc.relation.referencesZhao Kang et al. “Multi-graph Fusion for Multi-view Spectral Clustering”. In: Knowledge- Based Systems 189 (Sept. 2019). url: http://arxiv.org/abs/1909.06940.
dc.relation.referencesAlec Radford et al. “Language Models are Unsupervised Multitask Learners”. In: (2019). url: http://www.persagen.com/files/misc/radford2019language.pdf. [54] Tom B. Brown et al. “Language Models are Few-Shot Learners”. In: arXiv (May 2020). url: http://arxiv.org/abs/2005.14165.
dc.rights.accessrightsinfo:eu-repo/semantics/openAccess
dc.subject.proposalNamed entities
dc.subject.proposalTopic detection
dc.subject.proposalMulti-view clustering
dc.subject.proposalMulti-view learning
dc.subject.proposalGraph fusion
dc.subject.proposalEntidades nombradas
dc.subject.proposalAprendizaje multi-vista
dc.subject.proposalAgrupamiento multi-vista
dc.subject.proposalFusión de grafos
dc.subject.unescoIndexación automática
dc.subject.unescoRecuperación de información
dc.subject.unescoInformation processing
dc.subject.unescoAutomatic indexing
dc.title.translatedAprendizaje multi-vista para la detección jerárquica de temas en corpus de documentos
dc.type.coarhttp://purl.org/coar/resource_type/c_bdcc
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aa
dc.type.contentText
dc.type.redcolhttp://purl.org/redcol/resource_type/TM
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2


Archivos en el documento

Thumbnail
Thumbnail

Este documento aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del documento

Reconocimiento 4.0 InternacionalEsta obra está bajo licencia internacional Creative Commons Reconocimiento-NoComercial 4.0.Este documento ha sido depositado por parte de el(los) autor(es) bajo la siguiente constancia de depósito