Estudio comparativo de técnicas de clasificación binaria con múltiples anotadores

dc.contributor.advisorGil González, Julianspa
dc.contributor.advisorEspinosa Bedoya, Albeirospa
dc.contributor.authorGarcía Maya, Arles Felipespa
dc.contributor.researchgroupGIDIA: Grupo de Investigación y Desarrollo en Inteligencia Artificialspa
dc.coverage.sucursalUniversidad Nacional de Colombia - Sede Medellínspa
dc.date.accessioned2020-03-03T19:11:54Zspa
dc.date.available2020-03-03T19:11:54Zspa
dc.date.issued2019spa
dc.date.issued2019-12-02spa
dc.description.abstractEn los últimos años ha venido creciendo el interés de la comunidad de aprendizaje de máquina hacia el área de múltiples anotadores, esto debido a que existen problemas en los cuales contar con conjuntos de datos de un solo anotador es algo costoso, riesgoso o muy difícil de obtener. Esto ha generado que en la literatura existan diferentes técnicas y algoritmos que tratan de resolver el problema, al extraer el conocimiento de los múltiples anotadores para generar un conjunto de datos con un solo anotador; conocido como el conjunto de única etiqueta estimada. Es así como en este trabajo se realizó la selección, implementación, pruebas y análisis de las métricas precisión, Recall, F1 Score y ROC de cinco técnicas de aprendizaje de máquina con múltiples anotadores, con el objetivo de conocer el comportamiento de estas técnicas frente a diferentes bases de datos. Los resultados experimentales basados en las métricas de rendimiento obtenidas sobre las pruebas en las diferentes bases de datos evidencian grandes diferencias entre las técnicas para la misma base de datos, lo cual permite a la comunidad científica o profesional tener más criterios de selección a la hora de seleccionar alguna de las técnicas mencionadas aquí.spa
dc.description.abstractIn recent years, the machine learning community’s interest has been increasing towards multiple annotators' area because there are problems in which having single annotators datasets are too expensive, risky, or complex to obtain. These problems have generated in literature, different algorithms and techniques that try to solve that problem, either trying to extract knowledge from the multiple annotators’ datasets and then, creating a single annotator’s datasets; known as the estimated ground truth. Thus, in this document the selection, implementation, testing and performance analysis have been done with different metrics like accuracy, recall, F1 Score and ROC of five multiple annotators’ machine learning algorithms with the object of getting knowledge about the behavior of these techniques over different databases. The experimental results, based on the performance’s metrics obtained from test databases, show large differences among techniques for the same database, so then, this work can be a guide to the academic or professional community to get more choice criteria to select one of the techniques implemented here.spa
dc.description.additionalTrabajo de investigación presentada como requisito parcial para optar al título de Magister en Ingeniería de Sistemasspa
dc.description.commentsMagister en Ingeniería de Sistemasspa
dc.description.degreelevelMaestríaspa
dc.format.extent95spa
dc.format.mimetypeapplication/pdfspa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/75807
dc.language.isospaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Medellínspa
dc.relation.references[1] U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, "The KDD process for extracting useful knowledge from volumes of data", Communications of the ACM, vol. 39, no. 11, pp. 27-34, 1996spa
dc.relation.references[2] F. Rodrigues, F. Pereira and B. Ribeiro, "Learning from multiple annotators: Distinguishing good from random labelers", Pattern Recognition Letters, vol. 34, no. 12, pp. 1428-1436, 2013.spa
dc.relation.references[3] R. Filipe, "Probabilistic Models for Learning from Crowdsourced Data.", ResearchGate, 2016spa
dc.relation.references[4] Rodrigues, Filipe, and Francisco Pereira. "Deep learning from crowds." arXiv preprint arXiv:1709.01779 (2017).spa
dc.relation.references[5] "Amazon Mechanical Turk", Mturk.com, 2018. [Online]. Available: https://www.mturk.com/. [Accessed: 29- Sep- 2018].spa
dc.relation.references[6] D. Brabham, "Crowdsourcing as a Model for Problem Solving", Convergence: The International Journal of Research into New Media Technologies, vol. 14, no. 1, pp. 75-90, 2008.spa
dc.relation.references[7] RODRIGUES, Filipe; PEREIRA, Francisco; RIBEIRO, Bernardete. Gaussian process classification and active learning with multiple annotators. En International Conference on Machine Learning. 2014. p. 433-441.spa
dc.relation.references[8] ZHANG, Jing; WU, Xindong; SHENG, Victor S. Imbalanced multiple noisy labeling. IEEE Transactions on Knowledge and Data Engineering, 2015, vol. 27, no 2, p. 489-503.spa
dc.relation.references[9] RAYKAR, Vikas C., et al. Learning from crowds. Journal of Machine Learning Research, 2010, vol. 11, no Apr, p. 1297-1322.spa
dc.relation.references[11] GROOT, Perry; BIRLUTIU, Adriana; HESKES, Tom. Learning from multiple annotators with Gaussian processes. En International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, 2011. p. 159-164.spa
dc.relation.references[12] Yan, Y., Rosales, R., Fung, G., Schmidt, M., Hermosillo, G., Bogoni, L., ... & Dy, J. (2010, March). Modeling annotator expertise: Learning when everybody knows a bit of something. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 932-939)spa
dc.relation.references[13] RODRIGUES, Filipe, et al. Learning supervised topic models for classification and regression from crowds. IEEE transactions on pattern analysis and machine intelligence, 2017, vol. 39, no 12, p. 2409-2422.spa
dc.relation.references[14] RODRIGUES, Filipe Manuel Pereira Duarte. Probabilistic models for learning from crowdsourced data. 2016. Tesis Doctoral.spa
dc.relation.references[15] RISTOVSKI, Kosta, et al. Regression Learning with Multiple Noisy Oracles. En ECAI. 2010. p. 445-450.spa
dc.relation.references[16] J. Gil-Gonzalez, A. Alvarez-Meza and A. Orozco-Gutierrez, "Learning from multiple annotators using kernel alignment", Pattern Recognition Letters, vol. 116, pp. 150-156, 2018spa
dc.relation.references[18] "Pattern Recognition and Machine Learning", Journal of Electronic Imaging, vol. 16, no. 4, p. 049901, 2007. Available: 10.1117/1.2819119 [Accessed 14 January 2019].spa
dc.relation.references[19] Howe, J., 2008. Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business, 1st Edition. Crown Publishing Group, New York, NY, USA.spa
dc.relation.references[20] M. Esmaeily, H.S. Yazdi, S. Abbassi, R. Monsefi, Hierarchical cooperation of experts in learning from crowds, in: ICCKE, IEEE, 2016, pp. 211–217.spa
dc.relation.references[21] Little, M. (2019). UCI Machine Learning Repository: Parkinsons Data Set. [online] Archive.ics.uci.edu. Available at: https://archive.ics.uci.edu/ml/datasets/parkinsons [Accessed 3 Aug. 2019].spa
dc.relation.references[22] Little, M., McSharry, P., Roberts, S., Costello, D. and Moroz, I. (2007). Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection. Nature Precedings.spa
dc.relation.references[23] H. Wolberg, W. (2019). UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set. [online] Archive.ics.uci.edu. Available at: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) [Accessed 3 Aug. 2019].spa
dc.relation.references[24] Barreto, G. and da Rocha Neto, A. (2019). UCI Machine Learning Repository: Vertebral Column Data Set. [online] Archive.ics.uci.edu. Available at: http://archive.ics.uci.edu/ml/datasets/vertebral+column [Accessed 3 Aug. 2019].spa
dc.relation.references[25] Sigillito, V. (2019). UCI Machine Learning Repository: Ionosphere Data Set. [online] Archive.ics.uci.edu. Available at: https://archive.ics.uci.edu/ml/datasets/ionosphere [Accessed 3 Aug. 2019].spa
dc.relation.references[26] Garcia Maya, A. (2019). FelipeGarcia911/LearningFromCrowds. [online] GitHub. Available at: https://github.com/FelipeGarcia911/LearningFromCrowds [Accessed 16 Aug. 2019].spa
dc.rightsDerechos reservados - Universidad Nacional de Colombiaspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseAtribución-NoComercial 4.0 Internacionalspa
dc.rights.spaAcceso abiertospa
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/spa
dc.subject.ddcTecnología (Ciencias aplicadas)spa
dc.subject.proposalAprendizaje de máquinaspa
dc.subject.proposalPython.eng
dc.subject.proposalBinary classificationeng
dc.subject.proposalClasificación binariaspa
dc.subject.proposalGround truth estimationeng
dc.subject.proposalEtiqueta verdaderaspa
dc.subject.proposalMachine learning multiple annotatorseng
dc.titleEstudio comparativo de técnicas de clasificación binaria con múltiples anotadoresspa
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1088294092.2019.pdf
Tamaño:
1.2 MB
Formato:
Adobe Portable Document Format
Descripción:
Tesis de Maestría en Ingeniería - Ingeniería de Sistemas

Bloque de licencias

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
3.9 KB
Formato:
Item-specific license agreed upon to submission
Descripción: