Estudio comparativo de técnicas de clasificación binaria con múltiples anotadores
| dc.contributor.advisor | Gil González, Julian | spa |
| dc.contributor.advisor | Espinosa Bedoya, Albeiro | spa |
| dc.contributor.author | García Maya, Arles Felipe | spa |
| dc.contributor.researchgroup | GIDIA: Grupo de Investigación y Desarrollo en Inteligencia Artificial | spa |
| dc.coverage.sucursal | Universidad Nacional de Colombia - Sede Medellín | spa |
| dc.date.accessioned | 2020-03-03T19:11:54Z | spa |
| dc.date.available | 2020-03-03T19:11:54Z | spa |
| dc.date.issued | 2019 | spa |
| dc.date.issued | 2019-12-02 | spa |
| dc.description.abstract | En los últimos años ha venido creciendo el interés de la comunidad de aprendizaje de máquina hacia el área de múltiples anotadores, esto debido a que existen problemas en los cuales contar con conjuntos de datos de un solo anotador es algo costoso, riesgoso o muy difícil de obtener. Esto ha generado que en la literatura existan diferentes técnicas y algoritmos que tratan de resolver el problema, al extraer el conocimiento de los múltiples anotadores para generar un conjunto de datos con un solo anotador; conocido como el conjunto de única etiqueta estimada. Es así como en este trabajo se realizó la selección, implementación, pruebas y análisis de las métricas precisión, Recall, F1 Score y ROC de cinco técnicas de aprendizaje de máquina con múltiples anotadores, con el objetivo de conocer el comportamiento de estas técnicas frente a diferentes bases de datos. Los resultados experimentales basados en las métricas de rendimiento obtenidas sobre las pruebas en las diferentes bases de datos evidencian grandes diferencias entre las técnicas para la misma base de datos, lo cual permite a la comunidad científica o profesional tener más criterios de selección a la hora de seleccionar alguna de las técnicas mencionadas aquí. | spa |
| dc.description.abstract | In recent years, the machine learning community’s interest has been increasing towards multiple annotators' area because there are problems in which having single annotators datasets are too expensive, risky, or complex to obtain. These problems have generated in literature, different algorithms and techniques that try to solve that problem, either trying to extract knowledge from the multiple annotators’ datasets and then, creating a single annotator’s datasets; known as the estimated ground truth. Thus, in this document the selection, implementation, testing and performance analysis have been done with different metrics like accuracy, recall, F1 Score and ROC of five multiple annotators’ machine learning algorithms with the object of getting knowledge about the behavior of these techniques over different databases. The experimental results, based on the performance’s metrics obtained from test databases, show large differences among techniques for the same database, so then, this work can be a guide to the academic or professional community to get more choice criteria to select one of the techniques implemented here. | spa |
| dc.description.additional | Trabajo de investigación presentada como requisito parcial para optar al título de Magister en Ingeniería de Sistemas | spa |
| dc.description.comments | Magister en Ingeniería de Sistemas | spa |
| dc.description.degreelevel | Maestría | spa |
| dc.format.extent | 95 | spa |
| dc.format.mimetype | application/pdf | spa |
| dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/75807 | |
| dc.language.iso | spa | spa |
| dc.publisher.branch | Universidad Nacional de Colombia - Sede Medellín | spa |
| dc.relation.references | [1] U. Fayyad, G. Piatetsky-Shapiro and P. Smyth, "The KDD process for extracting useful knowledge from volumes of data", Communications of the ACM, vol. 39, no. 11, pp. 27-34, 1996 | spa |
| dc.relation.references | [2] F. Rodrigues, F. Pereira and B. Ribeiro, "Learning from multiple annotators: Distinguishing good from random labelers", Pattern Recognition Letters, vol. 34, no. 12, pp. 1428-1436, 2013. | spa |
| dc.relation.references | [3] R. Filipe, "Probabilistic Models for Learning from Crowdsourced Data.", ResearchGate, 2016 | spa |
| dc.relation.references | [4] Rodrigues, Filipe, and Francisco Pereira. "Deep learning from crowds." arXiv preprint arXiv:1709.01779 (2017). | spa |
| dc.relation.references | [5] "Amazon Mechanical Turk", Mturk.com, 2018. [Online]. Available: https://www.mturk.com/. [Accessed: 29- Sep- 2018]. | spa |
| dc.relation.references | [6] D. Brabham, "Crowdsourcing as a Model for Problem Solving", Convergence: The International Journal of Research into New Media Technologies, vol. 14, no. 1, pp. 75-90, 2008. | spa |
| dc.relation.references | [7] RODRIGUES, Filipe; PEREIRA, Francisco; RIBEIRO, Bernardete. Gaussian process classification and active learning with multiple annotators. En International Conference on Machine Learning. 2014. p. 433-441. | spa |
| dc.relation.references | [8] ZHANG, Jing; WU, Xindong; SHENG, Victor S. Imbalanced multiple noisy labeling. IEEE Transactions on Knowledge and Data Engineering, 2015, vol. 27, no 2, p. 489-503. | spa |
| dc.relation.references | [9] RAYKAR, Vikas C., et al. Learning from crowds. Journal of Machine Learning Research, 2010, vol. 11, no Apr, p. 1297-1322. | spa |
| dc.relation.references | [11] GROOT, Perry; BIRLUTIU, Adriana; HESKES, Tom. Learning from multiple annotators with Gaussian processes. En International Conference on Artificial Neural Networks. Springer, Berlin, Heidelberg, 2011. p. 159-164. | spa |
| dc.relation.references | [12] Yan, Y., Rosales, R., Fung, G., Schmidt, M., Hermosillo, G., Bogoni, L., ... & Dy, J. (2010, March). Modeling annotator expertise: Learning when everybody knows a bit of something. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (pp. 932-939) | spa |
| dc.relation.references | [13] RODRIGUES, Filipe, et al. Learning supervised topic models for classification and regression from crowds. IEEE transactions on pattern analysis and machine intelligence, 2017, vol. 39, no 12, p. 2409-2422. | spa |
| dc.relation.references | [14] RODRIGUES, Filipe Manuel Pereira Duarte. Probabilistic models for learning from crowdsourced data. 2016. Tesis Doctoral. | spa |
| dc.relation.references | [15] RISTOVSKI, Kosta, et al. Regression Learning with Multiple Noisy Oracles. En ECAI. 2010. p. 445-450. | spa |
| dc.relation.references | [16] J. Gil-Gonzalez, A. Alvarez-Meza and A. Orozco-Gutierrez, "Learning from multiple annotators using kernel alignment", Pattern Recognition Letters, vol. 116, pp. 150-156, 2018 | spa |
| dc.relation.references | [18] "Pattern Recognition and Machine Learning", Journal of Electronic Imaging, vol. 16, no. 4, p. 049901, 2007. Available: 10.1117/1.2819119 [Accessed 14 January 2019]. | spa |
| dc.relation.references | [19] Howe, J., 2008. Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business, 1st Edition. Crown Publishing Group, New York, NY, USA. | spa |
| dc.relation.references | [20] M. Esmaeily, H.S. Yazdi, S. Abbassi, R. Monsefi, Hierarchical cooperation of experts in learning from crowds, in: ICCKE, IEEE, 2016, pp. 211–217. | spa |
| dc.relation.references | [21] Little, M. (2019). UCI Machine Learning Repository: Parkinsons Data Set. [online] Archive.ics.uci.edu. Available at: https://archive.ics.uci.edu/ml/datasets/parkinsons [Accessed 3 Aug. 2019]. | spa |
| dc.relation.references | [22] Little, M., McSharry, P., Roberts, S., Costello, D. and Moroz, I. (2007). Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection. Nature Precedings. | spa |
| dc.relation.references | [23] H. Wolberg, W. (2019). UCI Machine Learning Repository: Breast Cancer Wisconsin (Diagnostic) Data Set. [online] Archive.ics.uci.edu. Available at: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) [Accessed 3 Aug. 2019]. | spa |
| dc.relation.references | [24] Barreto, G. and da Rocha Neto, A. (2019). UCI Machine Learning Repository: Vertebral Column Data Set. [online] Archive.ics.uci.edu. Available at: http://archive.ics.uci.edu/ml/datasets/vertebral+column [Accessed 3 Aug. 2019]. | spa |
| dc.relation.references | [25] Sigillito, V. (2019). UCI Machine Learning Repository: Ionosphere Data Set. [online] Archive.ics.uci.edu. Available at: https://archive.ics.uci.edu/ml/datasets/ionosphere [Accessed 3 Aug. 2019]. | spa |
| dc.relation.references | [26] Garcia Maya, A. (2019). FelipeGarcia911/LearningFromCrowds. [online] GitHub. Available at: https://github.com/FelipeGarcia911/LearningFromCrowds [Accessed 16 Aug. 2019]. | spa |
| dc.rights | Derechos reservados - Universidad Nacional de Colombia | spa |
| dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
| dc.rights.license | Atribución-NoComercial 4.0 Internacional | spa |
| dc.rights.spa | Acceso abierto | spa |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | spa |
| dc.subject.ddc | Tecnología (Ciencias aplicadas) | spa |
| dc.subject.proposal | Aprendizaje de máquina | spa |
| dc.subject.proposal | Python. | eng |
| dc.subject.proposal | Binary classification | eng |
| dc.subject.proposal | Clasificación binaria | spa |
| dc.subject.proposal | Ground truth estimation | eng |
| dc.subject.proposal | Etiqueta verdadera | spa |
| dc.subject.proposal | Machine learning multiple annotators | eng |
| dc.title | Estudio comparativo de técnicas de clasificación binaria con múltiples anotadores | spa |
| dc.type | Trabajo de grado - Maestría | spa |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
| dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
| dc.type.content | Text | spa |
| dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
| dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
| oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 1088294092.2019.pdf
- Tamaño:
- 1.2 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Tesis de Maestría en Ingeniería - Ingeniería de Sistemas
Bloque de licencias
1 - 1 de 1
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 3.9 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:

