Modelos generativos: Generación de audio en bioacustica
dc.contributor.advisor | Hernandez-Romero, Freddy Rolando | |
dc.contributor.advisor | Gomez Jaramillo, Francisco Albeiro | |
dc.contributor.author | Ñungo Manrique, Jose Sebastián | |
dc.date.accessioned | 2025-03-11T13:56:33Z | |
dc.date.available | 2025-03-11T13:56:33Z | |
dc.date.issued | 2024 | |
dc.description | ilustraciones, diagramas, fotografías, tablas | spa |
dc.description.abstract | Este estudio aborda el desafío de la disponibilidad limitada y la baja calidad de datos de audio en bioacústica, centrándose específicamente en la generación de croares realistas de la rana Boana faber. Proponemos un enfoque novedoso utilizando modelos probabilísticos de difusión, una potente técnica de aprendizaje profundo para la síntesis de audio. Debido a las demandas computacionales de estos modelos, implementamos un proceso de selección sistemático basado en la Distancia de Incepción de Fréchet (FID) y la agrupación K-medias para identificar un subconjunto de muestras generadas de alta calidad de un grupo más amplio. Evaluamos las muestras de audio generadas a través de un experimento de percepción humana en formato de pruebas A/B. Los resultados demuestran que nuestro modelo entrenado genera croares convincentes de Boana faber, incluso con un entrenamiento truncado, destacando el potencial del modelo para generar datos bioacústicos realistas. Este enfoque ofrece posibilidades prometedoras para mejorar los conjuntos de datos existentes y mejorar el rendimiento de los sistemas automatizados de monitoreo de la biodiversidad (Texto tomado de la fuente) | spa |
dc.description.abstract | This study addresses the challenge of limited and low-quality audio data in bioacoustics, specifically focusing on the generation of realistic frog croaks for the species Boana faber. We propose a novel approach using diffusion probabilistic models, a powerful deep learning technique for audio synthesis. Due to the computational demands of these models, we implement a systematic selection process based on Fréchet Inception Distance (FID) and K-means clustering to identify a subset of high-quality generated samples from a larger pool. We evaluated the generated audio samples through a human perception experiment in an A/B testing format. The results demonstrate that our trained model generates convincing Boana faber croaks, even with truncated training, highlighting the model’s potential for generating realistic bioacoustic data. This approach offers promising possibilities for improving existing datasets and improving the performance of automated biodiversity monitoring systems. | eng |
dc.description.degreelevel | Maestría | spa |
dc.description.degreename | Magíster en Ciencias - Matemática Aplicada | spa |
dc.format.extent | vii, 26 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.identifier.instname | Universidad Nacional de Colombia | spa |
dc.identifier.reponame | Repositorio Institucional Universidad Nacional de Colombia | spa |
dc.identifier.repourl | https://repositorio.unal.edu.co/ | spa |
dc.identifier.uri | https://repositorio.unal.edu.co/handle/unal/87632 | |
dc.language.iso | eng | spa |
dc.publisher | Universidad Nacional de Colombia | spa |
dc.publisher.branch | Universidad Nacional de Colombia - Sede Bogotá | spa |
dc.publisher.faculty | Facultad de Ciencias | spa |
dc.publisher.place | Bogotá, Colombia | spa |
dc.publisher.program | Bogotá - Ciencias - Maestría en Ciencias - Matemática Aplicada | spa |
dc.relation.references | ABU-MOSTAFA, Yaser S.; MAGDON-ISMAIL, Malik; LIN, Hsuan-Tien: Learning from data. Bd. 4. AMLBook New York, 2012 | spa |
dc.relation.references | BORSOS, Zalán; MARINIER, Raphaël; VINCENT, Damien; KHARITONOV, Eugene; PIETQUIN, Olivier; SHARIFI, Matt; ROBLEK, Dominik; TEBOUL, Olivier; GRANGIER, David; TAGLIASACCHI, Marco u. a.: Audiolm: a language modeling approach to audio generation. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023) | spa |
dc.relation.references | BROWNING, Ella; GIBB, Rory; GLOVER-KAPFER, Paul; JONES, Kate E.: Passive acoustic monitoring in ecology and conservation. WWF-UK, 2017 | spa |
dc.relation.references | CAÑAS, Juan S.; TORO-GÓMEZ, María Paula; SUGAI, Larissa Sayuri M.; BENÍTEZ RESTREPO, Hernán D.; RUDAS, Jorge; POSSO BAUTISTA, Breyner; TOLEDO, Luis F.; DENA, Simone; DOMINGOS, Adā.; Souza, Franco L. u. a.: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring. In: Scientific Data 10 (2023), Nr. 1, S. 771 | spa |
dc.relation.references | COLONNA, Juan; PEET, Tanel; FERREIRA, Carlos A.; JORGE, Alípio M; GOMES, Elsa F.; GAMA, João: Automatic classification of anuran sounds using convolutional neural networks. In: Proceedings of the ninth international c conference on computer science & software engineering, 2016, S. 73-78 | spa |
dc.relation.references | COOPER, Erica; HUANG, Wen-Chin; TSAO, Yu; WANG, Hsin-Min; TODA, Tomoki; YAMAGISHI, Junichi: A review on subjective and objective evaluation of synthetic speech. In: Acoustical Science and Technology advpub (2024), S. e24.12. http://dx.doi.org/10.1250/ ast.e24.12. DOI 10.1250/ast.e24.12 | spa |
dc.relation.references | CUI, Xiaodong; GOEL, Vaibhava; KINGSBURY, Brian: Data augmentation for deep neural network acoustic modeling. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (2015), Nr. 9, S. 1469-1477 | spa |
dc.relation.references | DE ARAÚJO, CB; LIMA, Marcos R.; ALBUQUERQUE, P; ALQUEZAR, Renata D.; BARREIROS, M; JARDIM, M GANGENOVA, E MACHADO, RB PHALAN, BT; Roos, AL u. a.: Acoustic monitoring of anurans and birds in tropical biomes. In: Biotropica 56 (2024), Nr. 3, S. e13307 | spa |
dc.relation.references | DENA, Simone; REBOUČAS, Raoni; AUGUSTO-ALVES, Guilherme; ZORNOSA-TORRES, Camila ; PONTES, Mariana R.; TOLEDO, Luís F.: How much are we losing in not depositing anuran sound recordings in scientific collections? In: Bioacoustics 29 (2020), Nr. 5, S. 590-601 | spa |
dc.relation.references | DHARIWAL, Prafulla; JUN, Heewoo PAYNE, Christine; KIM, Jong W.; RADFORD, Alec; SUTSKEVER, Ilya: Jukebox: A generative model for music. In: arXiv preprint arXiv:2005.00341 (2020) | spa |
dc.relation.references | DHARIWAL, Prafulla; NICHOL, Alexander: Diffusion Models Beat GANs on Image Synthesis. In: RANZATO, M. (Hrsg.); BEYGELZIMER, A. (Hrsg.); DAUPHIN, Y. (Hrsg.); LIANG, P.S. (Hrsg.); VAUGHAN, J. W. (Hrsg.): Advances in Neural Information Processing Systems Bd. 34, Curran Associates, Inc., 2021, 8780-8794 | spa |
dc.relation.references | DUCHI, John: Derivations for Linear Algebra and Optimization / Stanford University. Version: 2007. https://web.stanford.edu/~jduchi/projects/general_notes.pdf. 2007. Forschungsbericht | spa |
dc.relation.references | EMMRICH, Mike; VENCES, Miguel; ERNST, Raffael; KÖHLER, Jörn; BAREJ, Michael F.; GLAW, Frank; JANSEN, Martin; RÖDEL, Mark-Oliver: A guild classification system proposed for anuran advertisement calls. In: Zoosystematics and Evolution 96 (2020), Nr. 2, S. 515-525 | spa |
dc.relation.references | GAN, Hongxiao; ZHANG, Jinglan; TOWSEY, Michael; TRUSKINGER, Anthony; STARK, Debra ; VAN RENSBURG, Berndt J.; LI, Yuefeng; ROE, Paul: A novel frog chorusing recognition method with acoustic indices and machine learning. In: Future Generation Computer Systems 125 (2021), S. 485-495 | spa |
dc.relation.references | GOODFELLOW, Ian; POUGET-ABADIE, Jean; MIRZA, Mehdi; Xu, Bing; WARDE-FARLEY, David ; OZAIR, Sherjil; COURVILLE, Aaron; BENGIO, Yoshua: Generative Adversarial Nets. In: GHAHRAMANI, Z. (Hrsg.); WELLING, M. (Hrsg.); CORTES, C. (Hrsg.); LAWRENCE, N. (Hrsg.); WEINBERGER, K.Q. (Hrsg.): Advances in Neural Information Processing Systems Bd. 27, Curran Associates, Inc., 2014 | spa |
dc.relation.references | HABA, Duc: Data Augmentation with Python: Enhance deep learning accuracy with data augmentation methods for image, text, audio, and tabular data. Packt Publishing Ltd, 2023 | spa |
dc.relation.references | HE, Haibo; MA, Yunqian: Imbalanced learning: foundations, algorithms, and applications. John Wiley & Sons, 2013 | spa |
dc.relation.references | HEUSEL, Martin; RAMSAUER, Hubert; UNTERTHINER, Thomas; NESSLER, Bernhard; HOCHREITER, Sepp: GANS Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In: GUYON, I. (Hrsg.); LUXBURG, U. V. (Hrsg.); BENGIO, S. (Hrsg.); WALLACH, H. (Hrsg.); FERGUS, R. (Hrsg.); VISHWANATHAN, S. (Hrsg.); GARNETT, R. (Hrsg.): Advances in Neural Information Processing Systems Bd. 30, Curran Associates, Inc., 2017 | spa |
dc.relation.references | Ho, Jonathan; JAIN, Ajay; ABBEEL, Pieter: Denoising diffusion probabilistic models. In: Advances in neural information processing systems 33 (2020), S. 6840-6851 | spa |
dc.relation.references | HUANG, Chenn-Jung CHEN, You-Jia CHEN, Heng-Ming JIAN, Jui-Jiun; TSENG, ShengChieh; YANG, Yi-Ju; Hsu, Po-An: Intelligent feature extraction and classification of anuran vocalizations. In: Applied Soft Computing 19 (2014), S. 1-7 | spa |
dc.relation.references | KAUR, Parvinder; KHEHRA, Baljit S.; MAVI, Er Bhupinder S.: Data augmentation for object detection: A review. In: 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) IEEE, 2021, S. 537-543 | spa |
dc.relation.references | KINGMA, Diederik P.; WELLING, Max: Auto-encoding variational bayes. In: arXiv preprint arXiv:1312.6114 (2013) | spa |
dc.relation.references | KONG, Zhifeng; PING, Wei; HUANG, Jiaji; ZHAO, Kexin; CATANZARO, Bryan: Diffwave: A versatile diffusion model for audio synthesis. In: arXiv preprint arXiv:2009.09761 (2020) | spa |
dc.relation.references | LUEDTKE, Jennifer A.; CHANSON, Janice; NEAM, Kelsey; HOBIN, Louise; MACIEL, Adriano O.; CATENAZZI, Alessandro BORZÉE, Amaël; HAMIDY, Amir; AOWPHOL, Anchalee JEAN, Anderson u. a.: Ongoing declines for the world's amphibians in the face of emerging threats. In: Nature 622 (2023), Nr. 7982, S. 308-314 | spa |
dc.relation.references | Luo, Calvin: Understanding diffusion models: A unified perspective. In: arXiv preprint arXiv:2208.11970 (2022) | spa |
dc.relation.references | MATHWIN, Rupert; WASSENS, Skye; TURNER, Anna; HEARD, Geoffrey W.; HALL, Andrew ; BRADSHAW, Corey J.: Modelling the sustainable harvest of wild populations for the conservation of a threatened amphibian. In: Austral Ecology 49 (2024), Nr. 2, S. e13492 | spa |
dc.relation.references | OORD, Aäron van den; DIELEMAN, Sander; ZEN, Heiga; SIMONYAN, Karen; VINYALS, Oriol; GRAVES, Alex; KALCHBRENNER, Nal; SENIOR, Andrew W.; KAVUKCUOGLU, Koray: WaveNet: A Generative Model for Raw Audio. In: CoRR abs/1609.03499 (2016). http://arxiv.org/ abs/1609.03499 | spa |
dc.relation.references | PIJANOWSKI, Bryan C.: Principles of Soundscape Ecology: Discovering Our Sonic World. University of Chicago Press, 2024 | spa |
dc.relation.references | PRINCE, Simon J.: Understanding deep learning. MIT press, 2023 | spa |
dc.relation.references | REZENDE, Danilo; MOHAMED, Shakir: Variational inference with normalizing flows. In: International conference on machine learning PMLR, 2015, S. 1530-1538 | spa |
dc.relation.references | SCHNEIDER, Flavio; JIN, Zhijing; SCHÖLKOPF, Bernhard: Môusai: Text-to-Music Generation with Long-Context Latent Diffusion. In: arXiv preprint arXiv:2301.11757 (2023) | spa |
dc.relation.references | SHIRALI-SHAHREZA, Sajad; PENN, Gerald: MOS Naturalness and the Quest for Human-Like Speech. In: 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, S. 346-352 | spa |
dc.relation.references | SOHL-DICKSTEIN, Jascha; WEISS, Eric; MAHESWARANATHAN, Niru; GANGULI, Surya: Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning PMLR, 2015, S. 2256-2265 | spa |
dc.relation.references | STROUT, Julia; ROGAN, Bryce; SEYEDNEZHAD, SM M.; SMART, Katrina; BUSH, Mark; RIBEIRO, Eraldo: Anuran call classification with deep learning. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE, 2017, S. 2662-2665 | spa |
dc.relation.references | TIPPING, Michael E.; BISHOP, Christopher M.: Mixtures of probabilistic principal component analyzers. In: Neural computation 11 (1999), Nr. 2, S. 443-482 | spa |
dc.relation.references | VIDAL, Marcela A.; HENRÍQUEZ, Nayadet; TORRES-Díaz, Cristian; COLLADO, Gonzalo; ACUÑA-RODRÍGUEZ, Ian S.: Identifying Strategies for Effective Biodiversity Preservation and Species Status of Chilean Amphibians. In: Biology 13 (2024), Nr. 3, S. 169 | spa |
dc.relation.references | VILLON, Sébastien IOVAN, Corina MANGEAS, Morgan VIGLIOLA, Laurent: Con- fronting deep-learning and biodiversity challenges for automatic video-monitoring of marine ecosystems. In: Sensors 22 (2022), Nr. 2, S. 497 | spa |
dc.relation.references | XIE, Jie; TOWSEY, Michael; ZHANG, Jinglan; ROE, Paul: Acoustic classification of Australian frogs based on enhanced features and machine learning algorithms. In: Applied Acoustics 113 (2016), S. 193-201 | spa |
dc.relation.references | XIE, Saining; GIRSHICK, Ross; DOLLÁR, Piotr TU, Zhuowen; HE, Kaiming: Aggregated Residual Transformations for Deep Neural Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, S. 5987-5995 | spa |
dc.relation.references | Xu, Yuan; TUGULDUR, Erdene-Ochir: Convolutional neural networks for Google speech commands data set with PyTorch. https://github.com/tugstugi/pytorch-speech-commands, 2017 | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.rights.license | Reconocimiento 4.0 Internacional | spa |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | spa |
dc.subject.ddc | 510 - Matemáticas::519 - Probabilidades y matemáticas aplicadas | spa |
dc.subject.proposal | Modelos Generativos | spa |
dc.subject.proposal | Bioacuática | spa |
dc.subject.proposal | Modelos de Difusión | spa |
dc.subject.proposal | Generative Models | eng |
dc.subject.proposal | Bioacustics | eng |
dc.subject.proposal | Diffusion Models | eng |
dc.subject.wikidata | bioacoustics | eng |
dc.subject.wikidata | bioacústica | spa |
dc.subject.wikidata | Fréchet inception distance | eng |
dc.subject.wikidata | Distancia de inicio de Fréchet | spa |
dc.subject.wikidata | k-means clustering | eng |
dc.subject.wikidata | k-medias | spa |
dc.title | Modelos generativos: Generación de audio en bioacustica | spa |
dc.title.translated | Generative models: Audio generation in bioacoustics | eng |
dc.type | Trabajo de grado - Maestría | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
dc.type.coarversion | http://purl.org/coar/version/c_ab4af688f83e57aa | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/masterThesis | spa |
dc.type.redcol | http://purl.org/redcol/resource_type/TM | spa |
dc.type.version | info:eu-repo/semantics/acceptedVersion | spa |
dcterms.audience.professionaldevelopment | Investigadores | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- 1020835122.2024.pdf
- Tamaño:
- 4.62 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Final Thesis version (Corrected version)
Bloque de licencias
1 - 1 de 1
No hay miniatura disponible
- Nombre:
- license.txt
- Tamaño:
- 5.74 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción: