Modelos generativos: Generación de audio en bioacustica

dc.contributor.advisorHernandez-Romero, Freddy Rolando
dc.contributor.advisorGomez Jaramillo, Francisco Albeiro
dc.contributor.authorÑungo Manrique, Jose Sebastián
dc.date.accessioned2025-03-11T13:56:33Z
dc.date.available2025-03-11T13:56:33Z
dc.date.issued2024
dc.descriptionilustraciones, diagramas, fotografías, tablasspa
dc.description.abstractEste estudio aborda el desafío de la disponibilidad limitada y la baja calidad de datos de audio en bioacústica, centrándose específicamente en la generación de croares realistas de la rana Boana faber. Proponemos un enfoque novedoso utilizando modelos probabilísticos de difusión, una potente técnica de aprendizaje profundo para la síntesis de audio. Debido a las demandas computacionales de estos modelos, implementamos un proceso de selección sistemático basado en la Distancia de Incepción de Fréchet (FID) y la agrupación K-medias para identificar un subconjunto de muestras generadas de alta calidad de un grupo más amplio. Evaluamos las muestras de audio generadas a través de un experimento de percepción humana en formato de pruebas A/B. Los resultados demuestran que nuestro modelo entrenado genera croares convincentes de Boana faber, incluso con un entrenamiento truncado, destacando el potencial del modelo para generar datos bioacústicos realistas. Este enfoque ofrece posibilidades prometedoras para mejorar los conjuntos de datos existentes y mejorar el rendimiento de los sistemas automatizados de monitoreo de la biodiversidad (Texto tomado de la fuente)spa
dc.description.abstractThis study addresses the challenge of limited and low-quality audio data in bioacoustics, specifically focusing on the generation of realistic frog croaks for the species Boana faber. We propose a novel approach using diffusion probabilistic models, a powerful deep learning technique for audio synthesis. Due to the computational demands of these models, we implement a systematic selection process based on Fréchet Inception Distance (FID) and K-means clustering to identify a subset of high-quality generated samples from a larger pool. We evaluated the generated audio samples through a human perception experiment in an A/B testing format. The results demonstrate that our trained model generates convincing Boana faber croaks, even with truncated training, highlighting the model’s potential for generating realistic bioacoustic data. This approach offers promising possibilities for improving existing datasets and improving the performance of automated biodiversity monitoring systems.eng
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Ciencias - Matemática Aplicadaspa
dc.format.extentvii, 26 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.identifier.instnameUniversidad Nacional de Colombiaspa
dc.identifier.reponameRepositorio Institucional Universidad Nacional de Colombiaspa
dc.identifier.repourlhttps://repositorio.unal.edu.co/spa
dc.identifier.urihttps://repositorio.unal.edu.co/handle/unal/87632
dc.language.isoengspa
dc.publisherUniversidad Nacional de Colombiaspa
dc.publisher.branchUniversidad Nacional de Colombia - Sede Bogotáspa
dc.publisher.facultyFacultad de Cienciasspa
dc.publisher.placeBogotá, Colombiaspa
dc.publisher.programBogotá - Ciencias - Maestría en Ciencias - Matemática Aplicadaspa
dc.relation.referencesABU-MOSTAFA, Yaser S.; MAGDON-ISMAIL, Malik; LIN, Hsuan-Tien: Learning from data. Bd. 4. AMLBook New York, 2012spa
dc.relation.referencesBORSOS, Zalán; MARINIER, Raphaël; VINCENT, Damien; KHARITONOV, Eugene; PIETQUIN, Olivier; SHARIFI, Matt; ROBLEK, Dominik; TEBOUL, Olivier; GRANGIER, David; TAGLIASACCHI, Marco u. a.: Audiolm: a language modeling approach to audio generation. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023)spa
dc.relation.referencesBROWNING, Ella; GIBB, Rory; GLOVER-KAPFER, Paul; JONES, Kate E.: Passive acoustic monitoring in ecology and conservation. WWF-UK, 2017spa
dc.relation.referencesCAÑAS, Juan S.; TORO-GÓMEZ, María Paula; SUGAI, Larissa Sayuri M.; BENÍTEZ RESTREPO, Hernán D.; RUDAS, Jorge; POSSO BAUTISTA, Breyner; TOLEDO, Luis F.; DENA, Simone; DOMINGOS, Adā.; Souza, Franco L. u. a.: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring. In: Scientific Data 10 (2023), Nr. 1, S. 771spa
dc.relation.referencesCOLONNA, Juan; PEET, Tanel; FERREIRA, Carlos A.; JORGE, Alípio M; GOMES, Elsa F.; GAMA, João: Automatic classification of anuran sounds using convolutional neural networks. In: Proceedings of the ninth international c conference on computer science & software engineering, 2016, S. 73-78spa
dc.relation.referencesCOOPER, Erica; HUANG, Wen-Chin; TSAO, Yu; WANG, Hsin-Min; TODA, Tomoki; YAMAGISHI, Junichi: A review on subjective and objective evaluation of synthetic speech. In: Acoustical Science and Technology advpub (2024), S. e24.12. http://dx.doi.org/10.1250/ ast.e24.12. DOI 10.1250/ast.e24.12spa
dc.relation.referencesCUI, Xiaodong; GOEL, Vaibhava; KINGSBURY, Brian: Data augmentation for deep neural network acoustic modeling. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (2015), Nr. 9, S. 1469-1477spa
dc.relation.referencesDE ARAÚJO, CB; LIMA, Marcos R.; ALBUQUERQUE, P; ALQUEZAR, Renata D.; BARREIROS, M; JARDIM, M GANGENOVA, E MACHADO, RB PHALAN, BT; Roos, AL u. a.: Acoustic monitoring of anurans and birds in tropical biomes. In: Biotropica 56 (2024), Nr. 3, S. e13307spa
dc.relation.referencesDENA, Simone; REBOUČAS, Raoni; AUGUSTO-ALVES, Guilherme; ZORNOSA-TORRES, Camila ; PONTES, Mariana R.; TOLEDO, Luís F.: How much are we losing in not depositing anuran sound recordings in scientific collections? In: Bioacoustics 29 (2020), Nr. 5, S. 590-601spa
dc.relation.referencesDHARIWAL, Prafulla; JUN, Heewoo PAYNE, Christine; KIM, Jong W.; RADFORD, Alec; SUTSKEVER, Ilya: Jukebox: A generative model for music. In: arXiv preprint arXiv:2005.00341 (2020)spa
dc.relation.referencesDHARIWAL, Prafulla; NICHOL, Alexander: Diffusion Models Beat GANs on Image Synthesis. In: RANZATO, M. (Hrsg.); BEYGELZIMER, A. (Hrsg.); DAUPHIN, Y. (Hrsg.); LIANG, P.S. (Hrsg.); VAUGHAN, J. W. (Hrsg.): Advances in Neural Information Processing Systems Bd. 34, Curran Associates, Inc., 2021, 8780-8794spa
dc.relation.referencesDUCHI, John: Derivations for Linear Algebra and Optimization / Stanford University. Version: 2007. https://web.stanford.edu/~jduchi/projects/general_notes.pdf. 2007. Forschungsberichtspa
dc.relation.referencesEMMRICH, Mike; VENCES, Miguel; ERNST, Raffael; KÖHLER, Jörn; BAREJ, Michael F.; GLAW, Frank; JANSEN, Martin; RÖDEL, Mark-Oliver: A guild classification system proposed for anuran advertisement calls. In: Zoosystematics and Evolution 96 (2020), Nr. 2, S. 515-525spa
dc.relation.referencesGAN, Hongxiao; ZHANG, Jinglan; TOWSEY, Michael; TRUSKINGER, Anthony; STARK, Debra ; VAN RENSBURG, Berndt J.; LI, Yuefeng; ROE, Paul: A novel frog chorusing recognition method with acoustic indices and machine learning. In: Future Generation Computer Systems 125 (2021), S. 485-495spa
dc.relation.referencesGOODFELLOW, Ian; POUGET-ABADIE, Jean; MIRZA, Mehdi; Xu, Bing; WARDE-FARLEY, David ; OZAIR, Sherjil; COURVILLE, Aaron; BENGIO, Yoshua: Generative Adversarial Nets. In: GHAHRAMANI, Z. (Hrsg.); WELLING, M. (Hrsg.); CORTES, C. (Hrsg.); LAWRENCE, N. (Hrsg.); WEINBERGER, K.Q. (Hrsg.): Advances in Neural Information Processing Systems Bd. 27, Curran Associates, Inc., 2014spa
dc.relation.referencesHABA, Duc: Data Augmentation with Python: Enhance deep learning accuracy with data augmentation methods for image, text, audio, and tabular data. Packt Publishing Ltd, 2023spa
dc.relation.referencesHE, Haibo; MA, Yunqian: Imbalanced learning: foundations, algorithms, and applications. John Wiley & Sons, 2013spa
dc.relation.referencesHEUSEL, Martin; RAMSAUER, Hubert; UNTERTHINER, Thomas; NESSLER, Bernhard; HOCHREITER, Sepp: GANS Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In: GUYON, I. (Hrsg.); LUXBURG, U. V. (Hrsg.); BENGIO, S. (Hrsg.); WALLACH, H. (Hrsg.); FERGUS, R. (Hrsg.); VISHWANATHAN, S. (Hrsg.); GARNETT, R. (Hrsg.): Advances in Neural Information Processing Systems Bd. 30, Curran Associates, Inc., 2017spa
dc.relation.referencesHo, Jonathan; JAIN, Ajay; ABBEEL, Pieter: Denoising diffusion probabilistic models. In: Advances in neural information processing systems 33 (2020), S. 6840-6851spa
dc.relation.referencesHUANG, Chenn-Jung CHEN, You-Jia CHEN, Heng-Ming JIAN, Jui-Jiun; TSENG, ShengChieh; YANG, Yi-Ju; Hsu, Po-An: Intelligent feature extraction and classification of anuran vocalizations. In: Applied Soft Computing 19 (2014), S. 1-7spa
dc.relation.referencesKAUR, Parvinder; KHEHRA, Baljit S.; MAVI, Er Bhupinder S.: Data augmentation for object detection: A review. In: 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS) IEEE, 2021, S. 537-543spa
dc.relation.referencesKINGMA, Diederik P.; WELLING, Max: Auto-encoding variational bayes. In: arXiv preprint arXiv:1312.6114 (2013)spa
dc.relation.referencesKONG, Zhifeng; PING, Wei; HUANG, Jiaji; ZHAO, Kexin; CATANZARO, Bryan: Diffwave: A versatile diffusion model for audio synthesis. In: arXiv preprint arXiv:2009.09761 (2020)spa
dc.relation.referencesLUEDTKE, Jennifer A.; CHANSON, Janice; NEAM, Kelsey; HOBIN, Louise; MACIEL, Adriano O.; CATENAZZI, Alessandro BORZÉE, Amaël; HAMIDY, Amir; AOWPHOL, Anchalee JEAN, Anderson u. a.: Ongoing declines for the world's amphibians in the face of emerging threats. In: Nature 622 (2023), Nr. 7982, S. 308-314spa
dc.relation.referencesLuo, Calvin: Understanding diffusion models: A unified perspective. In: arXiv preprint arXiv:2208.11970 (2022)spa
dc.relation.referencesMATHWIN, Rupert; WASSENS, Skye; TURNER, Anna; HEARD, Geoffrey W.; HALL, Andrew ; BRADSHAW, Corey J.: Modelling the sustainable harvest of wild populations for the conservation of a threatened amphibian. In: Austral Ecology 49 (2024), Nr. 2, S. e13492spa
dc.relation.referencesOORD, Aäron van den; DIELEMAN, Sander; ZEN, Heiga; SIMONYAN, Karen; VINYALS, Oriol; GRAVES, Alex; KALCHBRENNER, Nal; SENIOR, Andrew W.; KAVUKCUOGLU, Koray: WaveNet: A Generative Model for Raw Audio. In: CoRR abs/1609.03499 (2016). http://arxiv.org/ abs/1609.03499spa
dc.relation.referencesPIJANOWSKI, Bryan C.: Principles of Soundscape Ecology: Discovering Our Sonic World. University of Chicago Press, 2024spa
dc.relation.referencesPRINCE, Simon J.: Understanding deep learning. MIT press, 2023spa
dc.relation.referencesREZENDE, Danilo; MOHAMED, Shakir: Variational inference with normalizing flows. In: International conference on machine learning PMLR, 2015, S. 1530-1538spa
dc.relation.referencesSCHNEIDER, Flavio; JIN, Zhijing; SCHÖLKOPF, Bernhard: Môusai: Text-to-Music Generation with Long-Context Latent Diffusion. In: arXiv preprint arXiv:2301.11757 (2023)spa
dc.relation.referencesSHIRALI-SHAHREZA, Sajad; PENN, Gerald: MOS Naturalness and the Quest for Human-Like Speech. In: 2018 IEEE Spoken Language Technology Workshop (SLT), 2018, S. 346-352spa
dc.relation.referencesSOHL-DICKSTEIN, Jascha; WEISS, Eric; MAHESWARANATHAN, Niru; GANGULI, Surya: Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning PMLR, 2015, S. 2256-2265spa
dc.relation.referencesSTROUT, Julia; ROGAN, Bryce; SEYEDNEZHAD, SM M.; SMART, Katrina; BUSH, Mark; RIBEIRO, Eraldo: Anuran call classification with deep learning. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE, 2017, S. 2662-2665spa
dc.relation.referencesTIPPING, Michael E.; BISHOP, Christopher M.: Mixtures of probabilistic principal component analyzers. In: Neural computation 11 (1999), Nr. 2, S. 443-482spa
dc.relation.referencesVIDAL, Marcela A.; HENRÍQUEZ, Nayadet; TORRES-Díaz, Cristian; COLLADO, Gonzalo; ACUÑA-RODRÍGUEZ, Ian S.: Identifying Strategies for Effective Biodiversity Preservation and Species Status of Chilean Amphibians. In: Biology 13 (2024), Nr. 3, S. 169spa
dc.relation.referencesVILLON, Sébastien IOVAN, Corina MANGEAS, Morgan VIGLIOLA, Laurent: Con- fronting deep-learning and biodiversity challenges for automatic video-monitoring of marine ecosystems. In: Sensors 22 (2022), Nr. 2, S. 497spa
dc.relation.referencesXIE, Jie; TOWSEY, Michael; ZHANG, Jinglan; ROE, Paul: Acoustic classification of Australian frogs based on enhanced features and machine learning algorithms. In: Applied Acoustics 113 (2016), S. 193-201spa
dc.relation.referencesXIE, Saining; GIRSHICK, Ross; DOLLÁR, Piotr TU, Zhuowen; HE, Kaiming: Aggregated Residual Transformations for Deep Neural Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, S. 5987-5995spa
dc.relation.referencesXu, Yuan; TUGULDUR, Erdene-Ochir: Convolutional neural networks for Google speech commands data set with PyTorch. https://github.com/tugstugi/pytorch-speech-commands, 2017spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.licenseReconocimiento 4.0 Internacionalspa
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/spa
dc.subject.ddc510 - Matemáticas::519 - Probabilidades y matemáticas aplicadasspa
dc.subject.proposalModelos Generativosspa
dc.subject.proposalBioacuáticaspa
dc.subject.proposalModelos de Difusiónspa
dc.subject.proposalGenerative Modelseng
dc.subject.proposalBioacusticseng
dc.subject.proposalDiffusion Modelseng
dc.subject.wikidatabioacousticseng
dc.subject.wikidatabioacústicaspa
dc.subject.wikidataFréchet inception distanceeng
dc.subject.wikidataDistancia de inicio de Fréchetspa
dc.subject.wikidatak-means clusteringeng
dc.subject.wikidatak-mediasspa
dc.titleModelos generativos: Generación de audio en bioacusticaspa
dc.title.translatedGenerative models: Audio generation in bioacousticseng
dc.typeTrabajo de grado - Maestríaspa
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.coarversionhttp://purl.org/coar/version/c_ab4af688f83e57aaspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttp://purl.org/redcol/resource_type/TMspa
dc.type.versioninfo:eu-repo/semantics/acceptedVersionspa
dcterms.audience.professionaldevelopmentInvestigadoresspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
1020835122.2024.pdf
Tamaño:
4.62 MB
Formato:
Adobe Portable Document Format
Descripción:
Final Thesis version (Corrected version)

Bloque de licencias

Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
5.74 KB
Formato:
Item-specific license agreed upon to submission
Descripción: