Evaluating the impact of curriculum learning on the training process for an intelligent agent in a video game

Sáenz Imbacuán, Rigoberto

Evaluating the impact of curriculum learning on the training process for an intelligent agent in a video game

dc.contributor.advisor	Camargo Mendoza, Jorge Eliécer	spa
dc.contributor.author	Sáenz Imbacuán, Rigoberto	spa
dc.date.accessioned	2020-11-06T14:33:26Z	spa
dc.date.available	2020-11-06T14:33:26Z	spa
dc.date.issued	2020-07-07	spa
dc.description.abstract	We want to measure the impact of the curriculum learning technique on a reinforcement training setup, several experiments were designed with different training curriculums adapted for the video game chosen as a case study. Then all were executed on a selected game simulation platform, using two reinforcement learning algorithms, and using the mean cumulative reward as a performance measure. Results suggest that curriculum learning has a significant impact on the training process, increasing training times in some cases, and decreasing them up to 40% percent in some other cases.	spa
dc.description.abstract	Se desea medir el impacto de la técnica de aprendizaje por currículos sobre el tiempo de entrenamiento de un agente inteligente que está aprendiendo a jugar un video juego usando aprendizaje por refuerzo, para esto se diseñaron varios experimentos con diferentes currículos adaptados para el video juego seleccionado como caso de estudio, y se ejecutaron en una plataforma de simulación de juegos seleccionada, usando dos algoritmos de aprendizaje por refuerzo, y midiendo su desempeño usando la recompensa media acumulada. Los resultados sugieren que usar aprendizaje por currículos tiene un impacto significativo sobre el proceso de entrenamiento, en algunos casos alargando los tiempos de entrenamiento, y en otros casos disminuyéndolos en hasta en un 40% por ciento.	spa
dc.description.additional	Línea de investigación: Aprendizaje por refuerzo en videojuegos. In this document we present the results of several experiments with curriculum learning applied to a game AI learning process to measure its effects on the learning time, specifically we trained an agent using a reinforcement learning algorithm to play a video game running on a game simulation platform, then we trained another agent under the same conditions but including a training curriculum, which is a set of rules that modify the learning environment at specific times to make it easier to master by the agent at the beginning, then we compared both results. Our initial hypothesis is that in some cases using a training curriculum would allow the agent to learn faster, reducing the training time required. We describe in detail all the main elements of our work, including the choice of the game simulation platform to run the training experiments, the review of the reinforcement learning algorithms used to train the agent, the description of the video game selected as case study, the parameters used to design the training curriculums, and the discussion of the results obtained.	spa
dc.description.degreelevel	Maestría	spa
dc.description.project	Evaluating the impact of curriculum learning on the training process for an intelligent agent in a video game	spa
dc.format.extent	97	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/78592
dc.language.iso	eng	spa
dc.publisher.branch	Universidad Nacional de Colombia - Sede Bogotá	spa
dc.publisher.program	Bogotá - Ingeniería - Maestría en Ingeniería - Ingeniería de Sistemas y Computación	spa
dc.relation.references	Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th International Conference on Machine Learning, ICML 2009, 41–48. https://dl.acm.org/doi/10.1145/1553374.1553380	spa
dc.relation.references	Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48, 71–99. https://doi.org/10.1016/S0010-0277(02)00106-3	spa
dc.relation.references	Harris, C. (1991). Parallel distributed processing models and metaphors for language and development. Ph.D. dissertation, University of California, San Diego. https://elibrary.ru/item.asp?id=5839109	spa
dc.relation.references	Juliani, Arthur. (2017, December 8). Introducing ML-Agents Toolkit v0.2: Curriculum Learning, new environments, and more. https://blogs.unity3d.com/2017/12/08/introducing-ml-agents-v0-2-curriculum-learning-new-environments-and-more/	spa
dc.relation.references	Gulcehre, C., Moczulski, M., Visin, F., & Bengio, Y. (2019). Mollifying networks. 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings. http://arxiv.org/abs/1608.04980	spa
dc.relation.references	Allgower, E. L., & Georg, K. (2003). Introduction to numerical continuation methods. In Classics in Applied Mathematics (Vol. 45). Colorado State University. https://doi.org/10.1137/1.9780898719154	spa
dc.relation.references	Justesen, N., Bontrager, P., Togelius, J., & Risi, S. (2017). Deep Learning for Video Game Playing. IEEE Transactions on Games, 12(1), 1–20. https://doi.org/10.1109/tg.2019.2896986	spa
dc.relation.references	Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The arcade learning environment: An evaluation platform for general agents. IJCAI International Joint Conference on Artificial Intelligence, 2013, 4148–4152. https://doi.org/10.1613/jair.3912	spa
dc.relation.references	Montfort, N., & Bogost, I. (2009). Racing the beam: The Atari video computer system. MIT Press, Cambridge Massachusetts. https://pdfs.semanticscholar.org/2e91/086740f228934e05c3de97f01bc58368d313.pdf	spa
dc.relation.references	Bhonker, N., Rozenberg, S., & Hubara, I. (2017). Playing SNES in the Retro Learning Environment. https://arxiv.org/pdf/1611.02205.pdf	spa
dc.relation.references	Buşoniu, L., Babuška, R., & De Schutter, B. (2010). Multi-agent reinforcement learning: An overview. Studies in Computational Intelligence, 310, 183–221. https://doi.org/10.1007/978-3-642-14435-6_7	spa
dc.relation.references	Kempka, M., Wydmuch, M., Runc, G., Toczek, J., & Jaskowski, W. (2016). ViZDoom: A Doom-based AI research platform for visual reinforcement learning. IEEE Conference on Computational Intelligence and Games, CIG, 0. https://doi.org/10.1109/CIG.2016.7860433	spa
dc.relation.references	Beattie, C., Leibo, J. Z., Teplyashin, D., Ward, T., Wainwright, M., Küttler, H., Lefrancq, A., Green, S., Valdés, V., Sadik, A., Schrittwieser, J., Anderson, K., York, S., Cant, M., Cain, A., Bolton, A., Gaffney, S., King, H., Hassabis, D., … Petersen, S. (2016). DeepMind Lab. https://arxiv.org/pdf/1612.03801.pdf	spa
dc.relation.references	Johnson, M., Hofmann, K., Hutton, T., & Bignell, D. (2016). The malmo platform for artificial intelligence experimentation. Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), 2016-January, 4246–4247. http://stella.sourceforge.net/	spa
dc.relation.references	Synnaeve, G., Nardelli, N., Auvolat, A., Chintala, S., Lacroix, T., Lin, Z., Richoux, F., & Usunier, N. (2016). TorchCraft: a Library for Machine Learning Research on Real-Time Strategy Games. https://arxiv.org/pdf/1611.00625.pdf	spa
dc.relation.references	Silva, V. do N., & Chaimowicz, L. (2017). MOBA: a New Arena for Game AI. https://arxiv.org/pdf/1705.10443.pdf	spa
dc.relation.references	Karpov, I. V., Sheblak, J., & Miikkulainen, R. (2008). OpenNERO: A game platform for AI research and education. Proceedings of the 4th Artificial Intelligence and Interactive Digital Entertainment Conference, AIIDE 2008, 220–221. https://www.aaai.org/Papers/AIIDE/2008/AIIDE08-038.pdf	spa
dc.relation.references	Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents. https://arxiv.org/pdf/1809.02627.pdf	spa
dc.relation.references	Juliani, A. (2017). Introducing: Unity Machine Learning Agents Toolkit. https://blogs.unity3d.com/2017/09/19/introducing-unity-machine-learning-agents/	spa
dc.relation.references	Alpaydin, E. (2010). Introduction to Machine Learning. In Massachusetts Institute of Technology (Second Edition). The MIT Press. https://kkpatel7.files.wordpress.com/2015/04/alppaydin_machinelearning_2010	spa
dc.relation.references	Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (Second Edition). The MIT Press. http://incompleteideas.net/sutton/book/RLbook2018.pdf	spa
dc.relation.references	Wolfshaar, J. Van De. (2017). Deep Reinforcement Learning of Video Games [University of Groningen, The Netherlands]. http://fse.studenttheses.ub.rug.nl/15851/1/Artificial_Intelligence_Deep_R_1.pdf	spa
dc.relation.references	Legg, S., & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444. https://doi.org/10.1007/s11023-007-9079-x	spa
dc.relation.references	Schaul, T., Togelius, J., & Schmidhuber, J. (2011). Measuring Intelligence through Games. https://arxiv.org/pdf/1109.1314.pdf	spa
dc.relation.references	Ortega, D. B., & Alonso, J. B. (2015). Machine Learning Applied to Pac-Man [Barcelona School of Informatics]. https://upcommons.upc.edu/bitstream/handle/2099.1/26448/108745.pdf	spa
dc.relation.references	Lample, G., & Chaplot, D. S. (2016). Playing FPS Games with Deep Reinforcement Learning. https://arxiv.org/pdf/1609.05521.pdf	spa
dc.relation.references	Adil, K., Jiang, F., Liu, S., Grigorev, A., Gupta, B. B., & Rho, S. (2017). Training an Agent for FPS Doom Game using Visual Reinforcement Learning and VizDoom. In (IJACSA) International Journal of Advanced Computer Science and Applications (Vol. 8, Issue 12). https://pdfs.semanticscholar.org/74c3/5bb13e71cdd8b5a553a7e65d9ed125ce958e.pdf	spa
dc.relation.references	Wang, E., Kosson, A., & Mu, T. (2017). Deep Action Conditional Neural Network for Frame Prediction in Atari Games. http://cs231n.stanford.edu/reports/2017/pdfs/602.pdf	spa
dc.relation.references	Karttunen, J., Kanervisto, A., Kyrki, V., & Hautamäki, V. (2020). From Video Game to Real Robot: The Transfer between Action Spaces. 5. https://arxiv.org/pdf/1905.00741.pdf	spa
dc.relation.references	Martinez, M., Sitawarin, C., Finch, K., Meincke, L., Yablonski, A., & Kornhauser, A. (2017). Beyond Grand Theft Auto V for Training, Testing and Enhancing Deep Learning in Self Driving Cars [Princeton University]. https://arxiv.org/pdf/1712.01397.pdf	spa
dc.relation.references	Singh, S., Barto, A. G., & Chentanez, N. (2005). Intrinsically Motivated Reinforcement Learning. http://www.cs.cornell.edu/~helou/IMRL.pdf	spa
dc.relation.references	Rockstar Games. (2020). https://www.rockstargames.com/	spa
dc.relation.references	Mattar, M., Shih, J., Berges, V.-P., Elion, C., & Goy, C. (2020). Announcing ML-Agents Unity Package v1.0! Unity Blog. https://blogs.unity3d.com/2020/05/12/announcing-ml-agents-unity-package-v1-0/	spa
dc.relation.references	Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-Dynamic Programming. In Encyclopedia of Optimization. Springer US. https://doi.org/10.1007/978-0-387-74759-0_440	spa
dc.relation.references	Shao, K., Tang, Z., Zhu, Y., Li, N., & Zhao, D. (2019). A Survey of Deep Reinforcement Learning in Video Games. https://arxiv.org/pdf/1912.10944.pdf	spa
dc.relation.references	Wu, Y., & Tian, Y. (2017). Training agent for first-person shooter game with actor-critic curriculum learning. ICLR 2017, 10. https://openreview.net/pdf?id=Hk3mPK5gg	spa
dc.relation.references	Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., & Kavukcuoglu, K. (2016, February 4). Asynchronous Methods for Deep Reinforcement Learning. 33rd International Conference on Machine Learning. https://arxiv.org/pdf/1602.01783.pdf	spa
dc.relation.references	Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017, August 19). A Brief Survey of Deep Reinforcement Learning. IEEE Signal Processing Magazine. https://doi.org/10.1109/MSP.2017.2743240	spa
dc.relation.references	Narvekar, S., Peng, B., Leonetti, M., Sinapov, J., Taylor, M. E., & Stone, P. (2020). Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey. https://arxiv.org/pdf/2003.04960.pdf	spa
dc.relation.references	Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity ML-Agents Toolkit. https://github.com/Unity-Technologies/ml-agents	spa
dc.relation.references	Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://arxiv.org/pdf/1707.06347.pdf	spa
dc.relation.references	Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. https://arxiv.org/pdf/1801.01290.pdf	spa
dc.relation.references	Weng, L. (2018). A (Long) Peek into Reinforcement Learning. Lil Log. https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html	spa
dc.relation.references	Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236	spa
dc.relation.references	Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning. https://hal.inria.fr/file/index/docid/938992/filename/dpg-icml2014.pdf	spa
dc.relation.references	Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016, September 9). Continuous control with deep reinforcement learning. ICLR 2016. https://arxiv.org/pdf/1509.02971.pdf	spa
dc.relation.references	Barth-Maron, G., Hoffman, M. W., Budden, D., Dabney, W., Horgan, D., Tb, D., Muldal, A., Heess, N., & Lillicrap, T. (2018). Distributed distributional deterministic policy gradients. ICLR 2018. https://openreview.net/pdf?id=SyZipzbCb	spa
dc.relation.references	Schulman, J., Levine, S., Moritz, P., Jordan, M. I., & Abbeel, P. (2015, February 19). Trust Region Policy Optimization. Proceeding of the 31st International Conference on Machine Learning. https://arxiv.org/pdf/1502.05477.pdf	spa
dc.relation.references	Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., & de Freitas, N. (2017, November 3). Sample Efficient Actor-Critic with Experience Replay. ICLR 2017. https://arxiv.org/pdf/1611.01224.pdf	spa
dc.relation.references	Wu, Y., Mansimov, E., Liao, S., Grosse, R., & Ba, J. (2017). Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. https://arxiv.org/pdf/1708.05144.pdf	spa
dc.relation.references	Fujimoto, S., van Hoof, H., & Meger, D. (2018, February 26). Addressing Function Approximation Error in Actor-Critic Methods. Proceedings of the 35th International Conference on Machine Learning. https://arxiv.org/pdf/1802.09477.pdf	spa
dc.relation.references	Liu, Y., Ramachandran, P., Liu, Q., & Peng, J. (2017). Stein Variational Policy Gradient. https://arxiv.org/pdf/1704.02399.pdf	spa
dc.relation.references	Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., & Kavukcuoglu, K. (2018). IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. https://arxiv.org/pdf/1802.01561.pdf	spa
dc.relation.references	Schulman, J., Klimov, O., Wolski, F., Dhariwal, P., & Radford, A. (2017). Proximal Policy Optimization. https://openai.com/blog/openai-baselines-ppo/	spa
dc.relation.references	Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2019). Soft Actor-Critic Algorithms and Applications. https://arxiv.org/pdf/1812.05905.pdf	spa
dc.relation.references	Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735	spa
dc.relation.references	Wydmuch, M., Kempka, M., & Jaskowski, W. (2018). ViZDoom Competitions: Playing Doom from Pixels. IEEE Transactions on Games, 11(3), 248–259. https://doi.org/10.1109/tg.2018.2877047	spa
dc.rights	Derechos reservados - Universidad Nacional de Colombia	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.license	Atribución-NoComercial 4.0 Internacional	spa
dc.rights.spa	Acceso abierto	spa
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	spa
dc.subject.ddc	000 - Ciencias de la computación, información y obras generales	spa
dc.subject.proposal	Aprendizaje por Curriculos	spa
dc.subject.proposal	Curriculum Learning	eng
dc.subject.proposal	Aprendizaje por Refuerzo	spa
dc.subject.proposal	Reinforcement Learning	eng
dc.subject.proposal	Training Curriculum	eng
dc.subject.proposal	Curriculo de Entrenamiento	spa
dc.subject.proposal	Media de Recompensa Acumulada	spa
dc.subject.proposal	Mean Cumulative Reward	eng
dc.subject.proposal	Proximal Policy Optimization	eng
dc.subject.proposal	Optimizacion por Politica Proxima	spa
dc.subject.proposal	Videojuegos	spa
dc.subject.proposal	Video Games	eng
dc.subject.proposal	Game AI	eng
dc.subject.proposal	Inteligencia Artificial en Videojuegos	spa
dc.subject.proposal	Unity Machine Learning Agents	eng
dc.subject.proposal	Agentes de Aprendizaje Automatico de Unity	spa
dc.subject.proposal	Kit de Herramientas de Aprendizaje Automatico de Unity	spa
dc.subject.proposal	Unity ML-Agents Toolkit	eng
dc.subject.proposal	Unity Engine	eng
dc.subject.proposal	Motor de Videojuegos Unity	spa
dc.title	Evaluating the impact of curriculum learning on the training process for an intelligent agent in a video game	spa
dc.title.alternative	Evaluando el impacto del aprendizaje por currículos en el proceso de entrenamiento de un agente inteligente en un videojuego	spa
dc.type	Trabajo de grado - Maestría	spa
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc	spa
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.content	Text	spa
dc.type.driver	info:eu-repo/semantics/masterThesis	spa
dc.type.version	info:eu-repo/semantics/acceptedVersion	spa
oaire.accessrights	http://purl.org/coar/access_right/c_abf2	spa