Overcoming the Reality Gap: Imitation and Reinforcement Learning Algorithms for Bipedal Robotic Locomotion Problems

Yanguas Rojas, David Reinerio

Overcoming the Reality Gap: Imitation and Reinforcement Learning Algorithms for Bipedal Robotic Locomotion Problems

dc.contributor.advisor	Mojica Nava, Eduardo Alirio
dc.contributor.author	Yanguas Rojas, David Reinerio
dc.contributor.cvlac	Yanguas Rojas, David	spa
dc.contributor.googlescholar	David Yanguas-Rojas	spa
dc.contributor.orcid	0000-0001-5874-721X	spa
dc.contributor.researchgate	David R. Yanguas Rojas	spa
dc.contributor.researchgroup	Programa de Investigacion sobre Adquisicion y Analisis de Señales Paas-Un	spa
dc.date.accessioned	2024-01-24T20:37:35Z
dc.date.available	2024-01-24T20:37:35Z
dc.date.issued	2023
dc.description	ilustraciones, diagramas, fotografías	spa
dc.description.abstract	Esta tesis presenta una estrategia de entrenamiento de robots que utiliza técnicas de aprendizaje artificial para optimizar el rendimiento de los robots en tareas complejas. Motivado por los impresionantes logros recientes en el aprendizaje automático, especialmente en juegos y escenarios virtuales, el proyecto tiene como objetivo explorar el potencial de estas técnicas para mejorar las capacidades de los robots más allá de la programación humana tradicional a pesar de las limitaciones impuestas por la brecha de la realidad. El caso de estudio seleccionado para esta investigación es la locomoción bípeda, ya que permite dilucidar los principales desafíos y ventajas de utilizar métodos de aprendizaje artificial para el aprendizaje de robots. La tesis identifica cuatro desafíos principales en este contexto: la variabilidad de los resultados obtenidos de los algoritmos de aprendizaje artificial, el alto costo y riesgo asociado con la realización de experimentos en robots reales, la brecha entre la simulación y el comportamiento del mundo real, y la necesidad de adaptar los patrones de movimiento humanos a los sistemas robóticos. La propuesta consiste en tres módulos principales para abordar estos desafíos: Enfoques de Control No Lineal, Aprendizaje por Imitación y Aprendizaje por Reforzamiento. El módulo de Enfoques de Control No Lineal establece una base al modelar robots y emplear técnicas de control bien establecidas. El módulo de Aprendizaje por Imitación utiliza la imitación para generar políticas iniciales basadas en datos de captura de movimiento de referencia o resultados preliminares de políticas para crear patrones de marcha similares a los humanos y factibles. El módulo de Aprendizaje por Refuerzos complementa el proceso mejorando de manera iterativa las políticas paramétricas, principalmente a través de la simulación pero con el rendimiento en el mundo real como objetivo final. Esta tesis enfatiza la modularidad del enfoque, permitiendo la implementación de los módulos individuales por separado o su combinación para determinar la estrategia más efectiva para diferentes escenarios de entrenamiento de robots. Al utilizar una combinación de técnicas de control establecidas, aprendizaje por imitación y aprendizaje por refuerzos, la estrategia de entrenamiento propuesta busca desbloquear el potencial para que los robots alcancen un rendimiento optimizado en tareas complejas, contribuyendo al avance de la inteligencia artificial en la robótica no solo en sistemas virtuales sino en sistemas reales.	spa
dc.description.abstract	The thesis introduces a comprehensive robot training framework that utilizes artificial learning techniques to optimize robot performance in complex tasks. Motivated by recent impressive achievements in machine learning, particularly in games and virtual scenarios, the project aims to explore the potential of these techniques for improving robot capabilities beyond traditional human programming. The case study selected for this investigation is bipedal locomotion, as it allows for elucidating key challenges and advantages of using artificial learning methods for robot learning. The thesis identifies four primary challenges in this context: the variability of results obtained from artificial learning algorithms, the high cost and risk associated with conducting experiments on real robots, the reality gap between simulation and real-world behavior, and the need to adapt human motion patterns to robotic systems. The proposed approach consists of three main modules to address these challenges: Non-linear Control Approaches, Imitation Learning, and Reinforcement Learning. The Non-linear Control module establishes a foundation by modeling robots and employing well-established control techniques. The Imitation Learning module utilizes imitation to generate initial policies based on reference motion capture data or preliminary policy results to create feasible human-like gait patterns. The Reinforcement Learning module complements the process by iteratively improving parametric policies, primarily through simulation but ultimately with real-world performance as the ultimate goal. The thesis emphasizes the modularity of the approach, allowing for the implementation of individual modules separately or their combination to determine the most effective strategy for different robot training scenarios. By employing a combination of established control techniques, imitation learning, and reinforcement learning, the framework seeks to unlock the potential for robots to achieve optimized performances in complex tasks, contributing to the advancement of artificial intelligence in robotics.	eng
dc.description.degreelevel	Doctorado	spa
dc.description.degreename	Doctor en ingeniería mecánica y mecatrónica	spa
dc.format.extent	xxi, 158 páginas	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.instname	Universidad Nacional de Colombia	spa
dc.identifier.reponame	Repositorio Institucional Universidad Nacional de Colombia	spa
dc.identifier.repourl	https://repositorio.unal.edu.co/	spa
dc.identifier.uri	https://repositorio.unal.edu.co/handle/unal/85427
dc.language.iso	eng	spa
dc.publisher	Universidad Nacional de Colombia	spa
dc.publisher.branch	Universidad Nacional de Colombia - Sede Bogotá	spa
dc.publisher.faculty	Facultad de Ingeniería	spa
dc.publisher.place	Bogotá, Colombia	spa
dc.publisher.program	Bogotá - Ingeniería - Doctorado en Ingeniería - Ingeniería Mecánica y Mecatrónica	spa
dc.relation.references	[Arcos-Legarda et al., 2019] Arcos-Legarda, J., Cortes-Romero, J., Beltran-Pulido, A., and Tovar, A. (2019). Hybrid disturbance rejection control of dynamic bipedal robots. Multibody System Dynamics, 46:281–306.	spa
dc.relation.references	[Belongie, 2023] Belongie, S. (2023). Rodrigues’ rotation formula. Created by Eric W. Weisstein.	spa
dc.relation.references	[Burda et al., 2018] Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., and Efros, A. A. (2018). Large-Scale Study of Curiosity-Driven Learning. ArXiv.	spa
dc.relation.references	[Carnegie Mellon University Graphics Lab, 2004] Carnegie Mellon University Graphics Lab (2004). CMU Motion Capture Database. The database was created with funding from NSF EIA-0196217.	spa
dc.relation.references	[Chebotar et al., 2018] Chebotar, Y., Handa, A., Makoviychuk, V., Macklin, M., Issac, J., Ratliff, N., and Fox, D. (2018). Closing the sim-to-real loop: Adapting simulation randomization with real world experience.	spa
dc.relation.references	[Chernova and Thomaz, 2014] Chernova, S. and Thomaz, A. L. (2014). Robot Learning from Human Teachers. Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(3):1–121.	spa
dc.relation.references	[Chernova and Veloso, 2010] Chernova, S. and Veloso, M. (2010). Confidence-based multirobot learning from demonstration. International Journal of Social Robotics, 2(2):195–215.	spa
dc.relation.references	[Chevallereau et al., 2014] Chevallereau, C., Sinnet, R. W., Ames, A. D., and Universit´e, L. (2014). Models feedback control and open problems of 3d bipedal robotic walking. Automatica.	spa
dc.relation.references	[Collins et al., 2005] Collins, S., Ruina, A., Tedrake, R., and Wisse, M. (2005). Efficient bipedal robots based on passive-dynamic walkers. Science, 307:1082–1085.	spa
dc.relation.references	[Ding et al., 2019] Ding, J., Zhou, C., and Xiao, X. (2019). Energy-Efficient Bipedal Gait Pattern Generation via CoM Acceleration Optimization. IEEE-RAS International Conference on Humanoid Robots, 2018-Novem:238–244.	spa
dc.relation.references	[Duan et al., 2017] Duan, Y., Andrychowicz, M., Stadie, B. C., Ho, J., Schneider, J., Sutskever, I., Abbeel, P., and Zaremba, W. (2017). One-Shot Imitation Learning. ArXiv.	spa
dc.relation.references	[Duan et al., 2016] Duan, Y., Chen, X., Houthooft, R., Schulman, J., and Abbeel, P. (2016). Benchmarking Deep Reinforcement Learning for Continuous Control. ArXiv.	spa
dc.relation.references	[Erez et al., 2015] Erez, T., Lowrey, K., Tassa, Y., Kumar, V., Kolev, S., and Todorov, E. (2015). An integrated system for real-time model predictive control of humanoid robots. Proceedings on IEEE-RAS International Conference on Humanoid Robots, 2015- Febru(February):292–299.	spa
dc.relation.references	[García et al., 1989] Garc´ıa, C. E., Prett, D. M., and Morari, M. (1989). Model predictive control: Theory and practice—a survey. Automatica, 25(3):335–348.	spa
dc.relation.references	[Google, 2015] Google (2015). AlphaGo — DeepMind. https://deepmind.com/research/ alphago/. accessed 20-May-2019.	spa
dc.relation.references	[Heess et al., 2017] Heess, N., TB, D., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, S. M. A., Riedmiller, M., and Silver, D. (2017). Emergence of locomotion behaviours in rich environments.	spa
dc.relation.references	[Ijspeert, 2008] Ijspeert, A. J. (2008). Central pattern generators for locomotion control in animals and robots: A review. Neural Networks, 21(4):642–653.	spa
dc.relation.references	[Jumper et al., 2021] Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., ˇ Z´ıdek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-Paredes, B., Nikolov, S., Jain, R., Adler, J., Back, T., Petersen, S., Reiman, D., Clancy, E., Zielinski, M., Steinegger, M., Pacholska, M., Berghammer, T., Bodenstein, S., Silver, D., Vinyals, O., Senior, A. W., Kavukcuoglu, K., Kohli, P., and Hassabis, D. (2021). Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583 – 589. Cited by: 8028; All Open Access, Green Open Access, Hybrid Gold Open Access.	spa
dc.relation.references	[Kingma and Ba, 2015] Kingma, D. P. and Ba, J. L. (2015). Adam: A method for stochastic optimization.	spa
dc.relation.references	[Kobayashi et al., 2018] Kobayashi, T., Sekiyama, K., Hasegawa, Y., Aoyama, T., and Fukuda, T. (2018). Virtual-dynamics-based reference gait speed generator for limit-cycle-based bipedal gait. ROBOMECH Journal, 5(1).	spa
dc.relation.references	[Koenemann et al., 2014] Koenemann, J., Burget, F., and Bennewitz, M. (2014). Real-time imitation of human whole-body motions by humanoids. Proceedings - IEEE International Conference on Robotics and Automation, pages 2806–2812.	spa
dc.relation.references	[Koos et al., 2013] Koos, S., Mouret, J. B., and Doncieux, S. (2013). The transferability approach: Crossing the reality gap in evolutionary robotics. IEEE Transactions on Evolutionary Computation, 17(1):122–145.	spa
dc.relation.references	[Kuindersma et al., 2016] Kuindersma, S., Deits, R., Fallon, M., Valenzuela, A., Dai, H., Permenter, F., Koolen, T., Marion, P., and Tedrake, R. (2016). Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Autonomous Robots, 40:429–455.	spa
dc.relation.references	[Lee et al., 2019] Lee, K., Kim, S., Lim, S., Choi, S., and Oh, S. (2019). Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning. ArXiv.	spa
dc.relation.references	[Lillicrap et al., 2015] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2015). Continuous control with deep reinforcement learning. ArXiv.	spa
dc.relation.references	[Lillicrap et al., 2019] Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. (2019). Continuous control with deep reinforcement learning.	spa
dc.relation.references	[Liu et al., 2022] Liu, S., Lever, G., Wang, Z., Merel, J., Eslami, S., Hennes, D., Czarnecki, W., Tassa, Y., Omidshafiei, S., Abdolmaleki, A., Graepel, T., and Heess, N. (2022). From motor control to team play in simulated humanoid football. Science Robotics, 7.	spa
dc.relation.references	[Loudon et al., 2008] Loudon, J. K. J. K., Swift, M., and Bell, S. (2008). The clinical orthopedic assessment guide. Human Kinetics.	spa
dc.relation.references	[Luenberger and Ye, 2021] Luenberger, D. G. and Ye, Y. (2021). Penalty and barrier methods. International Series in Operations Research and Management Science, 228.	spa
dc.relation.references	[Ma et al., 2021] Ma, L.-K., Yang, Z., Xin, T., Guo, B., and Yin, K. (2021). Learning and exploring motor skills with spacetime bounds. Computer Graphics Forum, 40(2):251–263.	spa
dc.relation.references	[Ma et al., 2018] Ma, W.-L., Or, Y., and Ames, A. D. (2018). Dynamic Walking on Slippery Surfaces: Demonstrating Stable Bipedal Gaits with Planned Ground Slippage. Proceedings on 2019 International Conference on Robotics and Automation (ICRA).	spa
dc.relation.references	[Mania et al., 2018] Mania, H., Guy, A., and Recht, B. (2018). Simple random search provides a competitive approach to reinforcement learning. Advances in Neural Information Processing Systems 31 (NeurIPS 2018).	spa
dc.relation.references	[Mark W. Spong, 2020] Mark W. Spong, Seth Hutchinson, M. V. (2020). Robot modeling and control. JOHN WILEY and SONS, INC.	spa
dc.relation.references	[Merel et al., 2017] Merel, J., Tassa, Y., TB, D., Srinivasan, S., Lemmon, J., Wang, Z., Wayne, G., and Heess, N. (2017). Learning human behaviors from motion capture by adversarial imitation. eprint arXiv:1707.02201.	spa
dc.relation.references	[Mnih et al., 2016] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., and Kavukcuoglu, K. (2016). Asynchronous Methods for Deep Reinforcement Learning.	spa
dc.relation.references	[Mnih et al., 2015] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540):529 – 533. Cited by: 15639.	spa
dc.relation.references	[Müller et al., 2022] Müller, M., Jazdi, N., and Weyrich, M. (2022). Self-improving models for the intelligent digital twin: Towards closing the reality-to-simulation gap. IFACPapersOnLine, 55:126–131.	spa
dc.relation.references	Nehaniv and Dautenhahn, 2002] Nehaniv, C. L. and Dautenhahn, K. (2002). The Correspondence Problem, page 41–61. MIT Press, Cambridge, MA, USA.	spa
dc.relation.references	[Nguyen and La, 2019] Nguyen, H. and La, H. (2019). Review of Deep Reinforcement Learning for Robot Manipulation. Proceedings - 3rd IEEE International Conference on Robotic Computing, IRC 2019, pages 590–595.	spa
dc.relation.references	[Niu et al., 2022] Niu, J., Hu, Y., Li, W., Huang, G., Han, Y., and Li, X. (2022). Closing the dynamics gap via adversarial and reinforcement learning for high-speed racing. Proceedings of the International Joint Conference on Neural Networks, 2022-July.	spa
dc.relation.references	[OpenAI, 2018] OpenAI (2018). Openai five. https://blog.openai.com/openai-five/. accessed 20-May-2019.	spa
dc.relation.references	[Recht, 2018] Recht, B. (2018). A Tour of Reinforcement Learning: The View from Continuous Control. eprint arXiv:1806.09460.	spa
dc.relation.references	[Robbins and Monro, 1951] Robbins, H. and Monro, S. (1951). A Stochastic Approximation Method. The Annals of Mathematical Statistics, 22(3):400 – 407.	spa
dc.relation.references	[ROBOTIS, 2022a] ROBOTIS (2022a). Robotis engineer kit 1 e-manual (visited on 2022/04/01).	spa
dc.relation.references	[ROBOTIS, 2022b] ROBOTIS (2022b). Robotis Engineer Kit 2 E-Manual, (visited on 2022/04/01).	spa
dc.relation.references	[ROBOTIS, 2022c] ROBOTIS (2022c). Robotis mini e-manual (visited on 2022/04/01).	spa
dc.relation.references	[ROBOTIS, 2022d] ROBOTIS (2022d). Robotis xl-320 e-manual (visited on 2022/04/01).	spa
dc.relation.references	[Rodriguez et al., 2018] Rodriguez, D., Brandenburger, A., and Behnke, S. (2018). Combining Simulations and Real-robot Experiments for Bayesian Optimization of Bipedal Gait Stabilization. Lecture Notes in Computer Science book series.	spa
dc.relation.references	[Rosolia and Borrelli, 2016] Rosolia, U. and Borrelli, F. (2016). Learning Model Predictive Control for Iterative Tasks. eprint arXiv:1609.01387, 4.	spa
dc.relation.references	[Salvato et al., 2021] Salvato, E., Fenu, G., Medvet, E., and Pellegrino, F. A. (2021). Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning. IEEE Access, 9:153171–153187.	spa
dc.relation.references	[Schulman et al., 2015] Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. (2015). Trust Region Policy Optimization. Proceedings of the 32nd International Conference on Machine Learning.	spa
dc.relation.references	[Siciliano and Khatib, 2016] Siciliano, B. and Khatib, O. (2016). Springer handbook of robotics. Springer eBooks.	spa
dc.relation.references	[Siegwart et al., 2011] Siegwart, R., Nourbakhsh, I. R., and Scaramuzza, D. (2011). Introduction to autonomous mobile robots. Intelligent robotics and autonomous agents. The MIT Press/Massachusetts Institute of Technology.	spa
dc.relation.references	[Simba et al., 2016] Simba, K. R., Uchiyama, N., and Sano, S. (2016). Real-time smooth trajectory generation for nonholonomic mobile robots using b´ezier curves. Robotics and Computer-Integrated Manufacturing, 41.	spa
dc.relation.references	[Singh et al., 2019] Singh, A., Yang, L., Hartikainen, K., Finn, C., and Levine, S. (2019). End-to-End Robotic Reinforcement Learning without Reward Engineering. Robotics: Science and Systems.	spa
dc.relation.references	[Sutton and Barto, 2008] Sutton, R. S. and Barto, A. G. (2008). Reinforcement Learning. The MIT Press, London, England, second edi edition.	spa
dc.relation.references	[´Swiechowski et al., 2022] ´Swiechowski, M., Godlewski, K., Sawicki, B., and Ma´ndziuk, J. (2022). Monte carlo tree search: a review of recent modifications and applications. Artificial Intelligence Review, 56(3):2497–2562.	spa
dc.relation.references	[Tan et al., 2018] Tan, J., Zhang, T., Coumans, E., Iscen, A., Bai, Y., Hafner, D., Bohez, S., and Vanhoucke, V. (2018). Sim-to-Real: Learning Agile Locomotion For Quadruped Robots. eprint arXiv:1804.10332.	spa
dc.relation.references	[Tedrake, 2022] Tedrake, R. (2022). Underactuated Robotics. MIT.	spa
dc.relation.references	[Tevatia and Schaal, 2000] Tevatia, G. and Schaal, S. (2000). Inverse kinematics for humanoid robots. Proceedings-IEEE International Conference on Robotics and Automation, 1(April):294–299.	spa
dc.relation.references	[Thorp, 2023] Thorp, H. H. (2023). Chatgpt is fun, but not an author. Science, 379(6630):313; All Open Access, Bronze Open Access.	spa
dc.relation.references	[Tobin et al., 2017] Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017). Domain randomization for transferring deep neural networks from simulation to the real world. IEEE International Conference on Intelligent Robots and Systems, 2017-Septe:23–30.	spa
dc.relation.references	[Todorov, 2019] Todorov, E. (2019). Mujoco: Modeling, simulation and visualization of multi-joint dynamics with contact. http://www.mujoco.org/book/index.html. accessed 20-May-2019.	spa
dc.relation.references	[Uchibe, 2018] Uchibe, E. (2018). Cooperative and competitive reinforcement and imitation learning for a mixture of heterogeneous learning modules. Frontiers in Neurorobotics, 12(SEP).	spa
dc.relation.references	[Vinyals et al., 2019] Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W. M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., and Silver, D. (2019). AlphaStar: Mastering the Real-Time Strategy Game StarCraft II. https://deepmind.com/ blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/{}. accessed 20-May-2019.	spa
dc.relation.references	[Vukobratovic and Borovac, 2004] Vukobratovic, M. and Borovac, B. (2004). Zero-moment point — thirty five years of its life. International Journal of Humanoid Robotics, 01:157– 173.	spa
dc.relation.references	[Vukobratovic and Juricic, 1969] Vukobratovic, M. and Juricic, D. (1969). Contribution to the synthesis of biped gait. IEEE Transactions on Biomedical Engineering, BME-16(1):1– 6.	spa
dc.relation.references	[Wampler, 1986] Wampler, C. W. (1986). Manipulator inverse kinematic solutions based on vector formulations and damped least-squares methods. IEEE Transactions on Systems, Man and Cybernetics, 16.	spa
dc.relation.references	[Watkins and Dayan, 1992] Watkins, C. J. C. H. and Dayan, P. (1992). Q-learning. Machine Learning, 8(3):279–292.	spa
dc.relation.references	[Westervelt et al., 2003] Westervelt, E. R., Grizzle, J. W., and Koditschek, D. E. (2003). Hybrid zero dynamics of planar biped walkers. IEEE Transactions on Automatic Control, 48:42–56.	spa
dc.relation.references	[Xie et al., 2019] Xie, Z., Clary, P., Dao, J., Morais, P., Hurst, J., and van de Panne, M. (2019). Iterative reinforcement learning based design of dynamic locomotion skills for cassie. 3rd Conference on Robot Learning (CoRL 2019), Osaka, Japan.	spa
dc.relation.references	[Xie et al., 2020] Xie, Z., Ling, H. Y., Kim, N. H., and Panne, M. V. D. (2020). Allsteps: Curriculum-driven learning of stepping stone skills. ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2020, pages 213–224.	spa
dc.relation.references	[Zou et al., 2019] Zou, F., Shen, L., Jie, Z., Zhang, W., and Liu, W. (2019). A sufficient condition for convergences of adam and rmsprop. volume 2019-June.	spa
dc.rights.accessrights	info:eu-repo/semantics/openAccess	spa
dc.rights.license	Reconocimiento 4.0 Internacional	spa
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	spa
dc.subject.armarc	Automation
dc.subject.ddc	620 - Ingeniería y operaciones afines::629 - Otras ramas de la ingeniería	spa
dc.subject.lemb	Robótica
dc.subject.lemb	Robotics
dc.subject.lemb	Automatización
dc.subject.lemb	Algoritmos
dc.subject.lemb	Algorithms
dc.subject.proposal	Reinforcement learning	eng
dc.subject.proposal	Humanoid Robotics	eng
dc.subject.proposal	Imitation Learning	eng
dc.subject.proposal	Non-Linear Control	eng
dc.subject.proposal	Robot Training	eng
dc.subject.proposal	Bipedal Locomotion	eng
dc.subject.proposal	Humanoid Locomotion	eng
dc.subject.proposal	Artificial Learning Techniques	eng
dc.subject.proposal	Reality Gap	eng
dc.subject.proposal	Sim to Real	eng
dc.title	Overcoming the Reality Gap: Imitation and Reinforcement Learning Algorithms for Bipedal Robotic Locomotion Problems	eng
dc.title.translated	Superando la brecha de la realidad: Algoritmos de aprendizaje por imitación y por refuerzos para problemas de locomoción robótica bípeda	spa
dc.type	Trabajo de grado - Doctorado	spa
dc.type.coar	http://purl.org/coar/resource_type/c_db06	spa
dc.type.coarversion	http://purl.org/coar/version/c_ab4af688f83e57aa	spa
dc.type.content	Audiovisual	spa
dc.type.content	Text	spa
dc.type.driver	info:eu-repo/semantics/doctoralThesis	spa
dc.type.redcol	http://purl.org/redcol/resource_type/TD	spa
dc.type.version	info:eu-repo/semantics/acceptedVersion	spa
dcterms.audience.professionaldevelopment	Estudiantes	spa
dcterms.audience.professionaldevelopment	Investigadores	spa
oaire.accessrights	http://purl.org/coar/access_right/c_abf2	spa

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: 1016058516.2024.pdf
Tamaño:: 13.11 MB
Formato:: Adobe Portable Document Format
Descripción:: Tesis de Doctorado en Ingeniería Mecánica y Mecatrónica

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 5.74 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Colecciones

Doctorado en Ingeniería - Ingeniería Mecánica y Mecatrónica