Policy search for motor primitives in robotics (original) (raw)

References

Andrieu, C., de Freitas, N., Doucet, A., & Jordan, M. I. (2003). An introduction to MCMC for machine learning. Machine Learning, 50(1), 5–43.
Article MATH Google Scholar
Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. In Advances in neural information processing systems (Vol. 6, pp. 503–521), Denver, CO, USA.
Google Scholar
Attias, H. (2003). Planning by probabilistic inference. In Proceedings of the ninth international workshop on artificial intelligence and statistics (AISTATS), Key West, FL, USA.
Google Scholar
Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 1019–1024), Acapulco, Mexico.
Google Scholar
Bagnell, J., Kadade, S., Ng, A., & Schneider, J. (2004). Policy search by dynamic programming. In Advances in neural information processing systems (Vol. 16), Vancouver, BC, CA.
Google Scholar
Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29(2–3), 213–244.
Article MATH Google Scholar
Chiappa, S., Kober, J., & Peters, J. (2009). Using Bayesian dynamical systems for motion template libraries. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 297–304).
Google Scholar
DARPA (2010a). Learning locomotion (L2). http://www.darpa.mil/ipto/programs/ll/ll.asp.
DARPA (2010b). Learning applied to ground robotics (LAGR). http://www.darpa.mil/ipto/programs/lagr/lagr.asp.
DARPA (2010c). Autonomous robot manipulation (ARM). http://www.darpa.mil/ipto/programs/arm/arm.asp.
Dayan, P., & Hinton, G. E. (1997). Using expectation-maximization for reinforcement learning. Neural Computation, 9(2), 271–278.
Article MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological), 39, 1–38.
MathSciNet MATH Google Scholar
El-Fakdi, A., Carreras, M., & Ridao, P. (2006). Towards direct policy search reinforcement learning for robot control. In Proceedings of the IEEE/RSJ 2006 international conference on intelligent robots and systems (IROS), Beijing, China.
Google Scholar
Fantoni, I., & Lozano, R. (2001). Non-linear control for underactuated mechanical systems. New York: Springer.
Google Scholar
Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. Advanced Robotics, Special Issue on Imitative Robots, 21(13), 1521–1544.
Google Scholar
Gullapalli, V., Franklin, J., & Benbrahim, H. (1994). Acquiring robot skills via reinforcement learning. IEEE Control Systems Journal, Special Issue on Robotics: Capturing Natural Motion, 4(1), 13–24.
Google Scholar
Hoffman, M., Doucet, A., de Freitas, N., & Jasra, A. (2007). Bayesian policy learning with trans-dimensional MCMC. In Advances in neural information processing systems (Vol. 20), Vancouver, BC, CA.
Google Scholar
Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (ICRA) (pp. 1398–1403), Washington, DC.
Google Scholar
Ijspeert, A. J., Nakanishi, J., & Schaal, S. (2003). Learning attractor landscapes for learning motor primitives. In Advances in neural information processing systems (Vol. 15, pp. 1547–1554), Vancouver, BC, CA.
Google Scholar
Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). Convergence of stochastic iterative dynamic programming algorithms. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 703–710). San Mateo: Morgan Kaufmann.
Google Scholar
Kirk, D. E. (1970). Optimal control theory. Englewood Cliffs: Prentice-Hall.
Google Scholar
Kober, J., & Peters, J. (2009a). Learning motor primitives for robotics. In Proceedings of IEEE international conference on robotics and automation (ICRA) (pp. 2112–2118).
Google Scholar
Kober, J., & Peters, J. (2009b). Policy search for motor primitives in robotics. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 849–856).
Google Scholar
Kober, J., Mohler, B., & Peters, J. (2008). Learning perceptual coupling for motor primitives. In Proceedings of the IEEE/RSJ 2008 international conference on intelligent robots and systems (IROS) (pp. 834–839), Nice, France.
Chapter Google Scholar
Kormushev, P., Calinon, S., & Caldwell, D. G. (2010). Robot motor skill coordination with em-based reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS).
Google Scholar
Kwee, I., Hutter, M., & Schmidhuber, J. (2001). Gradient-based reinforcement planning in policy-search methods. In M. A. Wiering (Ed.), Cognitieve Kunstmatige Intelligentie: Vol. 27. Proceedings of the 5th European workshop on reinforcement learning (EWRL) (pp. 27–29), Lugano. Manno: Onderwijsinsituut CKI, Utrecht University.
Google Scholar
Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (UAI) (pp. 354–361), Acapulco, Mexico.
Google Scholar
Martín, H. J. A., de Lope, J., & Maravall, D. (2009). The knn-td reinforcement learning algorithm. In Proceedings of the 3rd international work-conference on the interplay between natural and artificial computation (IWINAC) (pp. 305–314). Berlin: Springer.
Google Scholar
McLachan, G. J., & Krishnan, T. (1997). Wiley series in probability and statistics. The EM algorithm and extensions. New York: Wiley.
Google Scholar
Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y., & Kawato, M. (1996). A Kendama learning robot based on bi-directional theory. Neural Networks, 9(8), 1281–1302.
Article Google Scholar
Ng, A. Y., & Jordan, M. (2000). Pegasus: A policy search method for large mdps and pomdps. In Proceedings of the international conference on uncertainty in artificial intelligence (UAI) (pp. 406–415), Palo Alto, CA.
Google Scholar
Ng, A. Y., Kim, H. J., Jordan, M. I., & Sastry, S. (2004). Inverted autonomous helicopter flight via reinforcement learning. In Proceedings of the international symposium on experimental robotics (ISER). Cambridge: MIT Press.
Google Scholar
Park, D. H., Hoffmann, H., Pastor, P., & Schaal, S. (2008). Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In IEEE international conference on humanoid robots (HUMANOIDS) (pp. 91–98).
Google Scholar
PASCAL2 (2010). Challenges. http://pascallin2.ecs.soton.ac.uk/Challenges/.
Peshkin, L. (2001). Reinforcement learning by policy search. PhD thesis, Brown University, Providence, RI.
Peters, J. (2007). Machine learning of motor skills for robotics. PhD thesis, University of Southern California, Los Angeles, CA, 90089, USA.
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ 2006 international conference on intelligent robots and systems (IROS) (pp. 2219–2225), Beijing, China.
Chapter Google Scholar
Peters, J., & Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control. In Proceedings of the international conference on machine learning (ICML), Corvallis, OR, USA.
Google Scholar
Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103–123), Karlsruhe, Germany.
Google Scholar
Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural actor-critic. In Proceedings of the European conference on machine learning (ECML) (pp. 280–291), Porto, Portugal.
Google Scholar
Rückstieß, T., Felder, M., & Schmidhuber, J. (2008). State-dependent exploration for policy gradient methods. In Proceedings of the European conference on machine learning (ECML) (pp. 234–249), Antwerp, Belgium.
Google Scholar
Sato, S., Sakaguchi, T., Masutani, Y., & Miyazaki, F. (1993). Mastering of a task with interaction between a robot and its environment: “kendama” task. Transactions of the Japan Society of Mechanical Engineers C, 59(558), 487–493.
Google Scholar
Schaal, S., Atkeson, C. G., & Vijayakumar, S. (2002). Scalable techniques from nonparameteric statistics for real-time robot learning. Applied Intelligence, 17(1), 49–60.
Article MATH Google Scholar
Schaal, S., Peters, J., Nakanishi, J., & Ijspeert, A. J. (2003). Control, planning, learning, and imitation with dynamic movement primitives. In Proceedings of the workshop on bilateral paradigms on humans and humanoids, IEEE international conference on intelligent robots and systems (IROS), Las Vegas, NV, October 27–31, 2003.
Google Scholar
Schaal, S., Mohajerian, P., & Ijspeert, A. J. (2007). Dynamics systems vs. optimal control—a unifying view. Progress in Brain Research, 165(1), 425–445.
Article Google Scholar
Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., & Schmidhuber, J. (2010). Parameter-exploring policy gradients. Neural Networks, 21(4), 551–559.
Article Google Scholar
Shone, T., Krudysz, G., & Brown, K. (2000). Dynamic manipulation of Kendama (Tech. rep.). Rensselaer Polytechnic Institute.
Strens, M., & Moore, A. (2001). Direct policy search using paired statistical tests. In Proceedings of the 18th international conference on machine learning (ICML).
Google Scholar
Sumners, C. (1997). Toys in space: exploring science with the astronauts. New York: McGraw-Hill.
Google Scholar
Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Google Scholar
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the international machine learning conference (pp. 9–44).
Google Scholar
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (NIPS) (Vol. 13, pp. 1057–1063), Denver, CO, USA.
Google Scholar
Takenaka, K. (1984). Dynamical control of manipulator with vision: “cup and ball” game demonstrated by robot. Transactions of the Japan Society of Mechanical Engineers C, 50(458), 2046–2053.
Google Scholar
Taylor, M. E., Whiteson, S., & Stone, P. (2007). Transfer via inter-task mappings in policy search reinforcement learning. In Proceedings of the sixth international joint conference on autonomous agents and multiagent systems (AAMAS).
Google Scholar
Tedrake, R., Zhang, T. W., & Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3d biped. In Proceedings of the IEEE 2004 international conference on intelligent robots and systems (IROS) (pp. 2849–2854).
Google Scholar
Theodorou, E. A., Buchli, J., & Schaal, S. (2010). Reinforcement learning of motor skills in high dimensions: a path integral approach. In Proceedings of IEEE international conference on robotics and automation (ICRA) (pp. 2397–2403).
Google Scholar
Toussaint, M., & Goerick, C. (2007). Probabilistic inference for structured planning in robotics. In Proceedings of the IEEE/RSJ 2007 international conference on intelligent robots and systems (IROS), San Diego, CA, USA.
Google Scholar
Van Der Maaten, L., Postma, E., & Van Den Herik, H. (2007). Dimensionality reduction: a comparative review. Preprint.
Vlassis, N., Toussaint, M., Kontes, G., & Piperidis, S. (2009). Learning model-free robot control by a Monte Carlo EM algorithm. Autonomous Robots, 27(2), 123–130.
Article Google Scholar
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
MATH Google Scholar
Wulf, G. (2007). Attention and motor skill learning. Champaign: Human Kinetics.
Google Scholar

Download references