Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike Elements that can Solve Difficult Learning Control Problems. IEE Transactions on Systems, Man and Cybernetics13: 834–846. Google Scholar
Bertsekas, D. P. (1995a). A Counterexample to Temporal Differences Learning. Neural Computation7: 270–279. Google Scholar
Bertsekas, D. P. (1995b). Dynamic Programming and Optimal Control, Vol. 1. Belmont, Massachusetts: Athena Scientific. Google Scholar
Brafman, R. I. & Tennenholtz, M. (2001). R-MAX a General Polynomial Time Algorithm for Near-optimal Reinforcement Learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence2: 953–958. Google Scholar
Brooks, R. A. (1991). Elephants Don't Play Chess. In Maes, P. (ed.) Designing Autonomous Agents. MIT Press, pp. 3-15.
Chapman, D. & Kaelbling, L. P. (1991). Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons. Procs. Of the International Joint Conf. On Artificial Intelligence (IJCAI'91). 726-731.
Chrisman, L. (1992). Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach. Procs. Of the 10th National Conf. On Artificial Intelligence. 183-188.
Crites, R. H. (1996). Large-scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts Amherst.
del R. Millán J. (1996). Rapid, Safe and Incremental Learning of Navigation Strategies. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics26: 408–420. Google Scholar
Elman, J. L. (1990). Finding Structure in Time. Cognitive Science14: 179–211. Google Scholar
Haykin, S. (1990). Neural Networks: A Comprehensive Foundation, 2nd ed. Prentice-Hall.
Humphrys, M. (1996). Action Selection Methods Using Reinforcement Learning. PhD thesis, University of Cambridge.
Jaakkola, T., Jordan, M. I. & Singh, S. P. (1994). On the Convergence of Stochastic Iterative Dynamic Programming Algorithms. Neural Computation6: 1185–1201. Google Scholar
Kalmár, Z., Szepesvári, C. & Lorincz, A. (1998). Module-based Reinforcement Learning: Experiments with a Real Robot. Machine Learning.
Lin, L.-J. & Mitchell, T. M. (1992). Memory Approaches to Reinforcement Learning in Ono-Markovian Domains. CMU-CS-92 138, Carnegie Mellon University, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213. Google Scholar
Littman, M. L. & Szepesvári, C. (1996). A Generalized Reinforcement Learning Model: Convergence and Applications. Procs. Of the Thirteenth International Conf. on Machine Learning (ICML'96). 310-318.
Mahadevan, S & Connell, J. (1992). Automatic Programming of Behavior-Based Robots Using Reinforcement Learning. Artificial Intelligence55: 311–365. Google Scholar
Mataric, M (1998). Reinforcement Learning. Artificial Intelligence3: 357–369. Google Scholar
McCallum, A. K. (1996a). Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester.
McCallum, R. A. (1992). First Results with Utile Distinction Memory for Reinforcement Learning. Technical Report 446, The University of Rochester, Computer Science Department, The University Rochester, NT 14627. Google Scholar
McCallum, R. A. (1996b). Hidden State and Reinforcement Learning with Instance-based State Identification. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics26: 464–473. Google Scholar
Michie, D. & Chambers, R. A. (1968). BOXES: An Experiment in Adaptive Control. In Dale, E. & Michie, D. (eds.), Machine Intelligence 2. Edimburgh: Olivier and Boyd. 137–152. Google Scholar
Papadimitriou, C. & Tsitsiklis, J. (1987). The Complexity of Markov Decision Processes. Mathematics of Operations Research12: 441–450. Google Scholar
Parr, R. E. (1998). Hierarchical Control and Learning for Markov Decision Processes. PhD thesis, University of California at Berkeley. Google Scholar
Peng, J. & Williams, R. J. (1996). Incremental Multi-step Q-learning. Machine Learning22: 282–290. Google Scholar
Puterman, M. L. (1994). Markovian Decision Problems. John Wiley.
Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE77.
Ribeiro, C. H. C. (1998). Embedding A Priori Knowledge in Reinforcement Learning. Journal of Intelligent and Robotic Systems21: 51–71. Google Scholar
Ribeiro, C. H. C. & Hemerly, E. M. (1999). Autonomous Learning Based on Cost Assumptions: Theoretical Studies and Experiments in Robot Control. International Journal of Neural Systems9: 243–250. Google Scholar
Robbins, H. & Monor, S. (1951). A Stochastic Approximation Method. Annals of Mathematical Statistics22: 400–407. Google Scholar
Russell, S. J. & Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice-Hall.
Rylatt, M., Czarnecki, C. & Routen, T. (1998): Connectionist Learning in Behaviour-Based Mobile Robots: A Survey. Artificial Intelligence Review12: 445–468. Google Scholar
Sen, S. & Sekaran, M. (1998): Individual Learning of Coordination Knowledge. Journal of Experimental and Theoretical Artificial Intelligence3: 333–356. Google Scholar
Singh, S. P. & Bertsekas, D. (1997): Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems. In Mozer, M. C., Jordan, M. I. & Petsche, T. (eds.) Advances in Neural Information Processing Systems 9.
Singh, S. P. & Dayan, P. (1996): Analytical Mean Squared Error Curves for Temporal Difference Learning. Machine Learning, in press.
Singh, S. P. & Jaakkola, T., & Jordan, M. I. (1995): Reinforcement Learning with Soft State Aggregation. In Tesauro, G., Touretzky, D. S., & Leen, T. K. (eds.) Advances in Neural Infromation Processing Systems7: 361–368.
Striebel, C. T. (1965): Sufficient Statistics in the Optimal Control of Stochastic System. Journal of Math. Analysis and Applications12: 576–592. Google Scholar
Sutton, R. S.: (1988): Learning to Predict by the Method of Temporal Differences. Machine Learning3: 9–44. Google Scholar
Sutton, R. S.: (1990): Integrated Architectures for Learning, Planning and Reacting Based on Approximating Dynamic Programming. Procs. Of the 7th International Conf. on Machine Leaning. 216-224.
Sutton, R. S. (1996): Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In Touretzky, D. S., Mozer, M. C., & Hasselmo, M. E. (eds.) Advances in Neural Information Processing Systems8: 1038–1044.
Sutton, R. S. & Barto, A. G. (1990): Time-Derivative Models of Pavlovian Reinforcement. Learning and Computational Neuroscience: Foundations for Adaptive Networks. MIT Press.
Sutton, R. S., Precup, D. & Singh, S. (1998): Between MDPs and Semi-MDPs: Learning, Planning and Representing Knowledge at Multiple Temporal Scales. Technical Report 98-74, Department of Computer Science - University of Massachusetts, Amherst. Google Scholar
Szepesvári, C. (1997): Static and Dynamic Aspects of Optimal Sequential Decision Making. PhD thesis, József Attila University, Szeged, Hungary. Google Scholar
Szepesvári, C. (1996): Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms. CS-96-11, Brown University, Department of Computer Science, Brown University, Providence, Rhode Island 02912. Google Scholar
Tadepalli, P. & Ok, D. (1998): Model-based Average Reward Reinforcement Learning. Artificial Intelligence100: 177–224. Google Scholar
Tesauro, G. (1992): Practical Issues in Temporal Difference Learning. Machine Learning8: 257–277. Google Scholar
Tesauro, G. (1995): Temporal Difference Learning and T-D-Gammon. Communications of the ACM38: 58–67. Google Scholar
Tsitsiklis, J. N. & Roy, B. V. (1996): Feature-based Methods for Large Scale Dynamic Programming. Machine Learning22: 59–94. Google Scholar
Watkins, C. J. C. H. (1989): learning from Delayed Reward. PhD thesis, University of Cambridge.
Whitehead, S. D. & Ballard, D. H. (1990): Active Perception and Reinforcement Learning. Neural Computation2: 409–419. Google Scholar
Whitehead, S. D. & Lin, L.-J. (1995): Reinforcement Learning of non-Markov Decision Processes. Artificial intelligence73: 271–306. Google Scholar
Wolpert, D., Sil, J. & Tumer, K. (2001): Reinforcement Learning in Distributed Domains: Beyond Team Games. Proceedings of the 17th International Joint Conference on Artificial Intelligence2: 819–824. Google Scholar
Wyatt, J. (1997): Exploration and Inference in Learning from reinforcement. PhD thesis, University of Edinburgh.
Wyatt, J., Hoar, J. & Hayes, G. (1998): Design Analysis and Comparison of Robot Learners. Robotics and Autonomous Systems24: 17–32. Google Scholar