A. R. Cassandra, L. P. Kaelbling, and M. Littman, Acting Optimally in Partially Observable Stochastic Domains, Proceedings of the Twelfth National Conf. on Artificial Intelligence (AAAI-94), 1994.

T. Darrell, Reinforcement Learning of Active Recognition Behaviors, Interval Research Technical Report 1997-045 Available from: http://www.ai.mit, Portions of this paper previously appeared in Advances in Neural Information Processing Systems 8, (NIPS '95), pp.858-864, 1997.

M. Dorigo, Introduction to the Special Issue on Learning Autonomous Robots, IEEE Trans. on Systems, Man and Cyberneticspart B, vol.26, issue.3, pp.361-364, 1996.

J. Heemskerk and N. Sharkey, Learning Subsumptions for an Autonomous Robot, IEE seminar on self-learning robot, Digest No: 96/026, 1996.

M. L. Littman, Memoryless policies: theoretical limitations and practical results, From Animals to Animats 3: Proceedings of the Third Int, Conf. on Simulation of Adaptive Behavior, 1996.

R. A. Mccallum, Instance-based State Identification for Reinforcement Learning, Advances In Neural Information Processing Systems, 1995.

M. J. Mataric, Learning social behavior, Robotics and Autonomous Systems, vol.20, issue.2-4, pp.191-204, 1997.
DOI : 10.1016/S0921-8890(96)00068-1

F. Michaud and M. J. Mataric, Learning from History for Behavior-Based Mobile Robots in Non- Stationary Conditions, Special joint issue Machine Learning and Autonomous Robots Journals, 1998.

L. E. Parker, C. Touzet, and F. Fernandez, Techniques for Learning in Multi-Robot Teams, Robot Teams: From Diversity to Polymorphism, 2002.

L. E. Parker and C. Touzet, Multi-Robot Learning in a Cooperative Observation Task, Distributed Autonomous Robotic Systems 4, pp.391-401, 2000.

J. M. Santos and C. Touzet, Exploration tuned reinforcement function, Neurocomputing, vol.28, issue.1-3, pp.93-105, 1999.
DOI : 10.1016/S0925-2312(98)00117-9

J. W. Sheppard and S. L. Salzberg, A Teaching Strategy for Memory-Based Control, Lazy Learning, pp.343-370, 1997.
DOI : 10.1007/978-94-017-2053-3_13

J. Schmidhuber and J. Zhao, Multi-Agent Learning with the Success-Story Algorithm, Distributed Artificial Intelligence Meets Machine Learning-Learning in Multi-Agent Environments, Lecture Notes in Artificial Intelligence, vol.1221, pp.82-93, 1997.

S. P. Singh, T. Jaakkola, and M. Jordan, Learning Without State-Estimation in Partially Observable Markovian Decision Processes, Proceedings of the Eleventh Int. Machine Learning Conf, 1994.
DOI : 10.1016/B978-1-55860-335-6.50042-8

R. S. Sutton, Reinforcement Learning Architectures for Animats, Proceedings of the First Int. Conf. on Simulation of Adaptive Behavior, From Animals to Animats, pp.288-296, 1991.

C. Touzet, Robot Awareness in Cooperative Mobile Robotics, Autonomous Robots, vol.8, issue.1, pp.87-97, 2000.
DOI : 10.1023/A:1008945119734

C. Touzet, Neural Reinforcement Learning for Behaviour Synthesis, Special issue on Learning Robot: the New Wave, Robotics and Autonomous Systems, pp.3-4, 1997.

C. J. Watkins, King's College, Learning from Delayed Rewards, 1989.

S. Whitehead, J. Karlsson, and J. Tenenberg, Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging, Robot Learning, 1993.