Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains (original) (raw)

In this paper, we i n troduce FirstVisit Pro t-Sharing (FVPS) as a credit assignment procedure, an important issue in classi er systems and reinforcement learning frameworks. FVPS reinforces e ective rules to make a n agent acquire stochastic policies that cause it to behave v ery robustly within uncertain domains, without pre-de ned knowledge or subgoals. We use an internal episodic memory, not only to identify perceptual aliasing states but also to discard looping behavior and to acquire e ective s t o c hastic policies to escape perceptual deceptive states. We demonstrate the e ectiveness of our method in some typical classes of Partially Observable Markov Decision Processes, comparing with Sarsa() using a replacing eligibility trace. We claim that this approach results in an e ective stochastic or deterministic policy which is appropriate for the environment.