Abstract

Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning techniques have shown promise in simple domains. However, a number of hurdles must be passed before they are applicable to realistic problems. This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm. This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received. Connectionist backpropagation has previously been used for input generalization in reinforcement learning. We compare the two techniques analytically and empirically. The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of backpropagation is unpredictable. We found that a previous successful use of backpropagation can be explained by the linearity of the application domain. We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did. 1

Keywords

BackpropagationReinforcement learningGeneralizationComputer scienceArtificial intelligenceSet (abstract data type)Machine learningStability (learning theory)Artificial neural networkConnectionismAlgorithmMathematics

Related Publications

Publication Info

Year
1991
Type
article
Pages
726-731
Citations
249
Access
Closed

External Links

Citation Metrics

249
OpenAlex

Cite This

David Chapman, Leslie Pack Kaelbling (1991). Input generalization in delayed reinforcement learning: an algorithm and performance comparisons. , 726-731.