Abstract
Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning techniques have shown promise in simple domains. However, a number of hurdles must be passed before they are applicable to realistic problems. This paper describes one such difficulty, the input generalization problem (whereby the system must generalize to produce similar actions in similar situations) and an implemented solution, the G algorithm. This algorithm is based on recursive splitting of the state space based on statistical measures of differences in reinforcements received. Connectionist backpropagation has previously been used for input generalization in reinforcement learning. We compare the two techniques analytically and empirically. The G algorithm's sound statistical basis makes it easy to predict when it should and should not work, whereas the behavior of backpropagation is unpredictable. We found that a previous successful use of backpropagation can be explained by the linearity of the application domain. We found that in another domain, G reliably found the optimal policy, whereas none of a set of runs of backpropagation with many combinations of parameters did. 1
Keywords
Related Publications
Learning and Problem Solving with Multilayer Connectionist Systems
The difficulties of learning in multilayered networks of computational units has limited the use of connectionist systems in complex domains. This dissertation elucidates the is...
A Comparison between Recursive Neural Networks and Graph Neural Networks
Recursive neural networks (RNNs) and graph neural networks (GNNs) are two connectionist models that can directly process graphs. RNNs and GNNs exploit a similar processing frame...
Searching for Diverse, Cooperative Populations with Genetic Algorithms
In typical applications, genetic algorithms (GAs) process populations of potential problem solutions to evolve a single population member that specifies an ‘optimized’ solution....
Competitive Anti-Hebbian Learning of Invariants
Although the detection of invariant structure in a given set of input patterns is vital to many recognition tasks, connectionist learning rules tend to focus on directions of hi...
Learning to Generalize: Meta-Learning for Domain Generalization
Domain shift refers to the well known problem that a model trained in one source domain performs poorly when appliedto a target domain with different statistics. Domain Generali...
Publication Info
- Year
- 1991
- Type
- article
- Pages
- 726-731
- Citations
- 249
- Access
- Closed