Abstract
In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a reinforcement function computes feedback signals from the input-output pairs. The task is to discover and remember input-output pairs that generate rewards. Especially difficult cases occur when rewards are rare, since the expected time for any algorithm can grow exponentially with the size of the problem. Nonetheless, if a reinforcement function possesses regularities, and a learning algorithm exploits them, learning time can be reduced below that of non-generalizing algorithms. This paper describes a neural network algorithm called complementary reinforcement back-propagation (CRBP), and reports simulation results on problems designed to offer differing opportunities for generalization.
Keywords
Related Publications
Playing Atari with Deep Reinforcement Learning
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolu...
Generalization in Reinforcement Learning: Safely Approximating the Value Function
A straightforward approach to the curse of dimensionality inreinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximat...
Function Optimization using Connectionist Reinforcement Learning Algorithms
Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function value...
On the use of backpropagation in associative reinforcement learning
A description is given of several ways that backpropagation can be useful in training networks to perform associative reinforcement learning tasks. One way is to train a second ...
Input generalization in delayed reinforcement learning: an algorithm and performance comparisons
Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning te...
Publication Info
- Year
- 1989
- Type
- article
- Volume
- 2
- Pages
- 550-557
- Citations
- 54
- Access
- Closed