Abstract
Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley were sought using variants of REINFORCE algorithms. Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCE/MENT, represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.
Keywords
Affiliated Institutions
Related Publications
On the use of backpropagation in associative reinforcement learning
A description is given of several ways that backpropagation can be useful in training networks to perform associative reinforcement learning tasks. One way is to train a second ...
Pattern-recognizing stochastic learning automata
A class of learning tasks is described that combines aspects of learning automation tasks and supervised learning pattern-classification tasks. These tasks are called associativ...
Generalization and Scaling in Reinforcement Learning
In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a reinforcement function computes feedbac...
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement l...
Temporal credit assignment in reinforcement learning
This dissertation describes computational experiments comparing the performance of a range of reinforcement-learning algorithms. The experiments are designed to focus on aspects...
Publication Info
- Year
- 1991
- Type
- article
- Volume
- 3
- Issue
- 3
- Pages
- 241-268
- Citations
- 296
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1080/09540099108946587