Function Optimization using Connectionist Reinforcement Learning Algorithms

Ronald J. Williams; Jing Peng

doi:10.1080/09540099108946587

Abstract

Any non-associative reinforcement learning algorithm can be viewed as a method for performing function optimization through (possibly noise-corrupted) sampling of function values. We describe the results of simulations in which the optima of several deterministic functions studied by Ackley were sought using variants of REINFORCE algorithms. Some of the algorithms used here incorporated additional heuristic features resembling certain aspects of some of the algorithms used in Ackley's studies. Differing levels of performance were achieved by the various algorithms investigated, but a number of them performed at a level comparable to the best found in Ackley's studies on a number of the tasks, in spite of their simplicity. One of these variants, called REINFORCE/MENT, represents a novel but principled approach to reinforcement learning in nontrivial networks which incorporates an entropy maximization strategy. This was found to perform especially well on more hierarchically organized tasks.

Keywords

Reinforcement learningComputer scienceConnectionismArtificial intelligenceAlgorithmAssociative propertyHeuristicMaximizationEntropy (arrow of time)Function (biology)Artificial neural networkMachine learningMathematical optimizationMathematics

Affiliated Institutions

Northeastern University US

Related Publications

On the use of backpropagation in associative reinforcement learning

Williams

A description is given of several ways that backpropagation can be useful in training networks to perform associative reinforcement learning tasks. One way is to train a second ...

1988 IEEE International Conference on Neur... 74 citations

Pattern-recognizing stochastic learning automata

Andrew G. Barto , P. Anandan

A class of learning tasks is described that combines aspects of learning automation tasks and supervised learning pattern-classification tasks. These tasks are called associativ...

1985 IEEE Transactions on Systems Man and ... 309 citations

Generalization and Scaling in Reinforcement Learning

David H. Ackley , Michael L. Littman

In associative reinforcement learning, an environment generates input vectors, a learning system generates possible output vectors, and a reinforcement function computes feedbac...

1989 54 citations

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Marc Peter Deisenroth , Carl Edward Rasmussen

In this paper, we introduce pilco, a practical, data-efficient model-based policy search method. Pilco reduces model bias, one of the key problems of model-based reinforcement l...

2011 Scientific Repository (Petra Christia... 1076 citations

Temporal credit assignment in reinforcement learning

Richard S. Sutton

This dissertation describes computational experiments comparing the performance of a range of reinforcement-learning algorithms. The experiments are designed to focus on aspects...

1984 Scholarworks (University of Massachus... 778 citations

Publication Info

Year: 1991
Type: article
Volume: 3
Issue: 3
Pages: 241-268
Citations: 296
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Function Optimization using Connectionist Reinforcement Learning Algorithms

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

296

OpenAlex

Cite This

APA Style

                            
                                    Ronald J. Williams, 
                                
                                    Jing Peng
                                
                            (1991). 
                            Function Optimization using Connectionist Reinforcement Learning Algorithms. 
                            Connection Science
                            , 3
                            (3)
                            , 241-268.
                            https://doi.org/10.1080/09540099108946587

Identifiers

DOI: 10.1080/09540099108946587