Temporal credit assignment in reinforcement learning

Abstract

This dissertation describes computational experiments comparing the performance of a range of reinforcement-learning algorithms. The experiments are designed to focus on aspects of the credit-assignment problem having to do with determining when the behavior that deserves credit occurred. The issues of knowledge representation involved in developing new features or refining existing ones are not addressed. The algorithms considered include some from learning automata theory, mathematical learning theory, early "cybernetic" approaches to learning, Samuel's checker-playing program, Michie and Chambers's "Boxes" system, and a number of new algorithms. The tasks were selected so as to involve, first in isolation and then in combination, the issues of misleading generalizations, delayed reinforcement, unbalanced reinforcement, and secondary reinforcement. The tasks range from simple, abstract "two-armed bandit" tasks to a physically realistic pole-balancing task. The results indicate several areas where the algorithms presented here perform substantially better than those previously studied. An unbalanced distribution of reinforcement, misleading generalizations, and delayed reinforcement can greatly retard learning and in some cases even make it counterproductive. Performance can be substantially improved in the presence of these common problems through the use of mechanisms of reinforcement comparison and secondary reinforcement. We present a new algorithm similar to the "learning-by-generalization" algorithm used for altering the static evaluation function in Samuel's checker-playing program. Simulation experiments indicate that the new algorithm performs better than a version of Samuel's algorithm suitably modified for reinforcement learning tasks. Theoretical analysis in terms of an "ideal reinforcement signal" sheds light on the relationship between these two algorithms and other temporal credit-assignment algorithms.

Keywords

Reinforcement learningReinforcementComputer scienceArtificial intelligencePsychologySocial psychology

Related Publications

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

David Silver , Thomas Hubert , Julian Schrittwieser +10 more

One program to rule them all Computers can beat humans at increasingly complex games, including chess and Go. However, these programs are typically constructed for a particular ...

2018 Science 3322 citations

Temporal difference learning and TD-Gammon

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, che...

1995 Communications of the ACM 1457 citations

EFFECTS OF INSTRUCTIONS AND REINFORCEMENT‐FEEDBACK ON HUMAN OPERANT BEHAVIOR MAINTAINED BY FIXED‐INTERVAL REINFORCEMENT<sup>1</sup>

Alan Baron , Arnold Kaufman , Kathleen A. Stauber

In three experiments, human subjects were trained on a five‐component multiple schedule with different fixed intervals of monetary reinforcement scheduled in the different compo...

1969 Journal of the Experimental Analysis ... 157 citations

Input generalization in delayed reinforcement learning: an algorithm and performance comparisons

David Chapman , Leslie Pack Kaelbling

Delayed reinforcement learning is an attractive framework for the unsupervised learning of action policies for autonomous agents. Some existing delayed reinforcement learning te...

1991 249 citations

Some experiments in machine learning using vector evaluated genetic algorithms (artificial intelligence, optimization, adaptation, pattern recognition)

James David Schaffer

This dissertation describes experiments conducted to explore the efficacy of using vector-valued feedback with a class of adaptive procedures called genetic algorithms. The soft...

1984 121 citations

Publication Info

Year: 1984
Type: article
Citations: 778
Access: Closed

External Links

Citation Metrics

778

OpenAlex

Cite This

APA Style

                            
                                    Richard S. Sutton
                                
                            (1984). 
                            Temporal credit assignment in reinforcement learning. 
                            Scholarworks (University of Massachusetts Amherst)
                            
                            .