Abstract

This dissertation describes computational experiments comparing the performance of a range of reinforcement-learning algorithms. The experiments are designed to focus on aspects of the credit-assignment problem having to do with determining when the behavior that deserves credit occurred. The issues of knowledge representation involved in developing new features or refining existing ones are not addressed. The algorithms considered include some from learning automata theory, mathematical learning theory, early "cybernetic" approaches to learning, Samuel's checker-playing program, Michie and Chambers's "Boxes" system, and a number of new algorithms. The tasks were selected so as to involve, first in isolation and then in combination, the issues of misleading generalizations, delayed reinforcement, unbalanced reinforcement, and secondary reinforcement. The tasks range from simple, abstract "two-armed bandit" tasks to a physically realistic pole-balancing task. The results indicate several areas where the algorithms presented here perform substantially better than those previously studied. An unbalanced distribution of reinforcement, misleading generalizations, and delayed reinforcement can greatly retard learning and in some cases even make it counterproductive. Performance can be substantially improved in the presence of these common problems through the use of mechanisms of reinforcement comparison and secondary reinforcement. We present a new algorithm similar to the "learning-by-generalization" algorithm used for altering the static evaluation function in Samuel's checker-playing program. Simulation experiments indicate that the new algorithm performs better than a version of Samuel's algorithm suitably modified for reinforcement learning tasks. Theoretical analysis in terms of an "ideal reinforcement signal" sheds light on the relationship between these two algorithms and other temporal credit-assignment algorithms.

Keywords

Reinforcement learningReinforcementComputer scienceArtificial intelligencePsychologySocial psychology

Related Publications

Temporal difference learning and TD-Gammon

Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, che...

1995 Communications of the ACM 1457 citations

Publication Info

Year
1984
Type
article
Citations
778
Access
Closed

External Links

Citation Metrics

778
OpenAlex

Cite This

Richard S. Sutton (1984). Temporal credit assignment in reinforcement learning. Scholarworks (University of Massachusetts Amherst) .