Q-learning | RDL Research Database

Keywords

SketchAction (physics)Markov processMarkov decision processConvergence (economics)Computer scienceQ-learningSimple (philosophy)Mathematical optimizationDynamic programmingArtificial intelligenceMathematicsApplied mathematicsReinforcement learningAlgorithmStatistics

Affiliated Institutions

University of Edinburgh GB

Related Publications

A Probabilistic Production and Inventory Problem

F. d'Epenoux

R. Howard (R. Howard.1960. Dynamic Programming and Markov Processes. John Wiley and Sons, Inc., New York.) and A. Manne (A. Manne. 1960. Linear programming and sequential decisi...

1963 Management Science 200 citations

Decentralized learning in finite Markov chains

Richard M. Wheeler , Kumpati S. Narendra

The principal contribution of this paper is a new result on the decentralized control of finite Markov chains with unknown transition probabilities and rewords. One decentralize...

1986 IEEE Transactions on Automatic Control 100 citations

Markov Decision Processes: Discrete Stochastic Dynamic Programming.

Kasra Hazeghi , Martin L. Puterman

From the Publisher: The past decade has seen considerable theoretical and applied research on Markov decision processes, as well as the growing use of these models in ecology, ...

1995 Journal of the American Statistical A... 8422 citations

Actor-Critic Reinforcement Learning with Energy-Based Policies

Nicolas Heess , David Silver , Yee Whye Teh

We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly re...

2012 48 citations

Generalization in Reinforcement Learning: Safely Approximating the Value Function

Justin A. Boyan , Andrew Moore

A straightforward approach to the curse of dimensionality inreinforcement learning and dynamic programming is to replace the lookup table with a generalizing function approximat...

1994 506 citations

Publication Info

Year: 1992
Type: article
Volume: 8
Issue: 3-4
Pages: 279-292
Citations: 8791
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Q-learning

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

8791

OpenAlex

Cite This

APA Style

                            
                                    Christopher J. Watkins, 
                                
                                    Peter Dayan
                                
                            (1992). 
                            Q-learning. 
                            Machine Learning
                            , 8
                            (3-4)
                            , 279-292.
                            https://doi.org/10.1007/bf00992698

Identifiers

DOI: 10.1007/bf00992698