Deterministic policy gradient algorithms

David Silver; Guy Lever; Nicolas Heess; Thomas Degris; Daan Wierstra; Martin Riedmiller

Abstract

2014 In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.

Keywords

Computer scienceAlgorithm

Affiliated Institutions

DeepMind (United Kingdom) GB

Related Publications

Continuous control with deep reinforcement learning

Timothy Lillicrap , Jonathan J. Hunt , Alexander Pritzel +5 more

Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the determinist...

2016 arXiv (Cornell University) 6768 citations

Asynchronous Methods for Deep Reinforcement Learning

Volodymyr Mnih , Adrià Puigdomènech Badia , Mehdi Mirza +5 more

We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network contro...

2016 arXiv (Cornell University) 1690 citations

Counterfactual Multi-Agent Policy Gradients

Jakob Foerster , Gregory Farquhar , Triantafyllos Afouras +2 more

Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great...

2018 Proceedings of the AAAI Conference on... 1491 citations

Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems

Bahare Kiumarsi , Frank L. Lewis

This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input ...

2014 IEEE Transactions on Neural Networks ... 311 citations

Efficient projections onto the<i>l</i><sub>1</sub>-ball for learning in high dimensions

John C. Duchi , Shai Shalev‐Shwartz , Yoram Singer +1 more

We describe efficient algorithms for projecting a vector onto the ℓ1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, wher...

2008 1202 citations

Publication Info

Year: 2014
Type: preprint
Citations: 1738
Access: Closed

External Links

Citation Metrics

1738

OpenAlex

Cite This

APA Style

                            
                                    David Silver, 
                                
                                    Guy Lever, 
                                
                                    Nicolas Heess
                                
                                et al.
                            
                            (2014). 
                            Deterministic policy gradient algorithms. 
                            HAL (Le Centre pour la Communication Scientifique Directe)
                            
                            .