Abstract

2014 In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.

Keywords

Computer scienceAlgorithm

Affiliated Institutions

Related Publications

Publication Info

Year
2014
Type
preprint
Citations
1738
Access
Closed

External Links

Citation Metrics

1738
OpenAlex

Cite This

David Silver, Guy Lever, Nicolas Heess et al. (2014). Deterministic policy gradient algorithms. HAL (Le Centre pour la Communication Scientifique Directe) .