Continuous control with deep reinforcement learning

Abstract

Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.

Keywords

Reinforcement learningComputer scienceDomain (mathematical analysis)Artificial intelligenceAction (physics)Control (management)SwingArchitectureDeep learningEngineeringMathematics

Affiliated Institutions

Related Publications

Asynchronous Methods for Deep Reinforcement Learning

Volodymyr Mnih , Adrià Puigdomènech Badia , Mehdi Mirza +5 more

We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network contro...

2016 arXiv (Cornell University) 1690 citations

Active perception and recognition learning system based on Actor‐Q architecture

Katsunari Shibata , T. Nishino , Yoichi Okabe

Abstract This paper proposes the Actor‐Q architecture, which is a combination of Q‐Learning and Actor‐Critic architecture, as well as the active perception and recognition learn...

2002 Systems and Computers in Japan 8 citations

Deterministic policy gradient algorithms

David Silver , Guy Lever , Nicolas Heess +3 more

2014 In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly ...

2014 HAL (Le Centre pour la Communication ... 1738 citations

Counterfactual Multi-Agent Policy Gradients

Jakob Foerster , Gregory Farquhar , Triantafyllos Afouras +2 more

Many real-world problems, such as network packet routing and the coordination of autonomous vehicles, are naturally modelled as cooperative multi-agent systems. There is a great...

2018 Proceedings of the AAAI Conference on... 1491 citations

Actor–Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems

Bahare Kiumarsi , Frank L. Lewis

This paper presents a partially model-free adaptive optimal control solution to the deterministic nonlinear discrete-time (DT) tracking control problem in the presence of input ...

2014 IEEE Transactions on Neural Networks ... 311 citations

Publication Info

Year: 2016
Type: article
Citations: 6768
Access: Closed

External Links

Citation Metrics

6768

OpenAlex

Cite This

APA Style

                            
                                    Timothy Lillicrap, 
                                
                                    Jonathan J. Hunt, 
                                
                                    Alexander Pritzel
                                
                                et al.
                            
                            (2016). 
                            Continuous control with deep reinforcement learning. 
                            arXiv (Cornell University)
                            
                            .