TensorFlow: a system for large-scale machine learning

2016 Operating Systems Design and Implementation 6,305 citations

Abstract

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous parameter server designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

Keywords

DataflowComputer scienceArtificial intelligenceMulti-core processorMachine learningComputer architectureDeep learningScalabilityInferenceArtificial neural networkDataflow architectureComputationDistributed computingParallel computingProgramming languageOperating system

Affiliated Institutions

Related Publications

GeePS

Large-scale deep learning requires huge computational resources to train a multi-layer neural network. Recent systems propose using 100s to 1000s of machines to train networks w...

2016 296 citations

Dryad

Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "ch...

2007 2446 citations

Neural GPUs Learn Algorithms

Abstract: Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neura...

2016 arXiv (Cornell University) 63 citations

Publication Info

Year
2016
Type
article
Pages
265-283
Citations
6305
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

6305
OpenAlex

Cite This

Martı́n Abadi, Paul Barham, Jianmin Chen et al. (2016). TensorFlow: a system for large-scale machine learning. Operating Systems Design and Implementation , 265-283. https://doi.org/10.5555/3026877.3026899

Identifiers

DOI
10.5555/3026877.3026899