Abstract

Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, it has become unclear which elements of NMT architectures have a significant impact on translation quality. In this work, we present a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on a WMT English to German translation task. Our experiments provide practical insights into the relative importance of factors such as embedding size, network depth, RNN cell type, residual connections, attention mechanism, and decoding heuristics. As part of this contribution, we also release an open-source NMT framework in TensorFlow to make it easy for others to reproduce our results and perform their own experiments.

Keywords

Machine translationComputer scienceHyperparameterArtificial intelligenceHeuristicsMachine learningTranslation (biology)Task (project management)Decoding methodsEmbeddingArtificial neural networkResidualVariance (accounting)USableField (mathematics)Algorithm

Affiliated Institutions

Related Publications

Publication Info

Year
2017
Type
article
Citations
458
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

458
OpenAlex

Cite This

Denny Britz, Anna Goldie, Minh-Thang Luong et al. (2017). Massive Exploration of Neural Machine Translation Architectures. . https://doi.org/10.18653/v1/d17-1151

Identifiers

DOI
10.18653/v1/d17-1151