Abstract

In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in different scripts, that transfer works best between typologically similar languages, that monolingual corpora can train models for code-switching, and that the model can find translation pairs. From these results, we can conclude that M-BERT does create multilingual representations, but that these representations exhibit systematic deficiencies affecting certain language pairs.

Keywords

Computer scienceScripting languageNatural language processingArtificial intelligenceTask (project management)Language modelTransfer (computing)Zero (linguistics)MultilingualismCode (set theory)Translation (biology)Machine translationLinguisticsProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
preprint
Pages
4996-5001
Citations
1060
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

1060
OpenAlex

Cite This

Telmo Pires, Eva Schlinger, Dan Garrette (2019). How Multilingual is Multilingual BERT?. , 4996-5001. https://doi.org/10.18653/v1/p19-1493

Identifiers

DOI
10.18653/v1/p19-1493