Abstract

BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. Our findings are fourfold. BERT’s phrasal representation captures the phrase-level information in the lower layers. The intermediate layers of BERT compose a rich hierarchy of linguistic information, starting with surface features at the bottom, syntactic features in the middle followed by semantic features at the top. BERT requires deeper layers while tracking subject-verb agreement to handle long-term dependency problem. Finally, the compositional scheme underlying BERT mimics classical, tree-like structures.

Keywords

Computer scienceHierarchyNatural language processingArtificial intelligenceRepresentation (politics)Dependency (UML)PhraseLanguage modelInformation structureTerm (time)Verb phraseLinguisticsNoun phraseNoun

Affiliated Institutions

Related Publications

ASPECTS OF THE THEORY OF SYNTAX

Abstract : Contents: Methodological preliminaries: Generative grammars as theories of linguistic competence; theory of performance; organization of a generative grammar; justifi...

1964 10932 citations

Publication Info

Year
2019
Type
preprint
Pages
3651-3657
Citations
1155
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1155
OpenAlex

Cite This

Ganesh Jawahar, Benoît Sagot, Djamé Seddah (2019). What Does BERT Learn about the Structure of Language?. , 3651-3657. https://doi.org/10.18653/v1/p19-1356

Identifiers

DOI
10.18653/v1/p19-1356