Abstract
BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. Our findings are fourfold. BERT’s phrasal representation captures the phrase-level information in the lower layers. The intermediate layers of BERT compose a rich hierarchy of linguistic information, starting with surface features at the bottom, syntactic features in the middle followed by semantic features at the top. BERT requires deeper layers while tracking subject-verb agreement to handle long-term dependency problem. Finally, the compositional scheme underlying BERT mimics classical, tree-like structures.
Keywords
Affiliated Institutions
Related Publications
ASPECTS OF THE THEORY OF SYNTAX
Abstract : Contents: Methodological preliminaries: Generative grammars as theories of linguistic competence; theory of performance; organization of a generative grammar; justifi...
ERNIE: Enhanced Language Representation with Informative Entities
Neural language representation models such as BERT pre-trained on large-scale corpora can well capture rich semantic patterns from plain text, and be fine-tuned to consistently ...
An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition
Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that re...
Natural Language Processing with Python
This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filt...
Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure help...
Publication Info
- Year
- 2019
- Type
- preprint
- Pages
- 3651-3657
- Citations
- 1155
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.18653/v1/p19-1356