Learning accurate, compact, and interpretable tree annotation

Slav Petrov; Leon Barrett; Romain Thibaux; Dan Klein

doi:10.3115/1220175.1220230

Abstract

We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple X-bar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems.

Keywords

TreebankTerminal and nonterminal symbolsComputer scienceAnnotationNatural language processingArtificial intelligenceTree (set theory)Rule-based machine translationGrammarGrammar inductionMathematicsLinguistics

Affiliated Institutions

University of California, Berkeley US

Related Publications

Parsing Natural Scenes and Natural Language with Recursive Neural Networks

Richard Socher , Cliff Chiung-Yu Lin , Christopher D. Manning +1 more

Recursive structure is commonly found in the inputs of different modalities such as natural scene images or natural language sentences. Discovering this recursive structure help...

2011 1202 citations

Cross-lingual Language Model Pretraining

Guillaume Lample , Alexis Conneau

Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages...

2019 arXiv (Cornell University) 1617 citations

A Bayesian Hierarchical Model for Learning Natural Scene Categories

Li Fei-Fei , Pietro Perona

We propose a novel approach to learn and recognize natural scene categories. Unlike previous work, it does not require experts to annotate the training set. We represent the ima...

2005 3589 citations

Publication Info

Year: 2006
Type: article
Pages: 433-440
Citations: 808
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Learning accurate, compact, and interpretable tree annotation

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

808

OpenAlex

Cite This

APA Style

                            
                                    Slav Petrov, 
                                
                                    Leon Barrett, 
                                
                                    Romain Thibaux
                                
                                et al.
                            
                            (2006). 
                            Learning accurate, compact, and interpretable tree annotation. 
                            
                            , 433-440.
                            https://doi.org/10.3115/1220175.1220230

Identifiers

DOI: 10.3115/1220175.1220230