Effective self-training for parsing | RDL Research Database

Abstract

We present a simple, but surprisingly effective, method of self-training a twophase parser-reranker system using readily available unlabeled data.We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker.Our improved model achieves an f -score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing.Finally, we provide some analysis to better understand the phenomenon.

Keywords

ParsingComputer scienceBootstrapping (finance)Discriminative modelArtificial intelligenceNatural language processingBottom-up parsingTraining setSimple (philosophy)Top-down parsingMachine learningMathematics

Affiliated Institutions

Brown University US

Related Publications

Self-training PCFG grammars with latent annotations across languages

Zhongqiang Huang , Mary P. Harper

We investigate the effectiveness of self-training PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared...

2009 87 citations

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

Tao Shen , Tianyi Zhou , Guodong Long +3 more

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms ...

2018 Proceedings of the AAAI Conference on... 729 citations

Enriching Word Vectors with Subword Information

Piotr Bojanowski , Édouard Grave , Armand Joulin +1 more

Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore ...

2017 Transactions of the Association for C... 9444 citations

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Alex Wang , Amanpreet Singh , Julian Michael +3 more

Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-o...

2018 International Conference on Learning ... 1865 citations

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis , Yinhan Liu , Naman Goyal +5 more

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning...

2020 1222 citations

Publication Info

Year: 2006
Type: article
Pages: 152-159
Citations: 615
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Effective self-training for parsing

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

615

OpenAlex

Cite This

APA Style

                            
                                    David McClosky, 
                                
                                    Eugene Charniak, 
                                
                                    Mark Johnson
                                
                            (2006). 
                            Effective self-training for parsing. 
                            
                            , 152-159.
                            https://doi.org/10.3115/1220835.1220855

Identifiers

DOI: 10.3115/1220835.1220855