Effective self-training for parsing
2006
615 citations
We present a simple, but surprisingly effective, method of self-training a twophase parser-reranker system using readily available unlabeled data.We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker.Our improved model achieves an f -score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing.Finally, we provide some analysis to better understand the phenomenon.
We investigate the effectiveness of self-training PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared...
Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms ...
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore ...
Human ability to understand language is general, flexible, and robust. In contrast, most NLU models above the word level are designed for a specific task and struggle with out-o...
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning...
Social media, news, blog, policy document mentions