Abstract
We investigate the effectiveness of self-training PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak's lexicalized parser, the PCFG-LA parser was more effectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from self-training. We show for the first time that self-training is able to significantly improve the performance of the PCFG-LA parser, a single generative parser, on both small and large amounts of labeled training data. Our approach achieves state-of-the-art parsing accuracies for a single parser on both English (91.5%) and Chinese (85.2%).
Keywords
Affiliated Institutions
Related Publications
Accurate unlexicalized parsing
We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down ...
Effective self-training for parsing
We present a simple, but surprisingly effective, method of self-training a twophase parser-reranker system using readily available unlabeled data.We show that this type of boots...
DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding
Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms ...
Enriching Word Vectors with Subword Information
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Popular models that learn such representations ignore ...
Polyglot: Distributed Word Representations for Multilingual NLP
Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word e...
Publication Info
- Year
- 2009
- Type
- article
- Volume
- 2
- Pages
- 832-832
- Citations
- 87
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.3115/1699571.1699621