Abstract

We investigate the effectiveness of self-training PCFG grammars with latent annotations (PCFG-LA) for parsing languages with different amounts of labeled training data. Compared to Charniak's lexicalized parser, the PCFG-LA parser was more effectively adapted to a language for which parsing has been less well developed (i.e., Chinese) and benefited more from self-training. We show for the first time that self-training is able to significantly improve the performance of the PCFG-LA parser, a single generative parser, on both small and large amounts of labeled training data. Our approach achieves state-of-the-art parsing accuracies for a single parser on both English (91.5%) and Chinese (85.2%).

Keywords

Computer scienceRule-based machine translationNatural language processingArtificial intelligenceProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
2009
Type
article
Volume
2
Pages
832-832
Citations
87
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

87
OpenAlex

Cite This

Zhongqiang Huang, Mary P. Harper (2009). Self-training PCFG grammars with latent annotations across languages. , 2 , 832-832. https://doi.org/10.3115/1699571.1699621

Identifiers

DOI
10.3115/1699571.1699621