Abstract

We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown, by making use of simple, linguistically motivated state splits, which break down false independence assumptions latent in a vanilla treebank grammar. Indeed, its performance of 86.36% (LP/LR F1) is better than that of early lexicalized PCFG models, and surprisingly close to the current state-of-the-art. This result has potential uses beyond establishing a strong lower bound on the maximum possible accuracy of unlexicalized models: an unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.

Keywords

TreebankComputer scienceParsingArtificial intelligenceIndependence (probability theory)GrammarNatural language processingSimple (philosophy)State (computer science)AlgorithmMathematicsLinguistics

Affiliated Institutions

Related Publications

Publication Info

Year
2003
Type
article
Volume
1
Pages
423-430
Citations
3042
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

3042
OpenAlex

Cite This

Dan Klein, Christopher D. Manning (2003). Accurate unlexicalized parsing. , 1 , 423-430. https://doi.org/10.3115/1075096.1075150

Identifiers

DOI
10.3115/1075096.1075150