Abstract
Distributed word representations have recently been proven to be an invaluable resource for NLP. These representations are normally learned using neural networks and capture syntactic and semantic information about words. Informa-tion about word morphology and shape is nor-mally ignored when learning word representa-tions. However, for tasks like part-of-speech tag-ging, intra-word information is extremely use-ful, specially when dealing with morphologically rich languages. In this paper, we propose a deep neural network that learns character-level repre-sentation of words and associate them with usual word representations to perform POS tagging. Using the proposed approach, while avoiding the use of any handcrafted feature, we produce state-of-the-art POS taggers for two languages: En-glish, with 97.32 % accuracy on the Penn Tree-bank WSJ corpus; and Portuguese, with 97.47% accuracy on the Mac-Morpho corpus, where the latter represents an error reduction of 12.2 % on the best previous known result. 1.
Keywords
Affiliated Institutions
Related Publications
Feature-rich part-of-speech tagging with a cyclic dependency network
We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representati...
Learning the hidden structure of speech
In the work described here, the backpropagation neural network learning procedure is applied to the analysis and recognition of speech. This procedure takes a set of input/outpu...
An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition
Modeling phonological units of speech is a critical issue in speech recognition. In this paper, our recent development of an overlapping-feature-based phonological model that re...
Glove: Global Vectors for Word Representation
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the o...
An exploration of large vocabulary tools for small vocabulary phonetic recognition
While research in large vocabulary continuous speech recognition (LVCSR) has sparked the development of many state of the art research ideas, research in this domain suffers fro...
Publication Info
- Year
- 2014
- Type
- article
- Pages
- 1818-1826
- Citations
- 555
- Access
- Closed