Abstract

Recurrent neural networks are convenient and efficient models for language modeling. However, when applied on the level of characters instead of words, they suffer from several problems. In order to successfully model long-term dependencies, the hidden representation needs to be large. This in turn implies higher computational costs, which can become prohibitive in practice. We propose two alternative structural modifications to the classical RNN model. The first one consists on conditioning the character level representation on the previous word representation. The other one uses the character history to condition the output probability. We evaluate the performance of the two proposed modifications on challenging, multi-lingual real world data.

Keywords

Character (mathematics)Representation (politics)Computer scienceRecurrent neural networkArtificial intelligenceWord (group theory)Language modelTerm (time)Natural language processingArtificial neural networkMachine learningMathematics

Related Publications

Finding Structure in Time

Time underlies many interesting human behaviors. Thus, the question of how to represent time in connectionist models is very important. One approach is to represent time implici...

1990 Cognitive Science 10427 citations

Publication Info

Year
2015
Type
preprint
Citations
39
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

39
OpenAlex

Cite This

Piotr Bojanowski, Armand Joulin, Tomáš Mikolov (2015). Alternative structures for character-level RNNs. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.1511.06303

Identifiers

DOI
10.48550/arxiv.1511.06303