Abstract

quanteda is an R package providing a comprehensive workflow and toolkit for natural language processing tasks such as corpus management, tokenization, analysis, and visualization.It has extensive functions for applying dictionary analysis, exploring texts using keywords-in-context, computing document and feature similarities, and discovering multi-word expressions through collocation scoring.Based entirely on sparse operations, it provides highly efficient methods for compiling document-feature matrices and for manipulating these or using them in further quantitative analysis.Using C++ and multithreading extensively, quanteda is also considerably faster and more efficient than other R and Python packages in processing large textual data.

Keywords

R packageComputer scienceNatural language processingInformation retrievalProgramming language

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Volume
3
Issue
30
Pages
774-774
Citations
1221
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1221
OpenAlex

Cite This

Kenneth Benoit, Kohei Watanabe, H. P. Wang et al. (2018). quanteda: An R package for the quantitative analysis of textual data. The Journal of Open Source Software , 3 (30) , 774-774. https://doi.org/10.21105/joss.00774

Identifiers

DOI
10.21105/joss.00774