Abstract

In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced a little over one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently surpassed the level of non-expert humans, suggesting limited headroom for further research. In this paper we present SuperGLUE, a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard. SuperGLUE is available at super.gluebenchmark.com.

Keywords

Benchmark (surveying)Computer scienceSet (abstract data type)Metric (unit)SoftwareMachine learningLanguage modelArtificial intelligenceNatural language processingProgramming languageEngineeringOperations management

Affiliated Institutions

Related Publications

Publication Info

Year
2019
Type
preprint
Citations
985
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

985
OpenAlex

Cite This

Alex Wang, Yada Pruksachatkun, Nikita Nangia et al. (2019). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. arXiv (Cornell University) . https://doi.org/10.48550/arxiv.1905.00537

Identifiers

DOI
10.48550/arxiv.1905.00537