Abstract

Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically generated unanswerable questions that are easy to identify. To address these weaknesses, we present SQuADRUn, a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuADRUn, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuADRUn is a challenging natural language understanding task for existing models: a strong neural system that gets 86% F1 on SQuAD achieves only 66% F1 on SQuADRUn. We release SQuADRUn to the community as the successor to SQuAD.

Keywords

Context (archaeology)Question answeringParagraphComputer scienceTask (project management)Reading (process)Successor cardinalComprehensionFocus (optics)Natural (archaeology)Artificial intelligenceNatural languageEpistemologyWorld Wide WebLinguisticsPhilosophyMathematicsHistory

Affiliated Institutions

Related Publications

Publication Info

Year
2018
Type
article
Pages
784-789
Citations
2087
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2087
OpenAlex
758
Influential
732
CrossRef

Cite This

Pranav Rajpurkar, Robin Jia, Percy Liang (2018). Know What You Don’t Know: Unanswerable Questions for SQuAD. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 784-789. https://doi.org/10.18653/v1/p18-2124

Identifiers

DOI
10.18653/v1/p18-2124
arXiv
1806.03822

Data Quality

Data completeness: 84%