Abstract

Following the recent adoption by the machine translation community of automatic evaluation using the BLEU/NIST scoring process, we conduct an in-depth study of a similar idea for evaluating summaries. The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.

Keywords

NISTBLEUComputer sciencen-gramMachine translationArtificial intelligenceNatural language processingEvaluation of machine translationGramProcess (computing)Evaluation methodsMachine learningLanguage modelReliability engineeringProgramming language

Affiliated Institutions

Related Publications

BLEU

Human evaluations of machine translation are extensive but expensive. Human evaluations can take months to finish and involve human labor that can not be reused. We propose a me...

2001 20359 citations

Publication Info

Year
2003
Type
article
Volume
1
Pages
71-78
Citations
1573
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1573
OpenAlex

Cite This

Chin-Yew Lin, Eduard Hovy (2003). Automatic evaluation of summaries using N-gram co-occurrence statistics. , 1 , 71-78. https://doi.org/10.3115/1073445.1073465

Identifiers

DOI
10.3115/1073445.1073465