Abstract

We present a framework for information retrieval that combines document models and query models using a probabilistic ranking function based on Bayesian decision theory. The framework suggests an operational retrieval model that extends recent developments in the language modeling approach to information retrieval. A language model for each document is estimated, as well as a language model for each query, and the retrieval problem is cast in terms of risk minimization. The query language model can be exploited to model user preferences, the context of a query, synonomy and word senses. While recent work has incorporated word translation models for this purpose, we introduce a new method using Markov chains defined on a set of documents to estimate the query models. The Markov chain method has connections to algorithms from link analysis and social networks. The new approach is evaluated on TREC collections and compared to the basic language modeling approach and vector space models together with query expansion using Rocchio. Significant improvements are obtained over standard query expansion methods for strong baseline TF-IDF systems, with the greatest improvements attained for short queries on Web data.

Keywords

Computer scienceQuery expansionQuery languageLanguage modelRanking (information retrieval)Information retrievalWeb query classificationQuery optimizationRDF query languageDivergence-from-randomness modelWeb search queryVector space modelArtificial intelligenceNatural language processingProbabilistic logicData miningSearch engine

Affiliated Institutions

Related Publications

Publication Info

Year
2017
Type
article
Volume
51
Issue
2
Pages
251-259
Citations
772
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

772
OpenAlex

Cite This

John Lafferty, ChengXiang Zhai (2017). Document Language Models, Query Models, and Risk Minimization for Information Retrieval. ACM SIGIR Forum , 51 (2) , 251-259. https://doi.org/10.1145/3130348.3130375

Identifiers

DOI
10.1145/3130348.3130375