Abstract

In this paper we study in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time. We define several importance metrics, ordering schemes, and performance evaluation measures for this problem. We also experimentally evaluate the ordering schemes on the Stanford University Web. Our results show that a crawler with a good ordering scheme can obtain important pages significantly faster than one without.

Keywords

Web crawlerCrawlingComputer scienceFocused crawlerOrder (exchange)Web pageInformation retrievalWorld Wide WebScheme (mathematics)Data miningStatic web pageWeb navigationMathematics

Affiliated Institutions

Related Publications

Opinion spam and analysis

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sourc...

2008 1481 citations

Publication Info

Year
1998
Type
article
Volume
30
Issue
1-7
Pages
161-172
Citations
840
Access
Closed

External Links

Social Impact

Altmetric

Social media, news, blog, policy document mentions

Citation Metrics

840
OpenAlex

Cite This

Junghoo Cho, Héctor García-Molina, Lawrence M. Page (1998). Efficient crawling through URL ordering. Computer Networks and ISDN Systems , 30 (1-7) , 161-172. https://doi.org/10.1016/s0169-7552(98)00108-1

Identifiers

DOI
10.1016/s0169-7552(98)00108-1