Rankers
The ranker components that order URLs in the frontier.
A ranker determines the order in which queued URLs are selected for fetching. A job sets one ranker under config.ranker. This page catalogs the available kinds and their parameters.
How the ranker is configured
config.ranker is a single object with a kind and, for some kinds, a params object. There is one ranker per job. The ranker does not change which URLs are crawled, only the order in which they leave the queue, which matters most when a run is stopped before it finishes.
config:
ranker:
kind: breadthbreadth
Visits URLs closer to the seeds first, spreading coverage evenly across a site. This is a good default for site-wide crawls. It takes no parameters.
ranker:
kind: breadthdepth
Follows a branch of links deep before widening, reaching distant pages sooner at the cost of even coverage. It takes no parameters.
ranker:
kind: depthpage_rank
Orders URLs by their link importance within the crawl, prioritizing well-connected pages.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
damping_factor | float | no | none | The damping factor used in the ranking calculation. |
max_iterations | integer | no | none | The maximum number of iterations the ranking runs for. |
tolerance | float | no | none | The convergence tolerance at which iteration stops. |
ranker:
kind: page_rank
params:
damping_factor: 0.85
max_iterations: 100
tolerance: 0.000001