inndx/
GitHub

Rankers

The ranker components that order URLs in the frontier.

A ranker determines the order in which queued URLs are selected for fetching. A job sets one ranker under config.ranker. This page catalogs the available kinds and their parameters.

How the ranker is configured

config.ranker is a single object with a kind and, for some kinds, a params object. There is one ranker per job. The ranker does not change which URLs are crawled, only the order in which they leave the queue, which matters most when a run is stopped before it finishes.

config:
  ranker:
    kind: breadth

breadth

Visits URLs closer to the seeds first, spreading coverage evenly across a site. This is a good default for site-wide crawls. It takes no parameters.

ranker:
  kind: breadth

depth

Follows a branch of links deep before widening, reaching distant pages sooner at the cost of even coverage. It takes no parameters.

ranker:
  kind: depth

page_rank

Orders URLs by their link importance within the crawl, prioritizing well-connected pages.

FieldTypeRequiredDefaultDescription
damping_factorfloatnononeThe damping factor used in the ranking calculation.
max_iterationsintegernononeThe maximum number of iterations the ranking runs for.
tolerancefloatnononeThe convergence tolerance at which iteration stops.
ranker:
  kind: page_rank
  params:
    damping_factor: 0.85
    max_iterations: 100
    tolerance: 0.000001

Search docs

Search the Self-host documentation