inndx/
GitHub

Stopping criteria

The components that decide when a crawl run is finished.

Stopping criteria tell the orchestrator when to end a run. A job lists them under config.stopping_criteria. This page catalogs the available kinds and their parameters.

How stopping criteria are configured

Each entry in config.stopping_criteria is an object with a kind and a params object. A run stops as soon as any one of the listed criteria is met, so listing several sets multiple independent limits.

config:
  stopping_criteria:
    - kind: max_urls
      params:
        max_urls: 1000
    - kind: max_age
      params:
        max_age: 6h

max_urls

Stops the run after a maximum number of URLs has been processed.

FieldTypeRequiredDefaultDescription
max_urlsintegeryesnoneThe number of URLs after which the run stops.
stopping_criteria:
  - kind: max_urls
    params:
      max_urls: 1000

max_depth

Stops the run once the crawl reaches a maximum link distance from the seeds.

FieldTypeRequiredDefaultDescription
max_depthintegeryesnoneThe depth, counted in links from a seed, at which the run stops.
stopping_criteria:
  - kind: max_depth
    params:
      max_depth: 5

max_age

Stops the run after a maximum wall-clock duration since it started.

FieldTypeRequiredDefaultDescription
max_agedurationyesnoneThe run duration after which it stops. Durations are written like 30m, 6h.
stopping_criteria:
  - kind: max_age
    params:
      max_age: 6h

max_empty_evaluations

Stops the run after a number of consecutive scheduling evaluations have yielded no new work, which indicates the crawl has run out of URLs to visit.

FieldTypeRequiredDefaultDescription
max_empty_evaluationsintegeryesnoneThe number of consecutive empty evaluations after which the run stops.
stopping_criteria:
  - kind: max_empty_evaluations
    params:
      max_empty_evaluations: 10

Search docs

Search the Self-host documentation