Stopping criteria
The components that decide when a crawl run is finished.
Stopping criteria tell the orchestrator when to end a run. A job lists them under config.stopping_criteria. This page catalogs the available kinds and their parameters.
How stopping criteria are configured
Each entry in config.stopping_criteria is an object with a kind and a params object. A run stops as soon as any one of the listed criteria is met, so listing several sets multiple independent limits.
config:
stopping_criteria:
- kind: max_urls
params:
max_urls: 1000
- kind: max_age
params:
max_age: 6hmax_urls
Stops the run after a maximum number of URLs has been processed.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_urls | integer | yes | none | The number of URLs after which the run stops. |
stopping_criteria:
- kind: max_urls
params:
max_urls: 1000max_depth
Stops the run once the crawl reaches a maximum link distance from the seeds.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_depth | integer | yes | none | The depth, counted in links from a seed, at which the run stops. |
stopping_criteria:
- kind: max_depth
params:
max_depth: 5max_age
Stops the run after a maximum wall-clock duration since it started.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_age | duration | yes | none | The run duration after which it stops. Durations are written like 30m, 6h. |
stopping_criteria:
- kind: max_age
params:
max_age: 6hmax_empty_evaluations
Stops the run after a number of consecutive scheduling evaluations have yielded no new work, which indicates the crawl has run out of URLs to visit.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
max_empty_evaluations | integer | yes | none | The number of consecutive empty evaluations after which the run stops. |
stopping_criteria:
- kind: max_empty_evaluations
params:
max_empty_evaluations: 10