Guards
Early-exit checks evaluated before a parser pipeline runs.
Guards run before anything else in a pipeline and can short-circuit processing of a page. A pipeline lists guards under its guards field. This page catalogs the available kinds and their parameters.
How guards are configured
Each entry in a pipeline's guards list is an object with a kind and a params object. Guards are evaluated before the pipeline's steps run. Every guard must pass for the pipeline to proceed; if any guard fails, the pipeline is skipped for that page. This lets a pipeline cheaply opt out of pages it should not process.
pipelines:
- guards:
- kind: expression
params:
expression: "content_type.subtype == 'html'"
steps:
- kind: extractor
params:
kind: markdownexpression
Evaluates a CEL (Common Expression Language) expression and proceeds only when it returns true.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
expression | string | yes | none | A CEL expression that must evaluate to a boolean. The pipeline proceeds when it is true. |
The expression has these variables available: url (the URL broken into its parts), content_type (the response content type), redirects (the list of redirects that were followed), and run_id (the current run identifier). The expression must evaluate to a boolean; any other result is an error.
guards:
- kind: expression
params:
expression: "content_type.subtype == 'html'"