Sink actions
The actions that deliver parsed crawl results to their destinations.
Actions are the final step of the pipeline; they decide what happens to each parsed result. A pipeline lists actions under its actions field. This page catalogs the available kinds and their parameters.
How actions are configured
Each entry in a pipeline's actions list is an object with a kind and, for most kinds, a params object. Each result is passed to every action in the order they are listed, so a pipeline can deliver a result to more than one destination.
pipelines:
- actions:
- kind: to_blob
params:
directory: output
- kind: label_url
params:
labels:
delivered: "true"log
Emits a log line for each result instead of delivering it anywhere. It is a development aid for confirming a pipeline produces results.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
level | string | no | info | The log level: trace, debug, info, warn, or error. |
event | string | no | DataExtracted | The event name on the emitted log line, useful for filtering logs. |
actions:
- kind: log
params:
level: info
event: ResultProducedto_blob
Writes each result to blob storage. The result is stored at <directory>/<key>/data, where the key is set by the key strategy.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
directory | string | no | empty | A prefix applied to every key written, grouping a job's output under one path. |
key_strategy | hash or 5min | no | hash | How the per-result key is chosen. hash keys by a hash of the URL, so a later crawl overwrites the earlier copy. 5min keys under a five-minute timestamp bucket (YYYYMMDD/HH_MM/<url-hash>), preserving history. |
include_assets | boolean | no | true | Whether to also write assets resolved during parsing, stored under <directory>/<key>/assets/<asset-type>/<asset-hash>. |
storage_identifier | string | no | none | The configured blob backend to write to. Defaults to the deployment's default backend. |
actions:
- kind: to_blob
params:
directory: output
key_strategy: hash
include_assets: trueto_file
Writes each result to a directory on the machine running the sink, using the same data and assets layout as to_blob. The destination directory is set in the sink's configuration, not in the manifest.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
include_assets | boolean | no | true | Whether to also write assets resolved during parsing. |
actions:
- kind: to_file
params:
include_assets: truelabel_url
Writes key-value labels onto the URL's record for each result. The labels persist on the URL and can later be read by label-based filters and conditions.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
labels | map of string to string | yes | none | The labels to write onto the URL. |
actions:
- kind: label_url
params:
labels:
category: product
extracted: "true"