inndx/
GitHub

Sink actions

The actions that deliver parsed crawl results to their destinations.

Actions are the final step of the pipeline; they decide what happens to each parsed result. A pipeline lists actions under its actions field. This page catalogs the available kinds and their parameters.

How actions are configured

Each entry in a pipeline's actions list is an object with a kind and, for most kinds, a params object. Each result is passed to every action in the order they are listed, so a pipeline can deliver a result to more than one destination.

pipelines:
  - actions:
      - kind: to_blob
        params:
          directory: output
      - kind: label_url
        params:
          labels:
            delivered: "true"

log

Emits a log line for each result instead of delivering it anywhere. It is a development aid for confirming a pipeline produces results.

FieldTypeRequiredDefaultDescription
levelstringnoinfoThe log level: trace, debug, info, warn, or error.
eventstringnoDataExtractedThe event name on the emitted log line, useful for filtering logs.
actions:
  - kind: log
    params:
      level: info
      event: ResultProduced

to_blob

Writes each result to blob storage. The result is stored at <directory>/<key>/data, where the key is set by the key strategy.

FieldTypeRequiredDefaultDescription
directorystringnoemptyA prefix applied to every key written, grouping a job's output under one path.
key_strategyhash or 5minnohashHow the per-result key is chosen. hash keys by a hash of the URL, so a later crawl overwrites the earlier copy. 5min keys under a five-minute timestamp bucket (YYYYMMDD/HH_MM/<url-hash>), preserving history.
include_assetsbooleannotrueWhether to also write assets resolved during parsing, stored under <directory>/<key>/assets/<asset-type>/<asset-hash>.
storage_identifierstringnononeThe configured blob backend to write to. Defaults to the deployment's default backend.
actions:
  - kind: to_blob
    params:
      directory: output
      key_strategy: hash
      include_assets: true

to_file

Writes each result to a directory on the machine running the sink, using the same data and assets layout as to_blob. The destination directory is set in the sink's configuration, not in the manifest.

FieldTypeRequiredDefaultDescription
include_assetsbooleannotrueWhether to also write assets resolved during parsing.
actions:
  - kind: to_file
    params:
      include_assets: true

label_url

Writes key-value labels onto the URL's record for each result. The labels persist on the URL and can later be read by label-based filters and conditions.

FieldTypeRequiredDefaultDescription
labelsmap of string to stringyesnoneThe labels to write onto the URL.
actions:
  - kind: label_url
    params:
      labels:
        category: product
        extracted: "true"

Search docs

Search the Self-host documentation