inndx/
GitHub

Seeds

The seed components that determine where a crawl begins.

Seeds provide the initial set of URLs a crawl starts from. A job lists one or more seeds under config.seeds, each selected by kind. This page catalogs the available kinds and their parameters.

How seeds are configured

Each entry in config.seeds is an object with a kind and, for most kinds, a params object. A job may list several seeds, and their URLs combine into the starting set:

config:
  seeds:
    - kind: static_list
      params:
        urls:
          - https://example.com/
    - kind: sitemap
      params:
        urls:
          - https://example.com/sitemap.xml

static_list

A fixed list of URLs supplied directly in the manifest.

FieldTypeRequiredDefaultDescription
urlslist of stringsyesnoneThe URLs to start the crawl from.
seeds:
  - kind: static_list
    params:
      urls:
        - https://example.com/docs
        - https://example.com/blog

sitemap

Seeds from the URLs listed in one or more sitemaps.

FieldTypeRequiredDefaultDescription
urlslist of stringsnononeSitemap URLs to read.
limitintegernononeMaximum number of URLs to take from the sitemaps.
concurrencyintegernononeHow many sitemaps to fetch in parallel.
seeds:
  - kind: sitemap
    params:
      urls:
        - https://example.com/sitemap.xml
      limit: 500

host_labels

Seeds from hosts that carry the given labels. Labels are key-value tags previously attached to host records, so this kind crawls whichever known hosts match a tag.

FieldTypeRequiredDefaultDescription
labelsmap of string to stringyesnoneLabels a host must carry to be selected.
limitintegernononeMaximum number of hosts to seed from.
seeds:
  - kind: host_labels
    params:
      labels:
        tier: priority
      limit: 100

host_labels_sitemap

Selects hosts by label, then seeds from each selected host's sitemap. It combines host_labels selection with sitemap seeding.

FieldTypeRequiredDefaultDescription
labelsmap of string to stringnononeLabels a host must carry to be selected.
host_limitintegernononeMaximum number of hosts to select.
link_limitintegernononeMaximum number of URLs to take per host sitemap.
concurrencyintegernononeHow many host sitemaps to fetch in parallel.
seeds:
  - kind: host_labels_sitemap
    params:
      labels:
        tier: priority
      host_limit: 50
      link_limit: 200

Search docs

Search the Self-host documentation