inndx
GitHub

Crawl logs

Query per-URL crawl log entries, metrics, and latency for a run.

Crawl logs record each URL's passage through the pipeline during a run. The metrics and latency endpoints aggregate those entries into buckets over time. All three endpoints are scoped to a single run by its {run_id} path segment.

These endpoints share the pagination and error conventions.

List crawl log entries

GET/v1/runs/{run_id}/logs

Returns a page of crawl log entries for a run, most recent first.

Path parametersin: path
run_id
stringrequired

The ID of the run to read crawl log entries for.

format: uuid
Query parametersin: query
page
string

Pagination cursor. See pagination.

per_page
numberdefault: 20

Maximum number of entries to return.

min: 1max: 100
hosts
string[]

Return only entries for these hosts.

steps
string[]

Return only entries recorded at these pipeline steps.

statuses
string[]

Return only entries with these outcomes.

error_reasons
string[]

Return only entries with these error reasons.

pipeline_ids
string[]

Return only entries for these pipelines.

occurred_after
string

Return only entries that occurred after this timestamp.

format: date-time
occurred_before
string

Return only entries that occurred before this timestamp.

format: date-time
Responses

The shape of each crawl log entry is:

Crawl log entry
id
string

The entry ID.

format: uuid
run_id
string

The run the entry belongs to.

format: uuid
job_id
string

The job the run belongs to.

format: uuid
url
string

The URL the entry was recorded for.

url_hash
string

A stable hash of the URL.

host
string

The host of the URL.

step
string

The pipeline step the entry was recorded at.

status
string

The outcome of the entry.

pipeline_id
string

The pipeline that processed the URL, or absent when none applies.

error_reason
string

A short reason describing a failure, or absent on success.

error_detail
string

A longer description of a failure, or absent on success.

http_status
number

The HTTP status returned for the URL, or absent when none applies.

latency_ms
number

The time the step took in milliseconds, or absent when not measured.

depth
number

The crawl depth of the URL, or absent when not tracked.

parent_url_hash
string

The hash of the URL this one was discovered from, or absent for seeds.

occurred_at
string

When the entry was recorded.

format: date-time
curl 'http://localhost:8022/v1/runs/3f1a…/logs?per_page=20' \
  -H 'X-Tenant-Id: acme'

Run metrics

GET/v1/runs/{run_id}/metrics

Returns aggregated counts bucketed over time for a run.

Path parametersin: path
run_id
stringrequired

The ID of the run to aggregate metrics for.

format: uuid
Query parametersin: query
window
stringrequired

The bucket size. One of minute, hour, day, or week.

group_by
string[]

Dimensions to group the counts by.

filters
object

Dimension name to value pairs that restrict which entries are counted.

occurred_after
string

Include only entries that occurred after this timestamp.

format: date-time
occurred_before
string

Include only entries that occurred before this timestamp.

format: date-time
Responses

curl 'http://localhost:8022/v1/runs/3f1a…/metrics?window=hour&group_by=status' \
  -H 'X-Tenant-Id: acme'

Step latency

GET/v1/runs/{run_id}/latency

Returns step latency percentiles bucketed over time for a run.

Path parametersin: path
run_id
stringrequired

The ID of the run to aggregate latency for.

format: uuid
Query parametersin: query
window
stringrequired

The bucket size. One of minute, hour, day, or week.

steps
string[]

Include only these pipeline steps.

hosts
string[]

Include only these hosts.

occurred_after
string

Include only entries that occurred after this timestamp.

format: date-time
occurred_before
string

Include only entries that occurred before this timestamp.

format: date-time
Responses

curl 'http://localhost:8022/v1/runs/3f1a…/latency?window=hour' \
  -H 'X-Tenant-Id: acme'

Search docs

Search the Self-host documentation