Crawl logs
Query per-URL crawl log entries, metrics, and latency for a run.
Crawl logs record each URL's passage through the pipeline during a run. The metrics and latency endpoints aggregate those entries into buckets over time. All three endpoints are scoped to a single run by its {run_id} path segment.
These endpoints share the pagination and error conventions.
List crawl log entries
/v1/runs/{run_id}/logsReturns a page of crawl log entries for a run, most recent first.
- run_id stringrequired
The ID of the run to read crawl log entries for.
format: uuid
- page string
Pagination cursor. See pagination.
- per_page numberdefault: 20
Maximum number of entries to return.
min: 1max: 100- hosts string[]
Return only entries for these hosts.
- steps string[]
Return only entries recorded at these pipeline steps.
- statuses string[]
Return only entries with these outcomes.
- error_reasons string[]
Return only entries with these error reasons.
- pipeline_ids string[]
Return only entries for these pipelines.
- occurred_after string
Return only entries that occurred after this timestamp.
format: date-time- occurred_before string
Return only entries that occurred before this timestamp.
format: date-time
The shape of each crawl log entry is:
- id string
The entry ID.
format: uuid- run_id string
The run the entry belongs to.
format: uuid- job_id string
The job the run belongs to.
format: uuid- url string
The URL the entry was recorded for.
- url_hash string
A stable hash of the URL.
- host string
The host of the URL.
- step string
The pipeline step the entry was recorded at.
- status string
The outcome of the entry.
- pipeline_id string
The pipeline that processed the URL, or absent when none applies.
- error_reason string
A short reason describing a failure, or absent on success.
- error_detail string
A longer description of a failure, or absent on success.
- http_status number
The HTTP status returned for the URL, or absent when none applies.
- latency_ms number
The time the step took in milliseconds, or absent when not measured.
- depth number
The crawl depth of the URL, or absent when not tracked.
- parent_url_hash string
The hash of the URL this one was discovered from, or absent for seeds.
- occurred_at string
When the entry was recorded.
format: date-time
curl 'http://localhost:8022/v1/runs/3f1a…/logs?per_page=20' \
-H 'X-Tenant-Id: acme'Run metrics
/v1/runs/{run_id}/metricsReturns aggregated counts bucketed over time for a run.
- run_id stringrequired
The ID of the run to aggregate metrics for.
format: uuid
- window stringrequired
The bucket size. One of
minute,hour,day, orweek.- group_by string[]
Dimensions to group the counts by.
- filters object
Dimension name to value pairs that restrict which entries are counted.
- occurred_after string
Include only entries that occurred after this timestamp.
format: date-time- occurred_before string
Include only entries that occurred before this timestamp.
format: date-time
curl 'http://localhost:8022/v1/runs/3f1a…/metrics?window=hour&group_by=status' \
-H 'X-Tenant-Id: acme'Step latency
/v1/runs/{run_id}/latencyReturns step latency percentiles bucketed over time for a run.
- run_id stringrequired
The ID of the run to aggregate latency for.
format: uuid
- window stringrequired
The bucket size. One of
minute,hour,day, orweek.- steps string[]
Include only these pipeline steps.
- hosts string[]
Include only these hosts.
- occurred_after string
Include only entries that occurred after this timestamp.
format: date-time- occurred_before string
Include only entries that occurred before this timestamp.
format: date-time
curl 'http://localhost:8022/v1/runs/3f1a…/latency?window=hour' \
-H 'X-Tenant-Id: acme'