inndx
GitHub

Links

Read the URLs a crawl has seen, their labels, and the link graph of a run.

A link is a URL the crawl has encountered. Each link is addressed by its hash, a hex-encoded value used as a path segment and as an identifier.

These endpoints share the pagination and error conventions.

GET/v1/links

Returns a page of links the crawl has seen.

Query parametersin: query
page
string

Pagination cursor. See pagination.

per_page
numberdefault: 10

Maximum number of links to return.

min: 1max: 100
url_hashes
string[]

Return only links with these hashes.

urls
string[]

Return only links with these exact URLs.

host_hashes
string[]

Return only links whose host has one of these hashes.

hosts
string[]

Return only links whose host is one of these domains.

labels
object

Return only links whose labels match every supplied key and value.

Responses

curl 'http://localhost:8022/v1/links?per_page=20' \
  -H 'X-Tenant-Id: acme'
PUT/v1/links

Inserts or updates a batch of links by URL.

Request bodyin: body
urls
string[]required

The URLs to insert or update.

min: 1max: 255
Responses

curl -X PUT 'http://localhost:8022/v1/links' \
  -H 'X-Tenant-Id: acme' \
  -H 'Content-Type: application/json' \
  -d '{ "urls": ["https://example.com/", "https://example.com/about"] }'
GET/v1/links/{link_hash}

Returns a single link by its hash. The {link_hash} path segment is the URL's hash.

Responses

curl 'http://localhost:8022/v1/links/7c4b…' \
  -H 'X-Tenant-Id: acme'
GET/v1/links/{link_hash}/labels

Returns the labels for a link. The {link_hash} path segment is the URL's hash.

Responses

curl 'http://localhost:8022/v1/links/7c4b…/labels' \
  -H 'X-Tenant-Id: acme'
PUT/v1/links/{link_hash}/labels

Merges the supplied labels into a link's labels, replacing the value of any key that already exists.

Request bodyin: body
labels
objectrequired

String key and value pairs to set on the link.

Responses

curl -X PUT 'http://localhost:8022/v1/links/7c4b…/labels' \
  -H 'X-Tenant-Id: acme' \
  -H 'Content-Type: application/json' \
  -d '{ "labels": { "section": "docs" } }'
DELETE/v1/links/{link_hash}/labels

Deletes the named label keys from a link.

Query parametersin: query
keys
string[]

The label keys to delete.

Responses

curl -X DELETE 'http://localhost:8022/v1/links/7c4b…/labels?keys=section' \
  -H 'X-Tenant-Id: acme'
GET/v1/runs/{run_id}/links

Returns a page of links visited during a run. The {run_id} path segment is the run's UUID.

Query parametersin: query
page
string

Pagination cursor. See pagination.

per_page
numberdefault: 10

Maximum number of visited links to return.

min: 1max: 100
depth
number[]

Return only links visited at these crawl depths.

visited_before
string

Return only links visited before this timestamp.

format: date-time
visited_after
string

Return only links visited after this timestamp.

format: date-time
url_hashes
string[]

Return only links with these hashes.

urls
string[]

Return only links with these exact URLs.

host_hashes
string[]

Return only links whose host has one of these hashes.

hosts
string[]

Return only links whose host is one of these domains.

Responses

curl 'http://localhost:8022/v1/runs/3f1a…/links?per_page=20' \
  -H 'X-Tenant-Id: acme'

List a run's edges

GET/v1/runs/{run_id}/edges

Returns a page of parent and child link edges, the link graph discovered during a run. The {run_id} path segment is the run's UUID.

Query parametersin: query
page
string

Pagination cursor. See pagination.

per_page
numberdefault: 10

Maximum number of edges to return.

min: 1max: 100
url_hashes
string[]

Return only edges touching links with these hashes.

urls
string[]

Return only edges touching these exact URLs.

host_hashes
string[]

Return only edges touching links whose host has one of these hashes.

hosts
string[]

Return only edges touching links whose host is one of these domains.

Responses

curl 'http://localhost:8022/v1/runs/3f1a…/edges?per_page=20' \
  -H 'X-Tenant-Id: acme'

Search docs

Search the Self-host documentation