Links
Read the URLs a crawl has seen, their labels, and the link graph of a run.
A link is a URL the crawl has encountered. Each link is addressed by its hash, a hex-encoded value used as a path segment and as an identifier.
These endpoints share the pagination and error conventions.
List links
/v1/linksReturns a page of links the crawl has seen.
- page string
Pagination cursor. See pagination.
- per_page numberdefault: 10
Maximum number of links to return.
min: 1max: 100- url_hashes string[]
Return only links with these hashes.
- urls string[]
Return only links with these exact URLs.
- host_hashes string[]
Return only links whose host has one of these hashes.
- hosts string[]
Return only links whose host is one of these domains.
- labels object
Return only links whose labels match every supplied key and value.
curl 'http://localhost:8022/v1/links?per_page=20' \
-H 'X-Tenant-Id: acme'Upsert links
/v1/linksInserts or updates a batch of links by URL.
- urls string[]required
The URLs to insert or update.
min: 1max: 255
curl -X PUT 'http://localhost:8022/v1/links' \
-H 'X-Tenant-Id: acme' \
-H 'Content-Type: application/json' \
-d '{ "urls": ["https://example.com/", "https://example.com/about"] }'Get a link
/v1/links/{link_hash}Returns a single link by its hash. The {link_hash} path segment is the URL's
hash.
curl 'http://localhost:8022/v1/links/7c4b…' \
-H 'X-Tenant-Id: acme'Get a link's labels
/v1/links/{link_hash}/labelsReturns the labels for a link. The {link_hash} path segment is the URL's
hash.
curl 'http://localhost:8022/v1/links/7c4b…/labels' \
-H 'X-Tenant-Id: acme'Set a link's labels
/v1/links/{link_hash}/labelsMerges the supplied labels into a link's labels, replacing the value of any key that already exists.
- labels objectrequired
String key and value pairs to set on the link.
curl -X PUT 'http://localhost:8022/v1/links/7c4b…/labels' \
-H 'X-Tenant-Id: acme' \
-H 'Content-Type: application/json' \
-d '{ "labels": { "section": "docs" } }'Delete a link's labels
/v1/links/{link_hash}/labelsDeletes the named label keys from a link.
- keys string[]
The label keys to delete.
curl -X DELETE 'http://localhost:8022/v1/links/7c4b…/labels?keys=section' \
-H 'X-Tenant-Id: acme'List a run's visited links
/v1/runs/{run_id}/linksReturns a page of links visited during a run. The {run_id} path segment is
the run's UUID.
- page string
Pagination cursor. See pagination.
- per_page numberdefault: 10
Maximum number of visited links to return.
min: 1max: 100- depth number[]
Return only links visited at these crawl depths.
- visited_before string
Return only links visited before this timestamp.
format: date-time- visited_after string
Return only links visited after this timestamp.
format: date-time- url_hashes string[]
Return only links with these hashes.
- urls string[]
Return only links with these exact URLs.
- host_hashes string[]
Return only links whose host has one of these hashes.
- hosts string[]
Return only links whose host is one of these domains.
curl 'http://localhost:8022/v1/runs/3f1a…/links?per_page=20' \
-H 'X-Tenant-Id: acme'List a run's edges
/v1/runs/{run_id}/edgesReturns a page of parent and child link edges, the link graph discovered during
a run. The {run_id} path segment is the run's UUID.
- page string
Pagination cursor. See pagination.
- per_page numberdefault: 10
Maximum number of edges to return.
min: 1max: 100- url_hashes string[]
Return only edges touching links with these hashes.
- urls string[]
Return only edges touching these exact URLs.
- host_hashes string[]
Return only edges touching links whose host has one of these hashes.
- hosts string[]
Return only edges touching links whose host is one of these domains.
curl 'http://localhost:8022/v1/runs/3f1a…/edges?per_page=20' \
-H 'X-Tenant-Id: acme'