Parse

Run extraction pipelines against supplied content.

The parse endpoints run one or more pipelines over content you supply, so you can test extraction without crawling.

These endpoints share the pagination and error conventions.

Parse a document

POST/v1/parse

Runs the supplied pipelines over a single document and returns the results.

Request bodyin: body

url: The URL the content came from.
format: uri
body: The base64-encoded content to parse.
content_type: The MIME type of the content.
redirects: The redirects that were followed to reach the content. Each entry has url, location, side ("server" or "client"), and type ("permanent" or "temporary").
pipelines: The pipelines to evaluate against the content. Each pipeline has identifier (string, default "default"), optional guards, an optional navigator, steps (defaults to a single extractor), an optional priority, and behavior (string, default "continue"). See the parser components and the crawl manifest for the full shape of guards, navigators, and steps.

Responses

curl -X POST 'http://localhost:8022/v1/parse' \
  -H 'X-Tenant-Id: acme' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com/article",
    "body": "PGh0bWw+Li4uPC9odG1sPg==",
    "content_type": "text/html"
  }'

Parse a batch

POST/v1/parse/batch

Runs pipelines over multiple documents in a single request. Per-item failures are returned as error items rather than failing the whole request.

Request bodyin: body

items: The documents to parse. Each item has the same shape as the single parse request body.
min: 1max: 64

Responses

curl -X POST 'http://localhost:8022/v1/parse/batch' \
  -H 'X-Tenant-Id: acme' \
  -H 'Content-Type: application/json' \
  -d '{
    "items": [
      {
        "url": "https://example.com/article",
        "body": "PGh0bWw+Li4uPC9odG1sPg==",
        "content_type": "text/html"
      }
    ]
  }'

Parse

Parse a document

200The parse results.

422The request body could not be processed.

Parse a batch

200The batch results.

422The request body failed validation.

Search docs