Output formats
The available output formats and guidance on choosing among them.
The Scrape API returns a page in one or more formats per call. Specify what you want in the formats array of your request. If you omit formats, the API defaults to markdown.
Markdown
The page converted to clean markdown. Navigation, ads, cookie banners, and other page chrome are stripped. Headings, lists, code blocks, and links are preserved.
{ "kind": "markdown" }Use markdown when feeding content into an LLM, building a retrieval index, or any case where the text structure matters more than the original markup.
Skipping tags
Pass a skip_tags array to strip specific HTML elements before the markdown conversion:
{ "kind": "markdown", "skip_tags": ["nav", "footer", "aside"] }This is useful when a page has persistent elements like sidebars or related-article sections that add noise to the output.
HTML
The page's article content as cleaned HTML, without the surrounding page chrome.
{ "kind": "html" }Use HTML when you need the structural markup, want to do further DOM processing, or are passing the content to a system that can handle HTML natively.
JSON
The page content as a structured JSON representation.
{ "kind": "json" }Use JSON when you need machine-readable structure rather than a document.
Binary
The raw page bytes, base64-encoded in the HTTP response. The SDKs decode the value for you.
{ "kind": "binary" }Use binary when you need the original content before any parsing or conversion.
Requesting multiple formats
You can request more than one format in a single call. The response includes one result per format:
{
"url": "https://example.com",
"results": [
{ "kind": "markdown", "content": "..." },
{ "kind": "html", "content": "..." }
]
}A single call that requests two formats is still billed as one unit at the base price, since the page is only fetched once.