Resolvers
Pipeline components that resolve referenced asset URLs for downloading.
Resolvers locate assets referenced by a page (such as images) so they can be downloaded. They are used by the asset_resolver step. This page catalogs the available kinds and their parameters.
How resolvers are configured
An asset_resolver step holds a list of resolvers and the settings that control how the resolved assets are downloaded. Each entry in resolvers is an object with a kind and a params object.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
resolvers | list of resolvers | no | empty | The resolvers that locate asset URLs. |
concurrency | integer | no | 10 | How many assets to download in parallel. |
chunk_size | integer | no | 1048576 | The download chunk size in bytes. |
max_retries | integer | no | 3 | The maximum retry attempts for a failed asset download. |
timeout | duration | no | 10s | The per-asset download timeout. |
steps:
- kind: asset_resolver
params:
concurrency: 4
timeout: 10s
max_retries: 3
resolvers:
- kind: xpath
params:
xpaths:
- //body/img/@src
max_items: 3xpath
Resolves asset URLs from the page using XPath expressions.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
xpaths | list of strings | no | empty | XPath expressions selecting asset URLs. At least one is required when the list is set. |
max_items | integer | no | 5 | The maximum number of assets to resolve per page. |
resolvers:
- kind: xpath
params:
xpaths:
- //body/img/@src
max_items: 3