inndx/
GitHub

Deployment modes

Single-process and distributed deployment, and what differs between them.

inndx can run as a single process or as a set of independently deployed services. The crawl behavior is identical in both modes; what changes is how the stages are hosted and what infrastructure they use. This page explains the difference so you can choose the right shape for your situation.

Single process

In single-process mode, all five services (orchestrator, fetcher, parser, sink, and analytics) run together inside one process. Infrastructure that would otherwise be external, such as the message broker and cache, runs in memory within that process.

You start this mode with the dev command:

inndx dev

Or you can run a crawl directly from a manifest file without standing up a server at all:

inndx run my-crawl.yml

Single-process mode is intended for evaluation, local testing, and trying out new crawl configurations quickly. It is not suitable for production because its in-process infrastructure does not persist across restarts and does not scale.

Distributed

In distributed mode, each service runs as its own process, typically in separate containers or pods. They coordinate through shared external infrastructure: a message broker carries work between stages, a database holds crawl state, blob storage holds fetched content and results, and a cache is shared across services.

You start each service individually using its own subcommand:

inndx orchestrator
inndx fetcher
inndx parser
inndx sink
inndx analytics

Because each stage is independent, you can scale them separately. If fetching is your bottleneck, you can run more fetcher instances. If parsing is slow, you add parser capacity. Each service scales horizontally without affecting the others.

What changes between modes

The stages themselves are the same. The crawl logic, the components you configure in your manifest, and the HTTP APIs all behave identically. What differs is only the infrastructure configuration: which broker, database, cache, and blob storage backends are in use, and how they are addressed.

In single-process mode those are lightweight in-process implementations. In distributed mode they are real external services you provision. Switching from one mode to the other is a matter of configuration, not a change to the crawl itself.

Single process Distributed Orchestrator, Fetcher, Parser, Sink, Analytics In-process infrastructure Orchestrator Fetcher Parser Sink Analytics External infrastructure

Choosing a mode

Use single-process mode when you are evaluating inndx, building and testing a new crawl configuration, or running a one-off job in a constrained environment.

Use distributed mode for production workloads where you need durability, horizontal scaling, and the ability to operate each stage independently.

Production deployment

The supported path for a production distributed deployment, including infrastructure requirements, sizing guidance, and the Kubernetes install, is covered in your enterprise onboarding materials.

Search docs

Search the Self-host documentation