Execution Backends

When you submit a training run with osmosis train submit, Osmosis provisions and runs your rollout for you — you don’t pick or configure a backend. This page is an SDK-level reference for the open source osmosis-ai package, useful when you want to drive rollouts yourself (e.g. custom local harnesses or self-hosted experimentation).

An execution backend determines where an AgentWorkflow and Grader run when you orchestrate them through the SDK directly. You can execute everything in the current Python process for simplicity, or isolate each execution inside a Docker container.

ExecutionBackend Interface

All backends implement the ExecutionBackend abstract class:

class ExecutionBackend(ABC):
    async def execute(
        self,
        request: ExecutionRequest,
        on_workflow_complete: ResultCallback,
        on_grader_complete: ResultCallback | None = None,
    ) -> None: ...

    @property
    def max_concurrency(self) -> int: ...  # 0 = no limit

    def health(self) -> dict[str, Any]: ...  # default: {"status": "ok"}

Method	Description
`execute()`	Runs an AgentWorkflow (and optionally a Grader) for a single request
`max_concurrency`	Maximum parallel executions. `0` means no limit.
`health()`	Returns backend health status. Default implementation returns `{"status": "ok"}`.

LocalBackend

Executes your AgentWorkflow and Grader directly in the current Python process. No additional infrastructure required.

from osmosis_ai.rollout.backend import LocalBackend

backend = LocalBackend(
    workflow=MyWorkflow,
    grader=MyGrader,
)

Constructor parameters:

Parameter	Type	Description
`workflow`	class or `"module:attr"` string	Your `AgentWorkflow` subclass or a colon-separated import path (e.g. `"my_module:MyWorkflow"`)
`workflow_config`	`AgentWorkflowConfig	None`	Optional config passed to the workflow
`grader`	class or `"module:attr"` string	Your `Grader` subclass or a colon-separated import path
`grader_config`	`GraderConfig	None`	Optional config passed to the grader

Execution flow:

Receive request

The backend receives an ExecutionRequest containing the input prompt and reference answer for one dataset row.

Run workflow

Creates an AgentWorkflowContext, establishes a RolloutContext, and calls workflow.run(ctx).

Collect samples

Collects all RolloutSample objects from the RolloutContext after the workflow completes.

Run grader

Creates a GraderContext with the collected samples and that row’s reference answer, then calls grader.grade(ctx).

Return results

Returns an ExecutionResult containing the samples and their assigned rewards.

Concurrency is controlled via AgentWorkflowConfig.concurrency.max_concurrent (default: 4). The backend uses a ConcurrencyLimiter to cap parallel workflow executions. Error categorization: The LocalBackend maps exceptions to structured error types — TimeoutError becomes TIMEOUT, ValueError/TypeError become VALIDATION_ERROR, and all other exceptions become AGENT_ERROR.

HarborBackend

Executes your AgentWorkflow and Grader inside Docker containers via the harbor package. Each execution gets its own isolated container with an independent filesystem and process space.

from osmosis_ai.rollout.backend import HarborBackend

backend = HarborBackend(
    orchestrator=orchestrator,
    task_dir="/path/to/task",
    user_code_dir="/path/to/code",
    workflow=MyWorkflow,
    grader=MyGrader,
)

Key characteristics:

Pre-builds a Docker image at construction time
Each execution runs in a fresh container
HarborAgentWorkflowContext extends AgentWorkflowContext with an environment object for container interaction — environment.exec(), environment.upload_file(), etc.

HarborBackend shells out to Docker when it is constructed, so Docker must be installed and running on whichever host instantiates it. If you are submitting a training run through osmosis train submit, this does not apply — the platform manages execution for you.

Comparison

	LocalBackend	HarborBackend
Execution environment	Current Python process	Docker container
Isolation	None — shares process memory	Full — isolated filesystem and process
Startup speed	Instant	Requires Docker image build
Dependency management	Shared Python environment	Independent per container
Debugging	Direct — breakpoints, print statements	Container logs, trace files
Best for	Development, eval, simple deployments	Production, complex agents

Choosing a Backend

When driving the SDK directly, start with LocalBackend — it has no infrastructure dependencies and is easiest to debug. Reach for HarborBackend only when you need filesystem/process isolation or want each rollout to manage its own Python dependencies.

osmosis eval run uses LocalBackend internally. If you embed osmosis-ai in your own harness, you can wire up either backend yourself.

Next Steps

Local Evaluation

Evaluate your rollout locally with osmosis eval run, or use it as a pre-training smoke test.

Strands Integration

Use the AWS Strands agent framework with Osmosis for training.

CLI

Workspace

Rollout

Execution Backends

ExecutionBackend Interface

LocalBackend

HarborBackend

Comparison

Choosing a Backend

Next Steps

Local Evaluation

Strands Integration

CLI

Workspace

Rollout

Documentation Index

​ExecutionBackend Interface

​LocalBackend

​HarborBackend

​Comparison

​Choosing a Backend

​Next Steps

Local Evaluation

Strands Integration

ExecutionBackend Interface

LocalBackend

HarborBackend

Comparison

Choosing a Backend

Next Steps