Skip to main content
Most users do not configure execution backends from the CLI. osmosis eval submit and osmosis train submit both hand execution to the Osmosis platform; the rollout entrypoint constructs the backend on the server side. This page is an SDK-level guide for users embedding the open source osmosis-ai package in custom harnesses or self-hosted experiments.
An execution backend determines where an AgentWorkflow and Grader run when you orchestrate rollouts through the SDK directly.
BackendRuns whereBest for
LocalBackendCurrent Python processFast local development, custom eval harnesses, debugging
HarborBackendHarbor-managed trial environmentsPer-trial isolation, task environments, dependency separation

Backend Responsibilities

Every backend has the same core responsibilities:
1

Receive an execution request

The request contains the input prompt for one dataset row and, if grading is enabled, the reference label.
2

Run the workflow

The backend creates an AgentWorkflowContext, installs a RolloutContext, and calls AgentWorkflow.run(ctx).
3

Collect samples

Agent integrations such as OsmosisStrandsAgent and OsmosisAgent register sample sources on the rollout context. The backend collects those samples after the workflow finishes.
4

Run the grader

If a grader and label are available, the backend creates a GraderContext and calls Grader.grade(ctx).
5

Return structured results

The backend returns ExecutionResult objects with status, samples, rewards, and any categorized error.

ExecutionBackend Interface

All backends implement this shape:
class ExecutionBackend(ABC):
    async def execute(
        self,
        request: ExecutionRequest,
        on_workflow_complete: ResultCallback,
        on_grader_complete: ResultCallback | None = None,
    ) -> None: ...

    @property
    def max_concurrency(self) -> int: ...  # 0 = no limit

    def health(self) -> dict[str, Any]: ...  # default: {"status": "ok"}
MethodDescription
execute()Runs an AgentWorkflow and optionally a Grader for one request
max_concurrencyMaximum parallel executions. 0 means no limit.
health()Returns backend health information

LocalBackend

LocalBackend executes your workflow and grader directly in the current Python process.
from osmosis_ai.rollout.backend import LocalBackend

backend = LocalBackend(
    workflow=MyWorkflow,
    workflow_config=my_workflow_config,
    grader=MyGrader,
    grader_config=my_grader_config,
)
Constructor fields:
ParameterTypeDescription
workflowclass or "module:attr" stringAgentWorkflow subclass or import path
workflow_configAgentWorkflowConfig | NoneOptional config passed to the workflow
graderclass or "module:attr" stringOptional Grader subclass or import path
grader_configGraderConfig | NoneOptional config passed to the grader
Use LocalBackend when:
  • You want the shortest debug loop.
  • You need breakpoints, stack traces, or simple print debugging.
  • Your rollout can share the current Python environment.
  • You are building a custom harness or eval runner around the SDK.
LocalBackend uses AgentWorkflowConfig.concurrency.max_concurrent to limit parallel workflow executions. If no workflow config is provided, the default concurrency is 4.

Local Error Categories

LocalBackend maps workflow and grader exceptions into structured categories:
ExceptionCategory
TimeoutErrorTIMEOUT
ValueError, TypeError, AssertionErrorVALIDATION_ERROR
Other exceptionsAGENT_ERROR

HarborBackend

HarborBackend runs workflows inside Harbor-managed trial environments. For platform-facing Harbor rollouts, use the Daytona-backed path used by the workspace template; the managed platform does not currently support Docker-backed Harbor execution.
Use the Daytona-backed Harbor template only when you need Harbor-specific execution for platform training or SDK-level experiments. The managed platform does not currently support Docker-backed Harbor execution.
from pathlib import Path

from osmosis_ai.rollout.backend import HarborBackend

backend = HarborBackend(
    orchestrator=trial_queue,
    task_dir=Path("tasks/my-task"),
    user_code_dir=Path("."),
    workflow="rollouts.my_rollout.main:MyWorkflow",
    grader="rollouts.my_rollout.main:MyGrader",
)
Key constructor fields:
ParameterDescription
orchestratorHarbor TrialQueue used to run trials
task_dirTask directory used to build the Harbor environment
user_code_dirDirectory containing rollout code to copy into the Harbor workspace
workflowAgentWorkflow class or import path
workflow_configOptional workflow config class/object import path
graderOptional Grader class or import path
grader_configOptional grader config class/object import path
environment_configOptional Harbor environment configuration; platform-facing Harbor rollouts should use Daytona
prebuild_local_imageLocal Docker runtime option for SDK experiments; not supported by the managed platform
cleanup_successful_trialsWhether to remove successful trial artifacts after completion
HarborBackend can shell out to Docker only when you explicitly use the local Docker runtime in an SDK harness. Docker-backed Harbor execution is not currently supported by the managed Osmosis platform; use Daytona for Harbor rollouts that are meant to run on the platform.

Harbor Runtime Notes

HarborBackend runs the workflow inside the Harbor-managed trial environment, but the workflow still receives the standard AgentWorkflowContext:
from osmosis_ai.rollout import AgentWorkflow, AgentWorkflowContext


class HarborWorkflow(AgentWorkflow):
    async def run(self, ctx: AgentWorkflowContext) -> None:
        # This code is already running inside the Harbor trial environment.
        ...
Do not expect a ctx.environment object in rollout code. If your workflow needs files, tools, or processes inside the trial environment, package them in the Harbor task environment and access them through normal Python code from within run().

Comparison

DimensionLocalBackendHarborBackend
Execution environmentCurrent Python processHarbor trial environment
IsolationShared process and filesystemIsolated process and filesystem per trial
Startup costMinimalPrepares Harbor task environment
DependenciesCurrent Python environmentHarbor task environment
DebuggingDirect debugger, stack traces, print outputHarbor logs and trial artifacts
Typical userSDK harness author, eval toolingSelf-hosted experiments needing isolation

Choosing a Backend

Start with LocalBackend unless you know you need Harbor isolation. It is faster to debug, has fewer moving parts, and matches the default local starter templates used for smoke tests.
Reach for HarborBackend when:
  • Untrusted or messy rollout code should not share the host process.
  • Tools write files or spawn processes that should be isolated per trial.
  • You need a reproducible task environment per rollout execution.
  • You are experimenting with Harbor outside the managed Osmosis training path.

Relationship to Eval and Training

osmosis eval submit and osmosis train submit both run the rollout server-side and do not expose these SDK backends as user-facing options. The entrypoint constructs the backend on the platform when the rollout server starts. These SDK-level backends only matter when you embed osmosis-ai in your own harness.

Next Steps

Evaluation

Submit an evaluation run before a training run.

Building AgentWorkflows

Learn how workflows register samples through supported integrations.

Strands Integration

Use Strands Agents inside an Osmosis workflow.

OpenAI Agents Integration

Use the OpenAI Agents SDK inside an Osmosis workflow.