Skip to main content
A rollout is the unit of agent behavior that Osmosis evaluates during reinforcement learning training. It combines an AgentWorkflow, which produces one or more samples for a prompt, with a Grader, which turns those samples into reward signals. Rollouts are normal Python code in your workspace. They can use a simple single-call LLM workflow, a tool-using agent built with Strands Agents, an OpenAI Agents SDK workflow, or a custom harness that you drive through the open source osmosis-ai SDK.

Training Loop

Training on Osmosis repeatedly runs the same four-part loop:
1

Select a dataset row

The training cluster selects one row from your dataset and sends its prompt fields to your AgentWorkflow. Common datasets contain system_prompt, user_prompt, and ground_truth.
2

Run the AgentWorkflow

Your workflow receives an AgentWorkflowContext, calls the current policy through an Osmosis-supported agent integration, uses any tools you provide, and records rollout samples.
3

Grade the samples

Your Grader receives the collected samples plus the row’s reference answer (ground_truth, exposed as ctx.label) and assigns a numerical reward to each sample.
4

Update the model

The reward signal drives the training update, moving the policy toward behavior that receives higher rewards on your task.
This loop is why rollout code must route model calls through Osmosis integrations. The training cluster needs to serve the current policy, attach rollout metadata, collect traces, and connect rewards back to the samples that produced them.

Files in a Rollout

Each rollout lives under rollouts/ and is referenced by evaluation and training configs:
repository/
├── rollouts/
│   └── my-rollout/
│       ├── main.py
│       └── pyproject.toml
├── configs/
│   ├── eval/
│   │   └── my-rollout.toml
│   └── training/
│       └── my-rollout.toml
└── data/
    └── test.jsonl
FilePurpose
rollouts/my-rollout/main.pyDefines exactly one concrete AgentWorkflow and one concrete Grader for training and evaluation
rollouts/my-rollout/pyproject.tomlDeclares rollout-local Python dependencies
configs/eval/my-rollout.tomlPoints the evaluation run at the rollout, entrypoint, evaluation policy model, and platform dataset
configs/training/my-rollout.tomlPoints the training run at the rollout code version and training settings
osmosis train submit and osmosis eval submit both discover rollout classes from the entrypoint file and run the rollout server-side. Keep helper classes, tools, and config objects wherever you like, but expose the concrete AgentWorkflow and Grader classes that the platform should validate.

Core Abstractions

AbstractionWhat it doesWhere to learn more
AgentWorkflowDefines agent behavior: prompt handling, model calls, tool use, and sample creationBuilding AgentWorkflows
GraderDefines reward logic: exact matching, programmatic checks, LLM-as-judge, or custom scoringBuilding Graders
Agent integrationConnects your agent framework to the active Osmosis rollout contextStrands Integration, OpenAI Agents Integration
Execution backendRuns rollout code in-process or in a Harbor-managed environment when you drive the SDK yourselfExecution Backends

Choose an Agent Framework

Most rollout authors start with one of the built-in agent integrations:

Strands Agents

Use OsmosisStrandsAgent when you want Strands tools, Strands message handling, and a direct migration path from an existing Strands Agent.

OpenAI Agents

Use OsmosisAgent when your workflow already uses the OpenAI Agents SDK, Runner.run, sessions, handoffs, or OpenAI-style tool orchestration.
Both integrations use an OsmosisRolloutModel placeholder. You do not hard-code the training model inside rollout code; Osmosis resolves the placeholder to the current policy at runtime.
Do not call provider SDKs directly from AgentWorkflow.run() with a fixed model such as openai/gpt-5.2. Direct calls bypass the active RolloutContext, so the platform cannot route policy requests, collect samples, or connect rewards to the right rollout.

Choose an Execution Backend

If you use osmosis eval submit or osmosis train submit, the platform manages execution and you do not choose a backend from the CLI. The entrypoint decides which SDK backend to construct when the rollout server starts, and the starter templates use LocalBackend unless you choose the Harbor template.
If you build on the Harbor template for platform training, use the Daytona-backed Harbor path; the managed platform does not currently support Docker-backed Harbor execution.
You only choose a backend explicitly when embedding the open source SDK in your own harness:
BackendUse when
LocalBackendYou want fast in-process execution, easy debugging, and no Docker dependency
HarborBackendYou need Harbor-managed per-trial isolation and are using the Daytona-backed path required by the managed platform
See Execution Backends for SDK-level examples and tradeoffs.

Start from a Template

If you already have a task or dataset, start with Create Your Own Rollout. Platform-created workspace repositories include project-local Agent Skills that guide an AI coding agent through dataset planning, rollout creation, evaluation runs, debugging, and training run readiness. List available starter templates:
osmosis template list
Apply a Strands starter:
osmosis template apply multiply-local-strands
Or apply an OpenAI Agents starter:
osmosis template apply multiply-local-openai
Templates are copied from the platform workspace template repository. They write rollout code under rollouts/ plus matching evaluation and training configs, and are the quickest way to see the expected file layout, dependency declaration, and end-to-end workflow.

Path to Training

Once your rollout exists, use this path:
1

Implement the workflow

Put policy calls behind OsmosisStrandsAgent or OsmosisAgent, pass the dataset prompt from ctx.prompt, and keep any task-specific tools close to the rollout.
2

Implement the grader

Score every sample in ctx.samples. Start with a deterministic grader when possible, then add LLM-as-judge logic only when the task is subjective.
3

Commit and sync

Push rollout changes to the default branch so Git Sync publishes the code version used by evaluation runs and training runs.
4

Submit an evaluation run

Run osmosis eval submit configs/eval/my-rollout.toml and inspect rewards, failures, and per-row results with osmosis eval info <name>. See Evaluation.
5

Submit training

Run osmosis train submit configs/training/my-rollout.toml. See Training Runs for submission behavior.

Next Steps

Create Your Own Rollout

Use project-local Agent Skills to create a task-specific rollout with evaluation run gates.

Building AgentWorkflows

Learn the AgentWorkflow.run(ctx) contract and common implementation patterns.

Building Graders

Define reward signals that can drive training.

Strands Integration

Build tool-using rollouts with AWS Strands Agents.

OpenAI Agents Integration

Build rollouts with the OpenAI Agents SDK.