AgentWorkflow, which produces one or more samples for a prompt, with a Grader, which turns those samples into reward signals.
Rollouts are normal Python code in your workspace. They can use a simple single-call LLM workflow, a tool-using agent built with Strands Agents, an OpenAI Agents SDK workflow, or a custom harness that you drive through the open source osmosis-ai SDK.
Training Loop
Training on Osmosis repeatedly runs the same four-part loop:Select a dataset row
The training cluster selects one row from your dataset and sends its prompt fields to your
AgentWorkflow. Common datasets contain system_prompt, user_prompt, and ground_truth.Run the AgentWorkflow
Your workflow receives an
AgentWorkflowContext, calls the current policy through an Osmosis-supported agent integration, uses any tools you provide, and records rollout samples.Grade the samples
Your
Grader receives the collected samples plus the row’s reference answer (ground_truth, exposed as ctx.label) and assigns a numerical reward to each sample.Files in a Rollout
Each rollout lives underrollouts/ and is referenced by evaluation and training configs:
| File | Purpose |
|---|---|
rollouts/my-rollout/main.py | Defines exactly one concrete AgentWorkflow and one concrete Grader for training and evaluation |
rollouts/my-rollout/pyproject.toml | Declares rollout-local Python dependencies |
configs/eval/my-rollout.toml | Points the evaluation run at the rollout, entrypoint, evaluation policy model, and platform dataset |
configs/training/my-rollout.toml | Points the training run at the rollout code version and training settings |
osmosis train submit and osmosis eval submit both discover rollout classes from the entrypoint file and run the rollout server-side. Keep helper classes, tools, and config objects wherever you like, but expose the concrete AgentWorkflow and Grader classes that the platform should validate.Core Abstractions
| Abstraction | What it does | Where to learn more |
|---|---|---|
AgentWorkflow | Defines agent behavior: prompt handling, model calls, tool use, and sample creation | Building AgentWorkflows |
Grader | Defines reward logic: exact matching, programmatic checks, LLM-as-judge, or custom scoring | Building Graders |
| Agent integration | Connects your agent framework to the active Osmosis rollout context | Strands Integration, OpenAI Agents Integration |
| Execution backend | Runs rollout code in-process or in a Harbor-managed environment when you drive the SDK yourself | Execution Backends |
Choose an Agent Framework
Most rollout authors start with one of the built-in agent integrations:Strands Agents
Use
OsmosisStrandsAgent when you want Strands tools, Strands message handling, and a direct migration path from an existing Strands Agent.OpenAI Agents
Use
OsmosisAgent when your workflow already uses the OpenAI Agents SDK, Runner.run, sessions, handoffs, or OpenAI-style tool orchestration.OsmosisRolloutModel placeholder. You do not hard-code the training model inside rollout code; Osmosis resolves the placeholder to the current policy at runtime.
Choose an Execution Backend
If you useosmosis eval submit or osmosis train submit, the platform manages execution and you do not choose a backend from the CLI. The entrypoint decides which SDK backend to construct when the rollout server starts, and the starter templates use LocalBackend unless you choose the Harbor template.
You only choose a backend explicitly when embedding the open source SDK in your own harness:
| Backend | Use when |
|---|---|
LocalBackend | You want fast in-process execution, easy debugging, and no Docker dependency |
HarborBackend | You need Harbor-managed per-trial isolation and are using the Daytona-backed path required by the managed platform |
Start from a Template
If you already have a task or dataset, start with Create Your Own Rollout. Platform-created workspace repositories include project-local Agent Skills that guide an AI coding agent through dataset planning, rollout creation, evaluation runs, debugging, and training run readiness. List available starter templates:rollouts/ plus matching evaluation and training configs, and are the quickest way to see the expected file layout, dependency declaration, and end-to-end workflow.
Path to Training
Once your rollout exists, use this path:Implement the workflow
Put policy calls behind
OsmosisStrandsAgent or OsmosisAgent, pass the dataset prompt from ctx.prompt, and keep any task-specific tools close to the rollout.Implement the grader
Score every sample in
ctx.samples. Start with a deterministic grader when possible, then add LLM-as-judge logic only when the task is subjective.Commit and sync
Push rollout changes to the default branch so Git Sync publishes the code version used by evaluation runs and training runs.
Submit an evaluation run
Run
osmosis eval submit configs/eval/my-rollout.toml and inspect rewards, failures, and per-row results with osmosis eval info <name>. See Evaluation.Submit training
Run
osmosis train submit configs/training/my-rollout.toml. See Training Runs for submission behavior.Next Steps
Create Your Own Rollout
Use project-local Agent Skills to create a task-specific rollout with evaluation run gates.
Building AgentWorkflows
Learn the
AgentWorkflow.run(ctx) contract and common implementation patterns.Building Graders
Define reward signals that can drive training.
Strands Integration
Build tool-using rollouts with AWS Strands Agents.
OpenAI Agents Integration
Build rollouts with the OpenAI Agents SDK.