Rollout Overview

AgentWorkflow and Grader are the two core abstractions in the Osmosis SDK for reinforcement learning training. Together they define a rollout — the unit of agent behavior and evaluation that the training cluster executes on every training step.

The Training Loop

RL training on Osmosis follows a four-step loop:

Prompt

The training cluster selects one row from your dataset and sends its input prompt to your AgentWorkflow. In most cases, that row just contains system_prompt, user_prompt, and ground_truth.

Rollout

Your AgentWorkflow processes the prompt — calling LLMs, using tools, executing multi-step reasoning — and produces output messages.

Grading

Your Grader evaluates the AgentWorkflow’s output against the row’s reference answer (ground_truth, exposed as ctx.label) and assigns a numerical reward (typically 0.0 to 1.0).

RL Update

The reward signal drives the model weight update, moving the model toward higher-reward behaviors (i.e. better outcomes).

This loop repeats across your entire dataset for each training step, progressively improving the model’s performance on your specific task.

Core Abstractions

Abstraction	Purpose	Base Class
AgentWorkflow	Defines agent behavior — how the model processes prompts, calls tools, and produces output	`AgentWorkflow`
Grader	Defines evaluation logic — how agent outputs are scored to produce reward signals	`Grader`

Both live in your workspace’s rollouts/ directory as Python classes that you subclass and implement.

rollouts/
└── my-rollout/
    └── main.py    # Defines one AgentWorkflow + one Grader

Quick Example

from osmosis_ai.rollout import (
    AgentWorkflow,
    AgentWorkflowContext,
    Grader,
    GraderContext,
)

class MyWorkflow(AgentWorkflow):
    async def run(self, ctx: AgentWorkflowContext) -> None:
        prompt = ctx.prompt
        # Call your LLM or agent logic here
        pass

class MyGrader(Grader):
    async def grade(self, ctx: GraderContext) -> None:
        for sample_id, sample in ctx.samples.items():
            reward = 1.0 if "expected" in str(sample.messages) else 0.0
            ctx.set_sample_reward(sample_id, reward)

The SDK automatically discovers your AgentWorkflow and Grader subclasses from the entrypoint file. No registration or decorators are needed — just define exactly one AgentWorkflow subclass and zero or one Grader subclass in your module.

From Code to Training

Once you’ve written a rollout, the path to a live training run is three simple steps:

Evaluate locally with osmosis eval run

Run your rollout against a local dataset using your own LLM API key and check that rewards, pass rates, and agent traces look right. Cap the dataset with --limit N when you just want a quick smoke test. See Local Evaluation.

Commit and sync

Commit your rollouts/<name>/ directory and push to the default branch of your connected GitHub repo. The Osmosis platform picks up the change through Git Sync — this synced copy is what osmosis train submit actually runs, so uncommitted local edits are not included.

Submit a training run

Run osmosis train submit with a training TOML that points at your rollout and entrypoint. The platform provisions GPUs, deploys your rollout from the synced commit, and runs RL training for you. Pin a specific revision with commit_sha if you need reproducibility.

Next Steps

Building AgentWorkflows

Implement the AgentWorkflow class to define your agent behavior.

Building Graders

Implement the Grader class to define reward signals for training.

Local Evaluation

Evaluate your rollout with osmosis eval run — or use it as a smoke test before submitting a training run.

Training Runs

Submit a training run once your rollout passes local eval.

CLI

Workspace

Rollout

Rollout Overview

The Training Loop

Core Abstractions

Quick Example

From Code to Training

Next Steps

Building AgentWorkflows

Building Graders

Local Evaluation

Training Runs

CLI

Workspace

Rollout

Documentation Index

​The Training Loop

​Core Abstractions

​Quick Example

​From Code to Training

​Next Steps

Building AgentWorkflows

Building Graders

Local Evaluation

Training Runs

The Training Loop

Core Abstractions

Quick Example

From Code to Training

Next Steps