Platform Overview

Osmosis Platform is the central hub for managing reinforcement learning training of LLMs. It handles GPU provisioning, training orchestration, metrics collection, and model management — so you can focus on defining agent behavior and evaluation logic.

Core Concepts

Concept	Description
Workspace	Top-level container for your team. Holds projects, members, API keys, and integrations.
Project	A training context within a workspace. Contains training runs, reward functions, tools, and datasets.
Training Run	A single RL training session. Configured with a base model, dataset, reward functions, and tools.
Reward Function	Python function that scores LLM outputs deterministically. Returns a float.
Reward Rubric	Natural language criteria evaluated by an LLM judge during training.
MCP Tools	Functions the agent can call during rollouts (calculators, search, code execution, etc.).
Rollout Server	An HTTP server you build that implements your custom agent loop. The training cluster sends rollout requests to your server, which orchestrates tool calls and LLM inference, then returns trajectories and rewards. Built with the Python SDK’s `RolloutAgentLoop` API.
Checkpoint	A saved model state during training. Can be merged and exported to Hugging Face.

Architecture

Osmosis supports two rollout modes. Choose one depending on where you want the agent loop to run during training.

Option A: Local Rollout

Agent loop runs inside the training cluster. Push reward functions, rubrics, and MCP tools to GitHub — the platform syncs and runs everything. No server to deploy.

Option B: Remote Rollout

Agent loop runs on your own server. You only need to implement two functions — get_tools() (define available tools) and run() (agent loop logic) — by subclassing RolloutAgentLoop. The Python SDK automatically creates a trainer-compatible HTTP server that handles all protocol details (/v1/rollout/init, /v1/chat/completions, /v1/rollout/completed). Full control over agent logic and tool execution.