Why Osmosis
Hands-on Deployments
We work directly with customers to support the entire post-training workflow — from feature engineering to reward function creation.
Reinforcement Fine-Tuning
A comprehensive post-training platform that allows engineers to leverage cutting-edge reinforcement learning techniques (GRPO, DAPO, etc.).
Continuous Improvement
Integrate with your evaluation solutions to monitor performance and automatically start re-training runs — without the need for an engineer in the loop.
Use Cases
Data Extraction
Build domain-specific extraction models to capture the exact structure and content for any document.
Tool Use
Teach AI agents to use the exact tools they’ll have in production. Osmosis powers AI agents that stay reliable, even in the most complex multi-step, multi-tool tasks.
Code Generation
Train specialized coding models for blazing fast generation of domain-specific languages, front-end components, and context-aware tests.
How It Works
You Define
Provide the building blocks for training:
- Tools & Agent Logic — the actions your agent can take
- Reward Functions — how outputs are scored
- Training Data — the tasks your model learns from
Osmosis Trains
The platform handles the heavy lifting:
- GPU Training Cluster — managed infrastructure, no setup needed
- RL Training Loop — GRPO, DAPO, and multi-turn tool training
- Checkpoints & Metrics — track progress in real time
Get Started
Platform Quickstart
New to Osmosis? Start here.
Platform Overview
Understand core concepts — workspaces, training runs, metrics, and model management.
Local Rollout
Sync reward functions, rubrics, and MCP tools from your GitHub repository.
Remote Rollout
Build custom agent servers that integrate with Osmosis training infrastructure.
What is a Rollout?
In reinforcement learning, a rollout is the process of running a policy in an environment to generate a trajectory — the complete sequence of actions, observations, and outcomes from start to finish. In the LLM context, a rollout is a single attempt by the model to solve a task, including any reasoning steps, tool usage, and final output. Think of it like a single ChatGPT conversation: if multiple users ask the same question to the same model, each interaction counts as a separate rollout. Each rollout produces a trajectory that captures everything the model did during that attempt. A reward function then scores how well the model performed. Osmosis collects these trajectories and rewards, then uses reinforcement learning (GRPO, DAPO) to update the model’s policy — nudging it toward strategies that earn higher rewards. By running thousands of rollouts per training iteration, the model discovers which reasoning patterns, tool-use strategies, and response styles lead to better outcomes — and improves measurably on your specific tasks over time.Choose Your Workflow
Osmosis supports two main workflows for connecting your code to the training platform:| Local Rollout | Remote Rollout | |
|---|---|---|
| Best for | Reward functions, rubrics, MCP tools | Custom agent loops with complex logic |
| How it works | Push to GitHub → auto-synced to platform | Run your own HTTP server → platform connects |
| Setup | Add decorators + folder structure | Implement RolloutAgentLoop |
| When to use | Standard tool-use training | Multi-step reasoning, custom environments |