Skip to main content
Welcome to Osmosis — the forward-deployed reinforcement learning platform. Define your tools, reward functions, and training data; Osmosis handles the rest — from GPU orchestration to RL training — and delivers a task-specific model tuned to your exact workflow.

Why Osmosis

Hands-on Deployments

We work directly with customers to support the entire post-training workflow — from feature engineering to reward function creation.

Reinforcement Fine-Tuning

A comprehensive post-training platform that allows engineers to leverage cutting-edge reinforcement learning techniques (GRPO, DAPO, etc.).

Continuous Improvement

Integrate with your evaluation solutions to monitor performance and automatically start re-training runs — without the need for an engineer in the loop.

Use Cases

Data Extraction

Build domain-specific extraction models to capture the exact structure and content for any document.

Tool Use

Teach AI agents to use the exact tools they’ll have in production. Osmosis powers AI agents that stay reliable, even in the most complex multi-step, multi-tool tasks.

Code Generation

Train specialized coding models for blazing fast generation of domain-specific languages, front-end components, and context-aware tests.
Further reading: Open Source SLM Trained for MCP — see how Osmosis trained a small language model for tool use with reinforcement learning. Visit osmosis.ai for more use cases and demos.

How It Works

1

You Define

Provide the building blocks for training:
  • Tools & Agent Logic — the actions your agent can take
  • Reward Functions — how outputs are scored
  • Training Data — the tasks your model learns from
2

Osmosis Trains

The platform handles the heavy lifting:
  • GPU Training Cluster — managed infrastructure, no setup needed
  • RL Training Loop — GRPO, DAPO, and multi-turn tool training
  • Checkpoints & Metrics — track progress in real time
3

Deploy Your Model

Ship a model that’s better at your tasks:
  • Merge to HuggingFace — export trained weights
  • Deploy Anywhere — use your model in any environment

Get Started

Platform Quickstart

New to Osmosis? Start here.

Platform Overview

Understand core concepts — workspaces, training runs, metrics, and model management.

Local Rollout

Sync reward functions, rubrics, and MCP tools from your GitHub repository.

Remote Rollout

Build custom agent servers that integrate with Osmosis training infrastructure.

What is a Rollout?

In reinforcement learning, a rollout is the process of running a policy in an environment to generate a trajectory — the complete sequence of actions, observations, and outcomes from start to finish. In the LLM context, a rollout is a single attempt by the model to solve a task, including any reasoning steps, tool usage, and final output. Think of it like a single ChatGPT conversation: if multiple users ask the same question to the same model, each interaction counts as a separate rollout. Each rollout produces a trajectory that captures everything the model did during that attempt. A reward function then scores how well the model performed. Osmosis collects these trajectories and rewards, then uses reinforcement learning (GRPO, DAPO) to update the model’s policy — nudging it toward strategies that earn higher rewards. By running thousands of rollouts per training iteration, the model discovers which reasoning patterns, tool-use strategies, and response styles lead to better outcomes — and improves measurably on your specific tasks over time.

Choose Your Workflow

Osmosis supports two main workflows for connecting your code to the training platform:
Local RolloutRemote Rollout
Best forReward functions, rubrics, MCP toolsCustom agent loops with complex logic
How it worksPush to GitHub → auto-synced to platformRun your own HTTP server → platform connects
SetupAdd decorators + folder structureImplement RolloutAgentLoop
When to useStandard tool-use trainingMulti-step reasoning, custom environments