Introduction - Osmosis

Welcome to Osmosis — the forward-deployed reinforcement learning platform. Osmosis helps companies create task-specific models that beat foundation models at a fraction of the cost.

Why Osmosis

Hands-on Deployments

We work directly with customers to support the entire post-training workflow — from feature engineering to reward function creation.

Reinforcement Fine-Tuning

A comprehensive post-training platform that allows engineers to leverage cutting-edge reinforcement learning techniques (GRPO, DAPO, etc.).

Continuous Improvement

Integrate with your evaluation solutions to monitor performance and automatically start re-training runs — without the need for an engineer in the loop.

Use Cases

Data Extraction

Build domain-specific extraction models to capture the exact structure and content for any document.

Tool Use

Teach AI agents to use the exact tools they’ll have in production. Osmosis powers AI agents that stay reliable, even in the most complex multi-step, multi-tool tasks.

Code Generation

Train specialized coding models for blazing fast generation of domain-specific languages, front-end components, and context-aware tests.

Further reading: Open Source SLM Trained for MCP — see how Osmosis trained a small language model for tool use with reinforcement learning. Visit osmosis.ai for more use cases and demos.

How It Works

You Define

Provide the building blocks for training:

Tools & Agent Logic — the actions your agent can take
Reward Functions — how outputs are scored
Training Data — the tasks your model learns from

Osmosis Trains

The platform handles the heavy lifting:

GPU Training Cluster — managed infrastructure, no setup needed
RL Training Loop — GRPO, DAPO, and multi-turn tool training
Checkpoints & Metrics — track progress in real time

Deploy Your Model

Ship a model that’s better at your tasks:

Merge to HuggingFace — export trained weights
Deploy Anywhere — use your model in any environment

Get Started

Platform Quickstart

New to Osmosis? Start here.

Platform Overview

Understand core concepts — workspaces, training runs, metrics, and model management.

Local Rollout

Sync reward functions, rubrics, and MCP tools from your GitHub repository.

Remote Rollout

Build custom agent servers that integrate with Osmosis training infrastructure.

What is a Rollout?

In reinforcement learning, a rollout is the process of running a policy in an environment to generate a trajectory — the complete sequence of actions, observations, and outcomes from start to finish. In the LLM context, a rollout is a single attempt by the model to solve a task, including any reasoning steps, tool usage, and final output. Think of it like a single ChatGPT conversation: if multiple users ask the same question to the same model, each interaction counts as a separate rollout. Each rollout produces a trajectory that captures everything the model did during that attempt. A reward function then scores how well the model performed. Osmosis collects these trajectories and rewards, then uses reinforcement learning (GRPO, DAPO) to update the model’s policy — nudging it toward strategies that earn higher rewards. By running thousands of rollouts per training iteration, the model discovers which reasoning patterns, tool-use strategies, and response styles lead to better outcomes — and improves measurably on your specific tasks over time.

Choose Your Workflow

Osmosis supports two main workflows for connecting your code to the training platform:

	Local Rollout	Remote Rollout
Best for	Reward functions, rubrics, MCP tools	Custom agent loops with complex logic
How it works	Push to GitHub → auto-synced to platform	Run your own HTTP server → platform connects
Setup	Add decorators + folder structure	Implement `RolloutAgentLoop`
When to use	Standard tool-use training	Multi-step reasoning, custom environments

Platform

​Why Osmosis

Hands-on Deployments

Reinforcement Fine-Tuning

Continuous Improvement

​Use Cases

Data Extraction

Tool Use

Code Generation

​How It Works

​Get Started

Platform Quickstart

Platform Overview

Local Rollout

Remote Rollout

​What is a Rollout?

​Choose Your Workflow

Why Osmosis

Use Cases

How It Works

Get Started

What is a Rollout?

Choose Your Workflow