Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.osmosis.ai/llms.txt

Use this file to discover all available pages before exploring further.

A training run takes a base model and iteratively improves it through reinforcement learning. You provide the AgentWorkflow, Graders, and dataset; the platform provisions GPUs, executes the training loop, and saves checkpoints automatically.

Concepts

Training Configuration vs Training Run

A Training Configuration is the recipe — it defines which model, dataset, AgentWorkflow, and hyperparameters to use. A Training Run is a single execution of that configuration. You can submit multiple runs from the same configuration to experiment with different settings.

Training Strategies

StrategyDescription
BatchSingle pass through the dataset with RL optimization. Best for initial experiments and well-defined tasks with clear reward signals.
ContinuousMultiple epochs with ongoing monitoring. Best for production model improvement and tasks requiring gradual refinement.

Submitting a Training Run

Submit a training run using the CLI with a TOML configuration file:
osmosis train submit configs/training/default.toml
osmosis train submit runs against the rollout code that the platform has already synced from your connected GitHub repository via Git Sync — not against your local working tree. Any uncommitted local edits are ignored; only what’s been pushed to the default branch (and successfully synced) is available to the training run.By default, the platform uses the latest synced commit. Pin the run to a specific commit with the commit_sha field below if you need reproducibility or want to submit from a known-good revision while iterating locally.

Key Configuration Fields

[experiment]
rollout = "my_workflow"                 # AgentWorkflow directory name (under rollouts/)
entrypoint = "main.py"                  # Entrypoint file name
model_path = "Qwen/Qwen3.5-35B-A3B"     # HuggingFace model path
dataset = "my_dataset"                  # Dataset name
# commit_sha = "abc123..."              # Optional: pin to a specific synced commit

[training]
lr = 1e-6                               # Learning rate
total_epochs = 1                        # Number of passes through the dataset
n_samples_per_prompt = 8                # Samples generated per prompt
global_batch_size = 64                  # Batch size for training

[sampling]
rollout_temperature = 1.0               # Sampling temperature during rollouts
rollout_top_p = 1.0                     # Top-p sampling during rollouts

[checkpoints]
checkpoint_save_freq = 20               # Save checkpoint every N steps
See Config Files for the full TOML reference with all available fields.

Status Lifecycle

Every training run progresses through a series of statuses:
StatusDescription
pendingRun is queued and waiting for GPU resources to be provisioned.
runningTraining is actively in progress. Metrics and checkpoints are being produced.
finishedTraining completed successfully. Final checkpoint and metrics are available.
failedTraining encountered an error during execution. Check logs for details.
stoppedTraining was manually stopped by a user via the CLI or dashboard.
crashedTraining process terminated unexpectedly (e.g. OOM, hardware failure).
The internal lifecycle phases are: initprovisionsetuptrainfinalizecomplete (or error / cleanup).
A run in failed or crashed status may still have usable checkpoints saved before the failure occurred.

Monitoring

Track training progress through the CLI or the platform dashboard.

CLI Commands

# Check the current status of a training run
osmosis train status my-run

# View real-time training metrics
osmosis train metrics my-run

Platform Dashboard

The web dashboard at platform.osmosis.ai provides:
  • Real-time metrics — loss curves, reward trends, and learning rate schedules
  • Reward visualization — per-Grader reward scores over training steps
  • Checkpoint timeline — all saved checkpoints with their step number and metrics
  • Training logs — full output logs for debugging

LoRA Checkpoints

During training, LoRA checkpoints are saved at the interval specified by checkpoint_save_freq in your configuration. Checkpoints capture the adapter weights at a specific training step. You can:
  • Compare checkpoints by their reward scores to find the best-performing step
  • Resume training from a specific checkpoint if a run was stopped or crashed
  • Export checkpoints to HuggingFace for deployment

Managing Runs

Stopping a Run

osmosis train stop my-run
This gracefully stops the training process and saves a final checkpoint.

Deleting a Run

osmosis train delete my-run
Deleting a training run permanently removes all associated metrics, logs, and checkpoints. This action cannot be undone.

Next Steps

Datasets & Models

Upload datasets and manage base models for training.

Monitoring & Settings

Configure workspace settings and track training progress.