Skip to main content
A training run takes a base model and improves it through reinforcement learning. You provide the rollout, grader, training config, and dataset; the platform provisions GPUs, pulls code from your synced workspace repository, executes the training loop, and saves checkpoints automatically.

Concepts

Training Configuration vs Training Run

A Training Configuration is the recipe — it defines which model, dataset, AgentWorkflow, and hyperparameters to use. A Training Run is a single execution of that configuration. You can submit multiple runs from the same configuration to experiment with different settings.

Training Behavior

Each submitted run is a single managed training job for the rollout, dataset, model, and hyperparameters in its TOML config. To run another experiment, submit the config again with updated fields such as total_epochs, sampling settings, or checkpoint cadence.

Submitting a Training Run

Submit a training run using the CLI with a TOML configuration file:
osmosis train submit configs/training/default.toml
Git Sync is the source of truth for your rollout code. The CLI reads config values from the local TOML file you pass, but rollout code comes from the synced workspace repository. Commit, push, and wait for sync before submitting code changes; set commit_sha when you need a specific synced revision.

Key Configuration Fields

[experiment]
rollout = "my-rollout"                  # Rollout directory name (under rollouts/)
entrypoint = "main.py"                  # Entrypoint file name
model_path = "Qwen/Qwen3.6-35B-A3B"     # Hugging Face model path
dataset = "my-dataset"                  # Dataset name
# commit_sha = "abc123..."              # Optional: pin to a specific synced commit

[training]
lr = 1e-6                               # Learning rate
total_epochs = 1                        # Number of training epochs
n_samples_per_prompt = 8                # Samples generated per prompt
rollout_batch_size = 32                 # Rollout batch size
agent_workflow_timeout_s = 450          # Agent rollout timeout per row
grader_timeout_s = 150                  # Grader timeout per row

[sampling]
rollout_temperature = 1.0               # Sampling temperature during rollouts
rollout_top_p = 1.0                     # Top-p sampling during rollouts

[checkpoints]
checkpoint_save_freq = 20               # Save checkpoint every N steps
See Config Files for the full TOML reference with all available fields.

Status Lifecycle

Every training run progresses through a series of statuses:
StatusDescription
pendingRun is queued and waiting for GPU resources to be provisioned.
runningTraining is actively in progress. Metrics and checkpoints are being produced.
finishedTraining completed successfully. Final checkpoint and metrics are available.
failedTraining encountered an error during execution. Check logs for details.
stoppedTraining was manually stopped by a user via the CLI or dashboard.
killedTraining was terminated during platform cleanup or stop handling.
crashedTraining process terminated unexpectedly (e.g. OOM, hardware failure).
unknownThe platform could not determine the current training state.
The internal lifecycle phases are: initprovisionsetuptrainfinalizecomplete (or error / cleanup).
A run in failed or crashed status may still have usable checkpoints saved before the failure occurred.

Monitoring

Track training progress through the CLI or the platform dashboard.

CLI Commands

# Show run details, checkpoints, and metrics
osmosis train info my-run

# Save metrics to a specific JSON file
osmosis train info my-run --output results/my-run.json
While a run is in flight, train info reports progress (current_step / total_steps) and the most recent reward. train list surfaces the same fields so you can scan runs at a glance.

Platform Dashboard

The web dashboard at platform.osmosis.ai provides:
  • Run list — search and filter runs by status, dataset, base model, and rollout.
  • Overview metrics — view Duration, Reward, Improvement, Samples, Training Reward, Validation Reward, Model Entropy, Response Length, Total Length, and Truncation Ratio when available.
  • Checkpoints — view saved checkpoints with their step, reward, deployment status, and Hugging Face upload status.
  • Outputs — inspect output artifacts when they are available.
See Monitoring for the full list of dashboard metrics.

LoRA Checkpoints

During training, LoRA checkpoints are saved at the interval specified by checkpoint_save_freq in your configuration. Checkpoints capture the adapter weights at a specific training step. You can:
  • Compare checkpoints by their reward scores to find the best-performing step
  • Export checkpoints from the dashboard
  • Upload checkpoints to Hugging Face
  • Deploy LoRA models for inference with osmosis model deploy

Managing Runs

Stopping a Run

osmosis train stop my-run
This requests a graceful stop for the training process. If the stop completes successfully, the run enters stopped status.

Next Steps

Datasets

Upload datasets for training.

Models

Manage base models and deploy trained LoRA models.