A training run takes a base model and iteratively improves it through reinforcement learning. You provide the AgentWorkflow, Graders, and dataset; the platform provisions GPUs, executes the training loop, and saves checkpoints automatically.Documentation Index
Fetch the complete documentation index at: https://docs.osmosis.ai/llms.txt
Use this file to discover all available pages before exploring further.
Concepts
Training Configuration vs Training Run
A Training Configuration is the recipe — it defines which model, dataset, AgentWorkflow, and hyperparameters to use. A Training Run is a single execution of that configuration. You can submit multiple runs from the same configuration to experiment with different settings.Training Strategies
| Strategy | Description |
|---|---|
| Batch | Single pass through the dataset with RL optimization. Best for initial experiments and well-defined tasks with clear reward signals. |
| Continuous | Multiple epochs with ongoing monitoring. Best for production model improvement and tasks requiring gradual refinement. |
Submitting a Training Run
Submit a training run using the CLI with a TOML configuration file:Key Configuration Fields
See Config Files for the full TOML reference with all available fields.
Status Lifecycle
Every training run progresses through a series of statuses:| Status | Description |
|---|---|
| pending | Run is queued and waiting for GPU resources to be provisioned. |
| running | Training is actively in progress. Metrics and checkpoints are being produced. |
| finished | Training completed successfully. Final checkpoint and metrics are available. |
| failed | Training encountered an error during execution. Check logs for details. |
| stopped | Training was manually stopped by a user via the CLI or dashboard. |
| crashed | Training process terminated unexpectedly (e.g. OOM, hardware failure). |
A run in
failed or crashed status may still have usable checkpoints saved before the failure occurred.Monitoring
Track training progress through the CLI or the platform dashboard.CLI Commands
Platform Dashboard
The web dashboard at platform.osmosis.ai provides:- Real-time metrics — loss curves, reward trends, and learning rate schedules
- Reward visualization — per-Grader reward scores over training steps
- Checkpoint timeline — all saved checkpoints with their step number and metrics
- Training logs — full output logs for debugging
LoRA Checkpoints
During training, LoRA checkpoints are saved at the interval specified bycheckpoint_save_freq in your configuration. Checkpoints capture the adapter weights at a specific training step.
You can:
- Compare checkpoints by their reward scores to find the best-performing step
- Resume training from a specific checkpoint if a run was stopped or crashed
- Export checkpoints to HuggingFace for deployment
Managing Runs
Stopping a Run
Deleting a Run
Next Steps
Datasets & Models
Upload datasets and manage base models for training.
Monitoring & Settings
Configure workspace settings and track training progress.