Metrics Dashboard
During an active training run, the dashboard displays key metrics updated in real time:| Metric | Description |
|---|---|
| Training Reward | Average reward score across training rollouts |
| Validation Reward | Reward score on held-out validation data |
| Model Entropy | Measure of output diversity (higher = more exploration) |
| Response Length | Average token count of model responses |
| KL Divergence | Distance from the reference model (monitors catastrophic forgetting) |
Time Range Controls
Filter metrics by time range:- Last hour, last 6 hours, last 24 hours
- Full training run
- Custom range
Training Logs
The logs panel shows detailed event-level information:- Rollout logs — Individual rollout traces with prompts, responses, tool calls, and rewards
- System logs — Infrastructure events (GPU allocation, checkpoint saves, errors)
- Reward breakdowns — Per-sample reward scores with details from each reward function
Checkpoints
Checkpoints are saved automatically during training at configurable intervals. Each checkpoint captures:- Model weights at that training step
- Training metrics at the time of save
- Configuration used for the run
Viewing Checkpoints
The checkpoint list shows:- Step number — Training step when the checkpoint was saved
- Training reward — Average reward at that step
- Validation reward — Validation score at that step
- Timestamp — When the checkpoint was created
Merging Checkpoints
To create a deployable model from a checkpoint:- Select a checkpoint from the list
- Click Merge — this combines the RL adapter with the base model
- The merged model is saved and ready for export
Exporting to Hugging Face
After merging a checkpoint:- Click Upload to Hugging Face on the merged model
- Configure the target repository and visibility (public/private)
- The model is uploaded with a model card containing training metadata
You must configure your Hugging Face integration in Workspace Settings before exporting.