Skip to main content
The Osmosis CLI uses TOML files for evaluation runs and training runs. Configs must live inside the workspace directory:
Config typeRequired locationCommand
Evalconfigs/eval/*.tomlosmosis eval submit
Trainingconfigs/training/*.tomlosmosis train submit
Required fields are shown un-commented. Optional fields are commented out in template files and can be omitted to use platform defaults.

Eval Config

Used by osmosis eval submit to submit an evaluation run. The platform clones the workspace repository identified by the origin remote and runs the rollout server-side against a platform dataset.
configs/eval/my-rollout.toml
[experiment]
rollout = "my-rollout"                         # Rollout directory under rollouts/
entrypoint = "main.py"                         # Entrypoint relative to rollout dir
model_path = "openai/gpt-5-mini"               # LiteLLM-style model name for the evaluation policy
dataset = "my-platform-dataset"                # Platform dataset name from `osmosis dataset list`
# commit_sha = "abc123..."                     # Pin code to a specific commit

[evaluation]
# Optional. Omit values to use platform defaults.
# limit = 200                                  # First N rows; omit for random 10% sample
# n = 1                                        # Evaluation attempts per row
# batch_size = 1                               # Rows evaluated per batch
# pass_threshold = 1.0                         # Minimum passing score
# agent_workflow_timeout_s = 450               # Agent workflow timeout per row
# grader_timeout_s = 150                       # Grader timeout per row

# [env]
# LOG_LEVEL = "INFO"                           # Non-secret literal env var

[secrets]
# Required for eval configs. Default OpenAI eval models need this.
# Use required = [] when the evaluation needs no secret refs.
required = ["OPENAI_API_KEY"]

[experiment]

FieldTypeRequiredDescription
rolloutstrYesRollout directory name under rollouts/
entrypointstrYesPython entrypoint relative to the rollout directory
model_pathstrYesLiteLLM-style model name for the evaluation policy (e.g. openai/gpt-5-mini)
datasetstrYesPlatform dataset name from osmosis dataset list
commit_shastrNoPin code to a specific commit. Defaults to the latest synced commit on the default branch.

[evaluation]

All fields are optional. Omit values to use platform defaults.
FieldTypeDescription
limitintNumber of rows to evaluate (the first N rows). When omitted, the platform evaluates a random 10% sample of the dataset.
nintNumber of evaluation attempts per row (use values > 1 for pass@n metrics)
batch_sizeintRows evaluated per batch
pass_thresholdfloatScore at or above which a sample counts as passing
agent_workflow_timeout_sfloatTimeout for AgentWorkflow.run() per row
grader_timeout_sfloatTimeout for Grader.grade() per row

[env] and [secrets] (evaluation)

Optional [env] variables and a required [secrets] table for the evaluation run container. Eval configs must include [secrets] — use required = [] only when the evaluation needs no secret refs. See [env] and [secrets] below for the full ruleset.

Training Config

Used by osmosis train submit to submit a training run.
configs/training/my-rollout.toml
[experiment]
rollout = "my-rollout"                         # Rollout directory under rollouts/
entrypoint = "main.py"                         # Entrypoint file name
model_path = "Qwen/Qwen3.6-35B-A3B"            # Supported base model
dataset = "my-dataset"                         # Platform dataset name
# commit_sha = "abc123..."                     # Pin code to a commit

[training]
# lr = 1e-6                                    # Learning rate
# total_epochs = 1                             # Training epochs
# n_samples_per_prompt = 8                     # Rollout samples per prompt
# rollout_batch_size = 32                      # Rollout batch size
# max_prompt_length = 8192                     # Max prompt tokens
# max_response_length = 8192                   # Max response tokens
# agent_workflow_timeout_s = 450               # Agent timeout per row
# grader_timeout_s = 150                       # Grader timeout per row

[sampling]
# rollout_temperature = 1.0                    # Sampling temperature
# rollout_top_p = 1.0                          # Top-p sampling

[checkpoints]
# eval_interval = 10                           # Evaluate every N rollouts
# checkpoint_save_freq = 20                    # Save checkpoint every N rollouts

# [advanced]
# Backend-specific fields. Use only when instructed by Osmosis support.

# [env]
# LOG_LEVEL = "INFO"                           # Non-secret literal env var

# [secrets]
# required = ["OPENAI_API_KEY"]                # Optional in training; if set, must include `required`
Git Sync is the source of truth for your rollout code. The CLI reads config values from the local TOML file you pass, but rollout code comes from the synced workspace repository. Commit, push, and wait for sync before submitting code changes; set commit_sha when you need a specific synced revision.

[experiment]

FieldTypeRequiredDescription
rolloutstrYesRollout directory name under rollouts/
entrypointstrYesPython entrypoint file name, usually main.py
model_pathstrYesSupported base model path
datasetstrYesDataset name from osmosis dataset list
commit_shastrNoGit commit SHA to fetch from the workspace repository

[training]

FieldTypeDefaultDescription
lrfloatplatform defaultLearning rate
total_epochsintplatform defaultNumber of training epochs
n_samples_per_promptintplatform defaultRollout samples generated per prompt
rollout_batch_sizeintplatform defaultPrompts processed per rollout batch
max_prompt_lengthintplatform defaultMaximum prompt tokens
max_response_lengthintplatform defaultMaximum response tokens
agent_workflow_timeout_snumberplatform defaultAgent rollout timeout per row
grader_timeout_snumberplatform defaultGrader timeout per row

[sampling]

FieldTypeDefaultDescription
rollout_temperaturenumberplatform defaultSampling temperature during rollouts
rollout_top_pnumberplatform defaultTop-p sampling threshold

[checkpoints]

FieldTypeDefaultDescription
eval_intervalintplatform defaultEvaluate every N rollout steps
checkpoint_save_freqintplatform defaultSave a LoRA checkpoint every N rollout steps

[advanced]

Optional backend-specific fields. The CLI preserves unknown keys in this section and the platform validates them server-side.

[env] and [secrets]

Use these sections to inject environment variables into the rollout container during training runs or evaluation runs. The same shape applies to both training and evaluation configs.
SectionValuesUse for
[env]Literal strings stored in the config fileNon-secret configuration
[secrets].requiredList of platform environment_secret record namesAPI keys and private credentials
[env]
LOG_LEVEL = "INFO"

[secrets]
required = ["OPENAI_API_KEY", "DATABASE_URL"]
Rules:
  • [env] keys must match ^[A-Z_][A-Z0-9_]*$; [secrets].required names must match ^[A-Z][A-Z0-9_]*$.
  • The same name cannot appear in both [env] and [secrets].required.
  • [env] keys starting with _OSMOSIS_ are reserved by the platform and cannot be used.
  • [secrets].required entries are record names only. The platform resolves each name to its encrypted value server-side and injects it as an env var of the same name. Secret values never appear in the config file, the API payload, or CLI output.
  • Eval configs must include [secrets]. Use required = [] only when the evaluation needs no secret refs.
  • Training configs may omit [secrets]. If you include the table, it must define required.
Secrets are scoped. A workspace secret is shared across the workspace; a personal secret is private to you and overrides the workspace secret of the same name at run time. Register secrets with osmosis secret set before submitting a run that references them.
Start with only [experiment] (plus [secrets] for eval configs) and let the platform use training defaults. Add optional fields only when you need to tune a run.