Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.osmosis.ai/llms.txt

Use this file to discover all available pages before exploring further.

Datasets

Datasets provide the training prompts and ground truth that drive RL training. Each row in a dataset becomes a training example that the model learns from.

Dataset Format

Osmosis accepts datasets in JSONL, CSV, or Parquet format, up to 5 GB per file.

Required Columns

ColumnDescription
system_promptThe system prompt provided to the model for this example.
user_promptThe user prompt or question the model must respond to.
ground_truthThe expected correct answer or reference output. The platform UI also accepts label as an alias for this column.

Optional Columns

ColumnDescription
metadataArbitrary JSON metadata attached to each example. Accessible in your Grader via extra_info.

Example JSONL

{"system_prompt": "You are a helpful math tutor.", "user_prompt": "What is 15 * 23?", "ground_truth": "345"}
{"system_prompt": "You are a helpful math tutor.", "user_prompt": "Simplify 3/9.", "ground_truth": "1/3"}

Uploading a Dataset

osmosis dataset upload data/train.jsonl
After upload, the dataset enters a processing pipeline. You can check its status:
StatusDescription
pendingUpload received, waiting to be processed.
processingDataset is being validated and indexed.
uploadedDataset is ready for use in training runs.
errorProcessing failed — check column names and file format.
cancelledUpload was cancelled before processing completed.

Validating Locally

Before uploading, validate your dataset locally to catch format issues early:
osmosis dataset validate data/train.jsonl
This checks column names, data types, and file format without uploading to the platform.

Previewing a Dataset

Preview the first few rows of an uploaded dataset:
osmosis dataset preview my-dataset --rows 5

Managing Datasets

# List all datasets in the current workspace
osmosis dataset list

# Delete a dataset
osmosis dataset delete my-dataset
Deleting a dataset does not affect training runs that already used it, but it cannot be recovered.

Models

Supported Base Models

Osmosis uses models imported from Hugging Face as the starting point for training. We currently support:
ModelDescription
Qwen/Qwen3.5-35B-A3BQwen 3.5 35B with 3B active parameters (MoE)
Qwen/Qwen3.5-122B-A10BQwen 3.5 122B with 10B active parameters (MoE)
The list of supported models is expanding. Check the platform dashboard or run osmosis model list for the latest available models.

Model Management

# List available models in your workspace
osmosis model list

# Deploy a trained model
osmosis model deploy my-model

# Export a model to HuggingFace
osmosis model export my-model

Private Models

To use private models from Hugging Face, configure your Hugging Face access token on the Secrets page in your workspace settings. This allows the platform to pull gated or private models during training. See Monitoring & Settings for details on managing secrets.

Next Steps

Training Runs

Submit and manage training runs using your datasets and models.

Monitoring & Settings

Track training progress and configure workspace settings.