Datasets & Models

Datasets

Datasets provide the training prompts and ground truth that drive RL training. Each row in a dataset becomes a training example that the model learns from.

Dataset Format

Osmosis accepts datasets in JSONL, CSV, or Parquet format, up to 5 GB per file.

Required Columns

Column	Description
`system_prompt`	The system prompt provided to the model for this example.
`user_prompt`	The user prompt or question the model must respond to.
`ground_truth`	The expected correct answer or reference output. The platform UI also accepts `label` as an alias for this column.

Optional Columns

Column	Description
`metadata`	Arbitrary JSON metadata attached to each example. Accessible in your Grader via `extra_info`.

Example JSONL

{"system_prompt": "You are a helpful math tutor.", "user_prompt": "What is 15 * 23?", "ground_truth": "345"}
{"system_prompt": "You are a helpful math tutor.", "user_prompt": "Simplify 3/9.", "ground_truth": "1/3"}

Uploading a Dataset

osmosis dataset upload data/train.jsonl

After upload, the dataset enters a processing pipeline. You can check its status:

Status	Description
pending	Upload received, waiting to be processed.
processing	Dataset is being validated and indexed.
uploaded	Dataset is ready for use in training runs.
error	Processing failed — check column names and file format.
cancelled	Upload was cancelled before processing completed.

Validating Locally

Before uploading, validate your dataset locally to catch format issues early:

osmosis dataset validate data/train.jsonl

This checks column names, data types, and file format without uploading to the platform.

Previewing a Dataset

Preview the first few rows of an uploaded dataset:

osmosis dataset preview my-dataset --rows 5

Managing Datasets

# List all datasets in the current workspace
osmosis dataset list

# Delete a dataset
osmosis dataset delete my-dataset

Deleting a dataset does not affect training runs that already used it, but it cannot be recovered.

Models

Supported Base Models

Osmosis uses models imported from Hugging Face as the starting point for training. We currently support:

Model	Description
`Qwen/Qwen3.5-35B-A3B`	Qwen 3.5 35B with 3B active parameters (MoE)
`Qwen/Qwen3.5-122B-A10B`	Qwen 3.5 122B with 10B active parameters (MoE)

The list of supported models is expanding. Check the platform dashboard or run osmosis model list for the latest available models.

Model Management

# List available models in your workspace
osmosis model list

# Deploy a trained model
osmosis model deploy my-model

# Export a model to HuggingFace
osmosis model export my-model

Private Models

To use private models from Hugging Face, configure your Hugging Face access token on the Secrets page in your workspace settings. This allows the platform to pull gated or private models during training. See Monitoring & Settings for details on managing secrets.

Platform

Datasets & Models

Datasets

Dataset Format

Required Columns

Optional Columns

Example JSONL

Uploading a Dataset

Validating Locally

Previewing a Dataset

Managing Datasets

Models

Supported Base Models

Model Management

Private Models

Next Steps

Training Runs

Monitoring & Settings

Platform

Documentation Index

​Datasets

​Dataset Format

​Required Columns

​Optional Columns

​Example JSONL

​Uploading a Dataset

​Validating Locally

​Previewing a Dataset

​Managing Datasets

​Models

​Supported Base Models

​Model Management

​Private Models

​Next Steps

Training Runs

Monitoring & Settings

Datasets

Dataset Format

Required Columns

Optional Columns

Example JSONL

Uploading a Dataset

Validating Locally

Previewing a Dataset

Managing Datasets

Models

Supported Base Models

Model Management

Private Models

Next Steps