Skip to main content
Datasets provide the prompts and optional reference answers that drive evaluation runs and training runs. Each row becomes an example that your rollout and Grader process.

Dataset Format

Osmosis accepts datasets in JSONL, CSV, or Parquet format, up to 5 GB per file. Each dataset must contain at least 4 rows.

Required Columns

ColumnDescription
system_promptThe system prompt provided to the model for this example.
user_promptThe user prompt or question the model must respond to.

Optional Columns

ColumnDescription
ground_truthThe expected correct answer or reference output. The platform UI also accepts label as an alias for this column. When present, the value is passed to your Grader as context.label.
metadataArbitrary JSON metadata attached to each example.
Include ground_truth (or label) when your Grader needs a reference answer to score against. Datasets that drive reward functions based purely on model behavior can omit it.

Example JSONL

{"system_prompt": "You are a helpful math tutor.", "user_prompt": "What is 15 * 23?", "ground_truth": "345"}
{"system_prompt": "You are a helpful math tutor.", "user_prompt": "Simplify 3/9.", "ground_truth": "1/3"}

Upload a Dataset

osmosis dataset upload data/train.jsonl
The uploaded dataset is named from the file stem (train in this example). After upload, the dataset enters a processing pipeline. You can check its status:
osmosis dataset info <dataset-name>
StatusDescription
uploadingFile upload has started and is not complete yet.
pendingUpload received, waiting to be processed.
processingDataset is being validated and indexed.
uploadedDataset is ready for use in evaluation runs and training runs.
errorProcessing failed — check column names and file format.
cancelledUpload was cancelled before processing completed.

Validate Locally

Before uploading, validate your dataset locally to catch format issues early:
osmosis dataset validate data/train.jsonl
This checks required columns, file format, and basic JSONL/CSV/Parquet structure without uploading to the platform.

Preview a Dataset

Preview the first few rows of an uploaded dataset:
osmosis dataset preview my-dataset --rows 5

Manage Datasets

# List all datasets in the current workspace
osmosis dataset list

# Download a dataset file
osmosis dataset download my-dataset

Next Steps

Training Runs

Use validated datasets in training configs.

Models

Choose base models and deploy trained LoRA models.