Dataset Format
Osmosis accepts datasets in JSONL, CSV, or Parquet format, up to 5 GB per file. Each dataset must contain at least 4 rows.Required Columns
| Column | Description |
|---|---|
system_prompt | The system prompt provided to the model for this example. |
user_prompt | The user prompt or question the model must respond to. |
Optional Columns
| Column | Description |
|---|---|
ground_truth | The expected correct answer or reference output. The platform UI also accepts label as an alias for this column. When present, the value is passed to your Grader as context.label. |
metadata | Arbitrary JSON metadata attached to each example. |
Include
ground_truth (or label) when your Grader needs a reference answer to score against. Datasets that drive reward functions based purely on model behavior can omit it.Example JSONL
Upload a Dataset
train in this example). After upload, the dataset enters a processing pipeline. You can check its status:
| Status | Description |
|---|---|
| uploading | File upload has started and is not complete yet. |
| pending | Upload received, waiting to be processed. |
| processing | Dataset is being validated and indexed. |
| uploaded | Dataset is ready for use in evaluation runs and training runs. |
| error | Processing failed — check column names and file format. |
| cancelled | Upload was cancelled before processing completed. |
Validate Locally
Before uploading, validate your dataset locally to catch format issues early:Preview a Dataset
Preview the first few rows of an uploaded dataset:Manage Datasets
Next Steps
Training Runs
Use validated datasets in training configs.
Models
Choose base models and deploy trained LoRA models.