CLI Quick Start
Learn to batch evaluate datasets using the osmosis-ai command-line tool.
What You Can Do
The osmosis-ai CLI enables you to:
- Test rubric stability - Run the same rubric against the same data multiple times to verify scoring consistency (see multiple runs)
- Compare rubrics - Evaluate different rubrics side-by-side to choose the best one for your use case (see comparison)
Ready-to-use examples are available in the SDK’s examples/ folder. Let’s get started!
Installation
Access the CLI with:
5-Minute Workflow
Step 1: Create a Rubric Configuration
Create rubric_configs.yaml:
version: 1
default_score_min: 0.0
default_score_max: 1.0
rubrics:
- id: helpfulness
title: Response Helpfulness
rubric: |
Evaluate how helpful and actionable the response is.
Consider accuracy, completeness, and practicality.
model_info:
provider: openai
model: gpt-5
api_key_env: OPENAI_API_KEY
Step 2: Prepare Your Dataset
Create sample_data.jsonl:
{"solution_str": "Click 'Forgot Password' on the login page.", "rubric_id": "helpfulness"}
{"solution_str": "Please contact support for assistance.", "rubric_id": "helpfulness"}
Step 3: Set API Key
export OPENAI_API_KEY="your-key-here"
Step 4: Preview Configuration
Validate your setup:
osmosis preview --path rubric_configs.yaml
osmosis preview --path sample_data.jsonl
Step 5: Run Evaluation
osmosis eval --rubric helpfulness --data sample_data.jsonl
Understanding Output
Console:
Evaluating: 100%|████████| 2/2 [00:03<00:00, 1.5s/record]
Results Summary:
Average Score: 0.85
Min Score: 0.70
Max Score: 1.00
JSON File:
Results are saved to ~/.cache/osmosis/eval_result/helpfulness/
Advanced Usage
Multiple evaluation runs
Test rubric stability by running multiple evaluations:
osmosis eval --rubric helpfulness --data sample_data.jsonl --number 3
Compare rubrics
Compare two different rubrics to find the best one:
# First, run evaluation with the baseline rubric
osmosis eval --rubric helpfulness --data sample_data.jsonl --output baseline_results.json
# Then compare with a new rubric
osmosis eval --rubric helpfulness_v2 --data sample_data.jsonl --baseline baseline_results.json
Custom output location:
osmosis eval --rubric helpfulness --data sample_data.jsonl --output ./results.json
Tips
The CLI auto-discovers rubric_configs.yaml in your current directory, data directory, or ./examples/.
Never commit API keys. Always use environment variables.
Next Steps