Skip to main content

CLI Quick Start

Learn to batch evaluate datasets using the osmosis-ai command-line tool.

What You Can Do

The osmosis-ai CLI enables you to:
  • Test rubric stability - Run the same rubric against the same data multiple times to verify scoring consistency (see multiple runs)
  • Compare rubrics - Evaluate different rubrics side-by-side to choose the best one for your use case (see comparison)
Ready-to-use examples are available in the SDK’s examples/ folder. Let’s get started!

Installation

pip install osmosis-ai
Access the CLI with:
osmosis --help

5-Minute Workflow

Step 1: Create a Rubric Configuration

Create rubric_configs.yaml:
version: 1
default_score_min: 0.0
default_score_max: 1.0

rubrics:
  - id: helpfulness
    title: Response Helpfulness
    rubric: |
      Evaluate how helpful and actionable the response is.
      Consider accuracy, completeness, and practicality.
    model_info:
      provider: openai
      model: gpt-5
      api_key_env: OPENAI_API_KEY

Step 2: Prepare Your Dataset

Create sample_data.jsonl:
{"solution_str": "Click 'Forgot Password' on the login page.", "rubric_id": "helpfulness"}
{"solution_str": "Please contact support for assistance.", "rubric_id": "helpfulness"}

Step 3: Set API Key

export OPENAI_API_KEY="your-key-here"

Step 4: Preview Configuration

Validate your setup:
osmosis preview --path rubric_configs.yaml
osmosis preview --path sample_data.jsonl

Step 5: Run Evaluation

osmosis eval --rubric helpfulness --data sample_data.jsonl

Understanding Output

Console:
Evaluating: 100%|████████| 2/2 [00:03<00:00, 1.5s/record]

Results Summary:
Average Score: 0.85
Min Score: 0.70
Max Score: 1.00
JSON File: Results are saved to ~/.cache/osmosis/eval_result/helpfulness/

Advanced Usage

Multiple evaluation runs

Test rubric stability by running multiple evaluations:
osmosis eval --rubric helpfulness --data sample_data.jsonl --number 3

Compare rubrics

Compare two different rubrics to find the best one:
# First, run evaluation with the baseline rubric
osmosis eval --rubric helpfulness --data sample_data.jsonl --output baseline_results.json

# Then compare with a new rubric
osmosis eval --rubric helpfulness_v2 --data sample_data.jsonl --baseline baseline_results.json
Custom output location:
osmosis eval --rubric helpfulness --data sample_data.jsonl --output ./results.json

Tips

The CLI auto-discovers rubric_configs.yaml in your current directory, data directory, or ./examples/.
Never commit API keys. Always use environment variables.

Next Steps