CLI Quick Start

Learn to batch evaluate datasets using the osmosis-ai command-line tool.

What You Can Do

The osmosis-ai CLI enables you to:

Test rubric stability - Run the same rubric against the same data multiple times to verify scoring consistency (see multiple runs)
Compare rubrics - Evaluate different rubrics side-by-side to choose the best one for your use case (see comparison)

Ready-to-use examples are available in the SDK’s examples/ folder. Let’s get started!

Installation

pip install osmosis-ai

Access the CLI with:

osmosis --help

5-Minute Workflow

Step 1: Create a Rubric Configuration

Create rubric_configs.yaml:

version: 1
default_score_min: 0.0
default_score_max: 1.0

rubrics:
  - id: helpfulness
    title: Response Helpfulness
    rubric: |
      Evaluate how helpful and actionable the response is.
      Consider accuracy, completeness, and practicality.
    model_info:
      provider: openai
      model: gpt-5
      api_key_env: OPENAI_API_KEY

Step 2: Prepare Your Dataset

Create sample_data.jsonl:

{"solution_str": "Click 'Forgot Password' on the login page.", "rubric_id": "helpfulness"}
{"solution_str": "Please contact support for assistance.", "rubric_id": "helpfulness"}

Step 3: Set API Key

export OPENAI_API_KEY="your-key-here"

Step 4: Preview Configuration

Validate your setup:

osmosis preview --path rubric_configs.yaml
osmosis preview --path sample_data.jsonl

Step 5: Run Evaluation

osmosis eval --rubric helpfulness --data sample_data.jsonl

Understanding Output

Console:

Evaluating: 100%|████████| 2/2 [00:03<00:00, 1.5s/record]

Results Summary:
Average Score: 0.85
Min Score: 0.70
Max Score: 1.00

JSON File: Results are saved to ~/.cache/osmosis/eval_result/helpfulness/

Advanced Usage

Multiple evaluation runs

Test rubric stability by running multiple evaluations:

osmosis eval --rubric helpfulness --data sample_data.jsonl --number 3

Compare rubrics

Compare two different rubrics to find the best one:

# First, run evaluation with the baseline rubric
osmosis eval --rubric helpfulness --data sample_data.jsonl --output baseline_results.json

# Then compare with a new rubric
osmosis eval --rubric helpfulness_v2 --data sample_data.jsonl --baseline baseline_results.json

Custom output location:

osmosis eval --rubric helpfulness --data sample_data.jsonl --output ./results.json

Tips

The CLI auto-discovers rubric_configs.yaml in your current directory, data directory, or ./examples/.

Never commit API keys. Always use environment variables.

Next Steps

CLI Reference

Complete CLI documentation

Quick Start

Learn the Python SDK API

Python SDK

​CLI Quick Start

​What You Can Do

​Installation

​5-Minute Workflow

​Step 1: Create a Rubric Configuration

​Step 2: Prepare Your Dataset

​Step 3: Set API Key

​Step 4: Preview Configuration

​Step 5: Run Evaluation

​Understanding Output

​Advanced Usage

​Multiple evaluation runs

​Compare rubrics

​Tips

​Next Steps

CLI Reference

Quick Start

CLI Quick Start

What You Can Do

Installation

5-Minute Workflow

Step 1: Create a Rubric Configuration

Step 2: Prepare Your Dataset

Step 3: Set API Key

Step 4: Preview Configuration

Step 5: Run Evaluation

Understanding Output

Advanced Usage

Multiple evaluation runs

Compare rubrics

Tips

Next Steps