CLI Reference
The osmosis-ai CLI provides two main commands:preview for inspecting configurations and eval for running evaluations.
Installation
Global Usage
Commands
preview
Inspect and validate rubric configurations or dataset files.Usage
Options
| Option | Type | Required | Description |
|---|---|---|---|
--path | string | Yes | Path to the file to preview (YAML or JSONL) |
Examples
Preview a rubric configuration:Output
The command will:- Validate the file structure
- Display parsed contents in a readable format
- Show count summary (number of rubrics or records)
- Report any validation errors
eval
Evaluate a dataset against a rubric configuration.Usage
Required Options
| Option | Short | Type | Description |
|---|---|---|---|
--rubric | -r | string | Rubric ID from your configuration file |
--data | -d | string | Path to JSONL dataset file |
Optional Parameters
| Option | Short | Type | Default | Description |
|---|---|---|---|---|
--config | -c | string | Auto-discovered | Path to rubric configuration YAML |
--number | -n | integer | 1 | Number of evaluation runs per record |
--output | -o | string | ~/.cache/osmosis/... | Output path for results JSON |
--baseline | -b | string | None | Path to baseline evaluation for comparison |
Examples
Basic evaluation:Configuration Files
Rubric Configuration (YAML)
The rubric configuration file defines evaluation criteria and model settings.Structure
Required Fields
version: Configuration schema version (currently1)rubrics: List of rubric definitions
Rubric Definition Fields
| Field | Type | Required | Description |
|---|---|---|---|
id | string | Yes | Unique identifier for the rubric |
title | string | Yes | Human-readable title |
rubric | string | Yes | Evaluation criteria in natural language |
model_info | object | Yes | LLM provider configuration |
score_min | float | No | Minimum score (overrides default) |
score_max | float | No | Maximum score (overrides default) |
Model Info Fields
| Field | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | Provider name (see Supported Providers) |
model | string | Yes | Model identifier |
api_key_env | string | No | Environment variable name for API key |
timeout | integer | No | Request timeout in seconds (default: 30) |
Auto-Discovery
If you don’t specify--config, the CLI searches for rubric_configs.yaml in:
- Same directory as the data file
- Current working directory
./examples/subdirectory
Dataset Format (JSONL)
Each line in the JSONL file represents one evaluation record.Minimal Example
Complete Example
Field Reference
| Field | Type | Required | Description |
|---|---|---|---|
solution_str | string | Yes | The text to be evaluated (must be non-empty) |
conversation_id | string | No | Unique identifier for this record |
rubric_id | string | No | Links to a specific rubric in config |
original_input | string | No | Original user query/prompt for context |
ground_truth | string | No | Reference answer for comparison |
metadata | object | No | Additional context passed to evaluator |
extra_info | object | No | Runtime configuration options |
score_min | float | No | Override minimum score for this record |
score_max | float | No | Override maximum score for this record |
Output Format
Console Output
During evaluation, you’ll see:JSON Output File
The output JSON file contains detailed results:Supported Providers
| Provider | Value | API Key Env | Example Models |
|---|---|---|---|
| OpenAI | openai | OPENAI_API_KEY | gpt-5 |
| Anthropic | anthropic | ANTHROPIC_API_KEY | claude-sonnet-4-5 |
| Google Gemini | gemini | GOOGLE_API_KEY | gemini-2.5-flash |
| xAI | xai | XAI_API_KEY | grok-4 |
| OpenRouter | openrouter | OPENROUTER_API_KEY | 100+ models |
| Cerebras | cerebras | CEREBRAS_API_KEY | llama3.1-405b |
Provider Configuration Example
Advanced Usage
Baseline Comparison
Compare new evaluations against a baseline to detect regressions:Variance Analysis
Run multiple evaluations per record to measure score consistency:- Understanding rubric stability
- Detecting ambiguous criteria
- A/B testing different prompts
Batch Processing
Process multiple datasets:Custom Cache Location
Override the default cache directory:Error Handling
Common Errors
API Key Not Found
Rubric Not Found
rubric_configs.yaml and ensure the rubric ID matches exactly.
Invalid JSONL Format
Model Not Found
Timeout Error
Best Practices
Writing Effective Rubrics:- Be specific and measurable
- Include clear criteria and examples
- Test with sample data before large-scale evaluation
- Include diverse examples with relevant metadata
- Validate JSONL syntax before evaluation
- Keep solution_str concise but complete
- Process datasets in batches for cost efficiency
- Start with small samples to test rubrics
- Monitor API usage through provider dashboards