Skip to main content

API Reference

Browse the full osmosis-ai Python SDK surface below. The SDK is organized around two main capabilities:

Reward & Evaluation

Decorators and functions for scoring LLM outputs using local rules or LLM-based rubrics

Remote Rollout

Build custom agent loops for training infrastructure integration

Reward API

Decorators

@osmosis_reward

Decorator for local reward functions that compute scores without API calls. Use this for deterministic evaluation logic (exact match, regex, keyword matching). Signature:
@osmosis_reward
def function_name(
    solution_str: str,
    ground_truth: str,
    extra_info: dict = None,
    **kwargs
) -> float
Parameters:
  • solution_str (str, required) - Text to evaluate
  • ground_truth (str, required) - Reference answer
  • extra_info (dict, optional) - Additional context
  • **kwargs (required) - Future compatibility (see warning below)
Returns: float - Score value Example:
from osmosis_ai import osmosis_reward

@osmosis_reward
def exact_match(solution_str: str, ground_truth: str, extra_info: dict = None, **kwargs) -> float:
    return 1.0 if solution_str.strip() == ground_truth.strip() else 0.0

@osmosis_rubric

Decorator for LLM-based evaluation functions. Use this for subjective evaluation that requires semantic understanding (helpfulness, tone, quality). Signature:
@osmosis_rubric
def function_name(
    solution_str: str,
    ground_truth: str | None,
    extra_info: dict,
    **kwargs
) -> float
Parameters:
  • solution_str (str, required) - Text to evaluate
  • ground_truth (str | None, required) - Reference answer (can be None)
  • extra_info (dict, required) - Configuration and context
  • **kwargs (required) - Future compatibility (see warning below)
Returns: float - Score value Example:
from osmosis_ai import osmosis_rubric, evaluate_rubric

@osmosis_rubric
def quality_check(solution_str: str, ground_truth: str | None, extra_info: dict, **kwargs) -> float:
    return evaluate_rubric(
        rubric="Evaluate response quality",
        solution_str=solution_str,
        model_info={"provider": "openai", "model": "gpt-5.2"},
        ground_truth=ground_truth
    )

Rubric Evaluation

evaluate_rubric()

Evaluate text using an LLM-based rubric. This is the core function for LLM-powered evaluation. Signature:
def evaluate_rubric(
    rubric: str,
    solution_str: str,
    model_info: dict,
    ground_truth: str | None = None,
    original_input: str | None = None,
    metadata: dict | None = None,
    score_min: float = 0.0,
    score_max: float = 1.0,
    timeout: int | None = None,
    return_details: bool = False
) -> float | dict
Parameters:
ParameterTypeRequiredDescription
rubricstrYesNatural language evaluation criteria
solution_strstrYesText to evaluate
model_infodictYesLLM provider configuration
ground_truthstrNoReference answer
original_inputstrNoOriginal user query
metadatadictNoAdditional context
score_minfloatNoMinimum score (default: 0.0)
score_maxfloatNoMaximum score (default: 1.0)
timeoutintNoRequest timeout in seconds
return_detailsboolNoReturn full response (default: False)
model_info Structure:
{
    "provider": "openai",           # Required
    "model": "gpt-5.2",         # Required
    "api_key": "sk-...",            # Optional
    "api_key_env": "OPENAI_API_KEY", # Optional
    "timeout": 30                   # Optional
}
Returns:
  • float - Score (when return_details=False)
  • dict - Full response with score, explanation, raw payload (when return_details=True)
Example:
from osmosis_ai import evaluate_rubric

score = evaluate_rubric(
    rubric="Evaluate how helpful the response is.",
    solution_str="Click 'Forgot Password' to reset.",
    model_info={"provider": "openai", "model": "gpt-5.2"}
)

Exceptions

MissingAPIKeyError

Raised when an API key is not found for a provider.
from osmosis_ai import MissingAPIKeyError

try:
    score = evaluate_rubric(...)
except MissingAPIKeyError as e:
    print(f"API key not found: {e}")

ProviderRequestError

Raised when a provider request fails.
from osmosis_ai import ProviderRequestError

try:
    score = evaluate_rubric(...)
except ProviderRequestError as e:
    print(f"Provider error: {e}")

ModelNotFoundError

Raised when a specified model is not available (subclass of ProviderRequestError).
from osmosis_ai import ModelNotFoundError

try:
    score = evaluate_rubric(...)
except ModelNotFoundError as e:
    print(f"Model not found: {e}")

Types

ModelInfo (TypedDict)

from osmosis_ai.rubric_types import ModelInfo

model_info: ModelInfo = {
    "provider": "openai",
    "model": "gpt-5.2",
    "api_key_env": "OPENAI_API_KEY",
    "timeout": 30
}
FieldTypeRequiredDescription
providerstrYesProvider name (see Supported Providers)
modelstrYesModel identifier
api_keystrNoAPI key value directly
api_key_envstrNoEnvironment variable name for API key
timeoutfloatNoRequest timeout in seconds
score_minfloatNoMinimum score override
score_maxfloatNoMaximum score override
system_promptstr | NoneNoCustom system prompt for the evaluator
original_inputstr | NoneNoOriginal user query for context
reasoning_effortstr | NoneNoReasoning effort level (provider-specific)

RewardRubricRunResult (TypedDict)

Returned when return_details=True:
from osmosis_ai.rubric_types import RewardRubricRunResult

result: RewardRubricRunResult = {
    "score": 0.85,              # float
    "explanation": "...",       # str
    "raw": {...}                # Any - raw LLM response
}


Complete Reward Example

from osmosis_ai import osmosis_reward, osmosis_rubric, evaluate_rubric
from dotenv import load_dotenv

load_dotenv()

# Local reward function
@osmosis_reward
def exact_match(solution_str: str, ground_truth: str, extra_info: dict = None, **kwargs) -> float:
    return 1.0 if solution_str.strip() == ground_truth.strip() else 0.0

# Remote rubric evaluator
@osmosis_rubric
def semantic_eval(solution_str: str, ground_truth: str | None, extra_info: dict, **kwargs) -> float:
    return evaluate_rubric(
        rubric="Compare semantic similarity (0-1 scale)",
        solution_str=solution_str,
        ground_truth=ground_truth,
        model_info={"provider": "openai", "model": "gpt-5.2"}
    )

# Usage
solution = "The capital of France is Paris"
truth = "Paris is France's capital"

local_score = exact_match(solution, truth)
semantic_score = semantic_eval(solution, truth, {})

print(f"Exact match: {local_score}")      # 0.0
print(f"Semantic: {semantic_score}")      # ~1.0

Remote Rollout API

Build custom agent loops that integrate with Osmosis training infrastructure. Your agent runs as an HTTP server while the training cluster handles LLM inference and trajectory collection.

Quick Overview

ComponentDescription
RolloutAgentLoopBase class for implementing agents
RolloutContextExecution context with chat(), complete(), error() methods
RolloutRequestInitial request with messages, parameters, and metadata
create_app()Factory function to create FastAPI server

Remote Rollout Documentation

For complete Remote Rollout API documentation, guides, and examples, see the dedicated Remote Rollout section.

Next Steps

CLI Reference

Complete CLI documentation

Remote Rollout

Build custom agent loops