Reward Functions
Reward functions provide deterministic, numeric scoring for LLM outputs. They use the@osmosis_reward decorator and return a float value representing the quality of the output.
Basic Example
File:reward_fn/compute_reward.py
Function Signature
All reward functions must follow this signature:Parameters
solution_str: str
The output generated by the LLM that you want to evaluate.
Example:
ground_truth: str
The expected correct answer or reference solution.
Example:
extra_info: dict
Optional dictionary containing additional metadata or context.
Example:
**kwargs
Captures any additional keyword arguments for future compatibility.
Return Value
Return a float value representing the score:- Typically between
0.0(worst) and1.0(best) - Can use other ranges if appropriate for your use case
- Should be deterministic (same inputs → same output)
Common Patterns
Exact Match
Partial Credit
Threshold-Based
Multi-Criteria
Error Handling
Always handle errors gracefully:Testing Locally
Test your reward functions before pushing:Best Practices
1. Be Deterministic
Reward functions should always return the same score for the same inputs:2. Normalize Scores
Keep scores in a consistent range:3. Document Scoring Logic
Clearly explain how scores are calculated:4. Use extra_info When Appropriate
Leverage theextra_info parameter for context: