@osmosis_reward decorator and return a float value representing the quality of the output.
Basic Example
File:reward_fn/compute_reward.py
Function Signature
Parameters
solution_str: str
The output generated by the LLM that you want to evaluate.
Example:
ground_truth: str
The expected correct answer or reference solution.
Example:
extra_info: dict
Optional dictionary containing additional metadata or context.
Example:
**kwargs
Required parameter for future compatibility. See the warning above for details.
Return Value
Return a float value representing the score:- Typically between
0.0(worst) and1.0(best) - Can use other ranges if appropriate for your use case
- Should be deterministic (same inputs → same output)
Common Patterns
Exact Match
Multi-Criteria
Error Handling
Always handle errors gracefully:Testing Locally
Test your reward functions before pushing:Best Practices
1. Be Deterministic
Reward functions should always return the same score for the same inputs:2. Normalize Scores
Keep scores in a consistent range:3. Document Scoring Logic
Clearly explain how scores are calculated:4. Use extra_info When Appropriate
Leverage theextra_info parameter for context: