> ## Documentation Index > Fetch the complete documentation index at: https://docs.osmosis.ai/llms.txt > Use this file to discover all available pages before exploring further. # 评估 > 从 workspace directory 提交 evaluation runs 并查看结果 Evaluation run 会针对 platform dataset 提交一次 run，使用与 [`osmosis train submit`](/zh/cli/command-reference#train-submit) 相同的 workspace、rollout、entrypoint、dataset 以及可选的 `commit_sha` 语义。平台会克隆 workspace directory 的 `origin` remote 所指向的仓库，并在服务端运行 rollout，因此请在提交前 push 修改并确认 [Git Sync](/zh/cli/workspace/git-sync) 已完成。 Evaluation configs 必须位于结构化 Osmosis workspace directory 内的 `configs/eval/` 目录下。 `osmosis eval submit` 也是 training run 之前推荐的 pre-flight——先运行它，在投入 GPU 训练前发现问题。 ## 快速开始在 workspace directory 内： ```bash theme={"theme":{"light":"github-light","dark":"github-dark"},"languages":{"custom":["/languages/cli.json"]}} osmosis dataset list # 确认 platform dataset 名称 git push # 确保平台能看到您的 commit osmosis eval submit configs/eval/my-rollout.toml ``` 然后查看或管理该 run： ```bash theme={"theme":{"light":"github-light","dark":"github-dark"},"languages":{"custom":["/languages/cli.json"]}} osmosis eval list osmosis eval info osmosis eval stop ``` ## Evaluation Config 完整字段参考请参见 [Config Files](/zh/cli/config-files#eval-config)。 ```toml configs/eval/my-rollout.toml theme={"theme":{"light":"github-light","dark":"github-dark"},"languages":{"custom":["/languages/cli.json"]}} [experiment] rollout = "my-rollout" # rollouts/ 下的 rollout 目录 entrypoint = "main.py" # 相对 rollout 目录的 entrypoint model_path = "openai/gpt-5-mini" # LiteLLM 风格的 evaluation policy model 名称 dataset = "my-platform-dataset" # 来自 `osmosis dataset list` 的 platform dataset 名称 # commit_sha = # 可选：固定到指定 commit [evaluation] # 可选。省略字段表示使用平台默认值。 # limit = 200 # n = 1 # batch_size = 1 # pass_threshold = 1.0 # agent_workflow_timeout_s = 450 # grader_timeout_s = 150 # [env] # LOG_LEVEL = "INFO" [secrets] # Eval config 必填。仅在不需要任何 secret 时写 required = []。 required = ["OPENAI_API_KEY"] ``` 省略 `[evaluation].limit` 时，平台会对 dataset 随机抽取 10% 的样本进行评估（至少一行）。设置 `limit` 可评估固定行数——即 dataset 的前 `N` 行（按顺序）。 Git Sync 是您 rollout 代码的 source of truth。CLI 会读取您传入的本地 TOML config 值，但 rollout 代码来自已同步的 workspace repository。提交代码修改前，请先 commit、push 并等待同步完成；需要特定已同步版本时，请设置 `commit_sha`。 ## 工作方式 CLI 读取 evaluation TOML，根据 Git `origin` remote 解析 workspace，并在提交前本地校验 `[experiment]` 和 `[secrets]`（以及可选的 `[evaluation]` 和 `[env]`）部分。 CLI 提交 evaluation run 请求。平台克隆已连接的 workspace repository（或固定的 `commit_sha`），并准备 evaluation 环境。在评估任何行之前，平台会先做一次 pre-flight 检查，确认 `[experiment].model_path` 能用您配置的凭据访问。如果模型不可达——名称错误、API key 缺失或无效，或被 provider 限流——run 会提前失败，而不会浪费 evaluation 资源。请用 [`osmosis secret set`](/zh/cli/command-reference#secret) 注册该模型的 provider API key，并把它列在 `[secrets].required` 中（参见 [Configuration Files](/zh/cli/config-files#env-and-secrets)）。平台启动您的 rollout，使用 `[experiment].model_path` 作为 evaluation policy，为 platform dataset 中被选中的每一行驱动 `AgentWorkflow.run(ctx)`，然后用该行的 `ground_truth` 运行 `Grader.grade(ctx)`。平台聚合 rewards、pass rates 和 per-row 结果。使用 `osmosis eval info `（或 `osmosis --json eval info `）查看。 ## 命令 | 命令 | 描述 | | ------------------------------------------- | ------------------------------------------------ | | `osmosis eval submit .toml [--yes]` | 从 `configs/eval/` 下的 TOML 提交一次 evaluation run。 | | `osmosis eval list [--limit N] [--all]` | 列出当前 workspace directory 的 evaluation runs。 | | `osmosis eval info ` | 显示某次 evaluation run 的详细信息和结果。 | | `osmosis eval stop [--yes]` | 停止一次 pending 或 running 的 evaluation run。 | | `osmosis eval rubric` | 对 JSONL conversation 文件运行本地 LLM-as-judge。不与平台交互。 | 完整 flag 列表请参见 [命令参考](/zh/cli/command-reference#eval)。 ## 从 Evaluation Run 到 Training Run 运行 `osmosis eval submit configs/eval/my-rollout.toml`。使用 `osmosis eval list` 和 `osmosis eval info ` 跟踪进度并查看结果。把修复 push 到 workspace repository 并重新提交。比较改动时，可以用 `commit_sha` 重新对旧版本运行同一份 evaluation run。 Evaluation run 结果健康后，运行 `osmosis train submit configs/training/my-rollout.toml`。参见 [Training Runs](/zh/platform/training-runs)。 ## 本地 Rubric 评分 `osmosis eval rubric` 是一个本地工具，用于通过 LLM judge 给已有的 JSONL conversation 文件打分。它不需要 workspace directory 或平台认证，也不会运行 rollout。 ```bash theme={"theme":{"light":"github-light","dark":"github-dark"},"languages":{"custom":["/languages/cli.json"]}} osmosis eval rubric -d conversations.jsonl \ --rubric "Evaluate the assistant's helpfulness..." \ --model openai/gpt-5-mini ``` 完整 flag 列表请参见 [命令参考](/zh/cli/command-reference#eval-rubric)。 ## 下一步 evaluation 和 training 配置文件的完整参考。在提交 evaluation run 或 training run 前 push 并同步 rollout 代码。当 evaluation run 结果健康后，提交 training run。