Skip to main content
Models are split into Base Models and LoRA Models. Base models are the starting point for training. LoRA models are trained checkpoints produced by training runs; deploy a LoRA model to serve it through Osmosis inference. LoRA model lifecycle lives under the osmosis model command group: list, inspect, deploy, and undeploy all act on a LoRA model by name.

Base Models

Base models are imported from Hugging Face and used as the starting point for training on Osmosis.

Supported Base Models

We currently support:
ModelDescription
Qwen/Qwen3.6-35B-A3BQwen 3.6 35B with 3B active parameters (MoE)
Qwen/Qwen3.5-122B-A10BQwen 3.5 122B with 10B active parameters (MoE)
The list of supported models is expanding. Check the platform dashboard or run osmosis model list --type base for the latest available base models.

List Base Models

osmosis model list --type base
The base model list shows model name, creation date, and creator.

LoRA Models

LoRA models are trained checkpoints produced by training runs. The Models page lists them separately from base models and shows training run, checkpoint step, training reward, creation date, and deployment status when inference deployment is available for your account.

Inspect LoRA Models

List LoRA models:
osmosis model list --type lora
Show details for a single LoRA model:
osmosis model info <lora-model-name>
Model details include the base model, training run, checkpoint step, training reward, Hugging Face export status, and deployment status when deployment info is available. List base models and LoRA models side by side:
osmosis model list
When deployment info is available, the LoRA section also shows the workspace’s deployment-quota summary (for example, 2 of 5 inference deployments used).

Deploy a LoRA Model

After a training run finishes, list its LoRA models to find one to deploy:
osmosis model list --type lora
Deploy a LoRA model by name:
osmosis model deploy <lora-model-name>
Deploying an inactive LoRA model reactivates it. Deploying an already-active LoRA model is a no-op.

Call the Inference Endpoint

Deployed LoRA models are served through the OpenAI-compatible chat completions endpoint:
https://inference.osmosis.ai/v1/chat/completions
Use your Osmosis API key and the canonical model value from the model detail page or osmosis model info. The model value has the form <base_model_path>:<lora-model-name>.
curl -X POST https://inference.osmosis.ai/v1/chat/completions \
  -H "Authorization: Bearer $OSMOSIS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.6-35B-A3B:my-run-step-100",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
If inference deployment is not available for your account, deployment status, deployment quota, and endpoint snippets may be hidden.

Undeploy

To transition a LoRA model’s deployment to inactive:
osmosis model undeploy <lora-model-name>
The LoRA model remains in the training run history; undeploy only transitions the serving deployment to inactive. undeploy is idempotent — calling it on an already-inactive model is a no-op.

Requirements

  • Run model commands from the workspace directory so the CLI can resolve the connected workspace from Git origin.
  • The LoRA model must belong to a training run in the same workspace.
  • Inference deployment must be available for your account. Deploying models also requires an active subscription.
  • GitHub setup must be healthy before training runs can produce new LoRA models.

Next Steps

Datasets

Upload and validate datasets for evaluation runs and training runs.

Training Runs

Submit training runs and inspect their LoRA models.

Command Reference

Review model and deployment commands and options.