Routing

Routing rules determine how jobs are matched to workers. You can route by provider, model, priority, tags, cost tier, or custom logic.

Default behavior

Without routing rules, jobs go to the first available worker that supports the requested provider.

Tiered routing

Route by job complexity:

{
  "routing": {
    "tiers": {
      "low": { "provider": "ollama", "model": "llama3.2" },
      "medium": { "provider": "claude", "model": "haiku" },
      "high": { "provider": "claude", "model": "sonnet" }
    }
  }
}

Set the effort tier in the saddle’s command strip before dispatching. The dropdown next to the prompt input lets you pick low, medium, or high.

Tag-based routing

Match jobs to workers by tags:

{
  "routing": {
    "rules": [
      { "match": { "tag": "code-review" }, "workers": { "tag": "gpu-server" } },
      { "match": { "tag": "summarize" }, "workers": { "tag": "local" } }
    ]
  }
}

Fallback chains

Try providers in order:

{
  "routing": {
    "strategy": "fallback",
    "chain": ["ollama", "claude", "openrouter"]
  }
}

Budget routing

Enforce spend limits with automatic downgrade to local:

{
  "routing": {
    "budget": {
      "weekly_limit_usd": 5.00,
      "over_budget_provider": "ollama"
    }
  }
}

Automatic cap detection

When a cloud worker hits a provider’s rate limit or session cap (e.g., Claude’s “You’ve hit your limit” message), the router automatically:

Marks that worker as capped for 60 seconds
Routes the next dispatch to a different worker
Retries the capped worker after the cooldown

No manual intervention. The SDK detects the cap message in the worker’s stdout and classifies the failure as rate_limited.

Direct worker targeting

The saddle’s target picker lets you pin dispatches to a specific worker:

Click any worker to pin all dispatches to that worker
Click auto to let the router pick (default)
Select two or more workers to fan out the same prompt to all of them

This is useful for testing new workers, A/B comparisons, and debugging routing issues. Under the hood, it adds assigned_to to the dispatch payload.

See Cost Optimization for strategies built on routing.