Skip to content

Routing

Routing rules determine how jobs are matched to workers. You can route by provider, model, priority, tags, cost tier, or custom logic.

Without routing rules, jobs go to the first available worker that supports the requested provider.

Route by job complexity:

{
"routing": {
"tiers": {
"low": { "provider": "ollama", "model": "llama3.2" },
"medium": { "provider": "claude", "model": "haiku" },
"high": { "provider": "claude", "model": "sonnet" }
}
}
}

Set the effort tier in the saddle’s command strip before dispatching. The dropdown next to the prompt input lets you pick low, medium, or high.

Match jobs to workers by tags:

{
"routing": {
"rules": [
{ "match": { "tag": "code-review" }, "workers": { "tag": "gpu-server" } },
{ "match": { "tag": "summarize" }, "workers": { "tag": "local" } }
]
}
}

Try providers in order:

{
"routing": {
"strategy": "fallback",
"chain": ["ollama", "claude", "openrouter"]
}
}

Enforce spend limits with automatic downgrade to local:

{
"routing": {
"budget": {
"weekly_limit_usd": 5.00,
"over_budget_provider": "ollama"
}
}
}

When a cloud worker hits a provider’s rate limit or session cap (e.g., Claude’s “You’ve hit your limit” message), the router automatically:

  1. Marks that worker as capped for 60 seconds
  2. Routes the next dispatch to a different worker
  3. Retries the capped worker after the cooldown

No manual intervention. The SDK detects the cap message in the worker’s stdout and classifies the failure as rate_limited.

The saddle’s target picker lets you pin dispatches to a specific worker:

  • Click any worker to pin all dispatches to that worker
  • Click auto to let the router pick (default)
  • Select two or more workers to fan out the same prompt to all of them

This is useful for testing new workers, A/B comparisons, and debugging routing issues. Under the hood, it adds assigned_to to the dispatch payload.

See Cost Optimization for strategies built on routing.