Routing
Routing rules determine how jobs are matched to workers. You can route by provider, model, priority, tags, cost tier, or custom logic.
Default behavior
Section titled “Default behavior”Without routing rules, jobs go to the first available worker that supports the requested provider.
Tiered routing
Section titled “Tiered routing”Route by job complexity:
{ "routing": { "tiers": { "low": { "provider": "ollama", "model": "llama3.2" }, "medium": { "provider": "claude", "model": "haiku" }, "high": { "provider": "claude", "model": "sonnet" } } }}Set the effort tier in the saddle’s command strip before dispatching. The dropdown next to the prompt input lets you pick low, medium, or high.
Tag-based routing
Section titled “Tag-based routing”Match jobs to workers by tags:
{ "routing": { "rules": [ { "match": { "tag": "code-review" }, "workers": { "tag": "gpu-server" } }, { "match": { "tag": "summarize" }, "workers": { "tag": "local" } } ] }}Fallback chains
Section titled “Fallback chains”Try providers in order:
{ "routing": { "strategy": "fallback", "chain": ["ollama", "claude", "openrouter"] }}Budget routing
Section titled “Budget routing”Enforce spend limits with automatic downgrade to local:
{ "routing": { "budget": { "weekly_limit_usd": 5.00, "over_budget_provider": "ollama" } }}Automatic cap detection
Section titled “Automatic cap detection”When a cloud worker hits a provider’s rate limit or session cap (e.g., Claude’s “You’ve hit your limit” message), the router automatically:
- Marks that worker as capped for 60 seconds
- Routes the next dispatch to a different worker
- Retries the capped worker after the cooldown
No manual intervention. The SDK detects the cap message in the worker’s stdout and classifies the failure as rate_limited.
Direct worker targeting
Section titled “Direct worker targeting”The saddle’s target picker lets you pin dispatches to a specific worker:
- Click any worker to pin all dispatches to that worker
- Click auto to let the router pick (default)
- Select two or more workers to fan out the same prompt to all of them
This is useful for testing new workers, A/B comparisons, and debugging routing issues. Under the hood, it adds assigned_to to the dispatch payload.
See Cost Optimization for strategies built on routing.