Effort Tiers Explained
ModelReins has one dial: effort tier. You set how hard the job needs to work and the router picks everything else.
Five tiers. Each one is a complete plan: model preference, retry policy, quality gate, fan-out, urgency.
The five tiers
Section titled “The five tiers”| Tier | Model | Retries | Quality gate | Fan-out | Urgency | Use when |
|---|---|---|---|---|---|---|
| trivial | local | 0 | off | 1 | batch | ”What’s 2+2?”, quick lookups |
| quick | haiku-class | 1 | off | 1 | normal | one-off code questions |
| standard | sonnet-class | 2 | single review | 1 | normal | daily-driver work |
| deep | opus-class | 3 | multi-pass review | 1 | normal | architecture, refactors |
| critical | opus-class | 3 | review + human gate | 1 | urgent | production deploys |
Trivial — when speed matters more than quality
Section titled “Trivial — when speed matters more than quality”What it does:
- Routes to the smallest local model available (Ollama, LM Studio)
- No retries
- No quality gate
- Batch urgency (waits for available capacity)
When to use:
- Math questions
- Translations
- Quick lookups
- “Yes or no” type answers
- Anything where being slightly wrong is fine
What you shouldn’t expect:
- Code review
- Nuanced reasoning
- Complex multi-step tasks
Cost: essentially zero. Local model, no cloud cost.
Quick — single-pass cloud
Section titled “Quick — single-pass cloud”What it does:
- Routes to a haiku-class cloud model
- One retry on failure
- No quality gate
- Normal urgency
When to use:
- Quick code questions
- “Explain this function”
- Boilerplate generation
- Drafts that you’ll refine yourself
Cost: very low. Haiku-class is the cheapest cloud tier.
Standard — the daily driver
Section titled “Standard — the daily driver”What it does:
- Routes to a sonnet-class model
- Up to 2 retries on failure
- Single quality-gate review — after the worker finishes, a reviewer worker checks the output and either passes it or sends it back for one more attempt with feedback
- Normal urgency
When to use:
- Most coding work
- Refactoring
- Bug fixes
- Documentation
- The default 90% of the time
Cost: moderate. Includes the review pass.
Deep — for hard problems
Section titled “Deep — for hard problems”What it does:
- Routes to an opus-class model (best available)
- Up to 3 retries
- Multi-pass quality gate — the reviewer can send the job back multiple times, each time with cumulative feedback
- Normal urgency
When to use:
- Architecture decisions
- Complex refactors with cross-cutting concerns
- Anything you’d normally bring to a senior engineer
- Tasks where “almost right” isn’t good enough
Cost: higher. Best model + multiple passes.
Critical — production-grade with human approval
Section titled “Critical — production-grade with human approval”What it does:
- Routes to an opus-class model
- Up to 3 retries
- Multi-pass quality gate
- Human approval gate — the job pauses after the AI work is done and waits for you to approve or deny via the saddle
- Urgent priority
When to use:
- Production deploys
- Database migrations
- Anything irreversible
- Anything where you want eyes on the result before it lands
Cost: highest. But the human gate means nothing reaches production without you.
How to switch tiers
Section titled “How to switch tiers”Three ways:
- Saddle pills — click
trivial · quick · standard · deep · critical - Keyboard —
Ctrl+Shift+]to cycle up,Ctrl+Shift+[to cycle down - Status bar — click the effort tier in the bottom right of VSCode
Your choice persists in settings (modelreins.defaultEffort).
Mode is the override
Section titled “Mode is the override”The five modes (auto · cheap · fast · smart · local) are override preferences within the tier:
- auto — let the router decide (default)
- cheap — within this tier, prefer the cheapest model
- fast — within this tier, prefer the fastest worker
- smart — within this tier, prefer the highest-quality worker
- local — within this tier, force a local worker
Example: you’re on deep tier (opus-class) but in cheap mode. The router will pick the cheapest opus-class subscription that has capacity, rather than the highest-quality one.
What the router actually does with your tier
Section titled “What the router actually does with your tier”Internally, every dispatch calls tier_to_dispatch_args(tier, mode) which returns a plan. The plan goes through negotiate_dispatch_plan() which checks:
- Is the requested model available?
- Are the subscriptions at capacity?
- Do you have a worker with the right tags?
- Should anything be downgraded?
You see the negotiated plan in the dispatch response. If something got downgraded, the response includes a downgrades array explaining what changed and why. Use Preview Plan in the saddle to see what would happen before you actually dispatch.