Effort Tiers Explained

ModelReins has one dial: effort tier. You set how hard the job needs to work and the router picks everything else.

Five tiers. Each one is a complete plan: model preference, retry policy, quality gate, fan-out, urgency.

The five tiers

Tier	Model	Retries	Quality gate	Fan-out	Urgency	Use when
trivial	local	0	off	1	batch	”What’s 2+2?”, quick lookups
quick	haiku-class	1	off	1	normal	one-off code questions
standard	sonnet-class	2	single review	1	normal	daily-driver work
deep	opus-class	3	multi-pass review	1	normal	architecture, refactors
critical	opus-class	3	review + human gate	1	urgent	production deploys

Trivial — when speed matters more than quality

What it does:

Routes to the smallest local model available (Ollama, LM Studio)
No retries
No quality gate
Batch urgency (waits for available capacity)

When to use:

Math questions
Translations
Quick lookups
“Yes or no” type answers
Anything where being slightly wrong is fine

What you shouldn’t expect:

Code review
Nuanced reasoning
Complex multi-step tasks

Cost: essentially zero. Local model, no cloud cost.

Quick — single-pass cloud

What it does:

Routes to a haiku-class cloud model
One retry on failure
No quality gate
Normal urgency

When to use:

Quick code questions
“Explain this function”
Boilerplate generation
Drafts that you’ll refine yourself

Cost: very low. Haiku-class is the cheapest cloud tier.

Standard — the daily driver

What it does:

Routes to a sonnet-class model
Up to 2 retries on failure
Single quality-gate review — after the worker finishes, a reviewer worker checks the output and either passes it or sends it back for one more attempt with feedback
Normal urgency

When to use:

Most coding work
Refactoring
Bug fixes
Documentation
The default 90% of the time

Cost: moderate. Includes the review pass.

Deep — for hard problems

What it does:

Routes to an opus-class model (best available)
Up to 3 retries
Multi-pass quality gate — the reviewer can send the job back multiple times, each time with cumulative feedback
Normal urgency

When to use:

Architecture decisions
Complex refactors with cross-cutting concerns
Anything you’d normally bring to a senior engineer
Tasks where “almost right” isn’t good enough

Cost: higher. Best model + multiple passes.

Critical — production-grade with human approval

What it does:

Routes to an opus-class model
Up to 3 retries
Multi-pass quality gate
Human approval gate — the job pauses after the AI work is done and waits for you to approve or deny via the saddle
Urgent priority

When to use:

Production deploys
Database migrations
Anything irreversible
Anything where you want eyes on the result before it lands

Cost: highest. But the human gate means nothing reaches production without you.

How to switch tiers

Three ways:

Saddle pills — click trivial · quick · standard · deep · critical
Keyboard — Ctrl+Shift+] to cycle up, Ctrl+Shift+[ to cycle down
Status bar — click the effort tier in the bottom right of VSCode

Your choice persists in settings (modelreins.defaultEffort).

Mode is the override

The five modes (auto · cheap · fast · smart · local) are override preferences within the tier:

auto — let the router decide (default)
cheap — within this tier, prefer the cheapest model
fast — within this tier, prefer the fastest worker
smart — within this tier, prefer the highest-quality worker
local — within this tier, force a local worker

Example: you’re on deep tier (opus-class) but in cheap mode. The router will pick the cheapest opus-class subscription that has capacity, rather than the highest-quality one.

What the router actually does with your tier

Internally, every dispatch calls tier_to_dispatch_args(tier, mode) which returns a plan. The plan goes through negotiate_dispatch_plan() which checks:

Is the requested model available?
Are the subscriptions at capacity?
Do you have a worker with the right tags?
Should anything be downgraded?

You see the negotiated plan in the dispatch response. If something got downgraded, the response includes a downgrades array explaining what changed and why. Use Preview Plan in the saddle to see what would happen before you actually dispatch.