Skip to content

Effort Tiers Explained

ModelReins has one dial: effort tier. You set how hard the job needs to work and the router picks everything else.

Five tiers. Each one is a complete plan: model preference, retry policy, quality gate, fan-out, urgency.

TierModelRetriesQuality gateFan-outUrgencyUse when
triviallocal0off1batch”What’s 2+2?”, quick lookups
quickhaiku-class1off1normalone-off code questions
standardsonnet-class2single review1normaldaily-driver work
deepopus-class3multi-pass review1normalarchitecture, refactors
criticalopus-class3review + human gate1urgentproduction deploys

Trivial — when speed matters more than quality

Section titled “Trivial — when speed matters more than quality”

What it does:

  • Routes to the smallest local model available (Ollama, LM Studio)
  • No retries
  • No quality gate
  • Batch urgency (waits for available capacity)

When to use:

  • Math questions
  • Translations
  • Quick lookups
  • “Yes or no” type answers
  • Anything where being slightly wrong is fine

What you shouldn’t expect:

  • Code review
  • Nuanced reasoning
  • Complex multi-step tasks

Cost: essentially zero. Local model, no cloud cost.

What it does:

  • Routes to a haiku-class cloud model
  • One retry on failure
  • No quality gate
  • Normal urgency

When to use:

  • Quick code questions
  • “Explain this function”
  • Boilerplate generation
  • Drafts that you’ll refine yourself

Cost: very low. Haiku-class is the cheapest cloud tier.

What it does:

  • Routes to a sonnet-class model
  • Up to 2 retries on failure
  • Single quality-gate review — after the worker finishes, a reviewer worker checks the output and either passes it or sends it back for one more attempt with feedback
  • Normal urgency

When to use:

  • Most coding work
  • Refactoring
  • Bug fixes
  • Documentation
  • The default 90% of the time

Cost: moderate. Includes the review pass.

What it does:

  • Routes to an opus-class model (best available)
  • Up to 3 retries
  • Multi-pass quality gate — the reviewer can send the job back multiple times, each time with cumulative feedback
  • Normal urgency

When to use:

  • Architecture decisions
  • Complex refactors with cross-cutting concerns
  • Anything you’d normally bring to a senior engineer
  • Tasks where “almost right” isn’t good enough

Cost: higher. Best model + multiple passes.

Critical — production-grade with human approval

Section titled “Critical — production-grade with human approval”

What it does:

  • Routes to an opus-class model
  • Up to 3 retries
  • Multi-pass quality gate
  • Human approval gate — the job pauses after the AI work is done and waits for you to approve or deny via the saddle
  • Urgent priority

When to use:

  • Production deploys
  • Database migrations
  • Anything irreversible
  • Anything where you want eyes on the result before it lands

Cost: highest. But the human gate means nothing reaches production without you.

Three ways:

  1. Saddle pills — click trivial · quick · standard · deep · critical
  2. KeyboardCtrl+Shift+] to cycle up, Ctrl+Shift+[ to cycle down
  3. Status bar — click the effort tier in the bottom right of VSCode

Your choice persists in settings (modelreins.defaultEffort).

The five modes (auto · cheap · fast · smart · local) are override preferences within the tier:

  • auto — let the router decide (default)
  • cheap — within this tier, prefer the cheapest model
  • fast — within this tier, prefer the fastest worker
  • smart — within this tier, prefer the highest-quality worker
  • local — within this tier, force a local worker

Example: you’re on deep tier (opus-class) but in cheap mode. The router will pick the cheapest opus-class subscription that has capacity, rather than the highest-quality one.

What the router actually does with your tier

Section titled “What the router actually does with your tier”

Internally, every dispatch calls tier_to_dispatch_args(tier, mode) which returns a plan. The plan goes through negotiate_dispatch_plan() which checks:

  • Is the requested model available?
  • Are the subscriptions at capacity?
  • Do you have a worker with the right tags?
  • Should anything be downgraded?

You see the negotiated plan in the dispatch response. If something got downgraded, the response includes a downgrades array explaining what changed and why. Use Preview Plan in the saddle to see what would happen before you actually dispatch.