Anthropic is basically saying, “You can get Opus-ish performance for a lot of day-to-day work, without paying Opus-ish money.” Big claim. They’re calling Claude Sonnet 4.6 a “full upgrade” across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It also ships with a 1M token context window, while keeping Sonnet pricing unchanged at $3 / $15 per million input/output tokens. Same pricing as Sonnet 4.5.
Source: [Anthropic announcement]
So what is Claude Sonnet 4.6, really?
Claude Sonnet 4.6 is Anthropic’s most capable Sonnet model so far, aimed at stronger coding and agent-style workflows without nudging you into Opus pricing.
Stuff you’ll want to know up front:
- Anthropic says it’s the default model for Free + Pro users in claude.ai and Claude Cowork.
- Pricing stays at $3/$15 per million tokens for input/output, same as Sonnet 4.5.
- Context goes up to 1M tokens in beta. Simon Willison notes both Opus and Sonnet often default to 200k max input, but can stretch higher in beta, usually at higher cost.
Source: [Simon Willison] - Simon also reports a “reliable knowledge cutoff” of August 2025 for Sonnet 4.6.
Why developers should care
The headline isn’t “new model shipped.” The real headline is, “More of the everyday dev grind might fit comfortably on the cheaper tier, and not feel like a compromise.”
Anthropic explicitly says work you used to “reach for an Opus-class model” for is now often doable on Sonnet 4.6. They point to the unglamorous time-sinks. Office docs, forms, spreadsheets, small-but-irritating codebase changes, and those multi-step workflows that hop across tools.
Source: [Anthropic announcement]
And honestly? That “economically boring” stuff is most of real life on a team.
Also, if you’re already paying for a stack of AI subscriptions, this is the kind of release that makes you re-open the spreadsheet and ask, “Do we still need all of these?” I wrote about that tradeoff here: [open-source LLMs vs $200 AI plans]
Coding upgrades: what actually got better
Anthropic is really leaning into coding improvements this time. The theme is pretty clear: less duplicated code, better instruction-following, fewer “yep, done” moments where the model sounds confident but didn’t actually finish the job.
Two numbers from their evals stand out:
- In Claude Code testing, users preferred Sonnet 4.6 over Sonnet 4.5 about 70% of the time.
- Users preferred Sonnet 4.6 over Opus 4.5 59% of the time, and the reasons are telling. Less overengineering, less “laziness,” fewer hallucinations, more consistent follow-through.
Source: [Anthropic announcement]
A workflow where you’ll feel it fast
If you want to stress-test whether a model is genuinely “coding-good,” don’t give it a cute little function. Give it a repo-wide refactor where duplication is the enemy and context matters.
Try prompts like:
- “Find all places we validate JWTs and consolidate into one module.”
- “Replace ad-hoc retry loops with a shared
backoff()helper.” - “Update this API client to support pagination across the whole codebase.”
Those tasks punish shallow pattern-matching. Sonnet 4.6 is positioned as better at reading the situation first, then editing. That’s the difference between a clean refactor and a pile of spaghetti you’re stuck untangling later.
Computer use
Anthropic also highlights “computer use” improvements. This is the mouse-and-keyboard style interaction with software, not custom APIs and clean little tool endpoints.
They point to OSWorld as a standard benchmark here. It’s tasks across real apps like Chrome, LibreOffice, and VS Code, running in a simulated environment with no special connectors.
Source: [Anthropic announcement]
Here’s the refreshingly honest part: Anthropic says the model still trails skilled humans. But early users are reporting “human-level capability” for things like navigating complex spreadsheets, filling multi-step web forms, and coordinating work across multiple browser tabs.
Prompt injection is the price tag on “computer use”
If your model can browse and click, it can also get manipulated by hostile content. Anthropic explicitly calls out prompt injection attacks where hidden instructions live on websites, and says Sonnet 4.6 is a major improvement vs Sonnet 4.5, performing similarly to Opus 4.6 in their safety evaluations.
Source: [Anthropic announcement]
If you’re building with tool use, treat the docs as required reading, not vibes-based optional homework: [Anthropic API documentation]
Long context and agent planning: 1M tokens (beta)
Sure, 1M tokens (beta) is the flashy headline spec. But the more interesting claim is that it can still reason effectively across that much context.
Source: Anthropic announcement
And that’s what makes “agent planning” feel less like a demo and more like something you can run on a Tuesday:
- Load a big codebase or a mountain of docs
- Keep state across longer tasks
- Make multi-step plans without forgetting step 2 halfway through
Anthropic also mentions a simulated business evaluation called “Vending-Bench Arena,” where Sonnet 4.6 used a longer-horizon strategy: invest heavily early, then pivot to profitability later, and beat competitors by timing the pivot well.
No, doesn’t mean it “runs your company.” But it does hint it’s getting better at not faceplanting mid-task when the work stretches out.
How to use Sonnet 4.6 today (API + CLI)
Anthropic says Sonnet 4.And is the default model for Free and Pro users on claude.ai, and pricing stays at Sonnet rates.
Source: Anthropic announcement
Call Claude Sonnet 4.6 via the Anthropic API (Python)
from anthropic import Anthropic
client = Anthropic()
# expects ANTHROPIC_API_KEY in env
msg = client.messages.create(
model="claude-sonnet-4.6",
max_tokens=800,
messages=[
{"role": "user", "content": "Refactor this Python module to remove duplicated parsing logic. Keep tests passing."}
],
)
print(msg.content[0].text)Use it from the llm CLI
Simon Willison shared an example using llm-anthropic with the model name explicitly set.
Source: Simon Willison
uvx --with llm-anthropic llm \
'Generate an SVG of a pelican riding a bicycle' \
-m claude-sonnet-4.6Best practices that keep it useful
A few habits help Sonnet 4.Plus shine, and also help you avoid blaming the model for… well, your own messy prompt.
- Give it repo structure first. Even a quick
tree -L 3changes the quality of the plan. - Ask for a plan, then ask for a patch. Two-step prompting cuts down “confident chaos.”
- Put guardrails around computer use. Domain allowlists, confirm-before-submit for forms, and logging.
- Say “don’t duplicate logic” out loud. It feels obvious. Models still miss it.
Common mistakes people will still make
- Treating 1M tokens like a personality trait. Bigger context isn’t magic. Feed it junk, get more junk.
- Letting tool use run wild. Computer use plus web access without constraints is how prompt injection becomes your problem.
- Skipping evaluation. Anthropic shares strong numbers, like +15 points in Box’s heavy reasoning Q&A vs Sonnet 4.5, and 94% on an internal insurance benchmark for computer use. Your workload is still the only benchmark that really matters.
Source: Anthropic announcement
Should we care?
Yeah. Probably.
Claude Sonnet 4.6 brings upgrades where devs actually feel it: more consistent coding, long-context reasoning up to 1M tokens in beta, and more capable computer use. And it does it while staying at Sonnet pricing and becoming the default in claude.ai for many users.
If you try it this week, pick one real task you normally reserve for a pricier model or a senior dev’s time, run it on Sonnet 4.6, then compare outputs side-by-side. If you notice a pattern, good or bad, drop a comment. Real notes beat benchmark chest-thumping every time.