Claude Opus 4.8

Anthropic's newest flagship. Released May 28, 2026 with stronger agentic coding, dynamic workflows that fan out hundreds of parallel subagents, and a 3× cheaper fast mode at the same regular price as Opus 4.7.

1M tokens · Text / Vision / Code · Prompt cache

Use Claude Opus 4.8 on VM0

Claude Opus 4.8 is Anthropic's flagship release on May 28, 2026, a direct upgrade to Opus 4.7 at the same $5/$25 vendor list price. It posts the highest SWE-bench Pro (69.2%), OSWorld-Verified (83.4%), MCP-Atlas (82.2%), and Humanity's Last Exam (57.9% with tools) scores Anthropic has ever shipped, and is the first model to break 10% on the legal-agent all-pass standard.

The two structural changes worth knowing about are dynamic workflows (plan a job, fan it out across hundreds of parallel subagents in a single session) and a fast-mode pricing cut to 2.5× speed at $10/$50 per 1M tokens — three times cheaper than fast mode on prior Claude models. Effort levels expand to high (default), extra, and max. Anthropic itself frames the release as a "modest but tangible improvement" rather than a leap.

What is Claude Opus 4.8?

May 28, 2026 · Top-tier of the Claude 4 family. Anthropic's recommended default for new agents; ships at the same ×2 multiplier as Opus 4.7.

Claude Opus 4.8 was released on May 28, 2026 as Anthropic's new flagship, 41 days after Opus 4.7. It targets the same coding, agentic-skills, reasoning, and knowledge-work workloads as 4.7, at the same regular list price ($5 input / $25 output per 1M tokens) and the same VM0 multiplier (×2). Anthropic positions the release as a "modest but tangible improvement on its predecessor" rather than a step-change.

Two structural changes matter for VM0 users. First, dynamic workflows: the model can plan a task and fan it out across hundreds of parallel subagents in a single session, which Anthropic describes as a step toward handling codebase-scale migrations across hundreds of thousands of lines of code in one run. Second, fast mode at 2.5× speed is now $10 / $50 per 1M tokens — three times cheaper than fast mode on prior Claude models. Effort levels expand to three tiers: high (default), extra (xhigh in Claude Code), and max.

Independent reads (LLM Stats, VentureBeat, Vellum) corroborate the relative ordering against 4.7 and competitors: 4.8 wins on every cell of Anthropic's published comparison set except Terminal-Bench 2.1, where GPT-5.5 still leads (78.2% vs 4.8's 74.6%). The 4.7-to-4.8 jump on SWE-bench Pro is +4.9 points; on USAMO 2026 it is +27.4; on the new 1M-token GraphWalks long-context F1 it is +27.8. Treat absolute scores as directional — SWE-bench Verified is approaching saturation across all frontier models.

What's notable about Claude Opus 4.8

Headline architecture and capability features.

Opus 4.8 keeps the 1M-token context window and 128K max output from Opus 4.7, billed at standard input pricing across the entire window. Effort control expands to three levels: high (the new default), extra (xhigh inside Claude Code), and max. The Messages API now accepts system entries mid-conversation without breaking prompt caching. Dynamic workflows let Claude plan and dispatch hundreds of parallel subagents in a single session. Fast mode runs at ~2.5× standard speed for $10 / $50 per 1M tokens. Multimodal inputs across text, vision, and code are unchanged.

Specs at a glance

FamilyClaude 4 generation

ModalitiesText, vision, code

LanguagesEnglish-first, multilingual

Prompt cachingSupported (Anthropic)

Context window1M tokens

Max outputUp to 128K tokens

Effort levelsHigh (default) / Extra / Max

Vendor list price$5 input / $25 output per 1M (fast mode $10/$50, 2.5× speed)

Claude Opus 4.8 benchmarks

Vendor-reported scores from Anthropic's Opus 4.8 system card, with comparisons against Opus 4.7, GPT-5.5, and Gemini 3.1 Pro at max effort and 5-trial averages. 4.8 leads in six of seven cells Anthropic publishes; GPT-5.5 retains the lead on Terminal-Bench 2.1. SWE-bench Verified is approaching saturation across all frontier models — the harder SWE-bench Pro set is the more durable signal.

SWE-bench Verifiedvendor-reported; up from Opus 4.7's 87.6%

88.6%

SWE-bench Proleads the field (4.7: 64.3%, GPT-5.5: 58.6%, Gemini 3.1 Pro: 54.2%)

69.2%

Terminal-Bench 2.1up from 4.7's 66.1% on 2.0; GPT-5.5 leads here at 78.2%

74.6%

OSWorld-Verified (computer use)leads the field (4.7: 82.8%, GPT-5.5: 78.7%)

83.4%

Online-Mind2Web (browser agent)vendor-reported

84%

MCP-Atlasup from Opus 4.7's 77.3%

82.2%

BrowseComp (single-agent)up from Opus 4.7's 79.3%

84.3%

GraphWalks long-context F1 (1M tokens)up from Opus 4.7's 40.3%

68.1%

Humanity's Last Exam (with tools)49.8% without tools; leads the field

57.9%

GPQA Diamondflat vs 4.7 — saturated across frontier models

~93%

USAMO 2026 (math)up from Opus 4.7's 69.3%

96.7%

GDPval-AA (knowledge work)leads (4.7: 1753, GPT-5.5: 1769)

1890 Elo

Finance Agent v2leads the field

53.9%

Legal-agent all-passfirst model to break this standard

>10%

Claude Opus 4.8 pricing

Provider list price, per 1M tokens.

Input$5.00

Output$25.00

Cache read$0.50

Cache write$6.25

How Claude Opus 4.8 behaves in practice

Observed behaviour from production agent runs.

Dynamic workflows

The headline new capability. Opus 4.8 can plan a task and then run hundreds of parallel subagents within the same session — Anthropic positions this as the path to codebase-scale migrations across hundreds of thousands of lines in one run. On VM0, this means a single agent run can orchestrate fan-out work that previously required external scheduling.

First-attempt code edits

Anthropic reports Opus 4.8 is around four times less likely than 4.7 to overlook flaws when reviewing code, and the +4.9 point SWE-bench Pro jump (69.2% vs 64.3%) backs that up on the harder, less-saturated coding set. Pick 4.8 for patches that have to apply cleanly across many files.

Long-context recall

GraphWalks F1 at 1M tokens jumps from 40.3% to 68.1% — the biggest single-benchmark gain in the release. The 1M-token window is now actually usable at the high end of its range, not just nominal.

Honesty and overconfidence

Anthropic reports a more than ten-fold reduction in overconfidence versus 4.7, 0% on uncritically reporting flawed results (a first for the Claude family), and a 3.7% rate of failing to raise important events to the user. Misalignment incidence is ~1.9, effectively tied with Anthropic's best-aligned Mythos Preview.

Speed and fast mode

Standard speed is comparable to Opus 4.7. The pricing change is the headline: fast mode at 2.5× speed costs $10 / $50 per 1M tokens, three times cheaper than fast mode on prior Claude models. Worth using for orchestration steps where wall-clock latency matters.

Prompt-injection caveat

Anthropic's system card notes 4.8 is somewhat less robust to agentic prompt injection than 4.7 — Gray Swan red-teaming shows a ~9.6% attack-success rate versus 6.0% on 4.7. Teams running 4.8 in pipelines that handle untrusted input should review their sandboxing approach.

Best agent tasks for Claude Opus 4.8

The codebase-scale migration that used to need a sprint

Hand Opus 4.8 a migration that touches a few hundred files — ORM swap, framework version bump, security fix across a monorepo — and let dynamic workflows fan the work out to parallel subagents within one session. The +4.9 point SWE-bench Pro jump and the four-fold reduction in missed flaws on code review are what cash out on this kind of run.

The 1M-token research run that actually holds together

Drop a 200-page contract draft, three competitor proposals, and last quarter's legal opinions into the window, then ask Opus 4.8 to flag every clause that's tighter than market. GraphWalks at 1M jumping from 40.3% to 68.1% is what makes this kind of cross-document synthesis newly reliable.

The agent orchestrator that doesn't lie about its work

Use 4.8 as the planner that breaks a request into ten steps, dispatches each to cheaper sub-agents, and reports the result. The 0% rate on uncritically reporting flawed results, combined with the ten-fold drop in overconfidence, is the reason production teams reach for 4.8 when the agent's own self-report has to be trustworthy.

The latency-sensitive flow that finally pencils out on fast mode

Fast mode at 2.5× speed used to cost three times what it does now ($10/$50 per 1M vs the prior tier). For interactive copilots, on-call summarisers, or any step where wall-clock latency dominates the experience, fast-mode 4.8 is now the default choice in the Claude family.

When to skip Claude Opus 4.8

Skip Opus 4.8 on high-volume routine work where Sonnet 4.6 hits the same quality bar at a fraction of the cost, on latency-critical chat replies where Kimi K2.7 Code is much faster, on agentic terminal coding where GPT-5.5 still leads Terminal-Bench 2.1 (78.2% vs 4.8's 74.6%), and on pipelines that ingest untrusted input without sandboxing — 4.8's prompt-injection robustness is slightly weaker than 4.7's.

Claude Opus 4.8 vs other models

Claude Opus 4.8 vs Claude Opus 4.7

Same ×2 multiplier, same context window, same regular price. Opus 4.8 leads on every cell Anthropic publishes (SWE-bench Verified +1, SWE-bench Pro +4.9, OSWorld-Verified +0.6, MCP-Atlas +4.9, BrowseComp +5.0, GraphWalks 1M +27.8, USAMO +27.4). The trade-off is a slightly weaker prompt-injection profile (~9.6% attack-success rate vs 6.0%). Migrate new agents to 4.8; pin 4.7 only if you've validated against it and don't want to re-run regressions.

Claude Opus 4.8 vs Claude Sonnet 4.6

Sonnet 4.6 (×1) is still the workhorse default for most agent loops. Promote to Opus 4.8 when Sonnet visibly fails on hard reasoning, long-context recall, or first-attempt code edits — usually as the planner that delegates to Sonnet- or cost-saving sub-agents. With dynamic workflows, Opus 4.8 as orchestrator + Sonnet 4.6 as workers is the new recommended pattern.

Claude Opus 4.8 vs GPT-5.5

Opus 4.8 leads on six of seven cells in Anthropic's comparison set, with the biggest gaps on SWE-bench Pro (69.2% vs 58.6%) and OSWorld-Verified (83.4% vs 78.7%). GPT-5.5 keeps the lead on Terminal-Bench 2.1 (78.2% vs 74.6%). Pick 4.8 for cross-file coding and computer-use agents; pick GPT-5.5 specifically when terminal-driven work dominates.

Claude Opus 4.8 vs Gemini 3.1 Pro

Opus 4.8 leads by wide margins on SWE-bench Pro (+15.0) and OSWorld-Verified (+7.2). The two models stay within noise on saturated science benchmarks like GPQA Diamond. Default to 4.8 for agentic work; consider Gemini specifically when you need Google's tool integration story.

Claude Opus 4.8 vs DeepSeek V4 Pro

DeepSeek V4 Pro (×0.1) remains the cost-optimised pick when raw token price dominates the decision. Opus 4.8 retains the lead on tool-routing reliability, long-context recall, alignment metrics, and computer-use, which is why most enterprise English-language agents still default to 4.8 despite the price gap.

Bottom line: should you use Claude Opus 4.8?

The new default for new agents in the Claude family. Migrate from 4.7 when you can re-validate; default to it directly for fresh work. Keep Sonnet 4.6 as the cheaper workhorse beneath it.

Frequently asked questions

When was Claude Opus 4.8 released?

Anthropic released Opus 4.8 on May 28, 2026, 41 days after Opus 4.7. It is available today across Claude products, the Claude API (model id claude-opus-4-8), Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and VM0.

How does Opus 4.8 pricing compare to 4.7?

Regular pricing is identical: $5 per 1M input tokens, $25 per 1M output tokens, $0.50 per 1M cached input. The change is fast mode, now $10 / $50 per 1M tokens at 2.5× speed — three times cheaper than fast mode on prior Claude models.

What are dynamic workflows?

A new capability that lets Opus 4.8 plan a task and then run hundreds of parallel subagents within a single session. Anthropic positions this as the path to codebase-scale migrations across hundreds of thousands of lines of code in one agent run.

What effort levels does Opus 4.8 support?

Three levels: high (the new default), extra (xhigh in Claude Code), and max. Higher settings spend more tokens on reasoning before producing a response; lower settings favour speed and rate-limit efficiency.

Should I migrate from Opus 4.7 to 4.8?

Yes for new work — same multiplier, same regular price, stronger behaviour across every published comparison cell except Terminal-Bench 2.1. Migrate pinned production agents only after running them through your regression suite, and review your sandboxing if the agent ingests untrusted input (4.8 is slightly less robust to prompt injection than 4.7).

Does Opus 4.8 support prompt caching?

Yes. Cached input bills at $0.50 per 1M tokens, a 10× discount on the cached portion. The Messages API now also accepts system entries mid-conversation without breaking the cache.

Alternatives

Claude Opus 4.7

Previous flagship; slightly more robust to prompt injection

Claude Sonnet 4.6

Cheaper default for most agent loops

GPT-5.5

Leads Terminal-Bench 2.1 for agentic terminal coding

Using Claude Opus 4.8 on VM0

Two ways to access Claude Opus 4.8 on VM0

VM0 supports Claude Opus 4.8 as a Built-in model billed in VM0 credits, and through bring-your-own with a Anthropic API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.

VM0's recommendation

VM0 positions Claude Opus 4.8 as a core agent model, recommended alongside Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6 for the steps that drive the actual outcome of an agent run. These are the models we'd pick for the orchestrator role, for code-touching agents, and for any step where a wrong answer is expensive.

Credits and the ×2 multiplier

Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. Claude Opus 4.8 bills at ×2 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.

Claude Opus 4.8 bills at ×2, which means a step here costs 2× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). It's a premium tier on VM0, so the cost-effective pattern is to default to a cheaper model and route only the steps that genuinely need the extra reasoning depth to Claude Opus 4.8.

Available on VM0 since May 28, 2026.