GPT-5.4 on VM0. The OpenAI workhorse

OpenAI's workhorse of the GPT-5 family. Sits at the ×1 credit baseline alongside Claude Sonnet 4.6 and is the right default for most Codex-framework agents.

400K tokens · Text / Vision / Code · Prompt cache

Use GPT-5.4 on VM0

GPT-5.4 is the workhorse of OpenAI's GPT-5 family — the model you keep running everywhere by default. Vendor-reported SWE-bench Verified at 74.9% places it in the same range as Claude Sonnet 4.6 on coding, and its tool-use accuracy is what most production Codex-framework agents are tuned against.

Vendor list price is $2.5 / $15 per 1M tokens with cached input at $0.25 / 1M. It sits at ×1 credits on VM0 Managed — the same baseline as Claude Sonnet 4.6 — which makes it the natural pick when your agent is already on the Codex framework and you want a balanced cost/quality default.

What is GPT-5.4?

April 2026 · Workhorse of the GPT-5 family. The recommended default for most Codex-framework agents.

GPT-5.4 is the workhorse of OpenAI's GPT-5 generation, released in April 2026 alongside the flagship GPT-5.5 and the cost-optimised GPT-5.4 Mini. OpenAI positions it as the everywhere-default for agents on the Codex framework — the model you keep running on every step unless a specific step justifies escalation to 5.5.

Architecturally GPT-5.4 shares the 400K-token context window, the reasoning_effort parameter, prompt caching and the Responses API surface with the rest of the GPT-5 family. The split versus GPT-5.5 is compute investment per token: 5.4 runs faster and cheaper, 5.5 invests more in reasoning depth. The split versus GPT-5.4 Mini is the opposite — 5.4 carries more quality for the steps that actually decide the agent run.

On VM0 it sits at the ×1 credit multiplier, the same baseline as Claude Sonnet 4.6, which makes side-by-side cost comparisons between Anthropic and OpenAI defaults trivial. The choice between the two usually comes down to framework (Codex vs Claude Code), ecosystem (existing integrations, tool definitions) and which model your team has more behavioural muscle memory for.

What's notable about GPT-5.4

Headline architecture and capability features.

GPT-5.4 uses the same architecture as the rest of the GPT-5 family: 400K-token context window, reasoning_effort parameter at four levels (minimal, low, medium, high), prompt caching where cached input bills at one-tenth the input rate, and the Responses API surface that codex CLI uses by default. Tool-use, structured outputs and computer-use are supported. Inputs are multimodal across text, vision and code.

Specs at a glance

FamilyGPT-5 generation

ModalitiesText, vision, code

LanguagesEnglish-first, multilingual

Prompt cachingSupported (OpenAI)

Context window400K tokens

Max outputUp to 128K tokens

Reasoning effortMinimal / Low / Medium / High

Vendor list price$2.5 input / $15 output per 1M

GPT-5.4 benchmarks

Vendor-reported scores from OpenAI's GPT-5 release materials, with deltas shown against the previous OpenAI generation. Independent reviews place GPT-5.4 in the same coding-quality band as Claude Sonnet 4.6. Treat absolute percentages as directional.

SWE-bench Verifiedvendor-reported

74.9%

Terminal-Bench 2.0vendor-reported tool use

~58%

AIME 2025 (no tools)vendor-reported competition math

~92%

GPQA Diamondvendor-reported graduate science

~85%

OSWorld (computer use)vendor-reported

~62%

SpeedArtificial Analysis, medium effort

~110 tokens/sec

GPT-5.4 pricing

Provider list price, per 1M tokens.

Input$2.50

Output$15.00

Cache read$0.25

Cache writeNot billed

How GPT-5.4 behaves in practice

Observed behaviour from production agent runs.

Tool routing

Solid baseline accuracy across the standard Codex-framework tool catalogue. Where 5.5 pulls ahead is on hard edge cases (conditional tool selection, deeply nested arguments) — for the routine cases 5.4 routes correctly with significantly lower latency.

Code edits

Comparable patch quality to Claude Sonnet 4.6 on standard refactor and bug-fix workloads. Where 5.5 starts to pull ahead is on multi-file changes where the patch has to apply cleanly on the first try.

Speed

Materially faster than 5.5 — around 110 tokens/sec at medium effort per Artificial Analysis. This is part of why 5.4 stays the default for interactive chat replies and short agent loops where user-visible latency matters.

Cost efficiency

×1 credits with output behaviour in the Sonnet 4.6 quality band. For teams already on the Codex framework, this is the cost/quality sweet spot — promote to 5.5 only on steps that visibly need it.

Hallucination behaviour

Inherits the calibration improvements OpenAI shipped with the GPT-5 generation. Less prone to confident wrong answers than the GPT-4 series, especially on questions outside its training horizon.

Best agent tasks for GPT-5.4

The default agent step on the Codex framework

If your agent is already built on codex CLI or any Codex-framework integration, GPT-5.4 is the natural everywhere-default. ×1 credits, fast enough for interactive use, accurate enough for the routine tool calls that dominate most agent runs.

The interactive chat with vision

Screenshot-driven UIs, document Q&A, image annotation — GPT-5.4 handles all three multimodally at workhorse speed. The ×1 multiplier keeps the per-turn cost in the same band as Sonnet 4.6, so you can A/B the two against each other on the same workload.

The cost/quality A/B against Claude Sonnet 4.6

Both models sit at ×1 credits on VM0 Managed, which makes them directly comparable on cost. Run the same agent on both for a week and pick by behaviour on your specific workload — neither is universally better, and the right default depends on your tool catalogue and prompt style.

When to skip GPT-5.4

Skip GPT-5.4 on the hardest reasoning, computer-use or multi-file code-edit steps where 5.5 noticeably leads, and on high-volume bulk classification or pre-filter work where 5.4 Mini is four times cheaper at the vendor level.

GPT-5.4 vs other models

GPT-5.4 vs GPT-5.5

Same family, different positioning. 5.5 (×2) gives you the strongest reasoning, computer-use and first-attempt code quality; 5.4 (×1) gives you the same context window and feature set at half the credit cost and noticeably higher speed. Default to 5.4; escalate to 5.5 only on the steps that visibly need it.

GPT-5.4 vs Claude Sonnet 4.6

The two ×1 baselines, one in each ecosystem. Sonnet 4.6 runs on the Claude Code framework; GPT-5.4 runs on Codex. Pick by which framework your existing agents and tool definitions target. On raw output quality they're close enough that A/B-testing on your workload is the right call.

GPT-5.4 vs GPT-5.4 Mini

Same family, different positioning. 5.4 (×1) carries more reasoning quality per token; 5.4 Mini (×0.3) gives you a much cheaper option for bulk and pre-filter work. Use 5.4 Mini for fan-out classification and 5.4 for the steps that decide the agent run.

Bottom line: should you use GPT-5.4?

GPT-5.4 is the everywhere-default for Codex-framework agents on VM0. Escalate to 5.5 for hard reasoning, drop to 5.4 Mini for bulk pre-filtering.

Frequently asked questions

What is GPT-5.4's context window?

400,000 tokens, with up to 128K tokens of output per response. The full window bills at standard rates.

Can GPT-5.4 handle images?

Yes. GPT-5.4 is multimodal. It accepts image inputs alongside text and code natively.

When should I pick GPT-5.4 over Claude Sonnet 4.6?

When your agent is already built on the Codex framework or you need the OpenAI ecosystem (tool catalogue, structured outputs, Responses API). Both sit at ×1 credits, so cost is identical and the choice comes down to framework and behaviour fit.

Does GPT-5.4 support prompt caching?

Yes. Cached input bills at $0.25 per 1M tokens — a 10× discount on the cached portion.

What framework does GPT-5.4 use on VM0?

Codex. VM0 routes all GPT-5 models through the Codex framework's Responses API surface.

Alternatives

GPT-5.5

Escalation tier for the hardest steps

GPT-5.4 Mini

Cheaper option for bulk work

Claude Sonnet 4.6

×1 peer on the Claude Code framework

Using GPT-5.4 on VM0

Two ways to access GPT-5.4 on VM0

VM0 supports GPT-5.4 as a Built-in model billed in VM0 credits, and through bring-your-own with a OpenAI API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.

VM0's recommendation

VM0 positions GPT-5.4 as a core agent model, recommended alongside Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6 for the steps that drive the actual outcome of an agent run. These are the models we'd pick for the orchestrator role, for code-touching agents, and for any step where a wrong answer is expensive.

Credits and the ×1 multiplier

Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. GPT-5.4 bills at ×1 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.

GPT-5.4 sits at the ×1 baseline that every other Built-in model is priced against, so it's the unit you compare costs in when picking between models on VM0.

Available on VM0 since April 2026.