GPT-5.4 Mini on VM0. The cost-saving GPT-5
OpenAI's cost-optimised member of the GPT-5 family. ×0.3 credits, multimodal vision, and fast enough for high-volume routing, classification and pre-filter work.
400K tokens · Text / Vision / Code · Prompt cache
GPT-5.4 Mini is the cost-saving member of OpenAI's GPT-5 family — the one you reach for when unit cost matters more than peak reasoning quality. It keeps the 400K context window and multimodal inputs of the rest of the family but trims compute per token, which translates to lower price ($0.75 / $4.5 per 1M) and noticeably higher speed.
On VM0 it sits at ×0.3 credits, the same multiplier as Claude Haiku 4.5 and Kimi K2.6, which makes it the natural OpenAI-side pick for bulk classification, fan-out routing, pre-filters, and any agent step where dropping to a third of GPT-5.4's cost is the deciding factor.
What is GPT-5.4 Mini?
April 2026 · Cost-saving variant of the GPT-5 family. The OpenAI-side peer of Claude Haiku 4.5.
GPT-5.4 Mini is the cost-optimised member of OpenAI's GPT-5 generation, released in April 2026 alongside GPT-5.5 and GPT-5.4. OpenAI positions it as the high-throughput tier — the model you keep running on classification, routing and pre-filter steps where the bigger 5.4 or 5.5 would be wasted on routine decisions.
Architecturally it shares the GPT-5 family's 400K-token context window, the reasoning_effort parameter, prompt caching, and the Responses API surface that codex CLI uses by default. The trade-off versus 5.4 is reasoning depth: Mini handles standard tool calls, short summaries and structured-output workloads well, but starts to drift on the harder multi-step plans where 5.4 still holds up. The trade-off versus competitors at the same price point is ecosystem — if you're already on Codex, staying inside the OpenAI surface keeps tool definitions and structured-output schemas consistent.
On VM0 Mini sits at the ×0.3 credit multiplier, the same as Claude Haiku 4.5, Kimi K2.6 and DeepSeek V4 Pro. Within the cost-saving tier the choice depends mostly on framework and behaviour fit on your specific workload.
What's notable about GPT-5.4 Mini
Headline architecture and capability features.
GPT-5.4 Mini uses the same architecture as the rest of the GPT-5 family: 400K-token context window, the reasoning_effort parameter at four levels, prompt caching where cached input bills at one-tenth the input rate, and the Responses API surface. Tool-use, structured outputs and multimodal vision inputs are supported. The model is a smaller, faster sibling — fewer parameters per token, more throughput per dollar.
Specs at a glance
GPT-5.4 Mini benchmarks
Vendor-reported scores from OpenAI's GPT-5 Mini release materials. Independent reviews place 5.4 Mini in the same cost-saving band as Claude Haiku 4.5 on most agent benchmarks. Treat absolute percentages as directional.
GPT-5.4 Mini pricing
Provider list price, per 1M tokens.
How GPT-5.4 Mini behaves in practice
Observed behaviour from production agent runs.
Speed
Fastest model in the GPT-5 family — around 165 tokens/sec at medium effort per Artificial Analysis. This is the property that makes it viable for interactive chat replies and short fan-out tool calls where user-visible latency dominates.
Routine tool calls
Accurate on the standard Codex-framework tool catalogue. Where 5.4 pulls ahead is on hard edge cases (conditional tool selection, deeply nested arguments) — for the routine cases Mini handles tool routing cleanly at a third of the cost.
Bulk classification & pre-filter
Strongest cost/quality position in the GPT-5 family for fan-out work. Bulk PR triage, support-ticket categorisation, document-tier classification — all the workloads where you'd previously have hand-rolled regex are now affordable in a real model call.
Cost efficiency
×0.3 credits with multimodal vision included. At this price point Mini, Claude Haiku 4.5 and Kimi K2.6 all sit in the same band — the choice usually comes down to framework fit and behaviour on your specific workload.
When to escalate
Mini drifts on long multi-step plans, hard reasoning and first-attempt multi-file code edits. Build the agent so the orchestrator decides when to escalate to 5.4 or 5.5, not so Mini tries to carry the whole loop.
Best agent tasks for GPT-5.4 Mini
The fan-out classifier that runs on every event
Inbound support ticket, PR comment, sales-call transcript, document upload — Mini reads each one and routes it to the right downstream agent or human reviewer. ×0.3 credits and 165 tokens/sec mean the per-event cost is small enough that running it on every event (not just sampled batches) is actually viable.
The pre-filter step before the expensive model
Pin Mini at the top of the agent's tool call so it decides whether the request even needs to escalate. Most requests get a fast cheap answer; only the residual minority pays the full GPT-5.4 or 5.5 cost. This is where stacking cost-saving and core tiers genuinely changes what's affordable.
The interactive chat reply
Short multimodal turns where user-visible latency dominates the experience. Mini answers fast enough that streaming feels instant, and the multimodal support means a screenshot in the conversation Just Works.
When to skip GPT-5.4 Mini
Skip GPT-5.4 Mini on the hardest reasoning, multi-step agent orchestration, computer-use sequences and first-attempt multi-file code edits — escalate to 5.4 for routine versions of those tasks and 5.5 for the hardest ones.
GPT-5.4 Mini vs other models
GPT-5.4 Mini vs GPT-5.4
Same family, different positioning. 5.4 Mini (×0.3) wins on cost and speed; 5.4 (×1) wins on reasoning quality and tool-routing accuracy on hard cases. The standard pattern is to pre-filter with Mini and escalate residual cases to 5.4.
GPT-5.4 Mini vs Claude Haiku 4.5
Same multiplier (×0.3). Mini runs on the Codex framework; Haiku 4.5 runs on Claude Code. Both are multimodal and both target the same cost-saving slot. Pick by which framework your existing agents and tool definitions target.
GPT-5.4 Mini vs DeepSeek V4 Flash
DeepSeek V4 Flash (×0.02) is dramatically cheaper at the vendor level and is the right choice for pure bulk single-shot work. GPT-5.4 Mini (×0.3) carries more reasoning quality and stays inside the OpenAI ecosystem, which matters when your tool definitions and structured-output schemas are already tuned for Codex.
Bottom line: should you use GPT-5.4 Mini?
GPT-5.4 Mini is the cost-saving default on the OpenAI side. Pre-filter with Mini, escalate to GPT-5.4 for routine steps, escalate to GPT-5.5 only for the hardest reasoning.
Frequently asked questions
What is GPT-5.4 Mini's context window?
400,000 tokens, with up to 128K tokens of output per response — the same as the rest of the GPT-5 family.
Can GPT-5.4 Mini handle images?
Yes. Like the rest of the GPT-5 family it accepts image inputs alongside text and code.
When should I pick GPT-5.4 Mini over Claude Haiku 4.5?
When your agent is already built on the Codex framework or you need the OpenAI structured-output / tool-call ecosystem. Both sit at ×0.3 credits, so cost is identical and the choice comes down to framework and behaviour.
Does GPT-5.4 Mini support prompt caching?
Yes. Cached input bills at $0.075 per 1M tokens — a 10× discount on the cached portion.
What framework does GPT-5.4 Mini use on VM0?
Codex. VM0 routes all GPT-5 models through the Codex framework's Responses API surface.
Alternatives
Using GPT-5.4 Mini on VM0
Two ways to access GPT-5.4 Mini on VM0
VM0 supports GPT-5.4 Mini as a Built-in model billed in VM0 credits, and through bring-your-own with a OpenAI API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.
VM0's recommendation
VM0 positions GPT-5.4 Mini as a cost-saving option rather than a core agent model. Use it to optimise unit cost on non-core work, such as bulk classification, pre-filters, latency-critical short replies, or pinned legacy agents, while keeping Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6 on the steps that decide the run.
Credits and the ×0.3 multiplier
Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. GPT-5.4 Mini bills at ×0.3 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.
GPT-5.4 Mini bills at ×0.3, which means a step here costs only 0.3× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). That puts it well below the credit baseline and makes it the natural pick for high-volume background work where cost-per-step matters more than peak reasoning quality.
Available on VM0 since April 2026.