GPT-5.4 Mini on VM0. The cost-saving GPT-5

OpenAI's cost-optimised member of the GPT-5 family. ×0.3 credits, multimodal vision, and fast enough for high-volume routing, classification and pre-filter work.

400K tokens · Text / Vision / Code · Prompt cache

Use GPT-5.4 Mini on VM0

GPT-5.4 Mini is the cost-saving member of OpenAI's GPT-5 family — the one you reach for when unit cost matters more than peak reasoning quality. It keeps the 400K context window and multimodal inputs of the rest of the family but trims compute per token, which translates to lower price ($0.75 / $4.5 per 1M) and noticeably higher speed.

On VM0 it sits at ×0.3 credits, the same multiplier as Kimi K2.7 Code, which makes it the natural OpenAI-side pick for bulk classification, fan-out routing, pre-filters, and any agent step where dropping to a third of GPT-5.4's cost is the deciding factor.

What is GPT-5.4 Mini?

April 2026 · Cost-saving variant of the GPT-5 family. The OpenAI-side peer of Kimi K2.7 Code.

GPT-5.4 Mini is the cost-optimised member of OpenAI's GPT-5 generation, released in April 2026 alongside GPT-5.5 and GPT-5.4. OpenAI positions it as the high-throughput tier — the model you keep running on classification, routing and pre-filter steps where the bigger 5.4 or 5.5 would be wasted on routine decisions.

Architecturally it shares the GPT-5 family's 400K-token context window, the reasoning_effort parameter, prompt caching, and the Responses API surface that codex CLI uses by default. The trade-off versus 5.4 is reasoning depth: Mini handles standard tool calls, short summaries and structured-output workloads well, but starts to drift on the harder multi-step plans where 5.4 still holds up. The trade-off versus competitors at the same price point is ecosystem — if you're already on Codex, staying inside the OpenAI surface keeps tool definitions and structured-output schemas consistent.

On VM0 Mini sits at the ×0.3 credit multiplier, the same as Kimi K2.7 Code. DeepSeek V4 Pro sits lower at ×0.1, so within the cost-saving tier the choice depends mostly on framework and behaviour fit on your specific workload.

What's notable about GPT-5.4 Mini

Headline architecture and capability features.

GPT-5.4 Mini uses the same architecture as the rest of the GPT-5 family: 400K-token context window, the reasoning_effort parameter at four levels, prompt caching where cached input bills at one-tenth the input rate, and the Responses API surface. Tool-use, structured outputs and multimodal vision inputs are supported. The model is a smaller, faster sibling — fewer parameters per token, more throughput per dollar.

Specs at a glance

FamilyGPT-5 generation

ModalitiesText, vision, code

LanguagesEnglish-first, multilingual

Prompt cachingSupported (OpenAI)

Context window400K tokens

Max outputUp to 128K tokens

Reasoning effortMinimal / Low / Medium / High

Vendor list price$0.75 input / $4.5 output per 1M

GPT-5.4 Mini benchmarks

Vendor-reported scores from OpenAI's GPT-5 Mini release materials. Independent reviews place 5.4 Mini in the same cost-saving band as Kimi K2.7 Code on most agent benchmarks. Treat absolute percentages as directional.

SWE-bench Verifiedvendor-reported

~60%

Terminal-Bench 2.0vendor-reported tool use

~42%

AIME 2025 (no tools)vendor-reported competition math

~84%

GPQA Diamondvendor-reported graduate science

~74%

SpeedArtificial Analysis, medium effort

~165 tokens/sec

GPT-5.4 Mini pricing

Provider list price, per 1M tokens.

Input$0.75

Output$4.50

Cache read$0.07

Cache writeNot billed

How GPT-5.4 Mini behaves in practice

Observed behaviour from production agent runs.

Speed

Fastest model in the GPT-5 family — around 165 tokens/sec at medium effort per Artificial Analysis. This is the property that makes it viable for interactive chat replies and short fan-out tool calls where user-visible latency dominates.

Routine tool calls

Accurate on the standard Codex-framework tool catalogue. Where 5.4 pulls ahead is on hard edge cases (conditional tool selection, deeply nested arguments) — for the routine cases Mini handles tool routing cleanly at a third of the cost.

Bulk classification & pre-filter

Strongest cost/quality position in the GPT-5 family for fan-out work. Bulk PR triage, support-ticket categorisation, document-tier classification — all the workloads where you'd previously have hand-rolled regex are now affordable in a real model call.

Cost efficiency

×0.3 credits with multimodal vision included. Mini and Kimi K2.7 Code sit in the same band, while DeepSeek V4 Pro sits lower at ×0.1 — the choice usually comes down to framework fit and behaviour on your specific workload.

When to escalate

Mini drifts on long multi-step plans, hard reasoning and first-attempt multi-file code edits. Build the agent so the orchestrator decides when to escalate to 5.4 or 5.5, not so Mini tries to carry the whole loop.

Best agent tasks for GPT-5.4 Mini

The fan-out classifier that runs on every event

Inbound support ticket, PR comment, sales-call transcript, document upload — Mini reads each one and routes it to the right downstream agent or human reviewer. ×0.3 credits and 165 tokens/sec mean the per-event cost is small enough that running it on every event (not just sampled batches) is actually viable.

The pre-filter step before the expensive model

Pin Mini at the top of the agent's tool call so it decides whether the request even needs to escalate. Most requests get a fast cheap answer; only the residual minority pays the full GPT-5.4 or 5.5 cost. This is where stacking cost-saving and core tiers genuinely changes what's affordable.

The interactive chat reply

Short multimodal turns where user-visible latency dominates the experience. Mini answers fast enough that streaming feels instant, and the multimodal support means a screenshot in the conversation Just Works.

When to skip GPT-5.4 Mini

Skip GPT-5.4 Mini on the hardest reasoning, multi-step agent orchestration, computer-use sequences and first-attempt multi-file code edits — escalate to 5.4 for routine versions of those tasks and 5.5 for the hardest ones.

GPT-5.4 Mini vs other models

GPT-5.4 Mini vs GPT-5.4

Same family, different positioning. 5.4 Mini (×0.3) wins on cost and speed; 5.4 (×1) wins on reasoning quality and tool-routing accuracy on hard cases. The standard pattern is to pre-filter with Mini and escalate residual cases to 5.4.

GPT-5.4 Mini vs Claude Sonnet 4.6

Claude Sonnet 4.6 is the current catalog comparison target for this model.

GPT-5.4 Mini vs DeepSeek V4 Pro

DeepSeek V4 Pro sits lower on VM0 credits and is the stronger cost-first reasoning choice. Use it when price dominates, and use the current model when its provider fit or tool-routing profile matters more.

Bottom line: should you use GPT-5.4 Mini?

GPT-5.4 Mini is the cost-saving default on the OpenAI side. Pre-filter with Mini, escalate to GPT-5.4 for routine steps, escalate to GPT-5.5 only for the hardest reasoning.

Frequently asked questions

What is GPT-5.4 Mini's context window?

400,000 tokens, with up to 128K tokens of output per response — the same as the rest of the GPT-5 family.

Can GPT-5.4 Mini handle images?

Yes. Like the rest of the GPT-5 family it accepts image inputs alongside text and code.

When should I pick GPT-5.4 Mini over Kimi K2.7 Code?

When your agent is already built on the Codex framework or you need the OpenAI structured-output / tool-call ecosystem. Both sit at ×0.3 credits, so cost is identical and the choice comes down to framework and behaviour.

Does GPT-5.4 Mini support prompt caching?

Yes. Cached input bills at $0.075 per 1M tokens — a 10× discount on the cached portion.

What framework does GPT-5.4 Mini use on VM0?

Codex. VM0 routes all GPT-5 models through the Codex framework's Responses API surface.

Alternatives

GPT-5.4

Step up for harder steps, same family

Using GPT-5.4 Mini on VM0

Two ways to access GPT-5.4 Mini on VM0

VM0 supports GPT-5.4 Mini as a Built-in model billed in VM0 credits, and through bring-your-own with a OpenAI API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.

VM0's recommendation

VM0 positions GPT-5.4 Mini as a cost-saving option rather than a core agent model. Use it to optimise unit cost on non-core work, such as bulk classification, pre-filters, latency-critical short replies, or pinned legacy agents, while keeping Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6 on the steps that decide the run.

Credits and the ×0.3 multiplier

Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. GPT-5.4 Mini bills at ×0.3 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.

GPT-5.4 Mini bills at ×0.3, which means a step here costs only 0.3× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). That puts it well below the credit baseline and makes it the natural pick for high-volume background work where cost-per-step matters more than peak reasoning quality.

Available on VM0 since April 2026.