Kimi K2.5 on VM0. Moonshot's previous generation

The previous Kimi generation. Cheaper than K2.6 but with weaker tool-use; pin it only if a specific agent was validated on this version.

256K tokens · Text / Image / Code · Prompt cache

Use Kimi K2.5 on VM0

Kimi K2.5 is Moonshot's previous flagship, the open-weight model that K2.6 superseded in April 2026. It's still capable — strong on long-context summarisation — but K2.6 leads on every published benchmark at the same vendor price, and the hallucination rate gap is wide (K2.5 ~65% on Moonshot's evaluation versus K2.6's ~39%).

Vendor list price is $0.60 / $3 per 1M tokens, identical to K2.6. The honest pitch: if you built on K2.5 and it works, leave it; if you're starting fresh, start on K2.6.

What is Kimi K2.5?

Late 2025 (Kimi K2 series) · Previous generation of Moonshot's open-weight Kimi K2 series. Superseded by K2.6.

Kimi K2.5 was Moonshot's flagship Kimi model before K2.6. It was the first widely-deployed Kimi to combine long-context reasoning with a Claude-compatible API surface, and it remains a capable model for long-context summarisation work.

On VM0 it sits at the same vendor list price as K2.6 but a lower credit multiplier (×0.2). The lower multiplier reflects positioning rather than raw token cost. K2.6 is the recommended default for new work; K2.5 is the legacy pin.

K2.5 has a vendor-reported SWE-bench Pro score of 50.7 and a hallucination rate of ~65%. Both meaningfully behind K2.6 (58.6 and 39%). Behaviourally it remains stable for pinned production agents.

What's notable about Kimi K2.5

Headline architecture and capability features.

K2.5 is a Mixture-of-Experts model with 1T total parameters and 32B active per token from the same family as K2.6, fronted by a 256K-token context window and an Anthropic-compatible API surface. Open weights are published on Hugging Face.

Specs at a glance

FamilyKimi K2 series

ModalitiesImage, text, code

LanguagesMultilingual

Context window256K tokens

LicenseModified MIT (open weights)

Available on VM0Available since launch

Kimi K2.5 benchmarks

K2.5's benchmarks are now most useful as the comparison baseline for K2.6. The newer model leads on every published metric at the same vendor cost.

SWE-bench Provendor-reported

50.7

BrowseCompvendor-reported

78.4

Hallucination ratedown to 39% in K2.6

~65%

Kimi K2.5 pricing

Provider list price, per 1M tokens.

Input$0.60

Output$3.00

Cache read$0.10

Cache write$0.60

How Kimi K2.5 behaves in practice

Observed behaviour from production agent runs.

Long-context

Strong, similar shape to K2.6 but with K2.6 having the edge on harder recall benchmarks.

Tool use

Solid on common flows; K2.6 is meaningfully better on complex multi-tool agents.

Hallucinations

Vendor-reported hallucination rate of ~65%. Much higher than K2.6's 39%. Expect more confident-but-wrong outputs.

Best agent tasks for Kimi K2.5

The legacy agent that already works

Your team validated an agent against K2.5 a few months ago, the prompts are tuned, the eval suite passes, customers are happy. Pinning to K2.5 keeps that exact behaviour in place while you decide whether the K2.6 upgrade is worth re-running the validation. Same Moonshot endpoint, same Anthropic-compatible interface — only the model weights move when you switch.

The bulk-summarisation job where K2.6's edge doesn't show

Hundred-thousand-token transcripts going in, three-paragraph summaries coming out. Tool-routing accuracy isn't part of the workload, hallucination resistance matters less when a human is going to skim the output anyway, and at the same vendor price as K2.6 you can run K2.5 on these jobs without touching the existing pipeline.

When to skip Kimi K2.5

Don't start new agents on K2.5, since K2.6 is a free upgrade in every meaningful way except the multiplier. Skip it on multi-tool English routing where Sonnet 4.6 leads, and on tasks where hallucination is costly because K2.5's rate is materially worse than K2.6's.

Kimi K2.5 vs other models

Kimi K2.5 vs Kimi K2.6

K2.6 is the newer generation with stronger tool-use, lower hallucination rate (39% vs 65%), and better reasoning. K2.5 (×0.2) is slightly cheaper. Pick K2.5 only for pinned legacy agents.

Kimi K2.5 vs DeepSeek V4 Pro

DeepSeek V4 Pro (×0.3) has stronger reasoning. K2.5 (×0.2) wins on context size and stays within the Moonshot API surface.

Bottom line: should you use Kimi K2.5?

Maintenance mode. Pin if you have an agent already validated on it; otherwise start on K2.6.

Frequently asked questions

Why does K2.5 have a lower multiplier than K2.6 at the same vendor price?

Multipliers reflect VM0's positioning of each model in the lineup, not just per-token cost. K2.6 is the recommended Kimi default at ×0.3; K2.5 is positioned as legacy at ×0.2.

Should I migrate from K2.5 to K2.6?

Yes for new work. Same vendor price, stronger tool-use and reasoning, much lower hallucination rate. Migrate pinned agents only after running them through your regression suite.

What's the hallucination rate?

Vendor-reported ~65%. Meaningfully higher than K2.6 (39%). If your agent reports facts to users, this matters; consider K2.6 instead.

What's K2.5's context window?

256K tokens. Same as K2.6.

Alternatives

Kimi K2.6

Newer Kimi with better tool-use

GLM-5.1

Larger context window if you need it

DeepSeek V4 Pro

Stronger reasoning at slightly higher multiplier

Using Kimi K2.5 on VM0

Two ways to access Kimi K2.5 on VM0

VM0 supports Kimi K2.5 as a Built-in model billed in VM0 credits, and through bring-your-own with a Moonshot API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.

VM0's recommendation

VM0 positions Kimi K2.5 as a cost-saving option rather than a core agent model. Use it to optimise unit cost on non-core work, such as bulk classification, pre-filters, latency-critical short replies, or pinned legacy agents, while keeping Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6 on the steps that decide the run.

Credits and the ×0.2 multiplier

Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. Kimi K2.5 bills at ×0.2 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.

Kimi K2.5 bills at ×0.2, which means a step here costs only 0.2× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). That puts it well below the credit baseline and makes it the natural pick for high-volume background work where cost-per-step matters more than peak reasoning quality.

Available on VM0 since Available since launch.