Kimi K2.6 on VM0. Long-context agents

Moonshot's latest open-weight model. Best-in-class agentic benchmarks at the open-source frontier and a Claude-compatible interface.

256K tokens · Text / Vision / Code · Prompt cache

Use Kimi K2.6 on VM0

Kimi K2.6 is Moonshot's open-weight flagship and currently the strongest open-source agentic model on several public benchmarks. It sustains very long runs without losing the thread (Moonshot has documented unattended sessions of 12+ hours and 4,000+ tool calls) and accepts image and video input natively. Vendor-reported SWE-bench Pro hits 58.6 (above Claude Opus 4.6 and GPT-5.4 on that benchmark), and the hallucination rate dropped from K2.5's ~65% to ~39%.

Vendor list price is $0.60 / $3 per 1M tokens, open weights ship under a Modified MIT license, and the API is Anthropic-compatible. Reach for Sonnet 4.6 when production tool-routing reliability matters more than benchmark scores, and for Haiku when latency dominates.

What is Kimi K2.6?

April 20, 2026 · Top of Moonshot's open-weight Kimi K2 series. Successor to K2.5 and K2 Thinking.

Kimi K2.6 is Moonshot AI's open-weight agentic model released April 20, 2026. It's a 1-trillion-parameter Mixture-of-Experts (MoE) model with 32B active parameters per token. The same architecture family as K2.5 and K2 Thinking, with substantial gains on agentic coding and long-horizon reasoning.

K2.6 made a real splash on independent leaderboards. Vendor-reported scores put it ahead of GPT-5.4 (xhigh) and Claude Opus 4.6 (max effort) on SWE-bench Pro, with a hallucination rate of 39% (down from K2.5's 65%). Artificial Analysis ranks it #4 on its Intelligence Index. The leading open-weight option.

On VM0 it's exposed via the Moonshot API key as the default model, through VM0 Managed at the same ×0.3 multiplier, and via OpenRouter. The API is Anthropic-compatible, so VM0 agents written for Claude work without code changes.

What's notable about Kimi K2.6

Headline architecture and capability features.

K2.6 is a Mixture-of-Experts model with 1T total parameters and 32B active per token, fronted by a 256K-token context window and multimodal input across image and video (text-only output). Moonshot pairs it with an Agent Swarm runtime that scales horizontally to 300 sub-agents and 4,000 coordinated steps, and has documented long-horizon coding sessions of 12 hours or more. Open weights are published on Hugging Face under a Modified MIT License.

Specs at a glance

FamilyKimi K2 series

Parameters1T total / 32B active (MoE)

ModalitiesImage, video, text

LanguagesMultilingual

Context window256K tokens

LicenseModified MIT (open weights)

Available on VM0April 2026

Kimi K2.6 benchmarks

Vendor-reported scores from Moonshot's K2.6 release blog. Independent third parties (Artificial Analysis, TokenMix) corroborate the relative ordering. K2.6's hallucination rate dropped to 39% from K2.5's 65%. A significant safety/reliability improvement.

SWE-bench Provendor-reported; beats GPT-5.4, Opus 4.6

58.6

SWE-bench Verifiedvendor-reported

80.2

Terminal-Bench 2.0Terminus-2 framework

66.7

LiveCodeBench (v6)vendor-reported

89.6

HLE (with tools)leads GPT-5.4 and Opus 4.6

54.0

BrowseComp (Agent Swarm)up from K2.5's 78.4

86.3

Artificial Analysis Intelligence Index#4 overall, leading open-weight

Kimi K2.6 pricing

Provider list price, per 1M tokens.

Input$0.60

Output$3.00

Cache read$0.10

Cache write$0.60

How Kimi K2.6 behaves in practice

Observed behaviour from production agent runs.

Long-context recall

Strongest long-context recall in our internal evaluation across the Built-in lineup. Maintains coherence across long agent transcripts where Anthropic Sonnet starts to drift.

Agentic benchmarks

Vendor-reported SWE-bench Pro 58.6 is the highest in the lineup at the time of writing. Beats GPT-5.4 and Opus 4.6.

Long-horizon coding

Documented 12+ hour autonomous sessions completing 4,000+ tool calls. The model genuinely sustains performance across very long runs.

Tool use

Reliable across common VM0 tool flows. The Anthropic-compatible API means tool schemas designed for Claude work directly.

Best agent tasks for Kimi K2.6

The investigation that has to read every old thread

Dig through six months of Slack conversations to find why a customer churned, comb the support-ticket backlog for a recurring bug pattern, or stitch together insights across a hundred RFCs. K2.6's long-context recall holds up across transcripts where Anthropic Sonnet starts dropping earlier turns, which is exactly what "reading the whole pile" workflows need.

The autonomous refactor that runs overnight

Moonshot has documented a 13-hour autonomous refactor of an eight-year-old matching engine, with K2.6 sustaining 4,000+ tool calls without drifting off task. That's the kind of run where most models lose the goal somewhere around hour two; K2.6's long-horizon stability is what makes "start it Friday evening, check Monday morning" actually work.

The multimodal agent that handles screenshots and clips

K2.6 accepts both image and video input through MoonViT, which is unusual outside the Claude family. Useful for screenshot-driven QA agents, document-vision pipelines, and any deployment where you'd otherwise have to splice in a separate vision model just to read images.

When to skip Kimi K2.6

Skip K2.6 on the hardest tool-routing edge cases where Sonnet 4.6 still leads on production reliability, and on latency-critical chat replies where Haiku 4.5 is meaningfully faster.

Kimi K2.6 vs other models

Kimi K2.6 vs GLM-5.1

Both are long-context options. K2.6 wins on raw long-context recall in our internal evaluation; GLM-5.1 wins on context size (1M vs 256K). Default to K2.6 for long transcripts; reach for GLM-5.1 only when you need >256K tokens in a single prompt.

Kimi K2.6 vs Claude Sonnet 4.6

Sonnet (×1) leads on multi-tool English-language routing reliability. K2.6 (×0.3) wins on cost and on agentic benchmarks (SWE-bench Pro). Pair them: Sonnet for complex tool-routing, K2.6 for cost-sensitive agent work.

Kimi K2.6 vs Kimi K2.5

K2.6 is the newer generation with stronger tool-use, lower hallucination rate (39% vs 65%), and better reasoning. K2.5 (×0.2) is slightly cheaper. Prefer K2.6 for new work.

Bottom line: should you use Kimi K2.6?

The open-weight default for serious agent work — long-context, cost-effective. The remaining gaps versus Sonnet 4.6 are tool-routing reliability and enterprise support.

Frequently asked questions

When was Kimi K2.6 released?

Moonshot AI released Kimi K2.6 on April 20, 2026. Open weights are published on Hugging Face under a Modified MIT License.

What's the context window?

256K tokens. K2.6 differentiates on recall quality at that size, not raw window size. Recall starts to degrade past ~180K (similar to other 256K models).

Do I need to rewrite my agent to use Kimi?

No. Kimi K2.6 exposes an Anthropic-compatible API, so VM0 agents tuned for Claude work without code changes.

How does Kimi K2.6 compare to Claude Opus 4.6?

On agentic benchmarks (vendor-reported), K2.6 leads. SWE-bench Pro 58.6 vs Opus 4.6's 53.4, HLE with tools 54.0 vs 53.0. Opus 4.6 retains an edge on safety profile and English-language tool-routing reliability in production.

Does K2.6 support image input?

Yes. K2.6 accepts image and video input. Text-only output. Multimodal agents work natively.

Alternatives

Kimi K2.5

Older generation, slightly cheaper multiplier

GLM-5.1

Even longer context window (1M tokens)

DeepSeek V4 Pro

Cheaper alternative for cost-sensitive work

Using Kimi K2.6 on VM0

Two ways to access Kimi K2.6 on VM0

VM0 supports Kimi K2.6 as a Built-in model billed in VM0 credits, and through bring-your-own with a Moonshot API key. The Built-in path uses VM0 Managed routing and the credit multiplier explained below; the bring-your-own path bills you directly with the upstream vendor and skips the VM0 credit conversion entirely.

VM0's recommendation

VM0 positions Kimi K2.6 as a cost-saving option rather than a core agent model. Use it to optimise unit cost on non-core work, such as bulk classification, pre-filters, latency-critical short replies, or pinned legacy agents, while keeping Claude Opus 4.7, Claude Opus 4.6, or Claude Sonnet 4.6 on the steps that decide the run.

Credits and the ×0.3 multiplier

Every Built-in model on VM0 is priced as a multiple of Claude Sonnet 4.6, which sits at the ×1 credit baseline. Kimi K2.6 bills at ×0.3 credits. The multiplier is what shows up on your VM0 invoice; the vendor list price in the pricing table above is what the upstream provider charges before VM0 converts it into credits.

Kimi K2.6 bills at ×0.3, which means a step here costs only 0.3× the credits of an equivalent step on Sonnet 4.6 (the ×1 baseline). That puts it well below the credit baseline and makes it the natural pick for high-volume background work where cost-per-step matters more than peak reasoning quality.

Available on VM0 since April 2026.