China

Kimi

Kimi is an AI chatbot and assistant platform developed by Moonshot AI (月之暗面), a Chinese AI startup based in Beijing. The platform features Kimi K2.6 model with capabilities including long-context processing, deep research, code generation (Kimi Code), and agent-based workflows (Agent Swarm). Kimi gained significant popularity in China for its strong long-context understanding abilities, initially supporting 200K+ token contexts. The platform also offers productivity tools including document creation, slide presentations, spreadsheets, and website building.

Website

Products

Models

Available

Benchmarks

Region

China

Updated

May 29, 2026

CODPL speed

Provider speed highlights

Kimi K2.6 Product: Kimi

Median TPS358.95

TTFT p50816 ms

Samples3

Window: 1hRank #1

Product coverage

Products from this provider

Coding plan

Kimi

Kimi is an AI assistant platform offering subscription-based access to coding assistance, agent tools, and other AI features through plans like Kimi Code and Agent Swarm.

Kimi Code · SupportedWeb UI · Supported

Plans

Models

Updated

May 11, 2026

Model coverage

Models from this provider

Kimi K2

Kimi K2 is a large language model developed by Moonshot AI (Kimi). It is part of the Kimi series, likely representing a significant iteration in the model family.

Input / 1M tokens

$0.585

Output tokens/s

24.02

First-token seconds

1.57s

Artificial Analysis Intelligence Index

26.3

Kimi K2

Kimi K2 0905

Kimi K2 0905 is a large language model developed by Moonshot AI (Kimi). It is part of the Kimi series, likely featuring strong long-context understanding and reasoning capabilities.

Long contextReasoning

Input / 1M tokens

$0.60

Output tokens/s

24.01

First-token seconds

1.45s

Artificial Analysis Intelligence Index

30.9

Kimi K2

Kimi K2 Thinking

Kimi K2 Thinking is a reasoning-focused model from Moonshot AI, designed to excel in complex problem-solving and chain-of-thought tasks. It is optimized for deep analysis and logical inference.

ReasoningLong context

Input / 1M tokens

$0.60

Output tokens/s

116.59

First-token seconds

0.87s

Artificial Analysis Intelligence Index

40.9

Kimi K2.5

Kimi K2.5 (Non-reasoning)

Kimi K2.5 (Non-reasoning) is a fast-response variant of the Kimi K2.5 series, optimized for low-latency interactions. It excels in rapid content generation, chat, and multimodal understanding tasks where immediate answers are prioritized over deep, step-by-step reasoning.

FastMultimodalLong context

Input / 1M tokens

$0.60

Output tokens/s

33.6

First-token seconds

1.23s

Artificial Analysis Intelligence Index

37.3

Kimi K2.5

Kimi K2.5 (Reasoning)

Kimi K2.5 (Reasoning) is a large language model from Moonshot AI, specifically optimized for complex reasoning tasks. It excels in multi-step logical deduction, mathematical problem-solving, and code analysis, often employing a chain-of-thought approach to enhance accuracy. The model is designed to handle intricate queries that require deep analytical thinking.

ReasoningLong context

Input / 1M tokens

$0.58

Output tokens/s

40.37

First-token seconds

1.3s

Artificial Analysis Intelligence Index

46.8

Kimi K2.6

Kimi K2.6 is a large language model developed by Moonshot AI, optimized for long-context understanding and complex reasoning tasks. It builds upon the Kimi K2 series, offering enhanced performance in processing extensive information and generating coherent, logical responses.

Long contextReasoningCoding

Input / 1M tokens

$0.95

Output tokens/s

32.06

First-token seconds

1.42s

Artificial Analysis Intelligence Index

53.9

Kimi K2.6

Kimi K2.6 (Non-reasoning)

A non-reasoning variant of the Kimi K2.6 model, optimized for fast response times and cost-efficiency. It is suitable for applications requiring quick, economical replies while maintaining the long-context capabilities of the Kimi K2 series.

FastCheapLong context

Input / 1M tokens

$0.95

Output tokens/s

27.16

First-token seconds

1.35s

Artificial Analysis Intelligence Index

42.9

Kimi

Kimi Linear 48B A3B Instruct

Kimi Linear 48B A3B Instruct is a large language model optimized for efficiency, likely utilizing a Mixture-of-Experts (MoE) architecture with 48 billion total parameters and 3 billion active parameters. The 'Linear' designation suggests it may employ linear attention mechanisms for enhanced performance on long sequences. As an instruction-tuned model, it is designed for strong instruction following and conversational capabilities.

CodingReasoningFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

14.4

Discussion

Thinking... Make sure you are connected to GitHub server