GLM-4.7-Flash (Non-reasoning)

Z AIChina

GLM-4.7-Flash is a lightweight, high-speed variant of the GLM-4 series optimized for low-latency and cost-effective inference. As a non-reasoning model, it focuses on direct and rapid response generation rather than complex chain-of-thought processes. It is well-suited for applications requiring quick, efficient text generation.

FastCheapCoding

Input / 1M tokens

$0.07

Output / 1M tokens

$0.40

Output tokens/s

122.97

First-token seconds

0.95s

Supported plans

Benchmark history

Evaluations

TAU2

Measured May 14, 2026Source

Score

0.92

Terminalbench Hard

Measured May 14, 2026Source

Score

0.04

Lcr

Measured May 14, 2026Source

Score

0.15

Ifbench

Measured May 14, 2026Source

Score

0.46

Scicode

Measured May 14, 2026Source

Score

0.26

Hle

Measured May 14, 2026Source

Score

0.05

Gpqa

Measured May 14, 2026Source

Score

0.45

Artificial Analysis Coding Index

Measured May 14, 2026Source

Score

Artificial Analysis Intelligence Index

Measured May 14, 2026Source

Score

22.1

Aime 25

Measured May 14, 2026Source

Score

0.95

Livecodebench

Measured May 14, 2026Source

Score

0.89

Mmlu Pro

Measured May 14, 2026Source

Score

0.86

Artificial Analysis Math Index

Measured May 14, 2026Source

Score

CODPL speed

Provider ranking

Provider Median TPS TTFT p50 Success Samples Window Rank

Ollama Cloud

Product: Ollama Cloud

30.91

38,797 ms

100%

Rank #2

京东云

Product: 京东云 Coding Plan

72.78

484 ms

100%

Rank #1

Plan availability

Products and plans that support this model

GLM Coding Plan

GLM Coding Plan is a subscription service by Z AI (Zhipu AI) designed for AI-powered coding. It provides access to GLM models (GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air) through official integrations with 20+ coding tools including Claude Code, Cline, Kilo Code, Cursor, and VS Code. Plans include dedicated MCP tools for vision understanding, web search, and repository access.

Discussion

Thinking... Make sure you are connected to GitHub server