Models

GLM-4.7-Flash (Non-reasoning)

GLM-4.7-Flash is a lightweight, high-speed variant of the GLM-4 series optimized for low-latency and cost-effective inference. As a non-reasoning model, it focuses on direct and rapid response generation rather than complex chain-of-thought processes. It is well-suited for applications requiring quick, efficient text generation.

FastCheapCoding
Input / 1M tokens
$0.07
Output / 1M tokens
$0.40
Output tokens/s
122.97
First-token seconds
0.95s
Supported plans
3

Benchmark history

Evaluations

13

TAU2

Measured May 14, 2026Source

Score

0.92

Terminalbench Hard

Measured May 14, 2026Source

Score

0.04

Lcr

Measured May 14, 2026Source

Score

0.15

Ifbench

Measured May 14, 2026Source

Score

0.46

Scicode

Measured May 14, 2026Source

Score

0.26

Hle

Measured May 14, 2026Source

Score

0.05

Gpqa

Measured May 14, 2026Source

Score

0.45

Artificial Analysis Coding Index

Measured May 14, 2026Source

Score

11

Artificial Analysis Intelligence Index

Measured May 14, 2026Source

Score

22.1

Aime 25

Measured May 14, 2026Source

Score

0.95

Livecodebench

Measured May 14, 2026Source

Score

0.89

Mmlu Pro

Measured May 14, 2026Source

Score

0.86

Artificial Analysis Math Index

Measured May 14, 2026Source

Score

95

CODPL speed

Provider ranking

2

Plan availability

Products and plans that support this model

1
GLM Coding Plan

GLM Coding Plan

GLM Coding Plan is a subscription service by Z AI (Zhipu AI) designed for AI-powered coding. It provides access to GLM models (GLM-5.1, GLM-5-Turbo, GLM-4.7, GLM-4.5-Air) through official integrations with 20+ coding tools including Claude Code, Cline, Kilo Code, Cursor, and VS Code. Plans include dedicated MCP tools for vision understanding, web search, and repository access.

Discussion

Thinking... Make sure you are connected to GitHub server