Median TPS51.98
TTFT p50625 ms
Samples3
China
Z AI refers to Zhipu AI (智谱AI), a Chinese AI company developing GLM series large language models and foundation models. It provides generative AI services, APIs, and applications such as ChatGLM, positioning itself as a key player in China's LLM ecosystem.
Region
China
Updated
May 14, 2026
CODPL speed
Median TPS51.98
TTFT p50625 ms
Samples3
Product coverage
Model coverage
GLM 5V Turbo Reasoning
GLM 5V Turbo (Reasoning) is a multimodal model from Zhipu AI's GLM series, optimized for fast inference and strong reasoning capabilities. It is designed to handle complex tasks that require logical deduction and visual understanding.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
42.9
GLM 4.5
GLM-4.5 (Reasoning) is a model from Zhipu AI's GLM-4 series, specifically optimized for complex reasoning and problem-solving tasks. It likely employs chain-of-thought or similar techniques to enhance logical deduction and step-by-step analysis.
Input / 1M tokens
$0.60
Output tokens/s
47.24
First-token seconds
0.9s
Artificial Analysis Intelligence Index
26.4
GLM 4.5
GLM-4.5-Air is a lightweight and efficient variant of the GLM-4.5 series, optimized for fast response times and lower computational costs. It is suitable for applications requiring quick inference and high throughput.
Input / 1M tokens
$0.17
Output tokens/s
71.25
First-token seconds
1.45s
Artificial Analysis Intelligence Index
23.2
GLM 4.5
GLM-4.5V is a multimodal model from Z AI optimized for fast, non-reasoning tasks. It excels at processing visual inputs alongside text and is tuned for efficient, low-latency responses, particularly for Chinese language contexts.
Input / 1M tokens
$0.60
Output tokens/s
49.23
First-token seconds
30.02s
Artificial Analysis Intelligence Index
12.7
GLM 4.5
A multimodal reasoning model from Zhipu AI's GLM series. It is designed to process and reason across both text and visual inputs, excelling at tasks that require integrated understanding and logical deduction.
Input / 1M tokens
$0.60
Output tokens/s
49.92
First-token seconds
1.1s
Artificial Analysis Intelligence Index
15.1
GLM 4.6
GLM-4.6 (Non-reasoning) is a variant of the GLM-4 series optimized for general-purpose dialogue and content generation tasks, rather than complex reasoning. It offers fast response speeds and is suitable for high-throughput applications.
Input / 1M tokens
$0.60
Output tokens/s
37.89
First-token seconds
1.01s
Artificial Analysis Intelligence Index
30.2
GLM 4.6
GLM-4.6 (Reasoning) is a model from the GLM-4 series optimized for complex reasoning tasks. It excels at multi-step logical deduction and problem-solving, often employing chain-of-thought reasoning to enhance accuracy.
Input / 1M tokens
$0.55
Output tokens/s
35.27
First-token seconds
0.76s
Artificial Analysis Intelligence Index
32.5
GLM 4.6
GLM-4.6V is a multimodal model from Zhipu AI capable of processing both text and images. As a non-reasoning variant, it is optimized for general-purpose tasks, content generation, and multimodal understanding rather than complex chain-of-thought reasoning.
Input / 1M tokens
$0.30
Output tokens/s
28.76
First-token seconds
9.25s
Artificial Analysis Intelligence Index
17.1
GLM 4.6
A multimodal reasoning model from the GLM-4 series, designed for advanced visual understanding and complex logical inference tasks. It integrates vision capabilities with strong reasoning performance.
Input / 1M tokens
$0.30
Output tokens/s
37.2
First-token seconds
1.6s
Artificial Analysis Intelligence Index
23.4
GLM 4.7
GLM-4.7 (Non-reasoning) is a variant of the GLM-4 series from Z AI, optimized for general-purpose tasks without an explicit reasoning or chain-of-thought mode. It focuses on providing fast and cost-effective responses for standard conversational, coding, and everyday tasks.
Input / 1M tokens
$0.60
Output tokens/s
105.81
First-token seconds
0.69s
Artificial Analysis Intelligence Index
34.2
GLM 4.7
GLM-4.7 is a powerful reasoning model from Zhipu AI (Z AI), designed for complex logical and analytical tasks. It supports an ultra-long context window of 128K tokens and is capable of processing multimodal inputs.
Input / 1M tokens
$0.60
Output tokens/s
107.88
First-token seconds
0.8s
Artificial Analysis Intelligence Index
42.1
GLM 4.7
GLM-4.7-Flash is a lightweight, high-speed variant of the GLM-4 series optimized for low-latency and cost-effective inference. As a non-reasoning model, it focuses on direct and rapid response generation rather than complex chain-of-thought processes. It is well-suited for applications requiring quick, efficient text generation.
Input / 1M tokens
$0.07
Output tokens/s
122.97
First-token seconds
0.95s
Artificial Analysis Intelligence Index
22.1
GLM 4.7
GLM-4.7-Flash (Reasoning) is a lightweight, high-speed model from the GLM series, optimized for fast inference and strong reasoning capabilities. It is designed for applications requiring quick, logical responses and complex problem-solving.
Input / 1M tokens
$0.07
Output tokens/s
87.93
First-token seconds
0.87s
Artificial Analysis Intelligence Index
30.1
GLM 5
GLM-5 (Non-reasoning) is a variant of the GLM-5 series optimized for high-speed, low-latency responses. It excels in tasks requiring quick turnaround and cost efficiency, while maintaining strong capabilities in coding, multimodal understanding, and long-context processing.
Input / 1M tokens
$1.00
Output tokens/s
66.6
First-token seconds
1.36s
Artificial Analysis Intelligence Index
40.6
GLM 5
GLM-5 (Reasoning) is the latest generation large language model from Zhipu AI, specifically optimized for complex reasoning tasks. It features enhanced logical deduction and chain-of-thought capabilities, and is part of the multimodal GLM model family.
Input / 1M tokens
$1.00
Output tokens/s
84.04
First-token seconds
0.68s
Artificial Analysis Intelligence Index
49.8
GLM 5
Z AI develops the GLM series of large language models, including GLM-5 and GLM-5.1, designed for advanced AI applications like coding, reasoning, and multimodal tasks. These models are offered through the Z.ai platform and feature high parameter counts with efficient architectures.
GLM 5
Z.AI develops and offers advanced AI models, such as the GLM-5 series, which support multimodal inputs, complex coding, reasoning, and long-context tasks. The provider makes models available via API and through open-source releases on platforms like Hugging Face, focusing on research and deployment in the AI market.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
46.8
GLM 5.1
GLM-5.1 (Non-reasoning) is a variant of the GLM-5.1 model optimized for faster response times and cost-efficiency by omitting the dedicated reasoning/thinking process. It is suitable for general-purpose tasks, coding, and multimodal interactions where rapid output is prioritized over complex chain-of-thought reasoning.
Input / 1M tokens
$1.40
Output tokens/s
41.88
First-token seconds
1.16s
Artificial Analysis Intelligence Index
43.8
GLM 5.1
GLM-5.1 (Reasoning) is a large language model from Z AI (Zhipu AI) specifically optimized for complex reasoning tasks. It excels at multi-step logical deduction, problem-solving, and analysis, making it suitable for applications requiring deep thought and structured output.
Input / 1M tokens
$1.40
Output tokens/s
51.55
First-token seconds
0.88s
Artificial Analysis Intelligence Index
51.4
CodeGeex4 All 9B
Z AI develops and provides the CodeGeeX4 series of AI models, such as CodeGeeX4-ALL-9B, which are versatile models for various AI software development scenarios including code completion, code interpreter, web search, function calling, and repository-level Q&A.

Thinking... Make sure you are connected to GitHub server