China

阿里巴巴

Alibaba develops Tongyi Qianwen (Qwen) models and provides generative AI services via Alibaba Cloud. It is a major player in enterprise AI infrastructure and open-source LLMs.

Website

Products

Models

Available

Benchmarks

Region

China

Updated

May 14, 2026

Product coverage

Products from this provider

Token plan

阿里云百炼 Token Plan

Alibaba Cloud's Token Plan is an AI model subscription service that provides unified Credits for accessing multiple text and image generation models, compatible with mainstream AI programming and agent tools, offering tiered plans with monthly budget control and data security.

OpenClaude API · SupportedOpenAI API · SupportedCursor · Supported

Plans

Models

Updated

May 13, 2026

Model coverage

Models from this provider

阿里巴巴

QwQ 32B

QwQ 32B is a 32-billion parameter language model from Alibaba, designed to deliver strong reasoning and coding capabilities. It offers a balanced performance-to-cost ratio, making it suitable for a wide range of general-purpose and specialized tasks.

CodingReasoningCheap

Input / 1M tokens

$0.66

Output tokens/s

31.39

First-token seconds

0.46s

Artificial Analysis Intelligence Index

19.7

阿里巴巴

QwQ 32B-Preview

QwQ 32B-Preview is a 32-billion parameter reasoning model developed by Alibaba's Qwen team. It is specifically designed to excel at complex reasoning tasks, particularly in mathematics and coding, utilizing reinforcement learning to enhance its problem-solving capabilities. The model features a 'thinking' mode that allows it to break down problems step-by-step before providing a final answer.

ReasoningCodingFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

15.2

阿里巴巴

Qwen Chat 14B

Qwen Chat 14B is a mid-sized, general-purpose conversational model from Alibaba's Qwen series. It offers a balanced performance between capability and efficiency, optimized for dialogue, reasoning, and code generation tasks.

CodingReasoningFast

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

7.4

阿里巴巴

Qwen Chat 72B

Qwen Chat 72B is a large-parameter chat model from Alibaba's Qwen series, optimized for conversational interactions. It features strong reasoning capabilities, supports long contexts, and is proficient in multiple languages including Chinese and English.

ReasoningLong contextCoding

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

8.8

Qwen Image

Qwen-Image-2.0

Alibaba's Qwen image generation model version 2.0.

Multimodal

Qwen Image

Qwen-Image-2.0-Pro

Alibaba's Qwen image generation model version 2.0 Pro.

Multimodal

Qwen1.5

Qwen1.5 Chat 110B

Qwen1.5 Chat 110B is a large-scale language model from Alibaba's Qwen series, featuring 110 billion parameters. It excels in complex reasoning, code generation, and supports long-context understanding and multi-modal inputs.

CodingReasoningLong contextMultimodal

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

9.5

Qwen2

Qwen2 Instruct 72B

Qwen2 Instruct 72B is a large-scale instruction-tuned language model from Alibaba's Qwen series. It features strong reasoning, code generation, and multilingual capabilities, optimized for complex instruction following and dialogue tasks.

CodingReasoningLong context

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

11.7

Qwen 2.5

Qwen2.5 Coder Instruct 32B

Qwen2.5 Coder Instruct 32B is a 32-billion parameter language model from Alibaba, specifically optimized for coding tasks. It excels at code generation, completion, and understanding across multiple programming languages, following instructions effectively for developer workflows.

CodingReasoning

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

12.9

Qwen 2.5

Qwen2.5 Coder Instruct 7B

Qwen2.5 Coder Instruct 7B is a specialized code generation model from Alibaba's Qwen series, optimized for tasks like code completion, generation, and debugging. As a 7B parameter model, it offers a balance of strong coding performance and efficient inference speed, making it suitable for deployment in resource-constrained environments.

CodingFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

Qwen 2.5

Qwen2.5 Instruct 32B

Qwen2.5 Instruct 32B is a mid-sized, instruction-tuned language model from Alibaba's Qwen series. It excels at following instructions, multilingual tasks, and code generation while maintaining strong reasoning capabilities. The model supports a long context window of up to 128K tokens.

CodingReasoningLong context

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

13.2

Qwen 2.5

Qwen2.5 Instruct 72B

Qwen2.5 Instruct 72B is a large language model developed by Alibaba's Qwen team, optimized for instruction following and dialogue. It features strong multilingual capabilities, particularly in Chinese, and excels at complex reasoning and code generation tasks. The model supports long context windows for processing extensive information.

ReasoningCodingLong context

Input / 1M tokens

$0.36

Output tokens/s

55.29

First-token seconds

1.06s

Artificial Analysis Intelligence Index

15.6

Qwen 2.5

Qwen2.5 Max

Qwen2.5 Max is Alibaba Cloud's flagship large language model, excelling in complex reasoning, code generation, and multimodal understanding. It supports an extremely long context window and is designed for high-performance enterprise and research applications.

CodingReasoningLong contextMultimodal

Input / 1M tokens

$1.60

Output tokens/s

48.63

First-token seconds

1.14s

Artificial Analysis Intelligence Index

16.3

Qwen 2.5

Qwen2.5 Turbo

Qwen2.5 Turbo is a high-performance, cost-effective large language model optimized for rapid response times. It is part of Alibaba's Qwen series, designed to deliver strong general capabilities, including coding and reasoning, at a competitive price point.

FastCheapCodingReasoning

Input / 1M tokens

$0.05

Output tokens/s

70.19

First-token seconds

1.31s

Artificial Analysis Intelligence Index

Qwen3

Qwen3 0.6B (Non-reasoning)

Qwen3 0.6B is a lightweight, non-reasoning variant of the Qwen3 series with only 0.6 billion parameters. It is optimized for fast inference, low latency, and minimal resource consumption, making it suitable for edge deployment, simple conversational tasks, and applications requiring rapid response times.

FastCheap

Input / 1M tokens

$0.11

Output tokens/s

224.09

First-token seconds

0.89s

Artificial Analysis Intelligence Index

5.7

Qwen3

Qwen3 0.6B (Reasoning)

A lightweight reasoning model from the Qwen3 series, optimized for fast inference and cost-effective deployment. It excels in logical reasoning tasks with a focus on chain-of-thought capabilities.

ReasoningFastCheap

Input / 1M tokens

$0.11

Output tokens/s

225.26

First-token seconds

0.92s

Artificial Analysis Intelligence Index

6.5

Qwen3

Qwen3 1.7B (Non-reasoning)

Qwen3 1.7B is a lightweight language model from Alibaba's Qwen series, optimized for fast and efficient inference. It is designed for non-reasoning tasks, providing quick responses with minimal computational resources.

FastCheap

Input / 1M tokens

$0.11

Output tokens/s

140.28

First-token seconds

1.02s

Artificial Analysis Intelligence Index

6.8

Qwen3

Qwen3 1.7B (Reasoning)

A compact 1.7B parameter model from Alibaba's Qwen3 series, optimized for efficient reasoning tasks. It is designed to deliver strong logical and analytical performance in resource-constrained environments, offering a balance of speed and capability.

ReasoningFastCheap

Input / 1M tokens

$0.11

Output tokens/s

139.17

First-token seconds

0.99s

Artificial Analysis Intelligence Index

Qwen3

Qwen3 14B (Non-reasoning)

Qwen3 14B is a 14-billion parameter model from Alibaba's Qwen3 series, optimized for general-purpose dialogue and instruction following. As a non-reasoning variant, it focuses on efficient and responsive text generation, making it suitable for applications requiring quick, cost-effective, and high-quality conversational AI.

FastCheapLong context

Input / 1M tokens

$0.235

Output tokens/s

64.21

First-token seconds

1.16s

Artificial Analysis Intelligence Index

12.8

Qwen3

Qwen3 14B (Reasoning)

Qwen3 14B (Reasoning) is a 14-billion parameter model from Alibaba's Qwen3 series, specifically optimized for complex reasoning tasks. It excels at chain-of-thought and step-by-step logical problem-solving, offering a strong balance between advanced reasoning capabilities and computational efficiency.

ReasoningFastCheapLong context

Input / 1M tokens

$0.235

Output tokens/s

64.76

First-token seconds

1.14s

Artificial Analysis Intelligence Index

16.2

Qwen3

Qwen3 235B A22B (Non-reasoning)

Qwen3 235B A22B is a large-scale Mixture-of-Experts (MoE) language model from Alibaba's Qwen series, with a total of 235 billion parameters but only 22 billion activated per inference. This non-reasoning variant is optimized for general-purpose tasks, offering strong multilingual capabilities, coding proficiency, and efficient performance due to its MoE architecture.

CodingReasoningFastLong contextMultimodal

Input / 1M tokens

$0.45

Output tokens/s

69.11

First-token seconds

1.2s

Artificial Analysis Intelligence Index

Qwen3

Qwen3 235B A22B (Reasoning)

Qwen3 235B A22B (Reasoning) is a large-scale language model from Alibaba's Qwen3 series, optimized for complex reasoning tasks. It utilizes a Mixture-of-Experts (MoE) architecture with 235B total parameters and 22B activated parameters, balancing high performance with computational efficiency. The model excels in instruction following and multi-step logical reasoning.

ReasoningCodingLong context

Input / 1M tokens

$0.70

Output tokens/s

69.04

First-token seconds

1.31s

Artificial Analysis Intelligence Index

19.8

Qwen3

Qwen3 235B A22B 2507 (Reasoning)

This is a reasoning-optimized variant of the Qwen3 235B model from Alibaba Cloud. It is designed to excel in complex logical, mathematical, and coding tasks that require multi-step reasoning. As a large-scale model, it supports long context windows and is part of the advanced Qwen3 series.

ReasoningCodingLong context

Input / 1M tokens

$0.40

Output tokens/s

First-token seconds

1.21s

Artificial Analysis Intelligence Index

29.5

Qwen3

Qwen3 235B A22B 2507 Instruct

Qwen3 235B A22B is a large-scale Mixture-of-Experts (MoE) language model from Alibaba's Qwen series. It features 235 billion total parameters with 22 billion activated per token, designed for strong instruction following, complex reasoning, and multilingual tasks.

ReasoningCodingLong context

Input / 1M tokens

$0.20

Output tokens/s

68.67

First-token seconds

1.25s

Artificial Analysis Intelligence Index

Qwen3

Qwen3 30B A3B (Non-reasoning)

Qwen3 30B A3B is a 30-billion parameter model from Alibaba's Qwen3 series, optimized for general-purpose instruction following and fast response generation. As a non-reasoning variant, it prioritizes efficiency and speed over complex chain-of-thought tasks, making it suitable for cost-sensitive and latency-critical applications.

CodingFastCheapMultimodal

Input / 1M tokens

$0.08

Output tokens/s

67.46

First-token seconds

1.35s

Artificial Analysis Intelligence Index

12.5

Qwen3

Qwen3 30B A3B (Reasoning)

Qwen3 30B A3B is a reasoning-optimized language model from Alibaba, designed for enhanced logical inference and problem-solving tasks.

Reasoning

Input / 1M tokens

$0.09

Output tokens/s

67.28

First-token seconds

1.17s

Artificial Analysis Intelligence Index

15.3

Qwen3

Qwen3 30B A3B 2507 (Reasoning)

This is a 30-billion parameter reasoning model from Alibaba's Qwen3 series, optimized for complex logical and analytical tasks. It features enhanced chain-of-thought capabilities to improve accuracy in multi-step problem-solving.

ReasoningCodingLong context

Input / 1M tokens

$0.28

Output tokens/s

148.45

First-token seconds

1.05s

Artificial Analysis Intelligence Index

22.4

Qwen3

Qwen3 30B A3B 2507 Instruct

Qwen3 30B A3B is a 30-billion parameter instruction-tuned model from Alibaba's Qwen3 series, likely utilizing a Mixture-of-Experts architecture with 3 billion active parameters. It is optimized for strong instruction following, reasoning, and multilingual (especially Chinese) performance, balancing capability with inference efficiency.

CodingReasoningFastCheap

Input / 1M tokens

$0.15

Output tokens/s

122.46

First-token seconds

1.12s

Artificial Analysis Intelligence Index

Qwen3

Qwen3 32B (Non-reasoning)

Qwen3 32B (Non-reasoning) is a 32-billion parameter instruction-tuned model from Alibaba's Qwen series. It is designed for general-purpose dialogue and content generation, balancing performance and efficiency. This model excels at following instructions and handling a wide range of tasks without specialized reasoning modes.

ReasoningLong context

Input / 1M tokens

$0.15

Output tokens/s

104.69

First-token seconds

1.1s

Artificial Analysis Intelligence Index

14.5

Qwen3

Qwen3 32B (Reasoning)

Qwen3 32B (Reasoning) is a 32-billion parameter model from Alibaba's Qwen3 series, specifically optimized for complex reasoning tasks. It excels in chain-of-thought processes, logical deduction, and problem-solving, while also maintaining strong coding and long-context capabilities.

ReasoningCodingLong context

Input / 1M tokens

$0.195

Output tokens/s

103.45

First-token seconds

1.04s

Artificial Analysis Intelligence Index

16.5

Qwen3

Qwen3 4B (Non-reasoning)

Qwen3 4B (Non-reasoning) is a lightweight, 4-billion parameter language model from Alibaba's Qwen3 series, optimized for fast and cost-effective inference. It is designed for general-purpose tasks and edge deployment, offering a balance of performance and efficiency without the overhead of complex reasoning chains.

FastCheap

Input / 1M tokens

$0.11

Output tokens/s

104.23

First-token seconds

0.98s

Artificial Analysis Intelligence Index

12.5

Qwen3

Qwen3 4B (Reasoning)

Qwen3 4B (Reasoning) is a compact 4-billion parameter model from Alibaba's Qwen3 series, optimized for reasoning tasks. It likely incorporates a chain-of-thought or thinking mode to enhance logical problem-solving while maintaining low latency and cost. This model is suitable for deployment in resource-constrained environments requiring efficient reasoning capabilities.

ReasoningFastCheapLong context

Input / 1M tokens

$0.11

Output tokens/s

103.85

First-token seconds

Artificial Analysis Intelligence Index

14.2

Qwen3

Qwen3 4B 2507 (Reasoning)

A lightweight 4B-parameter reasoning model from Alibaba's Qwen3 series, optimized for instruction following and logical reasoning tasks. It offers a balance of performance and efficiency for resource-constrained deployments.

ReasoningFastCheapLong context

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

18.2

Qwen3

Qwen3 4B 2507 Instruct

Qwen3 4B is a lightweight, efficient instruction-tuned model from Alibaba's Qwen series. It is optimized for fast inference and low-cost deployment while maintaining strong performance in following instructions and general tasks, particularly for Chinese language processing.

FastCheapReasoning

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

12.9

Qwen3

Qwen3 8B (Non-reasoning)

Qwen3 8B (Non-reasoning) is an 8-billion parameter instruction-tuned model from Alibaba's Qwen3 series, optimized for general-purpose dialogue and instruction-following tasks. It is designed for fast response speeds and cost-effective deployment, making it suitable for applications requiring efficient and capable language understanding without complex reasoning chains.

CodingFastCheap

Input / 1M tokens

$0.18

Output tokens/s

92.36

First-token seconds

1.03s

Artificial Analysis Intelligence Index

10.6

Qwen3

Qwen3 8B (Reasoning)

Qwen3 8B (Reasoning) is a lightweight, 8-billion parameter model from Alibaba's Qwen3 series, optimized for instruction following and reasoning tasks. It delivers strong logical and analytical performance while maintaining fast inference speeds suitable for real-time applications.

ReasoningFastCheap

Input / 1M tokens

$0.11

Output tokens/s

87.15

First-token seconds

1.03s

Artificial Analysis Intelligence Index

13.2

Qwen3

Qwen3 Coder 30B A3B Instruct

Qwen3 Coder 30B A3B Instruct is a code-specialized model from Alibaba's Qwen3 series. It features a 30B total parameter size with a 3B active parameter architecture (likely a Mixture-of-Experts design), optimized for code generation, understanding, and instruction following.

CodingReasoningFast

Input / 1M tokens

$0.19

Output tokens/s

104.32

First-token seconds

1.5s

Artificial Analysis Intelligence Index

Qwen3

Qwen3 Coder 480B A35B Instruct

Qwen3 Coder 480B A35B Instruct is a large-scale, code-specialized language model from Alibaba's Qwen series. It features a Mixture-of-Experts (MoE) architecture with 480 billion total parameters and 35 billion active parameters, designed for high-performance code generation, understanding, and instruction following.

CodingReasoning

Input / 1M tokens

$0.30

Output tokens/s

68.64

First-token seconds

1.66s

Artificial Analysis Intelligence Index

24.8

Qwen3

Qwen3 Coder Next

Alibaba Group, through its Alibaba Cloud division, develops and offers the Qwen series of large language models for AI applications. The Qwen3-Coder-Next model is a coding-focused AI model with strong agentic capabilities, trained on executable task synthesis and reinforcement learning. Alibaba provides AI platform services via Alibaba Cloud Model Studio.

Input / 1M tokens

$0.35

Output tokens/s

152.98

First-token seconds

0.85s

Artificial Analysis Intelligence Index

28.3

Qwen3

Qwen3 Max

Qwen3 Max is Alibaba Cloud's flagship large language model, designed for high-performance general tasks. It features strong multimodal understanding, a 128K long context window, and excels in complex reasoning and code generation.

MultimodalLong contextCodingReasoning

Input / 1M tokens

$1.66

Output tokens/s

32.81

First-token seconds

1.93s

Artificial Analysis Intelligence Index

31.4

Qwen3

Qwen3 Max (Preview)

Qwen3 Max (Preview) is the latest flagship model from Alibaba's Qwen series, designed for high-performance enterprise applications. It features enhanced reasoning and coding capabilities, supports an ultra-long context window, and is optimized for complex analytical tasks.

ReasoningCodingLong contextMultimodal

Input / 1M tokens

$1.20

Output tokens/s

44.87

First-token seconds

1.77s

Artificial Analysis Intelligence Index

26.1

Qwen3

Qwen3 Max Thinking

Qwen3 Max Thinking is a high-end model from Alibaba's Qwen3 series, optimized for complex reasoning tasks. It features an enhanced thinking mode for deeper analysis and supports long-context processing.

ReasoningCodingLong context

Input / 1M tokens

$1.20

Output tokens/s

48.94

First-token seconds

1.44s

Artificial Analysis Intelligence Index

39.8

Qwen3

Qwen3 Max Thinking (Preview)

Qwen3 Max Thinking (Preview) is an advanced AI model from Alibaba, designed for enhanced reasoning and chain-of-thought capabilities. It excels in complex problem-solving and logical tasks, with a focus on deep thinking modes.

Reasoning

Input / 1M tokens

$1.20

Output tokens/s

45.61

First-token seconds

1.95s

Artificial Analysis Intelligence Index

32.5

Qwen3

Qwen3 Next 80B A3B (Reasoning)

Qwen3 Next 80B A3B (Reasoning) is a large language model from Alibaba's Qwen series, optimized for complex reasoning tasks. It utilizes a Mixture-of-Experts (MoE) architecture with 80 billion total parameters but only 3 billion active parameters per inference, offering a strong balance between high performance and computational efficiency.

ReasoningFastCheap

Input / 1M tokens

$0.50

Output tokens/s

169.83

First-token seconds

1.07s

Artificial Analysis Intelligence Index

26.7

Qwen3

Qwen3 Next 80B A3B Instruct

Qwen3 Next 80B A3B Instruct is a large language model from Alibaba's Qwen series, featuring a sparse activation architecture (likely 80B total parameters with ~3B active parameters per token). This design aims to deliver strong reasoning and coding capabilities while significantly improving inference speed and cost-efficiency compared to dense models of similar total size.

CodingReasoningFastCheapLong context

Input / 1M tokens

$0.50

Output tokens/s

167.41

First-token seconds

1.06s

Artificial Analysis Intelligence Index

20.1

Qwen3

Qwen3 Omni 30B A3B (Reasoning)

Qwen3 Omni 30B A3B (Reasoning) is a multimodal model from Alibaba's Qwen3 series, optimized for complex reasoning tasks. It processes both text and images, leveraging a 30-billion parameter architecture with 3 billion active parameters for efficient inference.

ReasoningMultimodal

Input / 1M tokens

$0.25

Output tokens/s

89.62

First-token seconds

1.07s

Artificial Analysis Intelligence Index

15.6

Qwen3

Qwen3 Omni 30B A3B Instruct

Qwen3 Omni 30B A3B Instruct is a multimodal model from Alibaba's Qwen3 series, designed for instruction-following tasks. It features a 30-billion parameter architecture with 3 billion active parameters, balancing performance and efficiency. The model supports both text and image inputs, making it suitable for diverse multimodal applications.

MultimodalReasoningCoding

Input / 1M tokens

$0.25

Output tokens/s

110.46

First-token seconds

0.98s

Artificial Analysis Intelligence Index

10.7

Qwen3

Qwen3 VL 235B A22B (Reasoning)

A large vision-language model from Alibaba with 235B total parameters and 22B activated parameters, focused on enhanced reasoning capabilities. It combines visual understanding with language generation, suitable for complex multimodal tasks requiring reasoning.

ReasoningMultimodal

Input / 1M tokens

$0.84

Output tokens/s

34.36

First-token seconds

1.29s

Artificial Analysis Intelligence Index

27.6

Qwen3

Qwen3 VL 235B A22B Instruct

Qwen3 VL 235B A22B Instruct is a large-scale multimodal model from Alibaba's Qwen series, featuring a 235B total parameter Mixture-of-Experts (MoE) architecture with 22B active parameters. It is designed for advanced visual and language understanding tasks, offering strong reasoning capabilities while maintaining efficiency.

MultimodalReasoning

Input / 1M tokens

$0.30

Output tokens/s

46.74

First-token seconds

1.17s

Artificial Analysis Intelligence Index

20.8

Qwen3

Qwen3 VL 30B A3B (Reasoning)

A multimodal vision-language model from Alibaba's Qwen3 series, optimized for reasoning tasks. It features a 30B total parameter architecture with 3B activated parameters, suggesting a Mixture-of-Experts design for efficient inference.

MultimodalReasoningFast

Input / 1M tokens

$0.20

Output tokens/s

125.65

First-token seconds

1.07s

Artificial Analysis Intelligence Index

19.7

Qwen3

Qwen3 VL 30B A3B Instruct

Qwen3 VL 30B A3B Instruct is a multimodal vision-language model from Alibaba's Qwen3 series. It is designed to process both image and text inputs, likely leveraging a Mixture-of-Experts architecture (30B total parameters, 3B active) for efficient inference. The model is instruction-tuned for following user prompts in visual and language tasks.

MultimodalFastCheap

Input / 1M tokens

$0.20

Output tokens/s

122.57

First-token seconds

1.03s

Artificial Analysis Intelligence Index

Qwen3

Qwen3 VL 32B (Reasoning)

Qwen3 VL 32B (Reasoning) is a multimodal vision-language model from Alibaba's Qwen series, optimized for complex reasoning tasks. It integrates visual understanding with strong logical and analytical capabilities, suitable for tasks requiring visual input and step-by-step reasoning.

MultimodalReasoning

Input / 1M tokens

$0.70

Output tokens/s

96.7

First-token seconds

1.4s

Artificial Analysis Intelligence Index

24.7

Qwen3

Qwen3 VL 32B Instruct

A 32-billion parameter vision-language model from Alibaba's Qwen3 series, excelling at image understanding, visual question answering, and multimodal reasoning while maintaining good response speed and cost efficiency.

MultimodalReasoningFastCheap

Input / 1M tokens

$0.70

Output tokens/s

54.88

First-token seconds

1.15s

Artificial Analysis Intelligence Index

17.2

Qwen3

Qwen3 VL 4B (Reasoning)

A compact 4-billion parameter multimodal model from the Qwen3 VL series, optimized for visual reasoning tasks. It processes both images and text to perform complex reasoning, making it suitable for applications requiring visual understanding and logical inference.

MultimodalReasoningFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

13.7

Qwen3

Qwen3 VL 4B Instruct

A lightweight vision-language model from Alibaba's Qwen3 series with 4 billion parameters. It is designed for efficient image understanding and multimodal tasks, offering fast inference and low deployment costs, suitable for edge or resource-constrained scenarios.

MultimodalFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

9.6

Qwen3

Qwen3 VL 8B (Reasoning)

Qwen3 VL 8B (Reasoning) is a lightweight, multimodal vision-language model from Alibaba's Qwen series, optimized for enhanced reasoning capabilities. It efficiently processes both text and images, making it suitable for tasks requiring visual understanding and logical inference.

MultimodalReasoningFastCheapCoding

Input / 1M tokens

$0.18

Output tokens/s

134.73

First-token seconds

1.08s

Artificial Analysis Intelligence Index

16.7

Qwen3

Qwen3 VL 8B Instruct

Qwen3 VL 8B Instruct is a lightweight, 8-billion parameter vision-language model from Alibaba's Qwen series. It is designed for efficient multimodal understanding and instruction following, excelling at tasks that require processing both text and images.

MultimodalFastCheap

Input / 1M tokens

$0.18

Output tokens/s

145.61

First-token seconds

1.09s

Artificial Analysis Intelligence Index

14.3

Qwen3.5

Qwen3.5 0.8B (Non-reasoning)

A lightweight, 0.8 billion parameter model from the Qwen3.5 series, optimized for fast inference and low-cost deployment. It is designed for simple, non-reasoning tasks and is suitable for edge devices or applications requiring rapid response times.

FastCheapReasoning

Input / 1M tokens

$0.01

Output tokens/s

105.07

First-token seconds

0.29s

Artificial Analysis Intelligence Index

9.9

Qwen3.5

Qwen3.5 0.8B (Reasoning)

Qwen3.5 0.8B (Reasoning) is a lightweight, 0.8-billion parameter model from Alibaba's Qwen3.5 series, specifically optimized for reasoning tasks. It is designed to deliver strong logical and analytical performance while maintaining a small footprint suitable for edge deployment or low-latency applications.

ReasoningFastCheap

Input / 1M tokens

$0.01

Artificial Analysis Intelligence Index

10.5

Qwen3.5

Qwen3.5 122B A10B (Non-reasoning)

A large language model from Alibaba's Qwen3.5 series, featuring 122B total parameters with 10B activated (A10B). This non-reasoning variant is optimized for high efficiency and low latency in general-purpose tasks, offering strong multilingual and multimodal capabilities without the overhead of complex reasoning chains.

CodingFastCheapMultimodal

Input / 1M tokens

$0.40

Output tokens/s

154.57

First-token seconds

1.11s

Artificial Analysis Intelligence Index

35.9

Qwen3.5

Qwen3.5 122B A10B (Reasoning)

Qwen3.5 122B A10B is a large-scale Mixture-of-Experts (MoE) model from Alibaba's Qwen series, optimized for complex reasoning tasks. It features a 122 billion parameter architecture with 10 billion active parameters, balancing high performance with computational efficiency. The model supports an extremely long context window and excels in code generation and logical analysis.

CodingReasoningLong context

Input / 1M tokens

$0.40

Output tokens/s

153.85

First-token seconds

1.02s

Artificial Analysis Intelligence Index

41.6

Qwen3.5

Qwen3.5 27B (Non-reasoning)

Qwen3.5 27B (Non-reasoning) is a mid-sized language model from Alibaba's Qwen series, optimized for general-purpose tasks without specialized reasoning capabilities. It supports long contexts and is designed for efficient deployment.

CodingFastCheapLong context

Input / 1M tokens

$0.28

Output tokens/s

89.68

First-token seconds

1.39s

Artificial Analysis Intelligence Index

37.2

Qwen3.5

Qwen3.5 27B (Reasoning)

Qwen3.5 27B (Reasoning) is a 27-billion parameter language model from Alibaba, optimized for reasoning tasks with enhanced chain-of-thought capabilities. It is designed for complex problem-solving and logical inference.

ReasoningCodingLong context

Input / 1M tokens

$0.30

Output tokens/s

90.12

First-token seconds

1.36s

Artificial Analysis Intelligence Index

42.1

Qwen3.5

Qwen3.5 2B (Non-reasoning)

A lightweight, 2-billion parameter model from the Qwen3.5 series optimized for fast and cost-effective inference. It is designed for general-purpose conversational tasks and simple applications where low latency and high throughput are prioritized over complex reasoning.

FastCheap

Input / 1M tokens

$0.02

Output tokens/s

324.32

First-token seconds

0.23s

Artificial Analysis Intelligence Index

14.7

Qwen3.5

Qwen3.5 2B (Reasoning)

Qwen3.5 2B (Reasoning) is a lightweight, 2-billion parameter model from Alibaba's Qwen series, specifically optimized for reasoning tasks. It delivers efficient and fast inference while maintaining strong performance on logical and analytical problems.

ReasoningFastCheapLong context

Input / 1M tokens

$0.02

Artificial Analysis Intelligence Index

16.3

Qwen3.5

Qwen3.5 35B A3B (Non-reasoning)

This is a high-efficiency Mixture-of-Experts (MoE) model from the Qwen3.5 series, featuring 35 billion total parameters with only 3 billion activated per inference. It is optimized for fast response speeds and is the non-reasoning variant, suitable for general-purpose tasks.

FastCheapLong context

Input / 1M tokens

$0.25

Output tokens/s

129.72

First-token seconds

1.17s

Artificial Analysis Intelligence Index

30.7

Qwen3.5

Qwen3.5 35B A3B (Reasoning)

A 35B parameter reasoning model from the Qwen3.5 series, utilizing a Mixture-of-Experts architecture with 3B activated parameters for efficient inference. Optimized for complex reasoning tasks while maintaining strong performance in coding and long-context understanding.

ReasoningCodingFastLong contextMultimodal

Input / 1M tokens

$0.25

Output tokens/s

122.98

First-token seconds

1.09s

Artificial Analysis Intelligence Index

37.1

Qwen3.5

Qwen3.5 397B A17B (Non-reasoning)

Qwen3.5 397B A17B (Non-reasoning) is a large-scale Mixture-of-Experts (MoE) model from Alibaba's Qwen series. As a non-reasoning variant, it is optimized for general-purpose tasks, offering high performance and efficiency without the overhead of extended chain-of-thought processes. It is well-suited for applications requiring fast and capable responses across coding, instruction following, and general knowledge tasks.

CodingFastCheap

Input / 1M tokens

$0.60

Output tokens/s

52.04

First-token seconds

1.95s

Artificial Analysis Intelligence Index

40.1

Qwen3.5

Qwen3.5 397B A17B (Reasoning)

Qwen3.5 397B A17B is a large-scale reasoning model from Alibaba's Qwen series, featuring 397 billion total parameters with 17 billion active parameters. It is optimized for complex reasoning tasks, multi-step problem solving, and supports a wide range of languages including Chinese and English.

CodingReasoningLong context

Input / 1M tokens

$0.60

Output tokens/s

51.78

First-token seconds

1.86s

Artificial Analysis Intelligence Index

Qwen3.5

Qwen3.5 4B (Non-reasoning)

A lightweight, 4-billion parameter model from the Qwen3.5 series, optimized for efficiency and speed. It is designed for fast inference and low-cost deployment, suitable for edge devices and applications requiring quick responses.

CodingFastCheap

Input / 1M tokens

$0.03

Output tokens/s

203.41

First-token seconds

0.23s

Artificial Analysis Intelligence Index

22.6

Qwen3.5

Qwen3.5 4B (Reasoning)

Qwen3.5 4B (Reasoning) is a lightweight, 4-billion parameter model from Alibaba's Qwen3.5 series, specifically optimized for enhanced reasoning and chain-of-thought capabilities. It offers a strong balance of performance and efficiency, making it suitable for fast inference tasks that require logical deduction.

ReasoningFastCheap

Input / 1M tokens

$0.03

Output tokens/s

201.95

First-token seconds

0.24s

Artificial Analysis Intelligence Index

27.1

Qwen3.5

Qwen3.5 9B (Non-reasoning)

Qwen3.5 9B (Non-reasoning) is a compact language model from Alibaba's Qwen series, optimized for fast inference and low-cost deployment. It supports coding and multilingual tasks with a 9B parameter size, making it suitable for edge devices and real-time applications.

CodingFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

27.3

Qwen3.5

Qwen3.5 9B (Reasoning)

Qwen3.5 9B (Reasoning) is a lightweight, efficient model from Alibaba's Qwen series, specifically optimized for enhanced reasoning and chain-of-thought capabilities. It offers a strong balance of performance, speed, and cost-effectiveness for complex problem-solving tasks.

ReasoningCodingFastCheapLong context

Input / 1M tokens

$0.10

Output tokens/s

59.31

First-token seconds

0.45s

Artificial Analysis Intelligence Index

32.4

Qwen3.5

Qwen3.5 Omni Flash

Qwen3.5 Omni Flash is a multimodal model from Alibaba's Qwen series, designed for fast and efficient processing of text, images, and potentially other modalities. It is optimized for low-latency applications, making it suitable for real-time interactive scenarios.

MultimodalFastCodingReasoningLong context

Input / 1M tokens

$0.10

Output tokens/s

253.73

First-token seconds

1.08s

Artificial Analysis Intelligence Index

25.9

Qwen3.5

Qwen3.5 Omni Plus

Qwen3.5 Omni Plus is a multimodal large language model from Alibaba's Qwen series, designed for enhanced performance across text, image, and potentially audio inputs. It features strong reasoning and coding capabilities, suitable for complex tasks requiring integrated understanding of different data types.

CodingReasoningMultimodal

Input / 1M tokens

$0.40

Output tokens/s

54.68

First-token seconds

1.43s

Artificial Analysis Intelligence Index

38.6

Qwen3.5

Qwen3.5 Plus 2026-02-15

Alibaba's Qwen3.5 Plus multimodal model supporting text, image, and video inputs with 1M-token context window for reasoning and agent workflows.

CodingReasoningLong contextMultimodal

Input / 1M tokens

$0.40

Qwen3.6

Qwen3.6 27B (Non-reasoning)

Qwen3.6 27B (Non-reasoning) is a mid-sized, general-purpose language model from Alibaba's Qwen3 series. It is optimized for fast response times and cost-efficiency, making it suitable for a wide range of standard tasks without the overhead of complex reasoning chains.

CodingFastCheapMultimodalLong context

Input / 1M tokens

$0.60

Output tokens/s

63.62

First-token seconds

1.39s

Artificial Analysis Intelligence Index

37.1

Qwen3.6

Qwen3.6 27B (Reasoning)

Qwen3.6 27B (Reasoning) is a mid-sized language model from Alibaba's Qwen series, optimized for complex reasoning tasks. It balances strong logical and analytical capabilities with efficient inference speed, making it suitable for applications requiring step-by-step problem-solving.

ReasoningCodingFastCheap

Input / 1M tokens

$0.60

Output tokens/s

65.57

First-token seconds

1.54s

Artificial Analysis Intelligence Index

45.8

Qwen3.6

Qwen3.6 35B A3B (Non-reasoning)

This is a 35-billion parameter model from Alibaba's Qwen series, optimized for fast and efficient inference without the overhead of a dedicated reasoning mode. It maintains strong capabilities in coding and multimodal tasks while offering a balance between performance and cost-effectiveness.

CodingMultimodalFastCheap

Input / 1M tokens

$0.375

Output tokens/s

183.3

First-token seconds

1.4s

Artificial Analysis Intelligence Index

31.5

Qwen3.6

Qwen3.6 35B A3B (Reasoning)

Qwen3.6 35B A3B is a reasoning-focused model from Alibaba's Qwen series. It is designed for complex problem-solving tasks, leveraging a 35B parameter architecture with optimized activation (A3B) for efficient inference.

ReasoningLong context

Input / 1M tokens

$0.248

Output tokens/s

181.98

First-token seconds

1.45s

Artificial Analysis Intelligence Index

43.5

Qwen3.6

Qwen3.6 Max Preview

Qwen3.6 Max Preview is the latest flagship model in Alibaba's Qwen series, designed to deliver top-tier performance in complex reasoning and code generation tasks. As a preview of the Max variant, it likely represents the most capable version within the Qwen3.6 family, optimized for high-quality, long-form outputs.

CodingReasoningLong context

Input / 1M tokens

$1.30

Output tokens/s

35.75

First-token seconds

2.49s

Artificial Analysis Intelligence Index

51.8

Qwen3.6

Qwen3.6 Plus

Qwen3.6 Plus is a large language model developed by Alibaba as part of the Qwen series. It is designed for general-purpose tasks and likely supports long-context processing and multilingual capabilities.

MultimodalLong context

Input / 1M tokens

$0.50

Output tokens/s

52.42

First-token seconds

1.88s

Artificial Analysis Intelligence Index

Wan2.7

Wan2.7-Image

Alibaba's Wan2.7 image generation model.

Multimodal

Wan2.7

Wan2.7-Image-Pro

Alibaba's Wan2.7 image generation model Pro version.

Multimodal

Discussion

Thinking... Make sure you are connected to GitHub server