Qwen3 VL 30B A3B Instruct

阿里巴巴China

Qwen3 VL 30B A3B Instruct is a multimodal vision-language model from Alibaba's Qwen3 series. It is designed to process both image and text inputs, likely leveraging a Mixture-of-Experts architecture (30B total parameters, 3B active) for efficient inference. The model is instruction-tuned for following user prompts in visual and language tasks.

MultimodalFastCheap

Input / 1M tokens

$0.20

Output / 1M tokens

$0.60

Output tokens/s

123.83

First-token seconds

1.01s

Supported plans

Benchmark history

Evaluations

TAU2

Measured May 29, 2026Source

Score

0.19

Terminalbench Hard

Measured May 29, 2026Source

Score

0.06

Lcr

Measured May 29, 2026Source

Score

0.24

Ifbench

Measured May 29, 2026Source

Score

0.33

Aime 25

Measured May 29, 2026Source

Score

0.72

Scicode

Measured May 29, 2026Source

Score

0.31

Livecodebench

Measured May 29, 2026Source

Score

0.48

Hle

Measured May 29, 2026Source

Score

0.06

Gpqa

Measured May 29, 2026Source

Score

0.7

Mmlu Pro

Measured May 29, 2026Source

Score

0.76

Artificial Analysis Math Index

Measured May 29, 2026Source

Score

72.3

Artificial Analysis Coding Index

Measured May 29, 2026Source

Score

14.3

Artificial Analysis Intelligence Index

Measured May 29, 2026Source

Score

Aime

Measured May 29, 2026Source

Score

0.94

Math 500

Measured May 29, 2026Source

Score

0.98

Plan availability

Products and plans that support this model

No products or plans have been linked to this model yet.

User ratings

Loading ratings...

Discussion

Thinking... Make sure you are connected to GitHub server