Step3 VL 10B

StepFunChina

Step3 VL 10B is a multimodal vision-language model developed by StepFun. With 10 billion parameters, it is designed to understand and process both visual and textual information for various tasks.

Multimodal

Input / 1M tokens

$0.00

Output / 1M tokens

$0.00

Supported plans

Benchmark history

Evaluations

TAU2

Measured May 29, 2026Source

Score

0.16

Terminalbench Hard

Measured May 29, 2026Source

Score

0.05

Lcr

Measured May 29, 2026Source

Score

Ifbench

Measured May 29, 2026Source

Score

0.5

Scicode

Measured May 29, 2026Source

Score

0.31

Hle

Measured May 29, 2026Source

Score

0.1

Gpqa

Measured May 29, 2026Source

Score

0.69

Artificial Analysis Coding Index

Measured May 29, 2026Source

Score

13.9

Artificial Analysis Intelligence Index

Measured May 29, 2026Source

Score

15.5

Plan availability

Products and plans that support this model

No products or plans have been linked to this model yet.

User ratings

Loading ratings...

Discussion

Thinking... Make sure you are connected to GitHub server