TAU2
Measured May 14, 2026Source
Score
0.16
Step3 VL 10B is a multimodal vision-language model developed by StepFun. With 10 billion parameters, it is designed to understand and process both visual and textual information for various tasks.
Benchmark history
Score
0.16
Score
0.05
Score
0
Score
0.5
Score
0.31
Score
0.1
Score
0.69
Score
13.9
Score
15.5
Plan availability

Thinking... Make sure you are connected to GitHub server