China
StepFun
StepFun is a Chinese AI company focused on developing advanced multimodal AI models and platforms. Its product suite includes large language models (like Step-3.5-Flash), multimodal reasoning models (Step-R1-V-mini), and specialized tools for image creation, editing, and knowledge base Q&A. The company offers an API platform and an 'Agent Studio' for building AI agents, positioning itself in the competitive generative AI market.
Region
China
Updated
May 14, 2026
Product coverage
Products from this provider
Coding plan
Step Plan
Step Plan is StepFun's subscription-based AI service for Agent and Coding scenarios. It offers standardized APIs with OpenAI and Anthropic compatibility, intelligent multi-model routing via step-router-v1, and out-of-the-box multimodal capabilities including text, speech, and image models. Designed for developers building intelligent workflows with high-frequency model calls.
Plans
4
Models
7
Updated
May 13, 2026
Coding plan
Step Plan
Step Plan is a subscription-based AI service from StepFun for high-frequency developers, offering access to flagship models via standardized APIs for coding and agent scenarios with intelligent routing and multimodal capabilities.
Plans
4
Models
4
Updated
May 13, 2026
Model coverage
Models from this provider
Step 3.5 Flash
Step 3.5 Flash
Step 3.5 Flash is a fast-response language model optimized for Chinese language understanding and generation. It is designed for quick inference and efficient performance in conversational and text-based tasks.
Input / 1M tokens
$0.10
Output tokens/s
153.02
First-token seconds
0.88s
Artificial Analysis Intelligence Index
37.8
Step 3.5 Flash
Step 3.5 Flash 2603
Step 3.5 Flash is a fast and efficient language model from StepFun, optimized for low-latency responses. It is part of the Flash series, designed to balance speed with strong reasoning capabilities for general-purpose tasks.
Input / 1M tokens
$0.00
Output tokens/s
155.65
First-token seconds
0.83s
Artificial Analysis Intelligence Index
38.5
Step 3 VL 10B
Step3 VL 10B
Step3 VL 10B is a multimodal vision-language model developed by StepFun. With 10 billion parameters, it is designed to understand and process both visual and textual information for various tasks.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
15.5
StepAudio 2.5
StepAudio 2.5 ASR
Automatic speech recognition model for streaming and near-realtime transcription.
StepAudio 2.5
StepAudio 2.5 TTS
Text-to-speech model with zero-shot voice cloning and natural-language control.
Step Image Edit 2
step-image-edit-2
Lightweight generative editing model for text-to-image and image editing with fast response.
Step Router V1
step-router-v1
Intelligent routing model for automatic switching between deepseek-v4-pro and step-3.5-flash based on task complexity.
Discussion

Thinking... Make sure you are connected to GitHub server

