China

StepFun

StepFun is a Chinese AI company focused on developing advanced multimodal AI models and platforms. Its product suite includes large language models (like Step-3.5-Flash), multimodal reasoning models (Step-R1-V-mini), and specialized tools for image creation, editing, and knowledge base Q&A. The company offers an API platform and an 'Agent Studio' for building AI agents, positioning itself in the competitive generative AI market.

Website

Products

Models

Available

Benchmarks

Region

China

Updated

May 29, 2026

Product coverage

Products from this provider

Coding plan

Step Plan

Step Plan is StepFun's subscription-based AI service for Agent and Coding scenarios. It offers standardized APIs with OpenAI and Anthropic compatibility, intelligent multi-model routing via step-router-v1, and out-of-the-box multimodal capabilities including text, speech, and image models. Designed for developers building intelligent workflows with high-frequency model calls.

RooCode · SupportedOpenAI API · SupportedOpenClaude API · Supported

Plans

Models

Updated

May 13, 2026

Coding plan

Step Plan

Step Plan is a subscription-based AI service from StepFun for high-frequency developers, offering access to flagship models via standardized APIs for coding and agent scenarios with intelligent routing and multimodal capabilities.

OpenClaude API · SupportedCodex · SupportedHermes Agent · Supported

Plans

Models

Updated

May 13, 2026

Model coverage

Models from this provider

Step 3.5 Flash

Step 3.5 Flash is a fast-response language model optimized for Chinese language understanding and generation. It is designed for quick inference and efficient performance in conversational and text-based tasks.

FastReasoningCoding

Input / 1M tokens

$0.10

Output tokens/s

183.87

First-token seconds

0.81s

Artificial Analysis Intelligence Index

37.8

Step 3.5 Flash

Step 3.5 Flash 2603

Step 3.5 Flash is a fast and efficient language model from StepFun, optimized for low-latency responses. It is part of the Flash series, designed to balance speed with strong reasoning capabilities for general-purpose tasks.

FastReasoning

Input / 1M tokens

$0.00

Output tokens/s

194.97

First-token seconds

0.83s

Artificial Analysis Intelligence Index

38.5

Step 3 VL 10B

Step3 VL 10B

Step3 VL 10B is a multimodal vision-language model developed by StepFun. With 10 billion parameters, it is designed to understand and process both visual and textual information for various tasks.

Multimodal

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

15.5

StepAudio 2.5

StepAudio 2.5 ASR

Automatic speech recognition model for streaming and near-realtime transcription.

StepAudio 2.5

StepAudio 2.5 TTS

Text-to-speech model with zero-shot voice cloning and natural-language control.

Step Image Edit 2

step-image-edit-2

Lightweight generative editing model for text-to-image and image editing with fast response.

Multimodal

Step Router V1

step-router-v1

Intelligent routing model for automatic switching between deepseek-v4-pro and step-3.5-flash based on task complexity.

Discussion

Thinking... Make sure you are connected to GitHub server