TAU2
Measured May 14, 2026Source
Score
0.73
Claude 4 Opus (Reasoning) is Anthropic's most capable model, optimized for complex reasoning and extended thinking tasks. It excels at multi-step problem solving, analysis, and generating nuanced, well-structured responses for challenging queries.
Benchmark history
Score
0.73
Score
0.31
Score
0.34
Score
0.54
Score
0.73
Score
0.76
Score
0.98
Score
0.4
Score
0.64
Score
0.12
Score
0.8
Score
0.87
Score
73.3
Score
34
Score
39
Plan availability

Thinking... Make sure you are connected to GitHub server