Terminalbench Hard
Measured May 14, 2026Source
Score
0
This is a high-performance, enterprise-grade model from NVIDIA, built on the Llama 3.3 architecture. It is optimized for conversational and instruction-following tasks, offering a strong balance of capability and efficiency for applications requiring fast response times and high throughput.
Benchmark history
Score
0
Score
0.11
Score
0.39
Score
0.08
Score
0.19
Score
0.78
Score
0.23
Score
0.28
Score
0.04
Score
0.52
Score
0.7
Score
7.7
Score
7.6
Score
14.3
Score
0.27
Plan availability

Thinking... Make sure you are connected to GitHub server