Models

Meta

Llama 3.2 Instruct 11B (Vision)

MetaMetaUnited States

This is an 11B parameter multimodal instruct model from Meta's Llama 3.2 series, optimized for vision tasks. It balances performance with efficiency, offering faster inference speeds than larger models, and is suitable for conversational and instruction-following scenarios requiring image understanding.

MultimodalReasoningFastCheap
Input / 1M tokens
$0.245
Output / 1M tokens
$0.245
Output tokens/s
85.21
First-token seconds
0.46s
Supported plans
0

Benchmark history

Evaluations

15

TAU2

Measured May 14, 2026Source

Score

0.15

Terminalbench Hard

Measured May 14, 2026Source

Score

0.01

Lcr

Measured May 14, 2026Source

Score

0.12

Ifbench

Measured May 14, 2026Source

Score

0.3

Aime 25

Measured May 14, 2026Source

Score

0.02

Aime

Measured May 14, 2026Source

Score

0.09

Math 500

Measured May 14, 2026Source

Score

0.52

Scicode

Measured May 14, 2026Source

Score

0.11

Livecodebench

Measured May 14, 2026Source

Score

0.11

Hle

Measured May 14, 2026Source

Score

0.05

Gpqa

Measured May 14, 2026Source

Score

0.22

Mmlu Pro

Measured May 14, 2026Source

Score

0.46

Artificial Analysis Math Index

Measured May 14, 2026Source

Score

1.7

Artificial Analysis Coding Index

Measured May 14, 2026Source

Score

4.2

Artificial Analysis Intelligence Index

Measured May 14, 2026Source

Score

8.7

Plan availability

Products and plans that support this model

0
No products or plans have been linked to this model yet.

Discussion

Thinking... Make sure you are connected to GitHub server