Models

Meta

Llama 3.2 Instruct 90B (Vision)

MetaMetaUnited States

This is the largest multimodal instruction-tuned model in the Meta Llama 3.2 series, featuring 90 billion parameters and support for both image and text inputs. It excels at visual understanding and complex reasoning tasks, making it suitable for sophisticated applications requiring the processing of both images and text.

MultimodalReasoning
Input / 1M tokens
$1.38
Output / 1M tokens
$1.38
Output tokens/s
42.93
First-token seconds
0.51s
Supported plans
0

Benchmark history

Evaluations

8

Aime

Measured May 14, 2026Source

Score

0.05

Math 500

Measured May 14, 2026Source

Score

0.63

Scicode

Measured May 14, 2026Source

Score

0.24

Livecodebench

Measured May 14, 2026Source

Score

0.21

Hle

Measured May 14, 2026Source

Score

0.05

Gpqa

Measured May 14, 2026Source

Score

0.43

Mmlu Pro

Measured May 14, 2026Source

Score

0.67

Artificial Analysis Intelligence Index

Measured May 14, 2026Source

Score

11.9

Plan availability

Products and plans that support this model

0
No products or plans have been linked to this model yet.

Discussion

Thinking... Make sure you are connected to GitHub server