Llama 3.2 Instruct 90B (Vision)

MetaUnited States

This is the largest multimodal instruction-tuned model in the Meta Llama 3.2 series, featuring 90 billion parameters and support for both image and text inputs. It excels at visual understanding and complex reasoning tasks, making it suitable for sophisticated applications requiring the processing of both images and text.

MultimodalReasoning

Input / 1M tokens

$1.38

Output / 1M tokens

$1.38

Output tokens/s

42.93

First-token seconds

0.51s

Measured May 14, 2026Source

Score

11.9

Plan availability

Products and plans that support this model

No products or plans have been linked to this model yet.

Discussion

Thinking... Make sure you are connected to GitHub server