United States

NVIDIA

NVIDIA provides core AI infrastructure including GPUs, CUDA, and AI platforms like DGX and NIM, powering global model training and inference.

Website

Products

Models

Available

Benchmarks

Region

United States

Updated

May 14, 2026

Product coverage

Products from this provider

No products have been linked to this provider yet.

Model coverage

Models from this provider

Llama 3.1 Nemotron

Llama 3.1 Nemotron Instruct 70B

A 70-billion parameter instruction-tuned model from NVIDIA's Nemotron family, based on Meta's Llama 3.1. It is optimized for strong instruction following, reasoning, and general-purpose enterprise tasks, with a focus on high-performance inference.

ReasoningFastLong contextCoding

Input / 1M tokens

$1.20

Output tokens/s

292.44

First-token seconds

0.26s

Artificial Analysis Intelligence Index

13.4

Llama 3.1 Nemotron

Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)

A compact 4B parameter model from NVIDIA's Nemotron family, fine-tuned from Llama 3.1 for enhanced reasoning and chain-of-thought capabilities. It is optimized for fast inference and low-cost deployment while maintaining strong performance on reasoning tasks.

ReasoningFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

14.4

Llama 3.1 Nemotron

Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)

A large-scale reasoning model from NVIDIA's Nemotron family, built upon the Llama 3.1 architecture. It is optimized for complex, multi-step reasoning tasks and is designed to deliver high accuracy in logical inference and problem-solving.

ReasoningCoding

Input / 1M tokens

$0.60

Output tokens/s

41.57

First-token seconds

0.73s

Artificial Analysis Intelligence Index

Llama 3.3 Nemotron

Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)

This is a high-performance, enterprise-grade model from NVIDIA, built on the Llama 3.3 architecture. It is optimized for conversational and instruction-following tasks, offering a strong balance of capability and efficiency for applications requiring fast response times and high throughput.

CodingFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

14.3

Llama 3.3 Nemotron

Llama 3.3 Nemotron Super 49B v1 (Reasoning)

An NVIDIA-optimized 49B parameter model from the Nemotron family, built on the Llama 3.3 architecture. It is specifically fine-tuned and enhanced for advanced reasoning and problem-solving tasks, likely employing chain-of-thought or similar techniques to improve logical inference.

ReasoningCodingFast

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

18.5

NVIDIA

Llama Nemotron Super 49B v1.5 (Non-reasoning)

This is a 49B parameter model based on the Llama architecture, optimized for general-purpose tasks. It offers fast inference speed and lower operational costs, making it suitable for high-throughput applications.

CodingFastCheapLong context

Input / 1M tokens

$0.10

Output tokens/s

50.74

First-token seconds

0.31s

Artificial Analysis Intelligence Index

14.6

NVIDIA

Llama Nemotron Super 49B v1.5 (Reasoning)

An NVIDIA-optimized reasoning model from the Llama Nemotron family, built on the Llama architecture. It is specifically fine-tuned and enhanced for complex reasoning, problem-solving, and instruction-following tasks.

ReasoningCodingFast

Input / 1M tokens

$0.10

Output tokens/s

50.6

First-token seconds

0.31s

Artificial Analysis Intelligence Index

18.7

NVIDIA Nemotron 3

NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)

A lightweight, efficient 30B parameter model from NVIDIA's Nemotron series, optimized for instruction following and dialogue. It is designed for fast inference and low-cost deployment, suitable for general-purpose conversational AI tasks.

CodingFastCheap

Input / 1M tokens

$0.05

Output tokens/s

83.53

First-token seconds

0.32s

Artificial Analysis Intelligence Index

13.2

NVIDIA Nemotron 3

NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)

A 30-billion parameter reasoning model from NVIDIA's Nemotron family, optimized for complex logical and analytical tasks. It features a 3B active parameter architecture for efficient inference while maintaining strong reasoning capabilities.

ReasoningFastCheapCoding

Input / 1M tokens

$0.055

Output tokens/s

105.56

First-token seconds

1.53s

Artificial Analysis Intelligence Index

24.3

NVIDIA Nemotron 3

NVIDIA Nemotron 3 Nano 4B

NVIDIA Nemotron 3 Nano 4B is a compact, 4-billion parameter language model from NVIDIA's Nemotron family. It is optimized for fast inference and low resource consumption, making it suitable for edge deployment and applications requiring low latency.

FastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

14.7

NVIDIA Nemotron 3

NVIDIA Nemotron 3 Super 120B A12B (Reasoning)

NVIDIA Nemotron 3 Super 120B A12B is a large-scale reasoning model from NVIDIA's Nemotron family. It features 120 billion total parameters with 12 billion activated parameters, optimized for complex reasoning and instruction-following tasks.

ReasoningCoding

Input / 1M tokens

$0.30

Output tokens/s

186.98

First-token seconds

0.97s

Artificial Analysis Intelligence Index

NVIDIA

NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)

This is a lightweight multimodal model from NVIDIA's Nemotron family, supporting both visual and language inputs. It is optimized for edge deployment, offering fast response times and efficient performance.

MultimodalFastCheap

Input / 1M tokens

$0.20

Output tokens/s

245.2

First-token seconds

0.7s

Artificial Analysis Intelligence Index

10.1

NVIDIA

NVIDIA Nemotron Nano 12B v2 VL (Reasoning)

A 12-billion parameter multimodal model from NVIDIA's Nemotron family, optimized for visual reasoning tasks. It combines vision and language understanding to perform complex analysis on images and text, with a focus on logical inference and step-by-step reasoning.

ReasoningMultimodalFastCheap

Input / 1M tokens

$0.20

Output tokens/s

137.84

First-token seconds

0.44s

Artificial Analysis Intelligence Index

14.9

NVIDIA

NVIDIA Nemotron Nano 9B V2 (Non-reasoning)

NVIDIA Nemotron Nano 9B V2 is a compact, non-reasoning language model optimized for fast and efficient inference. It is part of the Nemotron family, designed for low-latency applications and easy deployment on NVIDIA hardware.

FastCheap

Input / 1M tokens

$0.05

Output tokens/s

138.93

First-token seconds

0.74s

Artificial Analysis Intelligence Index

13.2

NVIDIA

NVIDIA Nemotron Nano 9B V2 (Reasoning)

A lightweight 9B parameter model from NVIDIA's Nemotron series, optimized for enhanced reasoning and chain-of-thought capabilities. It is designed for efficient inference, making it suitable for deployment on edge devices or for applications requiring fast, cost-effective responses with strong logical reasoning.

ReasoningFastCheapCoding

Input / 1M tokens

$0.04

Output tokens/s

119.68

First-token seconds

0.25s

Artificial Analysis Intelligence Index

14.8

Nemotron 3 Nano Omni

Nemotron 3 Nano Omni 30B A3B Reasoning

NVIDIA is a leading provider of AI hardware and software, including the Nemotron family of efficient multimodal models for agentic reasoning. The company develops advanced AI platforms such as NVIDIA NIM for deploying AI models at scale.

Input / 1M tokens

$0.075

Output tokens/s

306.01

First-token seconds

0.56s

Artificial Analysis Intelligence Index

21.4

NVIDIA

Nemotron Cascade 2 30B A3B

Nemotron Cascade 2 30B A3B is a large language model from NVIDIA's Nemotron family, featuring a Mixture-of-Experts (MoE) architecture with 30 billion total parameters and 3 billion active parameters per token. This design enables efficient, high-speed inference while maintaining strong performance on coding and reasoning tasks.

CodingReasoningFastCheap

Input / 1M tokens

$0.00

Artificial Analysis Intelligence Index

28.4

NVIDIA

nvidia/Kimi-K2.5-NVFP4

NVIDIA provides AI computing platforms and models, such as the Kimi-K2.5 series optimized in NVFP4 format for efficient inference on NVIDIA GPUs via services like NVIDIA NIM.

Discussion

Thinking... Make sure you are connected to GitHub server