United States
NVIDIA
NVIDIA provides core AI infrastructure including GPUs, CUDA, and AI platforms like DGX and NIM, powering global model training and inference.
Region
United States
Updated
May 14, 2026
Product coverage
Products from this provider
Model coverage
Models from this provider
Llama 3.1 Nemotron
Llama 3.1 Nemotron Instruct 70B
A 70-billion parameter instruction-tuned model from NVIDIA's Nemotron family, based on Meta's Llama 3.1. It is optimized for strong instruction following, reasoning, and general-purpose enterprise tasks, with a focus on high-performance inference.
Input / 1M tokens
$1.20
Output tokens/s
292.44
First-token seconds
0.26s
Artificial Analysis Intelligence Index
13.4
Llama 3.1 Nemotron
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
A compact 4B parameter model from NVIDIA's Nemotron family, fine-tuned from Llama 3.1 for enhanced reasoning and chain-of-thought capabilities. It is optimized for fast inference and low-cost deployment while maintaining strong performance on reasoning tasks.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
14.4
Llama 3.1 Nemotron
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
A large-scale reasoning model from NVIDIA's Nemotron family, built upon the Llama 3.1 architecture. It is optimized for complex, multi-step reasoning tasks and is designed to deliver high accuracy in logical inference and problem-solving.
Input / 1M tokens
$0.60
Output tokens/s
41.57
First-token seconds
0.73s
Artificial Analysis Intelligence Index
15
Llama 3.3 Nemotron
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)
This is a high-performance, enterprise-grade model from NVIDIA, built on the Llama 3.3 architecture. It is optimized for conversational and instruction-following tasks, offering a strong balance of capability and efficiency for applications requiring fast response times and high throughput.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
14.3
Llama 3.3 Nemotron
Llama 3.3 Nemotron Super 49B v1 (Reasoning)
An NVIDIA-optimized 49B parameter model from the Nemotron family, built on the Llama 3.3 architecture. It is specifically fine-tuned and enhanced for advanced reasoning and problem-solving tasks, likely employing chain-of-thought or similar techniques to improve logical inference.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
18.5
NVIDIA
Llama Nemotron Super 49B v1.5 (Non-reasoning)
This is a 49B parameter model based on the Llama architecture, optimized for general-purpose tasks. It offers fast inference speed and lower operational costs, making it suitable for high-throughput applications.
Input / 1M tokens
$0.10
Output tokens/s
50.74
First-token seconds
0.31s
Artificial Analysis Intelligence Index
14.6
NVIDIA
Llama Nemotron Super 49B v1.5 (Reasoning)
An NVIDIA-optimized reasoning model from the Llama Nemotron family, built on the Llama architecture. It is specifically fine-tuned and enhanced for complex reasoning, problem-solving, and instruction-following tasks.
Input / 1M tokens
$0.10
Output tokens/s
50.6
First-token seconds
0.31s
Artificial Analysis Intelligence Index
18.7
NVIDIA Nemotron 3
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
A lightweight, efficient 30B parameter model from NVIDIA's Nemotron series, optimized for instruction following and dialogue. It is designed for fast inference and low-cost deployment, suitable for general-purpose conversational AI tasks.
Input / 1M tokens
$0.05
Output tokens/s
83.53
First-token seconds
0.32s
Artificial Analysis Intelligence Index
13.2
NVIDIA Nemotron 3
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
A 30-billion parameter reasoning model from NVIDIA's Nemotron family, optimized for complex logical and analytical tasks. It features a 3B active parameter architecture for efficient inference while maintaining strong reasoning capabilities.
Input / 1M tokens
$0.055
Output tokens/s
105.56
First-token seconds
1.53s
Artificial Analysis Intelligence Index
24.3
NVIDIA Nemotron 3
NVIDIA Nemotron 3 Nano 4B
NVIDIA Nemotron 3 Nano 4B is a compact, 4-billion parameter language model from NVIDIA's Nemotron family. It is optimized for fast inference and low resource consumption, making it suitable for edge deployment and applications requiring low latency.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
14.7
NVIDIA Nemotron 3
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA Nemotron 3 Super 120B A12B is a large-scale reasoning model from NVIDIA's Nemotron family. It features 120 billion total parameters with 12 billion activated parameters, optimized for complex reasoning and instruction-following tasks.
Input / 1M tokens
$0.30
Output tokens/s
186.98
First-token seconds
0.97s
Artificial Analysis Intelligence Index
36
NVIDIA
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
This is a lightweight multimodal model from NVIDIA's Nemotron family, supporting both visual and language inputs. It is optimized for edge deployment, offering fast response times and efficient performance.
Input / 1M tokens
$0.20
Output tokens/s
245.2
First-token seconds
0.7s
Artificial Analysis Intelligence Index
10.1
NVIDIA
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
A 12-billion parameter multimodal model from NVIDIA's Nemotron family, optimized for visual reasoning tasks. It combines vision and language understanding to perform complex analysis on images and text, with a focus on logical inference and step-by-step reasoning.
Input / 1M tokens
$0.20
Output tokens/s
137.84
First-token seconds
0.44s
Artificial Analysis Intelligence Index
14.9
NVIDIA
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA Nemotron Nano 9B V2 is a compact, non-reasoning language model optimized for fast and efficient inference. It is part of the Nemotron family, designed for low-latency applications and easy deployment on NVIDIA hardware.
Input / 1M tokens
$0.05
Output tokens/s
138.93
First-token seconds
0.74s
Artificial Analysis Intelligence Index
13.2
NVIDIA
NVIDIA Nemotron Nano 9B V2 (Reasoning)
A lightweight 9B parameter model from NVIDIA's Nemotron series, optimized for enhanced reasoning and chain-of-thought capabilities. It is designed for efficient inference, making it suitable for deployment on edge devices or for applications requiring fast, cost-effective responses with strong logical reasoning.
Input / 1M tokens
$0.04
Output tokens/s
119.68
First-token seconds
0.25s
Artificial Analysis Intelligence Index
14.8
Nemotron 3 Nano Omni
Nemotron 3 Nano Omni 30B A3B Reasoning
NVIDIA is a leading provider of AI hardware and software, including the Nemotron family of efficient multimodal models for agentic reasoning. The company develops advanced AI platforms such as NVIDIA NIM for deploying AI models at scale.
Input / 1M tokens
$0.075
Output tokens/s
306.01
First-token seconds
0.56s
Artificial Analysis Intelligence Index
21.4
NVIDIA
Nemotron Cascade 2 30B A3B
Nemotron Cascade 2 30B A3B is a large language model from NVIDIA's Nemotron family, featuring a Mixture-of-Experts (MoE) architecture with 30 billion total parameters and 3 billion active parameters per token. This design enables efficient, high-speed inference while maintaining strong performance on coding and reasoning tasks.
Input / 1M tokens
$0.00
Artificial Analysis Intelligence Index
28.4
NVIDIA
nvidia/Kimi-K2.5-NVFP4
NVIDIA provides AI computing platforms and models, such as the Kimi-K2.5 series optimized in NVFP4 format for efficient inference on NVIDIA GPUs via services like NVIDIA NIM.
Discussion

Thinking... Make sure you are connected to GitHub server

