Back to Models

Llama 3.1 Nemotron Ultra 253B

by nvidia

Reasoning

Description

Llama 3.1 Nemotron Ultra 253B is Nvidia's flagship reasoning model, created by converting Meta's Llama 3.1 405B from dense to MoE architecture. Features 253B total parameters with ~17B active per token, Dynamic Thinking with toggleable reasoning, tool calling, and structured outputs.

Providers

Pricing

Input$0.60 /1M tokens
Output$1.80 /1M tokens
Get Your API Key

Specifications

Context128K tokens
Parameters253B
Licensenvidia-open
ReleasedMar 2025

Capabilities

ReasoningToolsJSON ModeStreaming

EU-Compliant API Access

Use Llama 3.1 Nemotron Ultra 253B with a simple API call. OpenAI-compatible endpoint, EU data residency guaranteed.

JavaScript
const response = await fetch("https://api.eurouter.ai/api/v1/chat/completions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${process.env.EUROUTER_API_KEY}`,
  },
  body: JSON.stringify({
    model: "llama-3.1-nemotron-ultra-253b",
    messages: [
      { role: "user", content: "Hello!" }
    ],
  }),
});

const data = await response.json();
console.log(data.choices[0].message.content);

More from nvidia

Explore more

Integrate AI without GDPR risk.

You need AI that won’t create compliance headaches. Your data stays in the EU, GDPR is enforced by default, and every request is routed for the best balance of cost, latency, and uptime, reducing risk while improving performance.

Get Your API Key
GDPR by default
EU data residency
Smart routing