Nvidia Nemotron 3 Super

nvidia/NVIDIA-Nemotron-3-Super-120B-A12B

Nemotron-3-Super-120B-A12B-FP8 is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template. This model is optimized for high-performance inferencing on the Geodd network, providing exceptional speed and reliability for production workloads.

Serverless API

Pay per token via our optimized endpoints.

View Documentation

Available Serverless

Run queries immediately, pay only for usage

Input$0.090 / M Tokens

Output$0.500 / M Tokens

API Usage

cURL

curl --location '$https://api.geodd.io/gateway/v1/chat/completions' \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data '{
  "model": "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B",
  "messages": [
    { "role": "user", "content": "Hello, how are you?" }
  ]
}'

Info

Providernvidia

Quantizationfp4

Created5/13/2026

Available RegionsUS

Supported Functionality

Context Length1,000,000

Max Output1,000,000

ServerlessSupported

Input Capabilitiestext

Output Capabilitiestext

Parameters

temperaturetop_ptop_kfrequency_penaltypresence_penaltyseedmax_tokensstop