DeepSeek V4 Pro | Model Details | Geodd AI
Model Library/DeepSeek V4 Pro

DeepSeek V4 Pro

deepseek-ai/DeepSeek-V4-Pro
API Docs

1.6T total parameters, 49B active DeepSeek-V4-Flash 284B total parameters, 13B active Both support 1 million token context length. Key improvements include: Hybrid Attention Architecture Uses Compressed Sparse Attention and Heavily Compressed Attention to make long-context inference much cheaper. At 1M context, V4-Pro uses only 27% of the single-token inference FLOPs and 10% of the KV cache compared with DeepSeek-V3.2. Manifold-Constrained Hyper-Connections Improves residual connections, helping stability across layers while keeping model expressiveness. Muon Optimizer Used to improve training speed, convergence, and stability. The models were pretrained on 32T+ tokens, then post-trained through a two-stage process: first training domain-specific experts using SFT and RL, then merging their strengths through on-policy distillation. The strongest mode, DeepSeek-V4-Pro-Max, is positioned as a major step forward for open-source models, especially in coding, reasoning, knowledge, and agentic tasks. DeepSeek-V4-Flash-Max can approach Pro-level reasoning with a larger thinking budget, but is weaker on pure knowledge and very complex agent workflows due to its smaller size. This model is optimized for high-performance inferencing on the Geodd network, providing exceptional speed and reliability for production workloads.

Read more

Features

Serverless API

Pay per token via our optimized endpoints.

View Documentation
Available Serverless
Run queries immediately, pay only for usage
Input$0.78 / M Tokens
Output$3.40 / M Tokens

API Usage

cURL
curl --location '$https://api.geodd.io/gateway/v1/chat/completions' \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data '{
  "model": "deepseek-ai/DeepSeek-V4-Pro",
  "messages": [
    { "role": "user", "content": "Hello, how are you?" }
  ]
}'

Info

Providerdeepseek-ai
Quantizationfp8
Created5/2/2026
Available RegionsUS

Supported Functionality

Context Length1,000,000
Max Output80,000
ServerlessSupported
Input Capabilitiestext
Output Capabilitiestext

Parameters

temperaturetop_pmax_tokensseedstoptop_kfrequency_penaltypresence_penaltyrepetition_penaltymin_p