1.6T total parameters, 49B active DeepSeek-V4-Flash 284B total parameters, 13B active Both support 1 million token context length. Key improvements include: Hybrid Attention Architecture Uses Compressed Sparse Attention and Heavily Compressed Attention to make long-context inference much cheaper. At 1M context, V4-Pro uses only 27% of the single-token inference FLOPs and 10% of the KV cache compared with DeepSeek-V3.2. Manifold-Constrained Hyper-Connections Improves residual connections, helping stability across layers while keeping model expressiveness. Muon Optimizer Used to improve training speed, convergence, and stability. The models were pretrained on 32T+ tokens, then post-trained through a two-stage process: first training domain-specific experts using SFT and RL, then merging their strengths through on-policy distillation. The strongest mode, DeepSeek-V4-Pro-Max, is positioned as a major step forward for open-source models, especially in coding, reasoning, knowledge, and agentic tasks. DeepSeek-V4-Flash-Max can approach Pro-level reasoning with a larger thinking budget, but is weaker on pure knowledge and very complex agent workflows due to its smaller size. This model is optimized for high-performance inferencing on the Geodd network, providing exceptional speed and reliability for production workloads.
Read morePay per token via our optimized endpoints.
curl --location '$https://api.geodd.io/gateway/v1/chat/completions' \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --data '{ "model": "deepseek-ai/DeepSeek-V4-Pro", "messages": [ { "role": "user", "content": "Hello, how are you?" } ] }'