DeepSeek V4 Flash deepseek-ai/DeepSeek-V4-Flash | DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. | 1049k | $0.140 | $0.300 | View Details |
DeepSeek V4 Pro deepseek-ai/DeepSeek-V4-Pro | DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. | 1000k | $1.480 | $3.400 | View Details |
Kimi K2.6 moonshotai/Kimi-K2.6 | Kimi K2.6 is an open-source, native multimodal agentic model that advances practical capabilities in long-horizon coding, coding-driven design, proactive autonomous execution, and swarm-based task orchestration. | 262k | $0.600 | $3.200 | View Details |
Gemma 4 31B google/gemma-4-31B-it | Gemma is a family of open models built by Google DeepMind. Gemma 4 models are multimodal, handling text and image input (with audio supported on small models) and generating text output. This release includes open-weights models in both pre-trained and instruction-tuned variants. Gemma 4 features a context window of up to 256K tokens and maintains multilingual support in over 140 languages. | 262k | $0.130 | $0.370 | View Details |
Kimi K2.5 moonshotai/Kimi-K2.5 | Kimi K2.5 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed visual and text tokens atop Kimi-K2-Base. It seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms. | 262k | $0.400 | $2.600 | View Details |
MiniMax M2.7 MiniMaxAI/MiniMax-M2.7 | MiniMax-M2.7 is our first model deeply participating in its own evolution. M2.7 is capable of building complex agent harnesses and completing highly elaborate productivity tasks, leveraging Agent Teams, complex Skills, and dynamic tool search. | 131k | $0.300 | $1.200 | View Details |
Qwen3 Coder Next Qwen/Qwen3-Coder-Next | Today, we're announcing Qwen3-Coder-Next, an open-weight language model designed specifically for coding agents and local development. | 262k | $0.078 | $0.980 | View Details |
GLM 5 zai-org/GLM-5 | We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity. | 200k | $0.600 | $1.600 | View Details |
openai/gpt-oss-120b openai/gpt-oss-120b | Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. | 131k | $0.039 | $0.182 | View Details |
Nvidia Nemotron 3 Super nvidia/NVIDIA-Nemotron-3-Super-120B-A12B | Nemotron-3-Super-120B-A12B-FP8 is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template. | 1000k | $0.090 | $0.500 | View Details |
openai/gpt-oss-20b openai/gpt-oss-20b | Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. | 131k | $0.030 | $0.140 | View Details |
NVIDIA: Nemotron 3 Nano 30B A3B nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 | Nemotron-3-Nano-30B-A3B-BF16 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. | 262k | $0.050 | $0.200 | View Details |