Chat Completion
DocsServerless InferenceChat Completion

Chat Completion

This endpoint generates chat completions based on a list of messages.

POST
https://api.geodd.io/inference/v1/chat/completions

Authorizations

Authorization string header required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model string required

Model name to use for text generation.

Example: thedrummer/unslopnemo-12b
messages array<object> required

A list of user messages comprising the conversation input so far.

role enum<string> required

The role of the message's author.

Available options: user, assistant
content string required

The contents of the user's input query or the assistant's previous response.

max_tokens integer optional

Maximum number of tokens to generate.

temperature float optional

Controls randomness.

0 → deterministic
1 → more creative
top_p float optional

Nucleus sampling. Limits tokens to a probability mass.

stop string | array<string> optional

Stops generation when one of the sequences is generated.

seed integer optional

Makes output reproducible (best effort).

top_k integer optional

Limits token selection to top-K candidates.

min_p float optional

Filters tokens below a minimum probability threshold.

frequency_penalty float optional

Penalizes tokens that appear frequently.

presence_penalty float optional

Encourages introducing new tokens.

repetition_penalty float optional

Discourages repeating tokens or phrases.

response_format object optional

Constrains output format.

JSON Object
json
{ "type": "json_object" }
JSON Schema
json
{ "type": "json_schema", "json_schema": { "name": "response", "schema": { "type": "object", "properties": { "answer": { "type": "string" } }, "required": ["answer"] } } }
structured_outputs boolean optional

Enables strict schema-constrained generation. When enabled, the model will follow JSON/schema constraints more reliably.

tools array<object> optional

List of available tools (functions). Allows the model to request external function execution.

json
[ { "type": "function", "function": { "name": "get_weather", "description": "Get weather by city", "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] } } } ]
tool_choice string | object optional

Controls how tools are used.

Modes
  • "none" never call tools
  • "auto" model decides
  • "required" must call a tool
Force Specific Tool
json
{ "type": "function", "function": { "name": "get_weather" } }

Responses

200
OK
Request processed successfully. Returns a chat completion object.
401
Unauthorized
Invalid or missing API Key. Check the Authorization header.
429
Too Many Requests
Token limit exceeded. Upgrade to a dedicated instance for higher limits.

Example Request:

json
{ "model": "thedrummer/unslopnemo-12b", "messages": [ { "role": "user", "content": "Weather in Colombo?" } ], "temperature": 0.7, "top_p": 0.9, "max_tokens": 256, "tools": [ { "type": "function", "function": { "name": "get_weather", "parameters": { "type": "object", "properties": { "city": { "type": "string" } }, "required": ["city"] } } } ], "tool_choice": "auto" }

Notes

  • Tool execution is not handled by the model — your application must run it locally and return the results.
  • Structured outputs rely on constrained decoding and may vary slightly by underlying model architecture.
  • Sampling parameters may be implicitly overridden by framework defaults if not explicitly set in the request JSON.