DocsServerless InferenceChat Completion

Chat Completion

This endpoint generates chat completions based on a list of messages.

POST

https://api.geodd.io/inference/v1/chat/completions

Authorizations

Authorization string header required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

model string required

Model name to use for text generation.

Example: thedrummer/unslopnemo-12b

messages array<object> required

A list of user messages comprising the conversation input so far.

role enum<string> required

The role of the message's author.

Available options: user, assistant

content string required

The contents of the user's input query or the assistant's previous response.

max_tokens integer optional

Maximum number of tokens to generate.

temperature float optional

Controls randomness.

0 → deterministic

1 → more creative

top_p float optional

Nucleus sampling. Limits tokens to a probability mass.

stop string | array<string> optional

Stops generation when one of the sequences is generated.

seed integer optional

Makes output reproducible (best effort).

top_k integer optional

Limits token selection to top-K candidates.

min_p float optional

Filters tokens below a minimum probability threshold.

frequency_penalty float optional

Penalizes tokens that appear frequently.

presence_penalty float optional

Encourages introducing new tokens.

repetition_penalty float optional

Discourages repeating tokens or phrases.

response_format object optional

Constrains output format.

JSON Object

json

{
  "type": "json_object"
}

JSON Schema

json

{
  "type": "json_schema",
  "json_schema": {
    "name": "response",
    "schema": {
      "type": "object",
      "properties": {
        "answer": { "type": "string" }
      },
      "required": ["answer"]
    }
  }
}

structured_outputs boolean optional

Enables strict schema-constrained generation. When enabled, the model will follow JSON/schema constraints more reliably.

tools array<object> optional

List of available tools (functions). Allows the model to request external function execution.

json

[
  {
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather by city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": { "type": "string" }
        },
        "required": ["city"]
      }
    }
  }
]

tool_choice string | object optional

Controls how tools are used.

Modes

"none" never call tools
"auto" model decides
"required" must call a tool

Force Specific Tool

json

{
  "type": "function",
  "function": { "name": "get_weather" }
}

Responses

200

Request processed successfully. Returns a chat completion object.

401

Unauthorized

Invalid or missing API Key. Check the Authorization header.

429

Too Many Requests

Token limit exceeded. Upgrade to a dedicated instance for higher limits.

Example Request:

json

{
  "model": "thedrummer/unslopnemo-12b",
  "messages": [
    { "role": "user", "content": "Weather in Colombo?" }
  ],
  "temperature": 0.7,
  "top_p": 0.9,
  "max_tokens": 256,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string" }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Notes

Tool execution is not handled by the model — your application must run it locally and return the results.
Structured outputs rely on constrained decoding and may vary slightly by underlying model architecture.
Sampling parameters may be implicitly overridden by framework defaults if not explicitly set in the request JSON.

Data Privacy

Completion