# AI Model Service FAQ

Frequently asked questions about the Phoeniqs AI Model Service — hosted open-source LLMs, API access, subscriptions, and inference.

For product documentation, see Phoeniqs AI Model Service.


# Questions

Every active model is exposed through an OpenAI-compatible API at https://maas.phoeniqs.com/. To send an inference request you need three things: a Base URL, a Model Name from the Active Models table (for example, inference-llama4-maverick), and an API Key passed in the Authorization: Bearer YOUR_API_KEY header. You can use standard OpenAI client libraries in Python, Node.js, or cURL — just point the client at the Phoeniqs base URL and swap in your model name and key.

Yes. The free Evaluation plan gives you 1 million tokens for 30 days with access to the full model catalogue through a single API key — no credit card and no monthly commitment. Use it to test model quality, compare models, or validate a use case before moving to the Exploratory plan (CHF 25/month), which includes 25 usage credits (1 credit = CHF 1 of model usage) and a basic SLA. If you need more credits than your monthly allowance, submit a service request — additional credits are not self-provisioned in the portal.

Models are curated open-source LLMs hosted entirely on Swiss sovereign infrastructure, served with vLLM for optimized inference. Usage is billed per token: each model has its own input and output credit rates per million tokens, listed in the Active Models table. Credits are consumed based on the number of tokens processed, and different models consume credits at different rates. Billing is predictable and tied to actual token volume rather than fixed per-request pricing.

Some active models are available in two variants through the same API: the standard model and a guardrailed variant with a -GRC suffix (Governance, Risk & Compliance). Use the same base URL and API key — only the model name changes. The -GRC variant screens every request at the gateway before it reaches the model, blocking high-risk content (such as violence or leaked credentials) and masking sensitive identifiers (such as emails or phone numbers). Use -GRC when you need content safety and data filtering enforced in production; use the standard variant for research, evaluation, or internal tools where you want raw model behaviour.

The most common causes include:

  • Excessive context passed through MCP (Model Context Protocol)
  • Tool-calling loops, where each tool result is injected back into the prompt
  • Long conversation history that is continuously appended without truncation
  • Large prompts or system instructions
  • Token estimation mismatches due to tokenizer differences or fields added late in the request pipeline

When combined, these factors can quickly exceed the model's context window.

A 400 Bad Request error related to tokens typically occurs when the total estimated input tokens exceed the model's context window.

Phoeniqs' AI gateway (LiteLLM) performs a protective pre-check before sending the request to the model. If the request would exceed the model's allowed context window, LiteLLM rejects the request early to avoid unnecessary compute usage. This situation can result in zero or negative tokens remaining for the model's response, which triggers the error.