Agent Q Limits
Agent Q enforces limits at multiple layers to ensure fair usage, predictable costs, and platform stability. This page documents every constraint with exact values.
Rate Limits
Rate limiting is applied per authenticated user using a token bucket algorithm.
| Limit | Default |
|---|---|
| Requests per minute | 10 |
| Concurrent streaming sessions | 2 |
Exceeding either limit returns HTTP 429 Too Many Requests. These limits apply equally to the built-in Agent Q chat and to the REST API.
Tokens and Usage
Every user message can trigger multiple LLM calls as the agent reasons, calls tools, and processes results. Token counts accumulate across all LLM calls within a single execution.
What consumes tokens in a single execution:
- System prompt. Loaded once per execution.
- Tool schemas. One short description per active tool.
- Conversation history. All prior messages, subject to compression for long sessions.
- Tool inputs. Parameters sent to each tool call.
- Tool outputs. Results from each tool, capped at 8,000 characters per result.
- LLM response. Text and reasoning generated on each step, capped at 4,000 tokens per individual LLM call.
Per-execution limits:
| Limit | Value |
|---|---|
| LLM API requests per execution | 300 |
| Input tokens per execution | 15,000,000 |
| Output tokens per execution | 15,000,000 |
| Total tokens per execution | 30,500,000 |
| Max tokens per individual LLM call | 4,000 |
These limits apply per user message (or per API call). Exceeding them returns an error indicating which limit was reached.
Timeouts
Timeouts are layered for defense-in-depth:
| Layer | Timeout | Description |
|---|---|---|
| Individual LLM API request | 120 seconds | Per call to the LLM provider |
| Agent execution | 300 seconds (5 min) | Full agent.run() including all tool calls |
| HTTP route handler | 360 seconds (6 min) | Outer safety net on the HTTP request |
If the agent execution timeout is reached, the agent stops and returns whatever it produced up to that point. For long-running tasks, break them into smaller sequential requests.
Tool Output Truncation
Individual tool results are truncated at 8,000 characters before being passed to the LLM. This prevents a single large result (e.g., a container with hundreds of fields) from consuming the entire context window.
Truncation strategy:
- The largest list-shaped fields in the result are trimmed first, repeatedly, until the result fits in the budget.
- Each list is reduced to a small sample so Agent Q still sees the shape of the data.
- A truncation note is appended to the result so Agent Q knows data was cut off.
History truncation:
Tool results from turns older than the last 2 user messages are further truncated to 300 characters in the conversation history. This prevents large outputs from earlier in the conversation from re-consuming tokens on every subsequent turn.
The saved chat history keeps a shortened copy of each tool output (up to 500 characters) so older sessions reload quickly.
Conversation Context and Compression
| Parameter | Value |
|---|---|
| Recent turns kept verbatim | 4 turns (user + assistant pairs) |
| LLM summary trigger | 10 messages |
| Max summary length | 3,000 characters |
| User message truncation (mechanical summary) | 250 characters |
| Assistant message truncation (mechanical summary) | 300 characters |
Compression works in two stages:
- Mechanical compression. Turns beyond the most recent 4 are summarized mechanically: user messages truncated to 250 chars, assistant messages to 300 chars, and tool interactions to 100 chars.
- LLM-generated summary. Once a session reaches 10 messages, the LLM generates a structured summary (up to 3,000 chars) stored with the session and used on all subsequent turns instead of the full history.
Starting a new session resets the context window entirely. This is the most reliable option for very long or unrelated workflows.
SQL Constraints
The preview_query tool and all computed asset SQL inputs only accept SELECT statements and CTEs (WITH clauses).
Blocked statements:
| Statement | Blocked |
|---|---|
INSERT |
Yes |
UPDATE |
Yes |
DELETE |
Yes |
DROP |
Yes |
CREATE |
Yes |
ALTER |
Yes |
TRUNCATE |
Yes |
These are blocked at the API level regardless of what the LLM generates.
Non-deterministic function warnings:
Agent Q generates a warning (but does not block) when SQL contains non-deterministic functions such as NOW(), CURRENT_TIMESTAMP, RANDOM(), and similar. This is because computed assets are expected to produce consistent, reproducible results.
Query execution timeout: 30 seconds by default (range: 5–150 seconds).
Scope Constraints
Datastore types for computed assets:
| Asset Type | Supported Datastores |
|---|---|
| Computed Table | JDBC datastores (PostgreSQL, Snowflake, BigQuery, MySQL, etc.) |
| Computed File | DFS datastores (S3, ADLS, GCS) |
| Computed Join | Any combination of JDBC and/or DFS |
RBAC enforcement:
Agent Q operates strictly within your user account's permissions. It cannot access datastores, containers, or fields that your role does not permit, and cannot trigger operations (profile, scan, export) beyond your permission scope.
Web search (if enabled):
When web search is enabled for your LLM configuration, Agent Q's search is restricted to the following domains only:
userguide.qualytics.ioqualytics.comqualytics.ai
Session ownership:
Users can only access their own chat sessions. Administrators with the Admin role can audit user sessions through the admin API.
Discovery and Listing Limits
These are the default limits for tool calls that list platform assets. They can be adjusted within the allowed range per call.
| Tool | Default Limit | Max |
|---|---|---|
list_datastores |
50 | 200 |
list_containers |
100 | 500 |
list_fields |
200 | 500 |
| Containers fetched for suggestions | 5 | n/a |
Input Limits
| Parameter | Value |
|---|---|
| Paste threshold (triggers attachment mode) | 1,000 characters |
| Max file attachment size | 20 MB |
| Max file attachments per message | 1 |
Very short follow-ups like "yes" or "that one" are recognized as continuations of the conversation and skip the topic check.
Pastes of 1,000 characters or more into the input are captured as an attachment panel rather than inserted inline.
File attachments are accepted only when the active LLM provider supports file uploads. See Attach a File for the supported formats and provider list.
Error Reference
| Error | Cause |
|---|---|
HTTP 429 Too Many Requests |
Exceeded 10 requests/min or 2 concurrent sessions |
Usage limit exceeded |
Hit 300 LLM requests, 15M input tokens, 15M output tokens, or 30.5M total tokens in a single execution |
Agent execution timed out |
Agent ran for more than 5 minutes |
SQL not allowed |
Attempted INSERT, UPDATE, DELETE, or DDL in a query |
Unsupported file format |
The active LLM provider cannot read the attached file (typically a binary Office document sent to a model without binary support). The error is delivered as an event in the SSE chat stream (the response itself is still HTTP 200); the message names the configured model and suggests converting the file to CSV or switching to a supported model. |
HTTP 400 Bad Request |
When calling the agent-chat API directly, the request body was empty, not valid JSON, or not a JSON object. Send a valid JSON object. |
HTTP 422 Unprocessable Content |
The request body was valid JSON but failed schema validation (for example, wrong field types, missing required fields, or an unreachable provider during LLM configuration). The response body lists which field(s) failed. |