Agent Q Limits
Agent Q enforces limits at multiple layers to ensure fair usage, predictable costs, and platform stability. This page documents every constraint with exact values.
Rate Limits
Rate limiting is applied per authenticated user using a token bucket algorithm.
| Limit | Default |
|---|---|
| Requests per minute | 10 |
| Concurrent streaming sessions | 2 |
Exceeding either limit returns HTTP 429 Too Many Requests. These limits apply equally to the built-in Agent Q chat and to the REST API.
Note
Rate limiting is in-memory and per-pod. In multi-replica deployments, effective limits scale with the number of replicas.
Tokens and Usage
Every user message can trigger multiple LLM calls as the agent reasons, calls tools, and processes results. Token counts accumulate across all LLM calls within a single execution.
What consumes tokens in a single execution:
- System prompt — loaded once per execution (~2,000 tokens)
- Tool schemas — each active tool description (100–400 tokens each)
- Conversation history — all prior messages, subject to compression for long sessions
- Tool inputs — parameters sent to each tool call
- Tool outputs — results from each tool, capped at 8,000 characters per result
- LLM response — text and reasoning generated on each step, capped at 4,000 tokens per individual LLM call
Per-execution limits:
| Limit | Value |
|---|---|
| LLM API requests per execution | 50 |
| Input tokens per execution | 1,000,000 |
| Output tokens per execution | 500,000 |
| Total tokens per execution | 1,500,000 |
| Max tokens per individual LLM call | 4,000 |
These limits apply per user message (or per API call). Exceeding them returns an error indicating which limit was reached.
Timeouts
Timeouts are layered for defense-in-depth:
| Layer | Timeout | Description |
|---|---|---|
| Individual LLM API request | 120 seconds | Per call to the LLM provider |
| Agent execution | 300 seconds (5 min) | Full agent.run() including all tool calls |
| HTTP route handler | 360 seconds (6 min) | Outer safety net on the HTTP request |
If the agent execution timeout is reached, the agent stops and returns whatever it produced up to that point. For long-running tasks, break them into smaller sequential requests.
Tool Output Truncation
Individual tool results are truncated at 8,000 characters before being passed to the LLM. This prevents a single large result (e.g., a container with hundreds of fields) from consuming the entire context window.
Truncation strategy:
- The largest list/array fields in the result are trimmed first, iteratively, up to 20 iterations.
- Each list is reduced to at most 5 items (or half the original length, whichever is smaller).
- A truncation note is appended to the result so Agent Q knows data was cut off.
History truncation:
Tool results from turns older than the last 2 user messages are further truncated to 300 characters in the conversation history. This prevents large outputs from earlier in the conversation from re-consuming tokens on every subsequent turn.
Tool output metadata stored in the database is also truncated to 500 characters to keep metadata columns lean.
Conversation Context and Compression
| Parameter | Value |
|---|---|
| Recent turns kept verbatim | 4 turns (user + assistant pairs) |
| LLM summary trigger | 10 messages |
| Max summary length | 3,000 characters |
| User message truncation (mechanical summary) | 100 characters |
| Assistant message truncation (mechanical summary) | 150 characters |
Compression works in two stages:
- Mechanical compression — Turns beyond the most recent 4 are summarized mechanically: user messages truncated to 100 chars, assistant messages to 150 chars, tool interactions to 100 chars.
- LLM-generated summary — Once a session reaches 10 messages, the LLM generates a structured summary (up to 3,000 chars) stored with the session and used on all subsequent turns instead of the full history.
Starting a new session resets the context window entirely — the most reliable option for very long or unrelated workflows.
SQL Constraints
The preview_query tool and all computed asset SQL inputs only accept SELECT statements and CTEs (WITH clauses).
Blocked statements:
| Statement | Blocked |
|---|---|
INSERT |
Yes |
UPDATE |
Yes |
DELETE |
Yes |
DROP |
Yes |
CREATE |
Yes |
ALTER |
Yes |
TRUNCATE |
Yes |
These are blocked at the API level regardless of what the LLM generates.
Non-deterministic function warnings:
Agent Q generates a warning (but does not block) when SQL contains non-deterministic functions such as NOW(), CURRENT_TIMESTAMP, RANDOM(), and similar. This is because computed assets are expected to produce consistent, reproducible results.
Query execution timeout: 30 seconds by default (range: 5–150 seconds).
Scope Constraints
Datastore types for computed assets:
| Asset Type | Supported Datastores |
|---|---|
| Computed Table | JDBC datastores (PostgreSQL, Snowflake, BigQuery, MySQL, etc.) |
| Computed File | DFS datastores (S3, ADLS, GCS) |
| Computed Join | Any combination of JDBC and/or DFS |
RBAC enforcement:
Agent Q operates strictly within your user account's permissions. It cannot access datastores, containers, or fields that your role does not permit, and cannot trigger operations (profile, scan, export) beyond your permission scope.
Web search (if enabled):
When web search is enabled for your LLM configuration, Agent Q's search is restricted to the following domains only:
userguide.qualytics.ioqualytics.comqualytics.ai
Session ownership:
Users can only access their own chat sessions. Administrators can access any user's session via the API using an optional user_id parameter.
Discovery and Listing Limits
These are the default limits for tool calls that list platform assets. They can be adjusted within the allowed range per call.
| Tool | Default Limit | Max |
|---|---|---|
list_datastores |
50 | 200 |
list_containers |
100 | 500 |
list_fields |
200 | 500 |
discover_tools results |
8 tools | — |
| Containers fetched for suggestions | 5 | — |
Input Limits
| Parameter | Value |
|---|---|
| Paste threshold (triggers attachment mode) | 200 characters |
| Guardrail bypass threshold (very short messages) | 5 characters |
Messages shorter than 5 characters (e.g., "hi", "yes") bypass the topic guardrail classifier entirely for speed. Messages of 200 characters or more pasted into the input are captured as an attachment panel rather than inserted inline.
Error Reference
| Error | Cause |
|---|---|
HTTP 429 Too Many Requests |
Exceeded 10 requests/min or 2 concurrent sessions |
Usage limit exceeded |
Hit 50 LLM requests, 1M input tokens, 500K output tokens, or 1.5M total tokens in a single execution |
Agent execution timed out |
Agent ran for more than 5 minutes |
SQL not allowed |
Attempted INSERT, UPDATE, DELETE, or DDL in a query |