Skip to content

Agent Q Limits

Agent Q enforces limits at multiple layers to ensure fair usage, predictable costs, and platform stability. This page documents every constraint with exact values.

Rate Limits

Rate limiting is applied per authenticated user using a token bucket algorithm.

Limit Default
Requests per minute 10
Concurrent streaming sessions 2

Exceeding either limit returns HTTP 429 Too Many Requests. These limits apply equally to the built-in Agent Q chat and to the REST API.

Tokens and Usage

Every user message can trigger multiple LLM calls as the agent reasons, calls tools, and processes results. Token counts accumulate across all LLM calls within a single execution.

What consumes tokens in a single execution:

  1. System prompt. Loaded once per execution.
  2. Tool schemas. One short description per active tool.
  3. Conversation history. All prior messages, subject to compression for long sessions.
  4. Tool inputs. Parameters sent to each tool call.
  5. Tool outputs. Results from each tool, capped at 8,000 characters per result.
  6. LLM response. Text and reasoning generated on each step, capped at 4,000 tokens per individual LLM call.

Per-execution limits:

Limit Value
LLM API requests per execution 300
Input tokens per execution 15,000,000
Output tokens per execution 15,000,000
Total tokens per execution 30,500,000
Max tokens per individual LLM call 4,000

These limits apply per user message (or per API call). Exceeding them returns an error indicating which limit was reached.

Timeouts

Timeouts are layered for defense-in-depth:

Layer Timeout Description
Individual LLM API request 120 seconds Per call to the LLM provider
Agent execution 300 seconds (5 min) Full agent.run() including all tool calls
HTTP route handler 360 seconds (6 min) Outer safety net on the HTTP request

If the agent execution timeout is reached, the agent stops and returns whatever it produced up to that point. For long-running tasks, break them into smaller sequential requests.

Tool Output Truncation

Individual tool results are truncated at 8,000 characters before being passed to the LLM. This prevents a single large result (e.g., a container with hundreds of fields) from consuming the entire context window.

Truncation strategy:

  1. The largest list-shaped fields in the result are trimmed first, repeatedly, until the result fits in the budget.
  2. Each list is reduced to a small sample so Agent Q still sees the shape of the data.
  3. A truncation note is appended to the result so Agent Q knows data was cut off.

History truncation:

Tool results from turns older than the last 2 user messages are further truncated to 300 characters in the conversation history. This prevents large outputs from earlier in the conversation from re-consuming tokens on every subsequent turn.

The saved chat history keeps a shortened copy of each tool output (up to 500 characters) so older sessions reload quickly.

Conversation Context and Compression

Parameter Value
Recent turns kept verbatim 4 turns (user + assistant pairs)
LLM summary trigger 10 messages
Max summary length 3,000 characters
User message truncation (mechanical summary) 250 characters
Assistant message truncation (mechanical summary) 300 characters

Compression works in two stages:

  1. Mechanical compression. Turns beyond the most recent 4 are summarized mechanically: user messages truncated to 250 chars, assistant messages to 300 chars, and tool interactions to 100 chars.
  2. LLM-generated summary. Once a session reaches 10 messages, the LLM generates a structured summary (up to 3,000 chars) stored with the session and used on all subsequent turns instead of the full history.

Starting a new session resets the context window entirely. This is the most reliable option for very long or unrelated workflows.

SQL Constraints

The preview_query tool and all computed asset SQL inputs only accept SELECT statements and CTEs (WITH clauses).

Blocked statements:

Statement Blocked
INSERT Yes
UPDATE Yes
DELETE Yes
DROP Yes
CREATE Yes
ALTER Yes
TRUNCATE Yes

These are blocked at the API level regardless of what the LLM generates.

Non-deterministic function warnings:

Agent Q generates a warning (but does not block) when SQL contains non-deterministic functions such as NOW(), CURRENT_TIMESTAMP, RANDOM(), and similar. This is because computed assets are expected to produce consistent, reproducible results.

Query execution timeout: 30 seconds by default (range: 5–150 seconds).

Scope Constraints

Datastore types for computed assets:

Asset Type Supported Datastores
Computed Table JDBC datastores (PostgreSQL, Snowflake, BigQuery, MySQL, etc.)
Computed File DFS datastores (S3, ADLS, GCS)
Computed Join Any combination of JDBC and/or DFS

RBAC enforcement:

Agent Q operates strictly within your user account's permissions. It cannot access datastores, containers, or fields that your role does not permit, and cannot trigger operations (profile, scan, export) beyond your permission scope.

Web search (if enabled):

When web search is enabled for your LLM configuration, Agent Q's search is restricted to the following domains only:

  • userguide.qualytics.io
  • qualytics.com
  • qualytics.ai

Session ownership:

Users can only access their own chat sessions. Administrators with the Admin role can audit user sessions through the admin API.

Discovery and Listing Limits

These are the default limits for tool calls that list platform assets. They can be adjusted within the allowed range per call.

Tool Default Limit Max
list_datastores 50 200
list_containers 100 500
list_fields 200 500
Containers fetched for suggestions 5 n/a

Input Limits

Parameter Value
Paste threshold (triggers attachment mode) 1,000 characters
Max file attachment size 20 MB
Max file attachments per message 1

Very short follow-ups like "yes" or "that one" are recognized as continuations of the conversation and skip the topic check.

Pastes of 1,000 characters or more into the input are captured as an attachment panel rather than inserted inline.

File attachments are accepted only when the active LLM provider supports file uploads. See Attach a File for the supported formats and provider list.

Error Reference

Error Cause
HTTP 429 Too Many Requests Exceeded 10 requests/min or 2 concurrent sessions
Usage limit exceeded Hit 300 LLM requests, 15M input tokens, 15M output tokens, or 30.5M total tokens in a single execution
Agent execution timed out Agent ran for more than 5 minutes
SQL not allowed Attempted INSERT, UPDATE, DELETE, or DDL in a query
Unsupported file format The active LLM provider cannot read the attached file (typically a binary Office document sent to a model without binary support). The error is delivered as an event in the SSE chat stream (the response itself is still HTTP 200); the message names the configured model and suggests converting the file to CSV or switching to a supported model.
HTTP 400 Bad Request When calling the agent-chat API directly, the request body was empty, not valid JSON, or not a JSON object. Send a valid JSON object.
HTTP 422 Unprocessable Content The request body was valid JSON but failed schema validation (for example, wrong field types, missing required fields, or an unreachable provider during LLM configuration). The response body lists which field(s) failed.