AgentQ Limits

AgentQ enforces limits at multiple layers to ensure fair usage, predictable costs, and platform stability. This page documents every constraint with exact values.

Rate Limits

Rate limiting is applied per authenticated user using a token bucket algorithm.

Limit	Default
Requests per minute	10
Concurrent streaming sessions	2

Exceeding either limit returns HTTP 429 Too Many Requests. These limits apply equally to the built-in AgentQ chat and to the REST API.

Tokens and Usage

Every user message can trigger multiple LLM calls as the agent reasons, calls tools, and processes results. Token counts accumulate across all LLM calls within a single execution.

What consumes tokens in a single execution:

System prompt. Loaded once per execution.
Tool schemas. One short description per active tool.
Conversation history. All prior messages, subject to compression for long sessions.
Tool inputs. Parameters sent to each tool call.
Tool outputs. Results from each tool, capped at 8,000 characters per result.
LLM response. Text and reasoning generated on each step, capped at 4,000 tokens per individual LLM call.

Per-execution limits:

Limit	Value
LLM API requests per execution	300
Input tokens per execution	15,000,000
Output tokens per execution	15,000,000
Total tokens per execution	30,500,000
Max tokens per individual LLM call	4,000

These limits apply per user message (or per API call). Exceeding them returns an error indicating which limit was reached.

Timeouts

Timeouts are layered for defense-in-depth:

Layer	Timeout	Description
Individual LLM API request	120 seconds	Per call to the LLM provider
Agent execution	300 seconds (5 min)	Full `agent.run()` including all tool calls
HTTP route handler	360 seconds (6 min)	Outer safety net on the HTTP request

If the agent execution timeout is reached, the agent stops and returns whatever it produced up to that point. For long-running tasks, break them into smaller sequential requests.

Tool Output Truncation

Individual tool results are truncated at 8,000 characters before being passed to the LLM. This prevents a single large result (e.g., a container with hundreds of fields) from consuming the entire context window.

Truncation strategy:

The largest list-shaped fields in the result are trimmed first, repeatedly, until the result fits in the budget.
Each list is reduced to a small sample so AgentQ still sees the shape of the data.
A truncation note is appended to the result so AgentQ knows data was cut off.

History truncation:

Tool results from turns older than the last 2 user messages are further truncated to 300 characters in the conversation history. This prevents large outputs from earlier in the conversation from re-consuming tokens on every subsequent turn.

The saved chat history keeps a shortened copy of each tool output (up to 500 characters) so older sessions reload quickly.

Conversation Context and Compression

Parameter	Value
Recent turns kept verbatim	4 turns (user + assistant pairs)
LLM summary trigger	10 messages
Max summary length	3,000 characters
User message truncation (mechanical summary)	250 characters
Assistant message truncation (mechanical summary)	300 characters

Compression works in two stages:

Mechanical compression. Turns beyond the most recent 4 are summarized mechanically: user messages truncated to 250 chars, assistant messages to 300 chars, and tool interactions to 100 chars.
LLM-generated summary. Once a session reaches 10 messages, the LLM generates a structured summary (up to 3,000 chars) stored with the session and used on all subsequent turns instead of the full history.

Starting a new session resets the context window entirely. This is the most reliable option for very long or unrelated workflows.

SQL Constraints

The preview_query tool and all computed asset SQL inputs only accept SELECT statements and CTEs (WITH clauses).

Blocked statements:

Statement	Blocked
`INSERT`	Yes
`UPDATE`	Yes
`DELETE`	Yes
`DROP`	Yes
`CREATE`	Yes
`ALTER`	Yes
`TRUNCATE`	Yes

These are blocked at the API level regardless of what the LLM generates.

Non-deterministic function warnings:

AgentQ generates a warning (but does not block) when SQL contains non-deterministic functions such as NOW(), CURRENT_TIMESTAMP, RANDOM(), and similar. This is because computed assets are expected to produce consistent, reproducible results.

Query execution timeout: 30 seconds by default (range: 5–150 seconds).

Scope Constraints

Datastore types for computed assets:

Asset Type	Supported Datastores
Computed Table	JDBC datastores (PostgreSQL, Snowflake, BigQuery, MySQL, etc.)
Computed File	DFS datastores (S3, ADLS, GCS)
Computed Join	Any combination of JDBC and/or DFS

RBAC enforcement:

AgentQ operates strictly within your user account's permissions. It cannot access datastores, containers, or fields that your role does not permit, and cannot trigger operations (profile, scan, export) beyond your permission scope.

Web search (if enabled):

When web search is enabled for your LLM configuration, AgentQ's search is restricted to the following domains only:

userguide.qualytics.io
qualytics.com
qualytics.ai

Session ownership:

Users can only access their own chat sessions. Administrators with the Admin role can audit user sessions through the admin API.

Discovery and Listing Limits

These are the default limits for tool calls that list platform assets. They can be adjusted within the allowed range per call.

Tool	Default Limit	Max
`list_datastores`	50	200
`list_containers`	100	500
`list_fields`	200	500
Containers fetched for suggestions	5	n/a

Input Limits

Parameter	Value
Paste threshold (triggers attachment mode)	1,000 characters
Max file attachment size	20 MB
Max file attachments per message	1

Very short follow-ups like "yes" or "that one" are recognized as continuations of the conversation and skip the topic check.

Pastes of 1,000 characters or more into the input are captured as an attachment panel rather than inserted inline.

File attachments are accepted only when the active LLM provider supports file uploads. See Attach a File for the supported formats and provider list.

Error Reference

Error	Cause
`HTTP 429 Too Many Requests`	Exceeded 10 requests/min or 2 concurrent sessions
`Usage limit exceeded`	Hit 300 LLM requests, 15M input tokens, 15M output tokens, or 30.5M total tokens in a single execution
`Agent execution timed out`	Agent ran for more than 5 minutes
`SQL not allowed`	Attempted INSERT, UPDATE, DELETE, or DDL in a query
`Unsupported file format`	The active LLM provider cannot read the attached file (typically a binary Office document sent to a model without binary support). The error is delivered as an event in the SSE chat stream (the response itself is still `HTTP 200`); the message names the configured model and suggests converting the file to CSV or switching to a supported model.
`HTTP 400 Bad Request`	When calling the agent-chat API directly, the request body was empty, not valid JSON, or not a JSON object. Send a valid JSON object.
`HTTP 422 Unprocessable Content`	The request body was valid JSON but failed schema validation (for example, wrong field types, missing required fields, or an unreachable provider during LLM configuration). The response body lists which field(s) failed.