Aggregation Comparison Check Examples
Three real-world scenarios that show how the Aggregation Comparison check is typically used in production: reconciling totals across two tables in the same warehouse, comparing row counts between a source system and the downstream warehouse, and enforcing a roll-up relationship between two aggregates on the same container.
The situation: The orders table stores a denormalized total_price per order; the lineitem table stores per-line extended prices that, when summed and adjusted for discount and tax, should equal the rounded total_price roll-up. Any drift between the two indicates a calculation or ingestion problem.
Check configuration
| Field | Value |
|---|---|
| Rule | Aggregation Comparison |
| Aggregation | ROUND(SUM(o_totalprice)) |
| Comparison | Equal To (eq) |
| Datastore | (same as target) |
| Container | lineitem |
| Aggregation (right reference) | ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax))) |
| Filter | (none) |
| Filter Clause (right reference) | (none) |
| Status | Active |
| Owner | (check creator) |
| Anomaly Assignee | (Finance Data on-call) |
| Tags | reconciliation, finance |
| Additional Metadata | jira: DATA-4012 |
| Description | Order totals must equal the line-item roll-up. |
Payload
{
"description": "Order totals must equal the line-item roll-up.",
"rule": "aggregationComparison",
"fields": [],
"container_id": 145,
"filter": null,
"properties": {
"expression": "ROUND(SUM(o_totalprice))",
"comparison": "eq",
"ref_container_id": 178,
"ref_expression": "ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)))",
"ref_filter": null
},
"tags": ["reconciliation", "finance"],
"additional_metadata": {"jira": "DATA-4012"},
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 12
}
Sample Aggregates
| Side | Expression | Evaluated Value |
|---|---|---|
Target (orders) |
ROUND(SUM(o_totalprice)) |
5,000,000 |
Reference (lineitem) |
ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax))) |
4,999,800 |
What gets flagged
The two aggregates differ by 200, so the eq comparison fails and a Shape Anomaly is emitted. There is no row-level output: Aggregation Comparison reports the disagreement at the partition level only.
Shape Anomaly
The target expression ROUND(SUM(o_totalprice)) evaluates to 5000000 which is not equal to the reference expression ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax))) value of 4999800
Flowchart
graph TD
A["Evaluate target aggregation on orders"] --> B["Evaluate reference aggregation on lineitem"]
B --> C{"Does target = reference?"}
C -->|Yes| D["Check passes"]
C -->|No| E["Shape Anomaly:<br/>target value vs reference value"]
Equivalent SQL
-- The check evaluates each side as one aggregate and compares them.
SELECT
(SELECT ROUND(SUM(o_totalprice)) FROM orders) AS target_value,
(SELECT ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax))) FROM lineitem) AS reference_value;
-- The Shape Anomaly fires when target_value <> reference_value.
The situation: A nightly batch lands customer rows into the analytics warehouse from the operational source system. The two row counts must agree after the batch completes. Both filters scope the comparison to today's load window so backfills do not influence the assertion.
Check configuration
| Field | Value |
|---|---|
| Rule | Aggregation Comparison |
| Aggregation | COUNT(*) |
| Comparison | Equal To (eq) |
| Datastore | operational_postgres |
| Container | customers |
| Aggregation (right reference) | COUNT(*) |
| Filter | load_date = current_date() |
| Filter Clause (right reference) | updated_at >= current_date() |
| Status | Active |
| Owner | (check creator) |
| Anomaly Assignee | (Ingestion on-call) |
| Tags | ingestion, reconciliation |
| Additional Metadata | jira: DATA-4138 |
| Description | Warehouse row count must match the source row count for today's window. |
Payload
{
"description": "Warehouse row count must match the source row count for today's window.",
"rule": "aggregationComparison",
"fields": [],
"container_id": 212,
"filter": "load_date = current_date()",
"properties": {
"expression": "COUNT(*)",
"comparison": "eq",
"ref_datastore_id": 9,
"ref_container_id": 47,
"ref_expression": "COUNT(*)",
"ref_filter": "updated_at >= current_date()"
},
"tags": ["ingestion", "reconciliation"],
"additional_metadata": {"jira": "DATA-4138"},
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 18
}
Sample Aggregates
| Side | Expression | Filter | Evaluated Value |
|---|---|---|---|
Target (warehouse customers) |
COUNT(*) |
load_date = current_date() |
10,418 |
Reference (source customers) |
COUNT(*) |
updated_at >= current_date() |
10,500 |
Why both filters matter
Each filter restricts its own side to today's load window. Without the target filter, prior-day rows in the warehouse would inflate the target count. Without the reference filter, untouched source rows would inflate the reference count. Both filters are applied independently before the aggregation runs.
What gets flagged
Eighty-two source rows did not land in the warehouse, so the counts disagree and the eq check fails.
Shape Anomaly
The target expression COUNT(*) evaluates to 10418 which is not equal to the reference expression COUNT(*) value of 10500 [filter: load_date = current_date()] [reference filter: updated_at >= current_date()]
Flowchart
graph TD
A["Apply filter:<br/>load_date = current_date()"] --> B["Evaluate COUNT(*) on warehouse customers"]
A2["Apply reference filter:<br/>updated_at >= current_date()"] --> B2["Evaluate COUNT(*) on source customers"]
B --> C{"Do counts match?"}
B2 --> C
C -->|Yes| D["Check passes"]
C -->|No| E["Shape Anomaly with both<br/>values and filter suffixes"]
Equivalent SQL
-- Target side (warehouse).
SELECT COUNT(*) AS target_count
FROM warehouse.customers
WHERE load_date = current_date();
-- Reference side (source).
SELECT COUNT(*) AS reference_count
FROM source.customers
WHERE updated_at >= current_date();
-- The Shape Anomaly fires when target_count <> reference_count.
The situation: The transactions table stores both gross and net amounts on every row. By definition, the sum of net_amount for any closed batch must be less than or equal to the sum of gross_amount. A violation indicates an upstream calculation error or a corrupted row. Both aggregations target the same container, with a shared filter that restricts the assertion to closed batches.
Check configuration
| Field | Value |
|---|---|
| Rule | Aggregation Comparison |
| Aggregation | SUM(net_amount) |
| Comparison | Less Than Or Equal To (lte) |
| Datastore | (same as target) |
| Container | transactions |
| Aggregation (right reference) | SUM(gross_amount) |
| Filter | batch_status = 'closed' |
| Filter Clause (right reference) | batch_status = 'closed' |
| Status | Active |
| Owner | (check creator) |
| Anomaly Assignee | (Finance Data on-call) |
| Tags | invariant, finance |
| Additional Metadata | jira: DATA-4271 |
| Description | Net sales must never exceed gross sales for closed batches. |
Payload
{
"description": "Net sales must never exceed gross sales for closed batches.",
"rule": "aggregationComparison",
"fields": [],
"container_id": 318,
"filter": "batch_status = 'closed'",
"properties": {
"expression": "SUM(net_amount)",
"comparison": "lte",
"ref_container_id": 318,
"ref_expression": "SUM(gross_amount)",
"ref_filter": "batch_status = 'closed'"
},
"tags": ["invariant", "finance"],
"additional_metadata": {"jira": "DATA-4271"},
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 24
}
Sample Aggregates
| Side | Expression | Filter | Evaluated Value |
|---|---|---|---|
| Target | SUM(net_amount) |
batch_status = 'closed' |
1,250,400.00 |
| Reference | SUM(gross_amount) |
batch_status = 'closed' |
1,248,300.00 |
What gets flagged
The net total is 2,100 higher than the gross total, which is impossible under the business rule. The lte comparison fails and a Shape Anomaly is emitted.
Shape Anomaly
The target expression SUM(net_amount) evaluates to 1250400 which is not less than or equal to the reference expression SUM(gross_amount) value of 1248300 [filter: batch_status = 'closed'] [reference filter: batch_status = 'closed']
Flowchart
graph TD
A["Apply filter: batch_status = 'closed' (both sides)"] --> B["Evaluate SUM(net_amount)"]
A --> B2["Evaluate SUM(gross_amount)"]
B --> C{"Is target <= reference?"}
B2 --> C
C -->|Yes| D["Check passes"]
C -->|No| E["Shape Anomaly:<br/>net is higher than gross"]
Equivalent SQL
Related
- Introduction: formal definition, field scope, and general/anomaly properties.
- How It Works: full semantics, NULL handling, filter behavior, and anomaly templates.
- API: payload shape and field notes for creating an Aggregation Comparison check programmatically.
- FAQ: short answers to the most frequent questions.