Skip to content

Aggregation Comparison Check Examples

Three real-world scenarios that show how the Aggregation Comparison check is typically used in production: reconciling totals across two tables in the same warehouse, comparing row counts between a source system and the downstream warehouse, and enforcing a roll-up relationship between two aggregates on the same container.

The situation: The orders table stores a denormalized total_price per order; the lineitem table stores per-line extended prices that, when summed and adjusted for discount and tax, should equal the rounded total_price roll-up. Any drift between the two indicates a calculation or ingestion problem.

Check configuration

Field Value
Rule Aggregation Comparison
Aggregation ROUND(SUM(o_totalprice))
Comparison Equal To (eq)
Datastore (same as target)
Container lineitem
Aggregation (right reference) ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)))
Filter (none)
Filter Clause (right reference) (none)
Status Active
Owner (check creator)
Anomaly Assignee (Finance Data on-call)
Tags reconciliation, finance
Additional Metadata jira: DATA-4012
Description Order totals must equal the line-item roll-up.

Payload

{
    "description": "Order totals must equal the line-item roll-up.",
    "rule": "aggregationComparison",
    "fields": [],
    "container_id": 145,
    "filter": null,
    "properties": {
        "expression": "ROUND(SUM(o_totalprice))",
        "comparison": "eq",
        "ref_container_id": 178,
        "ref_expression": "ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)))",
        "ref_filter": null
    },
    "tags": ["reconciliation", "finance"],
    "additional_metadata": {"jira": "DATA-4012"},
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 12
}

Sample Aggregates

Side Expression Evaluated Value
Target (orders) ROUND(SUM(o_totalprice)) 5,000,000
Reference (lineitem) ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax))) 4,999,800

What gets flagged

The two aggregates differ by 200, so the eq comparison fails and a Shape Anomaly is emitted. There is no row-level output: Aggregation Comparison reports the disagreement at the partition level only.

Shape Anomaly

The target expression ROUND(SUM(o_totalprice)) evaluates to 5000000 which is not equal to the reference expression ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax))) value of 4999800

Flowchart

graph TD
    A["Evaluate target aggregation on orders"] --> B["Evaluate reference aggregation on lineitem"]
    B --> C{"Does target = reference?"}
    C -->|Yes| D["Check passes"]
    C -->|No| E["Shape Anomaly:<br/>target value vs reference value"]

Equivalent SQL

-- The check evaluates each side as one aggregate and compares them.
SELECT
    (SELECT ROUND(SUM(o_totalprice)) FROM orders) AS target_value,
    (SELECT ROUND(SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax))) FROM lineitem) AS reference_value;
-- The Shape Anomaly fires when target_value <> reference_value.

The situation: A nightly batch lands customer rows into the analytics warehouse from the operational source system. The two row counts must agree after the batch completes. Both filters scope the comparison to today's load window so backfills do not influence the assertion.

Check configuration

Field Value
Rule Aggregation Comparison
Aggregation COUNT(*)
Comparison Equal To (eq)
Datastore operational_postgres
Container customers
Aggregation (right reference) COUNT(*)
Filter load_date = current_date()
Filter Clause (right reference) updated_at >= current_date()
Status Active
Owner (check creator)
Anomaly Assignee (Ingestion on-call)
Tags ingestion, reconciliation
Additional Metadata jira: DATA-4138
Description Warehouse row count must match the source row count for today's window.

Payload

{
    "description": "Warehouse row count must match the source row count for today's window.",
    "rule": "aggregationComparison",
    "fields": [],
    "container_id": 212,
    "filter": "load_date = current_date()",
    "properties": {
        "expression": "COUNT(*)",
        "comparison": "eq",
        "ref_datastore_id": 9,
        "ref_container_id": 47,
        "ref_expression": "COUNT(*)",
        "ref_filter": "updated_at >= current_date()"
    },
    "tags": ["ingestion", "reconciliation"],
    "additional_metadata": {"jira": "DATA-4138"},
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 18
}

Sample Aggregates

Side Expression Filter Evaluated Value
Target (warehouse customers) COUNT(*) load_date = current_date() 10,418
Reference (source customers) COUNT(*) updated_at >= current_date() 10,500

Why both filters matter

Each filter restricts its own side to today's load window. Without the target filter, prior-day rows in the warehouse would inflate the target count. Without the reference filter, untouched source rows would inflate the reference count. Both filters are applied independently before the aggregation runs.

What gets flagged

Eighty-two source rows did not land in the warehouse, so the counts disagree and the eq check fails.

Shape Anomaly

The target expression COUNT(*) evaluates to 10418 which is not equal to the reference expression COUNT(*) value of 10500 [filter: load_date = current_date()] [reference filter: updated_at >= current_date()]

Flowchart

graph TD
    A["Apply filter:<br/>load_date = current_date()"] --> B["Evaluate COUNT(*) on warehouse customers"]
    A2["Apply reference filter:<br/>updated_at >= current_date()"] --> B2["Evaluate COUNT(*) on source customers"]
    B --> C{"Do counts match?"}
    B2 --> C
    C -->|Yes| D["Check passes"]
    C -->|No| E["Shape Anomaly with both<br/>values and filter suffixes"]

Equivalent SQL

-- Target side (warehouse).
SELECT COUNT(*) AS target_count
FROM warehouse.customers
WHERE load_date = current_date();

-- Reference side (source).
SELECT COUNT(*) AS reference_count
FROM source.customers
WHERE updated_at >= current_date();
-- The Shape Anomaly fires when target_count <> reference_count.

The situation: The transactions table stores both gross and net amounts on every row. By definition, the sum of net_amount for any closed batch must be less than or equal to the sum of gross_amount. A violation indicates an upstream calculation error or a corrupted row. Both aggregations target the same container, with a shared filter that restricts the assertion to closed batches.

Check configuration

Field Value
Rule Aggregation Comparison
Aggregation SUM(net_amount)
Comparison Less Than Or Equal To (lte)
Datastore (same as target)
Container transactions
Aggregation (right reference) SUM(gross_amount)
Filter batch_status = 'closed'
Filter Clause (right reference) batch_status = 'closed'
Status Active
Owner (check creator)
Anomaly Assignee (Finance Data on-call)
Tags invariant, finance
Additional Metadata jira: DATA-4271
Description Net sales must never exceed gross sales for closed batches.

Payload

{
    "description": "Net sales must never exceed gross sales for closed batches.",
    "rule": "aggregationComparison",
    "fields": [],
    "container_id": 318,
    "filter": "batch_status = 'closed'",
    "properties": {
        "expression": "SUM(net_amount)",
        "comparison": "lte",
        "ref_container_id": 318,
        "ref_expression": "SUM(gross_amount)",
        "ref_filter": "batch_status = 'closed'"
    },
    "tags": ["invariant", "finance"],
    "additional_metadata": {"jira": "DATA-4271"},
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 24
}

Sample Aggregates

Side Expression Filter Evaluated Value
Target SUM(net_amount) batch_status = 'closed' 1,250,400.00
Reference SUM(gross_amount) batch_status = 'closed' 1,248,300.00

What gets flagged

The net total is 2,100 higher than the gross total, which is impossible under the business rule. The lte comparison fails and a Shape Anomaly is emitted.

Shape Anomaly

The target expression SUM(net_amount) evaluates to 1250400 which is not less than or equal to the reference expression SUM(gross_amount) value of 1248300 [filter: batch_status = 'closed'] [reference filter: batch_status = 'closed']

Flowchart

graph TD
    A["Apply filter: batch_status = 'closed' (both sides)"] --> B["Evaluate SUM(net_amount)"]
    A --> B2["Evaluate SUM(gross_amount)"]
    B --> C{"Is target <= reference?"}
    B2 --> C
    C -->|Yes| D["Check passes"]
    C -->|No| E["Shape Anomaly:<br/>net is higher than gross"]

Equivalent SQL

-- Both sides aggregate the same container with the same filter.
SELECT SUM(net_amount) AS net_total,
       SUM(gross_amount) AS gross_total
FROM transactions
WHERE batch_status = 'closed';
-- The Shape Anomaly fires when net_total > gross_total.
  • Introduction: formal definition, field scope, and general/anomaly properties.
  • How It Works: full semantics, NULL handling, filter behavior, and anomaly templates.
  • API: payload shape and field notes for creating an Aggregation Comparison check programmatically.
  • FAQ: short answers to the most frequent questions.