How Aggregation Comparison Checks Work
This page covers everything the Aggregation Comparison check does, in detail: how it evaluates the two sides, how NULL or non-finite results are treated, how filters scope each side, the resulting anomaly, and how the rule relates to other reconciliation rule types.
If you only need a quick reference, the Introduction page covers the formal definition, field scope, and general/anomaly properties. This page is the detailed reference.
Evaluation Flow
Every Aggregation Comparison check follows the same four-step evaluation flow:
- Apply the target filter. If the check has a
filterset, the platform restricts the target container to matching rows before computing the target aggregation. Rows outside the filter are excluded from the target aggregate. - Apply the reference filter. If
ref_filteris set, the same rule applies to the reference container. - Evaluate both aggregations. The target expression runs against the filtered target container; the reference expression runs against the filtered reference container. Each side must produce a single aggregate value (one row, one column).
- Compare the two values. The comparison operator is applied as
target <op> reference. When the relationship holds, the check passes. When it does not, a Shape Anomaly is emitted for the partition.
Aggregation Comparison emits Shape Anomalies only. The rule has no row-level evaluation, so it never produces Record Anomalies and the Custom Anomaly Description option is not applicable.
NULL and Non-Finite Aggregates
The check evaluates two aggregate values, not a column of rows, so NULL semantics apply to the aggregate result rather than to individual cells:
- If the target aggregation evaluates to
NULL,NaN, or an infinite value, the side is rendered asnull/NaNin the anomaly message and the comparison fails. - The same applies to the reference side.
- A failure on either side produces a Shape Anomaly so the disagreement is surfaced rather than silently passing.
In practice, a NULL aggregate usually means the filtered set is empty (for example, SUM(amount) over zero rows yields NULL) or the expression references a column that is itself null for every row in scope. When that is expected, lift the filter or add a COALESCE around the expression so the aggregate returns a defined value.
The Filter Clauses
Aggregation Comparison has two independent filter clauses, both expressed as SQL WHERE expressions:
filterscopes the target aggregation. Defined on the main check form.ref_filterscopes the reference aggregation. Defined on the Right Reference panel.
Each filter applies to its own side before the aggregation runs. Filtered-out rows on one side never influence the aggregate on the other side.
When either filter is set, the anomaly message includes the filter expression so the evaluated scope is visible in the alert. The exact message format is described in The Resulting Anomaly below.
Coverage
Aggregation Comparison does not use a coverage threshold. The check evaluates one boolean condition between two aggregate values; either the relationship holds or it does not. The coverage field on the API payload is ignored.
The Resulting Anomaly
Anomaly message format
When the comparison fails and both aggregates evaluated successfully, the Shape Anomaly message is:
The target expression '<expression>' evaluates to <target_value> which is not <comparison> the reference expression '<ref_expression>' value of <reference_value>
<comparison> is the long form of the operator: less than, less than or equal to, equal to, greater than or equal to, or greater than.
When either aggregate yields NULL, NaN, or an infinite value, that side renders as null/NaN in the message and the comparison is reported as failing.
When the target filter is set, the message ends with [filter: <expression>]. When the reference filter is set, the message ends with [reference filter: <expression>]. Both can appear at once.
Source Records Behavior
Aggregation Comparison is a Shape-only rule and does not emit Record Anomalies, so the Source Records view does not apply. The Shape Anomaly carries the two evaluated values directly in the message.
Relationship with Other Rule Types
Aggregation Comparison is one of several rule types that compare data across two sources. Use them together when a single relationship is not enough.
| Rule Type | Why pair it with Aggregation Comparison |
|---|---|
| Volumetric | Validates that a partition's row count stays within an expected range based on historical volume. Aggregation Comparison covers cross-source reconciliation; Volumetric covers within-source drift. |
| Data Diff | Performs a row-level diff between two containers. Choose Data Diff when you need to know which rows differ; choose Aggregation Comparison when you only need an aggregate-level reconciliation. |
| Metric | Asserts that a single aggregate falls inside an expected interval. Aggregation Comparison requires two aggregates that must agree with each other; Metric pins one aggregate to a fixed boundary. |
Performance Considerations
Aggregation Comparison computes two aggregate values, one per side. The cost is dominated by how each aggregation is evaluated and the rows the underlying engine has to scan:
- Filters narrow the data scanned on each side. A target filter that selects today's partition, or a reference filter that limits scope to the matching segment, can reduce the work substantially.
- Cross-datastore reconciliations load the reference container from its own datastore. Keep the reference aggregation simple (one
SUM, oneCOUNT) so the engine can push the aggregation down rather than fetching rows. - Aggregations that depend on Check Variables (for example,
{{ batch_id }}) are resolved before evaluation, so they do not add runtime overhead beyond the variable lookup.
Related
- Introduction: formal definition, field scope, and general/anomaly properties.
- Examples: three production scenarios with sample data and resulting anomalies.
- API: payload shape and field notes for creating an Aggregation Comparison check programmatically.
- FAQ: short answers to the most frequent questions.