Aggregation Comparison Check FAQ
Common questions about how the Aggregation Comparison check evaluates two aggregates, how it handles NULL or non-finite results, how filters scope each side, and how anomalies are reported.
Behavior
Which comparison operators are supported?
Five operators are accepted: lt (less than), lte (less than or equal to), eq (equal to), gte (greater than or equal to), and gt (greater than). The comparison is always applied as target <op> reference.
Can the reference container live in a different datastore?
Yes. Set ref_datastore_id in the payload to the datastore holding the reference container. When the reference container is in the same datastore as the target, omit ref_datastore_id (or send null) and only set ref_container_id.
Can the target and reference be the same container?
Yes. Set ref_container_id to the same ID as container_id and use the Right Reference panel to define the reference aggregation. This is useful when asserting an invariant between two aggregates on the same dataset (for example, SUM(net) <= SUM(gross)).
What happens when an aggregation returns NULL?
If the target or reference aggregation evaluates to NULL, NaN, or an infinite value, that side renders as null/NaN in the anomaly message and the comparison fails. A common cause is an empty filtered set; lift the filter or wrap the expression in COALESCE so the aggregate returns a defined value.
Do the two aggregations need to produce comparable types?
Yes. Both expressions are evaluated as numeric values for the comparison. Wrap non-numeric outputs (for example, MIN(some_date)) so they reduce to a numeric form that the operator can compare.
Anomaly Reporting
What does the anomaly message look like?
When both aggregates evaluate successfully:
The target expression '<expression>' evaluates to <target_value> which is not <comparison> the reference expression '<ref_expression>' value of <reference_value>
When either aggregate fails to evaluate, that side renders as null/NaN and the same template is used.
When the target filter is set, the message ends with [filter: <expression>]. When the reference filter is set, the message ends with [reference filter: <expression>]. Both suffixes can appear together.
Does Aggregation Comparison emit Record Anomalies?
No. The rule is Shape-only. It evaluates one boolean condition between two aggregate values and reports a single Shape Anomaly when the relationship does not hold. There are no per-row anomalies.
Does Custom Anomaly Description work for Aggregation Comparison?
No. Custom Anomaly Description (and the anomaly_message_field payload field) applies only to Record Anomalies. Aggregation Comparison emits Shape Anomalies, which always use the fixed template described above.
Why does the message show null/NaN instead of a number?
That side's aggregation returned NULL, NaN, or an infinite value. The most common causes are an empty filtered set, a column that is null for every row in scope, or an expression that produced an undefined result (for example, division by zero).
Configuration
Can I lower the coverage on an Aggregation Comparison check?
No. The rule does not use a coverage threshold. The coverage field on the API payload is ignored. Either the comparison holds for the two aggregates or it does not.
Can I change the comparison operator on an existing check?
Yes. A PUT to /api/quality-checks/{id} can update properties.comparison along with both expressions and both filters. The rule type, the target container, and the associated Check Template stay immutable. See the API page for the editable/immutable matrix.
How do I assert that two row counts must match?
Use COUNT(*) as both the target and reference aggregations and set the comparison to eq. Apply filters on each side independently to scope each count to the same logical set of rows.
Can I reference Check Variables inside the aggregation expressions?
Yes. Both expression and ref_expression accept Check Variables (for example, {{ batch_id }}). The variables are resolved before the aggregation runs.
How does Aggregation Comparison differ from Metric?
Aggregation Comparison compares two aggregates that must satisfy a relationship with each other. Metric pins a single aggregate to a fixed boundary (for example, between 0 and 1). Use Aggregation Comparison for cross-source reconciliation; use Metric when the bound is a constant.
How does Aggregation Comparison differ from Data Diff?
Data Diff performs a row-level diff between two containers and reports each mismatched row. Aggregation Comparison only checks an aggregate-level relationship. Choose Data Diff when you need to know which rows differ; choose Aggregation Comparison when an aggregate-level reconciliation is enough.
Related
- Introduction: formal definition, field scope, and general/anomaly properties.
- How It Works: full semantics, NULL handling, filter behavior, and edge cases.
- Examples: three production scenarios with sample data and resulting anomalies.
- API: payload example and field notes for creating an Aggregation Comparison check programmatically.