Skip to content

Aggregation Comparison Check FAQ

Common questions about how the Aggregation Comparison check evaluates two aggregates, how it handles NULL or non-finite results, how filters scope each side, and how anomalies are reported.

Behavior

Which comparison operators are supported?

Five operators are accepted: lt (less than), lte (less than or equal to), eq (equal to), gte (greater than or equal to), and gt (greater than). The comparison is always applied as target <op> reference.

Can the reference container live in a different datastore?

Yes. Set ref_datastore_id in the payload to the datastore holding the reference container. When the reference container is in the same datastore as the target, omit ref_datastore_id (or send null) and only set ref_container_id.

Can the target and reference be the same container?

Yes. Set ref_container_id to the same ID as container_id and use the Right Reference panel to define the reference aggregation. This is useful when asserting an invariant between two aggregates on the same dataset (for example, SUM(net) <= SUM(gross)).

What happens when an aggregation returns NULL?

If the target or reference aggregation evaluates to NULL, NaN, or an infinite value, that side renders as null/NaN in the anomaly message and the comparison fails. A common cause is an empty filtered set; lift the filter or wrap the expression in COALESCE so the aggregate returns a defined value.

Do the two aggregations need to produce comparable types?

Yes. Both expressions are evaluated as numeric values for the comparison. Wrap non-numeric outputs (for example, MIN(some_date)) so they reduce to a numeric form that the operator can compare.

Anomaly Reporting

What does the anomaly message look like?

When both aggregates evaluate successfully:

The target expression '<expression>' evaluates to <target_value> which is not <comparison> the reference expression '<ref_expression>' value of <reference_value>

When either aggregate fails to evaluate, that side renders as null/NaN and the same template is used.

When the target filter is set, the message ends with [filter: <expression>]. When the reference filter is set, the message ends with [reference filter: <expression>]. Both suffixes can appear together.

Does Aggregation Comparison emit Record Anomalies?

No. The rule is Shape-only. It evaluates one boolean condition between two aggregate values and reports a single Shape Anomaly when the relationship does not hold. There are no per-row anomalies.

Does Custom Anomaly Description work for Aggregation Comparison?

No. Custom Anomaly Description (and the anomaly_message_field payload field) applies only to Record Anomalies. Aggregation Comparison emits Shape Anomalies, which always use the fixed template described above.

Why does the message show null/NaN instead of a number?

That side's aggregation returned NULL, NaN, or an infinite value. The most common causes are an empty filtered set, a column that is null for every row in scope, or an expression that produced an undefined result (for example, division by zero).

Configuration

Can I lower the coverage on an Aggregation Comparison check?

No. The rule does not use a coverage threshold. The coverage field on the API payload is ignored. Either the comparison holds for the two aggregates or it does not.

Can I change the comparison operator on an existing check?

Yes. A PUT to /api/quality-checks/{id} can update properties.comparison along with both expressions and both filters. The rule type, the target container, and the associated Check Template stay immutable. See the API page for the editable/immutable matrix.

How do I assert that two row counts must match?

Use COUNT(*) as both the target and reference aggregations and set the comparison to eq. Apply filters on each side independently to scope each count to the same logical set of rows.

Can I reference Check Variables inside the aggregation expressions?

Yes. Both expression and ref_expression accept Check Variables (for example, {{ batch_id }}). The variables are resolved before the aggregation runs.

How does Aggregation Comparison differ from Metric?

Aggregation Comparison compares two aggregates that must satisfy a relationship with each other. Metric pins a single aggregate to a fixed boundary (for example, between 0 and 1). Use Aggregation Comparison for cross-source reconciliation; use Metric when the bound is a constant.

How does Aggregation Comparison differ from Data Diff?

Data Diff performs a row-level diff between two containers and reports each mismatched row. Aggregation Comparison only checks an aggregate-level relationship. Choose Data Diff when you need to know which rows differ; choose Aggregation Comparison when an aggregate-level reconciliation is enough.

  • Introduction: formal definition, field scope, and general/anomaly properties.
  • How It Works: full semantics, NULL handling, filter behavior, and edge cases.
  • Examples: three production scenarios with sample data and resulting anomalies.
  • API: payload example and field notes for creating an Aggregation Comparison check programmatically.