Data Diff Check API
The Data Diff check is created and managed through the standard Quality Checks API by setting rule to dataDiff and listing the compared fields under fields. The reference container, Row Identifiers, Passthrough Fields, Comparators, and diff_change_types are all configured through the properties object.
Tip
For complete API documentation, including request and response schemas, visit the API docs.
Endpoints
| Method | Path | Purpose |
|---|---|---|
POST |
/api/quality-checks |
Create a new Data Diff check. |
GET |
/api/quality-checks/{id} |
Retrieve a Data Diff check by ID. |
PUT |
/api/quality-checks/{id} |
Update an existing Data Diff check. |
DELETE |
/api/quality-checks/{id} |
Archive a Data Diff check (soft delete). The check stops being evaluated by Scans and can be restored from the archive view. |
What PUT can change
Editable: description, fields, filter, tags, additional_metadata, anomaly_message_field, status, owner_id, default_anomaly_assignee_id, and the properties keys ref_datastore_id, ref_container_id, id_field_names, passthrough_field_names, diff_change_types, numeric_comparator, duration_comparator, string_comparator.
Immutable: rule, container_id, template_id. To change any of these, delete the check and create a new one.
Permission: Author team permission (or above) on the target container's team for POST, PUT, and DELETE; Reporter team permission (or above) for GET.
Payload Example
Create a Data Diff check that compares N_NATIONKEY and N_NATIONNAME between NATION and NATION_BACKUP, matched by N_NATIONKEY, with POST /api/quality-checks. The payload below sets diff_change_types to ["removed", "changed"] so unmatched reference rows are not reported as added anomalies, a typical choice when the reference is a superset of the target (such as a long-lived backup).
{
"description": "Ensure NATION matches NATION_BACKUP on N_NATIONKEY and N_NATIONNAME",
"rule": "dataDiff",
"fields": ["N_NATIONKEY", "N_NATIONNAME"],
"container_id": 145,
"filter": null,
"properties": {
"ref_datastore_id": 22,
"ref_container_id": 803,
"id_field_names": ["N_NATIONKEY"],
"passthrough_field_names": [],
"diff_change_types": ["removed", "changed"]
},
"tags": ["replication"],
"additional_metadata": {"jira": "DATA-1234"},
"anomaly_message_field": null,
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 12
}
Field Notes
| Field | Required | Notes |
|---|---|---|
description |
Yes | Free-text description shown in the UI. |
rule |
Yes | Must be "dataDiff". |
fields |
Yes | Array of field names to compare between target and reference. Order does not affect evaluation. |
container_id |
Yes | ID of the target container (the dataset the check runs on). |
filter |
No | Spark SQL WHERE expression applied to the target container before matching. Send null for no filter. The reference container is always read in full. |
properties.ref_datastore_id |
Yes | ID of the datastore that holds the reference container. |
properties.ref_container_id |
Yes | ID of the reference container (table, view, or file) to compare against. |
properties.id_field_names |
No | Array of field names that form the compound key used to match target rows to reference rows. Required to produce changed diffs and to enable the Comparison Source Records view. Omit (or []) to fall back to a symmetrical set difference that produces only added/removed. |
properties.passthrough_field_names |
No | Array of extra field names carried into the source-records output for context. Passthrough fields appear alongside diffed fields but are never themselves a reason for the anomaly to fire. |
properties.diff_change_types |
No | Subset of ["added", "removed", "changed"] that restricts which diff statuses produce an anomaly. Defaults to all three when omitted. An empty list is rejected with HTTP 422; at least one status must be selected. Sending this property on an isReplicaOf check is also rejected. See How It Works → Restricting Anomalies by Status. |
properties.numeric_comparator |
No | Numeric Comparator tolerance object. See How It Works → Comparators. |
properties.duration_comparator |
No | Duration Comparator tolerance object. See How It Works → Comparators. |
properties.string_comparator |
No | String Comparator tolerance object. See How It Works → Comparators. |
tags |
No | List of tag names applied to the check for filtering and organization. |
additional_metadata |
No | Free-form key-value pairs (typically links to catalog, tickets, governance records). |
anomaly_message_field |
No | Not applicable to Data Diff. Data Diff emits only Shape Anomalies, which use a fixed message template, so this field is silently ignored at evaluation. Send null. |
template_id |
No | ID of a Check Template to associate the check with. null if not using a template. |
status |
No | "Active" (default) or "Draft". Draft checks are not evaluated by Scans. |
owner_id |
No | ID of the user who owns the check. Defaults to the user creating the check when omitted. |
default_anomaly_assignee_id |
No | ID of the user automatically assigned to anomalies produced by the check. When omitted, anomalies are created unassigned and must be triaged manually. |
Related
- Introduction: formal definition, field scope, and general/anomaly properties.
- How It Works: full semantics, Row Identifiers, Comparators, and edge cases.
- Examples: three production scenarios with sample data and resulting anomalies.
- FAQ: short answers to the most frequent questions.