Skip to content

Data Diff Check API

The Data Diff check is created and managed through the standard Quality Checks API by setting rule to dataDiff and listing the compared fields under fields. The reference container, Row Identifiers, Passthrough Fields, Comparators, and diff_change_types are all configured through the properties object.

Tip

For complete API documentation, including request and response schemas, visit the API docs.

Endpoints

Method Path Purpose
POST /api/quality-checks Create a new Data Diff check.
GET /api/quality-checks/{id} Retrieve a Data Diff check by ID.
PUT /api/quality-checks/{id} Update an existing Data Diff check.
DELETE /api/quality-checks/{id} Archive a Data Diff check (soft delete). The check stops being evaluated by Scans and can be restored from the archive view.

What PUT can change

Editable: description, fields, filter, tags, additional_metadata, anomaly_message_field, status, owner_id, default_anomaly_assignee_id, and the properties keys ref_datastore_id, ref_container_id, id_field_names, passthrough_field_names, diff_change_types, numeric_comparator, duration_comparator, string_comparator.

Immutable: rule, container_id, template_id. To change any of these, delete the check and create a new one.

Permission: Author team permission (or above) on the target container's team for POST, PUT, and DELETE; Reporter team permission (or above) for GET.

Payload Example

Create a Data Diff check that compares N_NATIONKEY and N_NATIONNAME between NATION and NATION_BACKUP, matched by N_NATIONKEY, with POST /api/quality-checks. The payload below sets diff_change_types to ["removed", "changed"] so unmatched reference rows are not reported as added anomalies, a typical choice when the reference is a superset of the target (such as a long-lived backup).

{
    "description": "Ensure NATION matches NATION_BACKUP on N_NATIONKEY and N_NATIONNAME",
    "rule": "dataDiff",
    "fields": ["N_NATIONKEY", "N_NATIONNAME"],
    "container_id": 145,
    "filter": null,
    "properties": {
        "ref_datastore_id": 22,
        "ref_container_id": 803,
        "id_field_names": ["N_NATIONKEY"],
        "passthrough_field_names": [],
        "diff_change_types": ["removed", "changed"]
    },
    "tags": ["replication"],
    "additional_metadata": {"jira": "DATA-1234"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 12
}

Field Notes

Field Required Notes
description Yes Free-text description shown in the UI.
rule Yes Must be "dataDiff".
fields Yes Array of field names to compare between target and reference. Order does not affect evaluation.
container_id Yes ID of the target container (the dataset the check runs on).
filter No Spark SQL WHERE expression applied to the target container before matching. Send null for no filter. The reference container is always read in full.
properties.ref_datastore_id Yes ID of the datastore that holds the reference container.
properties.ref_container_id Yes ID of the reference container (table, view, or file) to compare against.
properties.id_field_names No Array of field names that form the compound key used to match target rows to reference rows. Required to produce changed diffs and to enable the Comparison Source Records view. Omit (or []) to fall back to a symmetrical set difference that produces only added/removed.
properties.passthrough_field_names No Array of extra field names carried into the source-records output for context. Passthrough fields appear alongside diffed fields but are never themselves a reason for the anomaly to fire.
properties.diff_change_types No Subset of ["added", "removed", "changed"] that restricts which diff statuses produce an anomaly. Defaults to all three when omitted. An empty list is rejected with HTTP 422; at least one status must be selected. Sending this property on an isReplicaOf check is also rejected. See How It Works → Restricting Anomalies by Status.
properties.numeric_comparator No Numeric Comparator tolerance object. See How It Works → Comparators.
properties.duration_comparator No Duration Comparator tolerance object. See How It Works → Comparators.
properties.string_comparator No String Comparator tolerance object. See How It Works → Comparators.
tags No List of tag names applied to the check for filtering and organization.
additional_metadata No Free-form key-value pairs (typically links to catalog, tickets, governance records).
anomaly_message_field No Not applicable to Data Diff. Data Diff emits only Shape Anomalies, which use a fixed message template, so this field is silently ignored at evaluation. Send null.
template_id No ID of a Check Template to associate the check with. null if not using a template.
status No "Active" (default) or "Draft". Draft checks are not evaluated by Scans.
owner_id No ID of the user who owns the check. Defaults to the user creating the check when omitted.
default_anomaly_assignee_id No ID of the user automatically assigned to anomalies produced by the check. When omitted, anomalies are created unassigned and must be triaged manually.
  • Introduction: formal definition, field scope, and general/anomaly properties.
  • How It Works: full semantics, Row Identifiers, Comparators, and edge cases.
  • Examples: three production scenarios with sample data and resulting anomalies.
  • FAQ: short answers to the most frequent questions.