Skip to content

How to Create a Unique Check

Configure a Unique check inside the Authored Check Details modal. For the navigation steps that get you to the modal (selecting the datastore, opening the Checks tab, clicking Add Check), see Authored Check.

Prerequisites

  • At least the Author team permission on the datastore.
  • A container (table or file) already loaded into the datastore.
  • The field (or fields) you want to enforce uniqueness on exists on the container.

Configure the Unique Check

Step 1: Set the Rule Type to Unique.

The form simplifies because Unique has no rule-specific Properties section: the rule is fully defined by the fields, filter, and coverage.

Step 2: Select the File (or table) the check should run against.

Step 3: Select one or more Fields.

  • For a single-field Unique check, pick the column that must be unique row-by-row (for example, customer_id).
  • For a composite-key Unique check, pick two or more columns. The platform evaluates uniqueness on the combination of values, not on each column independently.

Step 4: (Optional) Set a Filter Clause to scope the check to a subset of records.

The filter is a Spark SQL WHERE expression evaluated before the uniqueness check. Use it to:

  • Limit the check to active or current rows (status = 'active', event_date = current_date()).
  • Work around NULL handling (email IS NOT NULL) when you want SQL-style uniqueness semantics.

Step 5: (Optional) Adjust Coverage.

The default is 100%, meaning every filtered row must be part of a singleton group. Lower it (for example, to 99.5%) only when a small fraction of duplicates is expected and tolerated.

Step 6: (Optional) Add a Description, Tags, and Additional Metadata for catalog and triage purposes. These do not affect evaluation.

Validate and Save

Step 7: Click Validate at the bottom of the modal.

The platform runs the check against the data without saving. A green confirmation message appears when the rule is well-formed.

Step 8: Click Save to create the check.

The new Unique check appears in the Checks list with the Authored badge. The next Scan operation will evaluate it.

Preview the results before saving

For a richer preview that lists exactly which rows would be flagged, use Dry Run from the check's actions menu. Dry Run returns sample anomalies and source records, which is helpful when tuning the filter or coverage.

Common Variations

The table below summarizes the three most common Unique check configurations. For end-to-end worked scenarios with sample data, see the Examples page.

Goal Fields Filter Coverage
Enforce a primary key on a single identifier customer_id (none) 100%
Enforce a composite key on a junction or line-item table order_id, line_number (none) 100%
Scope uniqueness to a time window user_id, event_type event_date = current_date() 100%
Make Unique behave like SQL UNIQUE (NULLs distinct) email email IS NOT NULL 100%
Tolerate a small known fraction of duplicates legacy_id (none) 99.5%
  • Introduction: formal definition, modes overview, field scope, and general/anomaly properties.
  • How It Works: full semantics, NULL handling, filter behavior, and edge cases.
  • Examples: three production scenarios with sample data and resulting anomalies.
  • API: payload shape for creating a Unique check programmatically.