How to Create a Unique Check
Configure a Unique check inside the Authored Check Details modal. For the navigation steps that get you to the modal (selecting the datastore, opening the Checks tab, clicking Add → Check), see Authored Check.
Prerequisites
- At least the Author team permission on the datastore.
- A container (table or file) already loaded into the datastore.
- The field (or fields) you want to enforce uniqueness on exists on the container.
Configure the Unique Check
Step 1: Set the Rule Type to Unique.
The form simplifies because Unique has no rule-specific Properties section: the rule is fully defined by the fields, filter, and coverage.
Step 2: Select the File (or table) the check should run against.
Step 3: Select one or more Fields.
- For a single-field Unique check, pick the column that must be unique row-by-row (for example,
customer_id). - For a composite-key Unique check, pick two or more columns. The platform evaluates uniqueness on the combination of values, not on each column independently.
Step 4: (Optional) Set a Filter Clause to scope the check to a subset of records.
The filter is a Spark SQL WHERE expression evaluated before the uniqueness check. Use it to:
- Limit the check to active or current rows (
status = 'active',event_date = current_date()). - Work around NULL handling (
email IS NOT NULL) when you want SQL-style uniqueness semantics.
Step 5: (Optional) Adjust Coverage.
The default is 100%, meaning every filtered row must be part of a singleton group. Lower it (for example, to 99.5%) only when a small fraction of duplicates is expected and tolerated.
Step 6: (Optional) Add a Description, Tags, and Additional Metadata for catalog and triage purposes. These do not affect evaluation.
Validate and Save
Step 7: Click Validate at the bottom of the modal.
The platform runs the check against the data without saving. A green confirmation message appears when the rule is well-formed.
Step 8: Click Save to create the check.
The new Unique check appears in the Checks list with the Authored badge. The next Scan operation will evaluate it.
Preview the results before saving
For a richer preview that lists exactly which rows would be flagged, use Dry Run from the check's actions menu. Dry Run returns sample anomalies and source records, which is helpful when tuning the filter or coverage.
Common Variations
The table below summarizes the three most common Unique check configurations. For end-to-end worked scenarios with sample data, see the Examples page.
| Goal | Fields | Filter | Coverage |
|---|---|---|---|
| Enforce a primary key on a single identifier | customer_id |
(none) | 100% |
| Enforce a composite key on a junction or line-item table | order_id, line_number |
(none) | 100% |
| Scope uniqueness to a time window | user_id, event_type |
event_date = current_date() |
100% |
Make Unique behave like SQL UNIQUE (NULLs distinct) |
email |
email IS NOT NULL |
100% |
| Tolerate a small known fraction of duplicates | legacy_id |
(none) | 99.5% |
Related
- Introduction: formal definition, modes overview, field scope, and general/anomaly properties.
- How It Works: full semantics, NULL handling, filter behavior, and edge cases.
- Examples: three production scenarios with sample data and resulting anomalies.
- API: payload shape for creating a Unique check programmatically.