Unique Check Examples
Three real-world scenarios that show how the Unique check is typically used in production: enforcing a primary key on a single field, enforcing a composite key on a line-item table, and scoping uniqueness to a time window with a filter clause.
The situation: Your customers table holds one row per customer. The customer_id column must be unique across the whole table. Duplicate IDs would break downstream joins in the data warehouse.
Check configuration
| Field | Value |
|---|---|
| Rule | Unique |
| Fields | customer_id |
| Filter | (none) |
| Coverage | 100% |
| Custom Anomaly Description | Off |
| Status | Active |
| Owner | (check creator) |
| Anomaly Assignee | (data-quality engineer) |
| Tags | pii, critical |
| Additional Metadata | jira: DATA-1234 |
| Description | Customer ID must be unique across the customers table |
Payload
{
"description": "Customer ID must be unique across the customers table",
"rule": "unique",
"fields": ["customer_id"],
"container_id": 145,
"coverage": 1,
"filter": null,
"properties": null,
"tags": ["pii", "critical"],
"additional_metadata": {"jira": "DATA-1234"},
"anomaly_message_field": null,
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 12
}
Sample Data
| customer_id | name | |
|---|---|---|
| 1001 | Alice Cohen | alice@example.com |
| 1002 | Ben Dias | ben@example.com |
| 1001 | Alice C. | alice.c@example.com |
| 1003 | Cara Mota | cara@example.com |
What gets flagged
customer_id = 1001 repeats on rows 1 and 3, so both rows are reported as part of the duplicate group.
Shape Anomaly
For the field customer_id, 50.000% of 4 records (2) contain duplicate values
Flowchart
graph TD
A["No filter, evaluate all rows"] --> B["Group rows by customer_id"]
B --> C{"Any customer_id<br/>appears more than once?"}
C -->|No| D["All rows pass"]
C -->|Yes| E["Flag every row in the duplicate group.<br/>customer_id 1001 appears twice, both flagged."]
Equivalent SQL
The situation: Your order_items table holds the individual lines of each customer order. The combination of (order_id, line_number) must be unique. order_id legitimately repeats (one row per line of the same order) and line_number legitimately repeats (every order has a line 1), but the pair must be unique on each row.
Check configuration
| Field | Value |
|---|---|
| Rule | Unique |
| Fields | order_id, line_number |
| Filter | (none) |
| Coverage | 100% |
| Custom Anomaly Description | Off |
| Status | Active |
| Owner | (check creator) |
| Anomaly Assignee | (orders-domain owner) |
| Tags | primary-key, orders |
| Additional Metadata | jira: DATA-2045 |
| Description | Each (order_id, line_number) pair must be unique in order_items |
Payload
{
"description": "Each (order_id, line_number) pair must be unique in order_items",
"rule": "unique",
"fields": ["order_id", "line_number"],
"container_id": 212,
"coverage": 1,
"filter": null,
"properties": null,
"tags": ["primary-key", "orders"],
"additional_metadata": {"jira": "DATA-2045"},
"anomaly_message_field": null,
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 18
}
Sample Data
| order_id | line_number | sku | quantity |
|---|---|---|---|
| 5001 | 1 | SKU-A | 2 |
| 5001 | 2 | SKU-B | 1 |
| 5001 | 1 | SKU-C | 3 |
| 5002 | 1 | SKU-A | 1 |
What gets flagged
order_id = 5001 legitimately repeats on rows 1, 2, and 3 (a 3-line order). line_number = 1 legitimately repeats on rows 1, 3, and 4 (every order has a line 1). The violation is the combination (5001, 1), which appears twice on rows 1 and 3, so both rows are reported as part of the duplicate group.
Shape Anomaly
For the field order_id, line_number, 50.000% of 4 records (2) contain duplicate values
Flowchart
graph TD
A["No filter, evaluate all rows"] --> B["Group rows by the order_id, line_number tuple"]
B --> C{"Any tuple<br/>appears more than once?"}
C -->|No| D["All rows pass"]
C -->|Yes| E["Flag every row in the duplicate group.<br/>(5001, 1) appears twice, both flagged."]
Equivalent SQL
-- Rows the Unique check would flag for the order_items table.
SELECT oi.*
FROM order_items oi
INNER JOIN (
SELECT order_id, line_number
FROM order_items
GROUP BY order_id, line_number
HAVING COUNT(*) > 1
) dup ON oi.order_id = dup.order_id
AND oi.line_number = dup.line_number
ORDER BY oi.order_id, oi.line_number;
The situation: Your events table receives a daily feed of user activity. Within a single day, each (user_id, event_type) pair must appear only once, since an idempotent ingestion job is supposed to guarantee no same-day duplicates. The same pair is allowed to repeat across days, so a date-scoped uniqueness check is what you want.
Check configuration
| Field | Value |
|---|---|
| Rule | Unique |
| Fields | user_id, event_type |
| Filter | event_date = current_date() |
| Coverage | 100% |
| Custom Anomaly Description | Off |
| Status | Active |
| Owner | (check creator) |
| Anomaly Assignee | (ingestion on-call) |
| Tags | ingestion, idempotency |
| Additional Metadata | jira: DATA-3071 |
| Description | Within a single day, each (user_id, event_type) pair must be unique |
Payload
{
"description": "Within a single day, each (user_id, event_type) pair must be unique",
"rule": "unique",
"fields": ["user_id", "event_type"],
"container_id": 318,
"coverage": 1,
"filter": "event_date = current_date()",
"properties": null,
"tags": ["ingestion", "idempotency"],
"additional_metadata": {"jira": "DATA-3071"},
"anomaly_message_field": null,
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 24
}
Why the filter matters
The filter runs before the uniqueness evaluation. Each scan only groups and compares rows from today, so cross-day repeats (which are expected) are intentionally ignored. The check fires only when the ingestion job produces a genuine same-day duplicate.
Sample Data (filtered to event_date = current_date())
| user_id | event_type | event_date | event_id |
|---|---|---|---|
| u-77 | login | today | e-001 |
| u-77 | checkout | today | e-002 |
| u-77 | login | today | e-003 |
| u-88 | login | today | e-004 |
What gets flagged
(u-77, login) appears twice on rows 1 and 3, so both rows are reported as part of the duplicate group. A separate (u-77, login) recorded yesterday would not affect today's scan because the filter scopes evaluation to the current day.
Shape Anomaly
For the field user_id, event_type, 50.000% of 4 records (2) contain duplicate values [filter: event_date = current_date()]
Flowchart
graph TD
A["Apply filter: event_date = current_date()"] --> B["Group filtered rows by the user_id, event_type tuple"]
B --> C{"Any tuple appears<br/>more than once within today's rows?"}
C -->|No| D["All filtered rows pass"]
C -->|Yes| E["Flag every row in the duplicate group.<br/>(u-77, login) appears twice today, both flagged."]
Equivalent SQL
-- Rows the Unique check would flag for today's events.
SELECT e.*
FROM events e
INNER JOIN (
SELECT user_id, event_type
FROM events
WHERE event_date = current_date()
GROUP BY user_id, event_type
HAVING COUNT(*) > 1
) dup ON e.user_id = dup.user_id
AND e.event_type = dup.event_type
WHERE e.event_date = current_date()
ORDER BY e.user_id, e.event_type;
Related
- Introduction: formal definition, modes overview, field scope, and general/anomaly properties.
- How It Works: full semantics, NULL handling, filter behavior, and edge cases.
- API: payload shape and field notes for creating a Unique check programmatically.
- FAQ: short answers to the most frequent questions.