Unique Check Examples

Three real-world scenarios that show how the Unique check is typically used in production: enforcing a primary key on a single field, enforcing a composite key on a line-item table, and scoping uniqueness to a time window with a filter clause.

Single-Field Primary KeyComposite Key on Order ItemsScoped Uniqueness with Filter

The situation: Your customers table holds one row per customer. The customer_id column must be unique across the whole table. Duplicate IDs would break downstream joins in the data warehouse.

Check configuration

Field	Value
Rule	Unique
Fields	`customer_id`
Filter	(none)
Coverage	100%
Custom Anomaly Description	Off
Status	Active
Owner	(check creator)
Anomaly Assignee	(data-quality engineer)
Tags	`pii`, `critical`
Additional Metadata	`jira: DATA-1234`
Description	Customer ID must be unique across the customers table

Payload

{
    "description": "Customer ID must be unique across the customers table",
    "rule": "unique",
    "fields": ["customer_id"],
    "container_id": 145,
    "coverage": 1,
    "filter": null,
    "properties": null,
    "tags": ["pii", "critical"],
    "additional_metadata": {"jira": "DATA-1234"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 12
}

Sample Data

customer_id	name	email
1001	Alice Cohen	alice@example.com
1002	Ben Dias	ben@example.com
1001	Alice C.	alice.c@example.com
1003	Cara Mota	cara@example.com

What gets flagged

customer_id = 1001 repeats on rows 1 and 3, so both rows are reported as part of the duplicate group.

Shape Anomaly

For the field customer_id, 50.000% of 4 records (2) contain duplicate values

Flowchart

graph TD
    A["No filter, evaluate all rows"] --> B["Group rows by customer_id"]
    B --> C{"Any customer_id<br/>appears more than once?"}
    C -->|No| D["All rows pass"]
    C -->|Yes| E["Flag every row in the duplicate group.<br/>customer_id 1001 appears twice, both flagged."]

Equivalent SQL

-- Rows the Unique check would flag for the customers table.
SELECT c.*
FROM customers c
INNER JOIN (
    SELECT customer_id
    FROM customers
    GROUP BY customer_id
    HAVING COUNT(*) > 1
) dup ON c.customer_id = dup.customer_id
ORDER BY c.customer_id;

The situation: Your order_items table holds the individual lines of each customer order. The combination of (order_id, line_number) must be unique. order_id legitimately repeats (one row per line of the same order) and line_number legitimately repeats (every order has a line 1), but the pair must be unique on each row.

Check configuration

Field	Value
Rule	Unique
Fields	`order_id`, `line_number`
Filter	(none)
Coverage	100%
Custom Anomaly Description	Off
Status	Active
Owner	(check creator)
Anomaly Assignee	(orders-domain owner)
Tags	`primary-key`, `orders`
Additional Metadata	`jira: DATA-2045`
Description	Each (order_id, line_number) pair must be unique in order_items

Payload

{
    "description": "Each (order_id, line_number) pair must be unique in order_items",
    "rule": "unique",
    "fields": ["order_id", "line_number"],
    "container_id": 212,
    "coverage": 1,
    "filter": null,
    "properties": null,
    "tags": ["primary-key", "orders"],
    "additional_metadata": {"jira": "DATA-2045"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 18
}

Sample Data

order_id	line_number	sku	quantity
5001	1	SKU-A	2
5001	2	SKU-B	1
5001	1	SKU-C	3
5002	1	SKU-A	1

What gets flagged

order_id = 5001 legitimately repeats on rows 1, 2, and 3 (a 3-line order). line_number = 1 legitimately repeats on rows 1, 3, and 4 (every order has a line 1). The violation is the combination (5001, 1), which appears twice on rows 1 and 3, so both rows are reported as part of the duplicate group.

Shape Anomaly

For the field order_id, line_number, 50.000% of 4 records (2) contain duplicate values

Flowchart

graph TD
    A["No filter, evaluate all rows"] --> B["Group rows by the order_id, line_number tuple"]
    B --> C{"Any tuple<br/>appears more than once?"}
    C -->|No| D["All rows pass"]
    C -->|Yes| E["Flag every row in the duplicate group.<br/>(5001, 1) appears twice, both flagged."]

Equivalent SQL

-- Rows the Unique check would flag for the order_items table.
SELECT oi.*
FROM order_items oi
INNER JOIN (
    SELECT order_id, line_number
    FROM order_items
    GROUP BY order_id, line_number
    HAVING COUNT(*) > 1
) dup ON oi.order_id = dup.order_id
     AND oi.line_number = dup.line_number
ORDER BY oi.order_id, oi.line_number;

The situation: Your events table receives a daily feed of user activity. Within a single day, each (user_id, event_type) pair must appear only once, since an idempotent ingestion job is supposed to guarantee no same-day duplicates. The same pair is allowed to repeat across days, so a date-scoped uniqueness check is what you want.

Check configuration

Field	Value
Rule	Unique
Fields	`user_id`, `event_type`
Filter	`event_date = current_date()`
Coverage	100%
Custom Anomaly Description	Off
Status	Active
Owner	(check creator)
Anomaly Assignee	(ingestion on-call)
Tags	`ingestion`, `idempotency`
Additional Metadata	`jira: DATA-3071`
Description	Within a single day, each (user_id, event_type) pair must be unique

Payload

{
    "description": "Within a single day, each (user_id, event_type) pair must be unique",
    "rule": "unique",
    "fields": ["user_id", "event_type"],
    "container_id": 318,
    "coverage": 1,
    "filter": "event_date = current_date()",
    "properties": null,
    "tags": ["ingestion", "idempotency"],
    "additional_metadata": {"jira": "DATA-3071"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 24
}

Why the filter matters

The filter runs before the uniqueness evaluation. Each scan only groups and compares rows from today, so cross-day repeats (which are expected) are intentionally ignored. The check fires only when the ingestion job produces a genuine same-day duplicate.

Sample Data (filtered to event_date = current_date())

user_id	event_type	event_date	event_id
u-77	login	today	e-001
u-77	checkout	today	e-002
u-77	login	today	e-003
u-88	login	today	e-004

What gets flagged

(u-77, login) appears twice on rows 1 and 3, so both rows are reported as part of the duplicate group. A separate (u-77, login) recorded yesterday would not affect today's scan because the filter scopes evaluation to the current day.

Shape Anomaly

For the field user_id, event_type, 50.000% of 4 records (2) contain duplicate values [filter: event_date = current_date()]

Flowchart

graph TD
    A["Apply filter: event_date = current_date()"] --> B["Group filtered rows by the user_id, event_type tuple"]
    B --> C{"Any tuple appears<br/>more than once within today's rows?"}
    C -->|No| D["All filtered rows pass"]
    C -->|Yes| E["Flag every row in the duplicate group.<br/>(u-77, login) appears twice today, both flagged."]

Equivalent SQL

-- Rows the Unique check would flag for today's events.
SELECT e.*
FROM events e
INNER JOIN (
    SELECT user_id, event_type
    FROM events
    WHERE event_date = current_date()
    GROUP BY user_id, event_type
    HAVING COUNT(*) > 1
) dup ON e.user_id = dup.user_id
     AND e.event_type = dup.event_type
WHERE e.event_date = current_date()
ORDER BY e.user_id, e.event_type;

Introduction: formal definition, modes overview, field scope, and general/anomaly properties.
How It Works: full semantics, NULL handling, filter behavior, and edge cases.
API: payload shape and field notes for creating a Unique check programmatically.
FAQ: short answers to the most frequent questions.

Unique Check Examples

Related