Skip to content

Unique Check Examples

Three real-world scenarios that show how the Unique check is typically used in production: enforcing a primary key on a single field, enforcing a composite key on a line-item table, and scoping uniqueness to a time window with a filter clause.

The situation: Your customers table holds one row per customer. The customer_id column must be unique across the whole table. Duplicate IDs would break downstream joins in the data warehouse.

Check configuration

Field Value
Rule Unique
Fields customer_id
Filter (none)
Coverage 100%
Custom Anomaly Description Off
Status Active
Owner (check creator)
Anomaly Assignee (data-quality engineer)
Tags pii, critical
Additional Metadata jira: DATA-1234
Description Customer ID must be unique across the customers table

Payload

{
    "description": "Customer ID must be unique across the customers table",
    "rule": "unique",
    "fields": ["customer_id"],
    "container_id": 145,
    "coverage": 1,
    "filter": null,
    "properties": null,
    "tags": ["pii", "critical"],
    "additional_metadata": {"jira": "DATA-1234"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 12
}

Sample Data

customer_id name email
1001 Alice Cohen alice@example.com
1002 Ben Dias ben@example.com
1001 Alice C. alice.c@example.com
1003 Cara Mota cara@example.com

What gets flagged

customer_id = 1001 repeats on rows 1 and 3, so both rows are reported as part of the duplicate group.

Shape Anomaly

For the field customer_id, 50.000% of 4 records (2) contain duplicate values

Flowchart

graph TD
    A["No filter, evaluate all rows"] --> B["Group rows by customer_id"]
    B --> C{"Any customer_id<br/>appears more than once?"}
    C -->|No| D["All rows pass"]
    C -->|Yes| E["Flag every row in the duplicate group.<br/>customer_id 1001 appears twice, both flagged."]

Equivalent SQL

-- Rows the Unique check would flag for the customers table.
SELECT c.*
FROM customers c
INNER JOIN (
    SELECT customer_id
    FROM customers
    GROUP BY customer_id
    HAVING COUNT(*) > 1
) dup ON c.customer_id = dup.customer_id
ORDER BY c.customer_id;

The situation: Your order_items table holds the individual lines of each customer order. The combination of (order_id, line_number) must be unique. order_id legitimately repeats (one row per line of the same order) and line_number legitimately repeats (every order has a line 1), but the pair must be unique on each row.

Check configuration

Field Value
Rule Unique
Fields order_id, line_number
Filter (none)
Coverage 100%
Custom Anomaly Description Off
Status Active
Owner (check creator)
Anomaly Assignee (orders-domain owner)
Tags primary-key, orders
Additional Metadata jira: DATA-2045
Description Each (order_id, line_number) pair must be unique in order_items

Payload

{
    "description": "Each (order_id, line_number) pair must be unique in order_items",
    "rule": "unique",
    "fields": ["order_id", "line_number"],
    "container_id": 212,
    "coverage": 1,
    "filter": null,
    "properties": null,
    "tags": ["primary-key", "orders"],
    "additional_metadata": {"jira": "DATA-2045"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 18
}

Sample Data

order_id line_number sku quantity
5001 1 SKU-A 2
5001 2 SKU-B 1
5001 1 SKU-C 3
5002 1 SKU-A 1

What gets flagged

order_id = 5001 legitimately repeats on rows 1, 2, and 3 (a 3-line order). line_number = 1 legitimately repeats on rows 1, 3, and 4 (every order has a line 1). The violation is the combination (5001, 1), which appears twice on rows 1 and 3, so both rows are reported as part of the duplicate group.

Shape Anomaly

For the field order_id, line_number, 50.000% of 4 records (2) contain duplicate values

Flowchart

graph TD
    A["No filter, evaluate all rows"] --> B["Group rows by the order_id, line_number tuple"]
    B --> C{"Any tuple<br/>appears more than once?"}
    C -->|No| D["All rows pass"]
    C -->|Yes| E["Flag every row in the duplicate group.<br/>(5001, 1) appears twice, both flagged."]

Equivalent SQL

-- Rows the Unique check would flag for the order_items table.
SELECT oi.*
FROM order_items oi
INNER JOIN (
    SELECT order_id, line_number
    FROM order_items
    GROUP BY order_id, line_number
    HAVING COUNT(*) > 1
) dup ON oi.order_id = dup.order_id
     AND oi.line_number = dup.line_number
ORDER BY oi.order_id, oi.line_number;

The situation: Your events table receives a daily feed of user activity. Within a single day, each (user_id, event_type) pair must appear only once, since an idempotent ingestion job is supposed to guarantee no same-day duplicates. The same pair is allowed to repeat across days, so a date-scoped uniqueness check is what you want.

Check configuration

Field Value
Rule Unique
Fields user_id, event_type
Filter event_date = current_date()
Coverage 100%
Custom Anomaly Description Off
Status Active
Owner (check creator)
Anomaly Assignee (ingestion on-call)
Tags ingestion, idempotency
Additional Metadata jira: DATA-3071
Description Within a single day, each (user_id, event_type) pair must be unique

Payload

{
    "description": "Within a single day, each (user_id, event_type) pair must be unique",
    "rule": "unique",
    "fields": ["user_id", "event_type"],
    "container_id": 318,
    "coverage": 1,
    "filter": "event_date = current_date()",
    "properties": null,
    "tags": ["ingestion", "idempotency"],
    "additional_metadata": {"jira": "DATA-3071"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 24
}

Why the filter matters

The filter runs before the uniqueness evaluation. Each scan only groups and compares rows from today, so cross-day repeats (which are expected) are intentionally ignored. The check fires only when the ingestion job produces a genuine same-day duplicate.

Sample Data (filtered to event_date = current_date())

user_id event_type event_date event_id
u-77 login today e-001
u-77 checkout today e-002
u-77 login today e-003
u-88 login today e-004

What gets flagged

(u-77, login) appears twice on rows 1 and 3, so both rows are reported as part of the duplicate group. A separate (u-77, login) recorded yesterday would not affect today's scan because the filter scopes evaluation to the current day.

Shape Anomaly

For the field user_id, event_type, 50.000% of 4 records (2) contain duplicate values [filter: event_date = current_date()]

Flowchart

graph TD
    A["Apply filter: event_date = current_date()"] --> B["Group filtered rows by the user_id, event_type tuple"]
    B --> C{"Any tuple appears<br/>more than once within today's rows?"}
    C -->|No| D["All filtered rows pass"]
    C -->|Yes| E["Flag every row in the duplicate group.<br/>(u-77, login) appears twice today, both flagged."]

Equivalent SQL

-- Rows the Unique check would flag for today's events.
SELECT e.*
FROM events e
INNER JOIN (
    SELECT user_id, event_type
    FROM events
    WHERE event_date = current_date()
    GROUP BY user_id, event_type
    HAVING COUNT(*) > 1
) dup ON e.user_id = dup.user_id
     AND e.event_type = dup.event_type
WHERE e.event_date = current_date()
ORDER BY e.user_id, e.event_type;
  • Introduction: formal definition, modes overview, field scope, and general/anomaly properties.
  • How It Works: full semantics, NULL handling, filter behavior, and edge cases.
  • API: payload shape and field notes for creating a Unique check programmatically.
  • FAQ: short answers to the most frequent questions.