Skip to content

Not Exists In

Definition

Asserts that values assigned to this field do not exist as values in another field.

In-Depth Overview

The Not ExistsIn rule allows you to ensure data exclusivity between different sources, whether it’s object storage systems or databases.

While databases might utilize unique constraints to maintain data distinctiveness between related tables, the Not ExistsIn rule extends this capability in two significant ways:

  1. Cross-System Exclusivity: it enables checks to ensure data does not overlap across different databases or even entirely separate systems. This can be essential in scenarios where data should be partitioned or isolated across platforms.
  2. Flexible Data Formats: Not just limited to databases, this rule can validate values against various data formats, such as ensuring values in a file do not coincide with those in a table.

These functionalities enable businesses to maintain data exclusivity even in intricate, multi-system settings.

Field Scope

Single: The rule evaluates a single specified field.

Accepted Types

Type
Date
Timestamp
Integral
Fractional
String
Boolean

Specific Properties

Define the datastore, table/file, and field where the rule should look for non-matching values.

Name Description
Datastore
The source datastore where the profile of the reference field is located.
Table/file
The profile (e.g. table, view or file) containing the reference field.
Field
The field name whose values should not match those of the selected field.

Anomaly Types

Type Supported
Record
Flag inconsistencies at the row level
Shape
Flag inconsistencies in the overall patterns and distributions of a field

Example

Scenario: A shipping company needs to ensure that all NATION_NAME entries in the NATION table aren't listed in an external unsupported regions file, which lists countries they don't ship to.

Sample Data

N_NATIONKEY N_NATIONNAME
1 Antarctica
2 Argentina
3 Atlantida

Unsupported Regions File Sample

UNSUPPORTED_REGION
Antarctica
Mars
...
{
    "description": "A shipping company needs to ensure that all NATION_NAME entries in the NATION table aren't listed in an external unsupported regions file, which lists countries they don't ship to",
    "coverage": 1,
    "properties": {
        "field_name":"UNSUPPORTED_REGION",
        "ref_container_id": {ref_container_id},
        "ref_datastore_id": {ref_datastore_id}
    },
    "tags": [],
    "fields": ["NATION_NAME"],
    "additional_metadata": {"key 1": "value 1", "key 2": "value 2"},
    "rule": "notExistsIn",
    "container_id": {container_id},
    "template_id": {template_id},
    "filter": "1=1"
}

Anomaly Explanation

In the sample data above, the entry with N_NATIONKEY 1 does not satisfy the rule because the N_NATIONNAME "Antarctica" is listed as an UNSUPPORTED_REGION in the unsupported regions file, indicating the company doesn't ship there.

graph TD
A[Start] --> B[Retrieve UNSUPPORTED_REGION]
B --> C[Retrieve N_NATIONNAME]
C --> D{Is N_NATIONNAME listed in UNSUPPORTED_REGION?}
D -->|No| E[Move to Next Record/End]
D -->|Yes| F[Mark as Anomalous]
F --> E
-- An illustrative SQL query demonstrating the rule applied to example dataset(s).
select
    n_nationkey
    , n_nationname
from nation 
where
    n_nationname in ('Antarctica', 'Mars', ... /* other unsupported regions */)

Potential Violation Messages

Record Anomaly

The N_NATIONNAME value of 'Antarctica' is an UNSUPPORTED_REGION.

Shape Anomaly

In N_NATIONNAME, 33.333% of 3 filtered records (1) do exist in UNSUPPORTED_REGION.