Not Exists In
Definition
Asserts that values assigned to this field do not exist as values in another field.
In-Depth Overview
The Not ExistsIn
rule allows you to ensure data exclusivity between different sources, whether it’s object storage systems or databases.
While databases might utilize unique constraints to maintain data distinctiveness between related tables, the Not ExistsIn
rule extends this capability in two significant ways:
- Cross-System Exclusivity: it enables checks to ensure data does not overlap across different databases or even entirely separate systems. This can be essential in scenarios where data should be partitioned or isolated across platforms.
- Flexible Data Formats: Not just limited to databases, this rule can validate values against various data formats, such as ensuring values in a file do not coincide with those in a table.
These functionalities enable businesses to maintain data exclusivity even in intricate, multi-system settings.
Field Scope
Single: The rule evaluates a single specified field.
Accepted Types
Type | |
---|---|
Date |
|
Timestamp |
|
Integral |
|
Fractional |
|
String |
|
Boolean |
Specific Properties
Define the datastore, table/file, and field where the rule should look for non-matching values.
Name | Description |
---|---|
Datastore |
The source datastore where the profile of the reference field is located. |
Table/file |
The profile (e.g. table, view or file) containing the reference field. |
Field |
The field name whose values should not match those of the selected field. |
Anomaly Types
Type | Supported |
---|---|
Record Flag inconsistencies at the row level |
|
Shape Flag inconsistencies in the overall patterns and distributions of a field |
Example
Scenario: A shipping company needs to ensure that all NATION_NAME entries in the NATION table aren't listed in an external unsupported regions file, which lists countries they don't ship to.
Sample Data
N_NATIONKEY | N_NATIONNAME |
---|---|
1 | Antarctica |
2 | Argentina |
3 | Atlantida |
Unsupported Regions File Sample
UNSUPPORTED_REGION |
---|
Antarctica |
Mars |
... |
{
"description": "A shipping company needs to ensure that all NATION_NAME entries in the NATION table aren't listed in an external unsupported regions file, which lists countries they don't ship to",
"coverage": 1,
"properties": {
"field_name":"UNSUPPORTED_REGION",
"ref_container_id": {ref_container_id},
"ref_datastore_id": {ref_datastore_id}
},
"tags": [],
"fields": ["NATION_NAME"],
"additional_metadata": {"key 1": "value 1", "key 2": "value 2"},
"rule": "notExistsIn",
"container_id": {container_id},
"template_id": {template_id},
"filter": "1=1"
}
Anomaly Explanation
In the sample data above, the entry with N_NATIONKEY
1 does not satisfy the rule because the N_NATIONNAME
"Antarctica" is listed as an UNSUPPORTED_REGION
in the unsupported regions file, indicating the company doesn't ship there.
graph TD
A[Start] --> B[Retrieve UNSUPPORTED_REGION]
B --> C[Retrieve N_NATIONNAME]
C --> D{Is N_NATIONNAME listed in UNSUPPORTED_REGION?}
D -->|No| E[Move to Next Record/End]
D -->|Yes| F[Mark as Anomalous]
F --> E
Potential Violation Messages
Record Anomaly
The N_NATIONNAME
value of 'Antarctica
' is an UNSUPPORTED_REGION
.
Shape Anomaly
In N_NATIONNAME
, 33.333% of 3 filtered records (1) do exist in UNSUPPORTED_REGION
.