Expected Schema
Definition
Asserts that all of the selected fields must be present in the datastore.
Behavior
The expected schema is the first check to be tested during a scan operation. If it fails, the scan operation will result as Failure
with the following message:
<container-name>
: Aborted because schema check anomalies were identified.
General Properties
Name | Supported |
---|---|
Filter Allows the targeting of specific data based on conditions |
|
Coverage Customization Allows adjusting the percentage of records that must meet the rule's conditions |
Specific Properties
Specify the fields that must be present in the schema, and determine if a schema change caused by additional fields should fail or pass the assertion.
Name | Description |
---|---|
Fields |
List of fields that must be presented in the schema. |
Allow other fields |
If true, then new fields are allowed to be presented in the schema. Otherwise, the assertion will be stricter. |
Anomaly Types
Type | Supported |
---|---|
Record Flag inconsistencies at the row level |
|
Shape Flag inconsistencies in the overall patterns and distributions of a field |
Example
Objective: Ensure that expected fields such as L_ORDERKEY, L_PARTKEY, and L_SUPPKEY are always present in the LINEITEM table.
Sample Data
Valid
FIELD_NAME | FIELD_TYPE |
---|---|
L_ORDERKEY | NUMBER |
L_PARTKEY | NUMBER |
L_SUPPKEY | NUMBER |
L_LINENUMBER | NUMBER |
L_QUANTITY | NUMBER |
L_EXTENDEDPRICE | NUMBER |
... | ... |
Invalid
L_SUPPKEY is missing from the schema
FIELD_NAME | FIELD_TYPE |
---|---|
L_ORDERKEY | NUMBER |
L_PARTKEY | NUMBER |
L_LINENUMBER | NUMBER |
L_QUANTITY | NUMBER |
L_EXTENDEDPRICE | NUMBER |
... | ... |
{
"description": "Ensure that expected fields such as L_ORDERKEY, L_PARTKEY, and L_SUPPKEY are always present in the LINEITEM table",
"coverage": 1,
"properties": {
"allow_other_fields":false,
"list":["L_ORDERKEY","L_PARTKEY","L_SUPPKEY"]
},
"tags": [],
"fields": null,
"additional_metadata": {"key 1": "value 1", "key 2": "value 2"},
"rule": "expectedSchema",
"container_id": {container_id},
"template_id": {template_id},
"filter": "1=1"
}
Anomaly Explanation
Among the presented sample schemas, the second one is missing one of the expected schema. Only the first schema has the correct expected schema.
graph TD
A[Start] --> B{Check for Field Presence}
B -.->|Field is missing| C[Mark as Shape Anomaly]
B -.->|All fields present| D[End]
Potential Violation Messages
Shape Anomaly
The required fields (L_SUPPKEY
) are not present.