Min Partition Size
Definition
Asserts the minimum number of records that should be loaded from each file or table partition.
In-Depth Overview
When working with large datasets that are often partitioned for better performance and scalability, ensuring a certain minimum number of records from each partition becomes crucial. This could be to ensure that each partition is well-represented in the analysis, to maintain data consistency or even to verify that data ingestion or migration processes are functioning properly.
The Min Partition Size rule allows users to set a threshold ensuring that each partition has loaded at least the specified minimum number of records.
General Properties
Name | Supported |
---|---|
Filter Allows the targeting of specific data based on conditions |
|
Coverage Customization Allows adjusting the percentage of records that must meet the rule's conditions |
Specific Properties
Sets the required minimum record count for each data partition
Name | Description |
---|---|
Minimum partition size |
Specifies the minimum number of records that should be loaded from each partition |
Anomaly Types
Type | Supported |
---|---|
Record Flag inconsistencies at the row level |
|
Shape Flag inconsistencies in the overall patterns and distributions of a field |
Example
Objective: Ensure that each partition of the LINEITEM table has at least 1000 records.
Sample Data for Partition P3
Row Number | L_ITEM |
---|---|
1 | Data |
2 | Data |
... | ... |
900 | Data |
{
"description": "Ensure that each partition of the LINEITEM table has at least 1000 records",
"coverage": 1,
"properties": {
"value": 1000
},
"tags": [],
"fields": null,
"additional_metadata": {"key 1": "value 1", "key 2": "value 2"},
"rule": "minPartitionSize",
"container_id": {container_id},
"template_id": {template_id},
"filter": "1=1"
}
The sample data above does not satisfy the rule because it contains only 900 records, which is less than the required minimum of 1000 records.
graph TD
A[Start] --> B[Retrieve Number of Records for Each Partition]
B --> C{Does Partition have >= 1000 records?}
C -->|Yes| D[Move to Next Partition/End]
C -->|No| E[Mark as Anomalous]
E --> D
Potential Violation Messages
Shape Anomaly
In LINEITEM
, fewer than 900 records were loaded.