Max Partition Size
Definition
Asserts the maximum number of records that should be loaded from each file or table partition.
In-Depth Overview
Managing the volume of data in each partition is critical when dealing with partitioned datasets. This is especially pertinent when system limitations or data processing capabilities are considered, ensuring that no partition exceeds the system's ability to handle data efficiently.
The Max Partition Size rule is designed to set an upper limit on the number of records each partition can contain.
General Properties
Name | Supported |
---|---|
Filter Allows the targeting of specific data based on conditions |
|
Coverage Customization Allows adjusting the percentage of records that must meet the rule's conditions |
Specific Properties
Specifies the maximum allowable record count for each data partition
Name | Description |
---|---|
Maximum partition size |
The maximum number of records that can be loaded from each partition. |
Anomaly Types
Type | Supported |
---|---|
Record Flag inconsistencies at the row level |
|
Shape Flag inconsistencies in the overall patterns and distributions of a field |
Example
Objective: Ensure that no partition of the LINEITEM table contains more than 10,000 records to prevent data processing bottlenecks.
Sample Data for Partition P3
Row Number | L_ITEM |
---|---|
1 | Data |
2 | Data |
... | ... |
10,050 | Data |
{
"description": "Ensure that no partition of the LINEITEM table contains more than 10,000 records to prevent data processing bottlenecks",
"coverage": 1,
"properties": {
"value":10000
},
"tags": [],
"fields": null,
"additional_metadata": {"key 1": "value 1", "key 2": "value 2"},
"rule": "maxPartitionSize",
"container_id": {container_id},
"template_id": {template_id},
"filter": "1=1"
}
In the sample data above, the rule is violated because partition P3 contains 10,050 records, which exceeds the set maximum of 10,000 records.
graph TD
A[Start] --> B[Retrieve Number of Records for Each Partition]
B --> C{Does Partition have <= 10,000 records?}
C -->|Yes| D[Move to Next Partition/End]
C -->|No| E[Mark as Anomalous]
E --> D
Potential Violation Messages
Shape Anomaly
In LINEITEM
, more than 10,000 records were loaded.