Time Distribution Size
Definition
Asserts that the count of records for each interval of a timestamp is between two numbers.
In-Depth Overview
The Time Distribution Size
rule helps in identifying irregularities in the distribution of records over time intervals such as hours, days, or months.
For instance, in a retail context, it could ensure that there’s a consistent number of orders each month to meet business targets. A sudden drop in orders might highlight operational issues or shifts in market demand that require immediate attention.
Field Scope
Single: The rule evaluates a single specified field.
Accepted Types
Type | |
---|---|
Timestamp |
|
Date |
Specific Properties
Name | Description |
---|---|
Interval |
Defines the time interval for segmentation. |
Min Count |
Specifies the minimum count of records in each segment. |
Max Count |
Specifies the maximum count of records in each segment. |
Anomaly Types
Type | Supported |
---|---|
Record Flag inconsistencies at the row level |
|
Shape Flag inconsistencies in the overall patterns and distributions of a field |
Example
Objective: Ensure that the number of orders for each month is consistently between 5 and 10.
Sample Data
O_ORDERKEY | O_ORDERDATE |
---|---|
1 | 2023-01-01 |
2 | 2023-01-15 |
3 | 2023-01-20 |
4 | 2023-01-25 |
5 | 2023-02-01 |
6 | 2023-02-05 |
7 | 2023-02-10 |
8 | 2023-02-15 |
9 | 2023-02-20 |
10 | 2023-02-25 |
11 | 2023-02-28 |
{
"description": "Ensure that the number of orders for each month is consistently between 5 and 10",
"coverage": 1,
"properties": {
"interval_name": "Monthly",
"min_size": 5,
"max_size": 10
},
"tags": [],
"fields": ["O_ORDERDATE"],
"additional_metadata": {"key 1": "value 1", "key 2": "value 2"},
"rule": "timeDistributionSize",
"container_id": {container_id},
"template_id": {template_id},
"filter": "1=1"
}
Anomaly Explanation
In the sample data above, the January segment fails the rule because there are only 4 orders, which is below the specified minimum count of 5.
graph TD
A[Start] --> B[Retrieve O_ORDERDATE]
B --> C{Segment data by month}
C --> D{Is count of records in each segment between 5 and 10?}
D -->|Yes| E[End]
D -->|No| F[Mark as Anomalous]
F --> E
Potential Violation Messages
Shape Anomaly
50.000% of the monthly segments of O_ORDERDATE
have record counts not between 5 and 10.