Skip to content

Time Distribution Size

Definition

Asserts that the count of records for each interval of a timestamp is between two numbers.

In-Depth Overview

The Time Distribution Size rule helps in identifying irregularities in the distribution of records over time intervals such as hours, days, or months.

For instance, in a retail context, it could ensure that there’s a consistent number of orders each month to meet business targets. A sudden drop in orders might highlight operational issues or shifts in market demand that require immediate attention.

Field Scope

Single: The rule evaluates a single specified field.

Accepted Types

Type
Timestamp
Date

Specific Properties

Name Description
Interval
Defines the time interval for segmentation.
Min Count
Specifies the minimum count of records in each segment.
Max Count
Specifies the maximum count of records in each segment.

Anomaly Types

Type Supported
Record
Flag inconsistencies at the row level
Shape
Flag inconsistencies in the overall patterns and distributions of a field

Example

Objective: Ensure that the number of orders for each month is consistently between 5 and 10.

Sample Data

O_ORDERKEY O_ORDERDATE
1 2023-01-01
2 2023-01-15
3 2023-01-20
4 2023-01-25
5 2023-02-01
6 2023-02-05
7 2023-02-10
8 2023-02-15
9 2023-02-20
10 2023-02-25
11 2023-02-28
{
    "description": "Ensure that the number of orders for each month is consistently between 5 and 10",
    "coverage": 1,
    "properties": {
        "interval_name": "Monthly",
        "min_size": 5,
        "max_size": 10
    },
    "tags": [],
    "fields": ["O_ORDERDATE"],
    "additional_metadata": {"key 1": "value 1", "key 2": "value 2"},
    "rule": "timeDistributionSize",
    "container_id": {container_id},
    "template_id": {template_id},
    "filter": "1=1"
}

Anomaly Explanation

In the sample data above, the January segment fails the rule because there are only 4 orders, which is below the specified minimum count of 5.

graph TD
A[Start] --> B[Retrieve O_ORDERDATE]
B --> C{Segment data by month}
C --> D{Is count of records in each segment between 5 and 10?}
D -->|Yes| E[End]
D -->|No| F[Mark as Anomalous]
F --> E
-- An illustrative SQL query demonstrating the rule applied to example dataset(s).
select 
    extract(month from o_orderdate) as month,
    count(*) as order_count
from 
    orders 
group by 
    extract(month from o_orderdate)
having 
    count(*) < 5
    or count(*) > 10;

Potential Violation Messages

Shape Anomaly

50.000% of the monthly segments of O_ORDERDATE have record counts not between 5 and 10.


Last update: June 14, 2024