Skip to content

Anomaly Types

Anomalies in Qualytics are classified into two primary types, Record Anomalies and Shape Anomalies, each targeting different aspects of data integrity. Record anomalies flag individual rows that fail specific quality checks, such as missing or invalid values. Shape anomalies, on the other hand, detect structural issues in the dataset, like missing columns or schema mismatches. Together, these types provide a comprehensive approach to identifying both value-level and schema-level data quality issues.

Let’s get started 🚀

Record Anomaly

A record anomaly identifies a single record (row) as anomalous and provides specific details regarding why it is considered anomalous. The simplest form of a record anomaly is a row that lacks an expected value for a field.

Example Use Case

Scenario

We have an Employee dataset used for payroll.

Rules:

  • Every employee must have a Salary greater than 40,000.
  • The dataset must contain these four columns: id, name, age, salary.
  • The name must follow the "First Last" format.

Rule Checked: Salary > 40,000

Input Table

id name age salary
1 John Doe 28 50,000
2 Jane Smith 35 75,000
3 Bob Johnson 22 30,000

Detection Result (Record Anomaly)

id name age salary anomaly_reason
3 Bob Johnson 22 30,000 Salary is less than the required 40,000

Why this is a Record Anomaly:

The table structure is correct. Only one row’s value violates the rule.

Shape Anomaly

A shape anomaly identifies an anomalous structure within the analyzed data. The simplest shape anomaly is a dataset that doesn't match the expected schema because it lacks one or more fields. Some shape anomalies only apply to a subset of the data being analyzed and can therefore produce a count of the number of rows that reflect the anomalous concern. Where that is possible, the shape anomaly's anomalous_record_count is populated.

Note

Sometimes, shape anomalies only affect a subset of the dataset. This means that only certain rows exhibit the structural issue, rather than the entire dataset.

Example Use Case

Scenario

We have a Sales Orders dataset.

Rules:

  • Required columns: order_id, customer_id, order_date, total_amount.
  • order_date must be in YYYY-MM-DD format.

Input Table

order_id customer_id order_date
101 C001 2025-08-10
102 C002 08/11/2025
103 C003 2025-08-12

Detection Result (Shape Anomalies)

order_id customer_id order_date total_amount anomaly_reason
101 C001 2025-08-10 Missing total_amount column
102 C002 08/11/2025 Missing total_amount column; Date format incorrect
103 C003 2025-08-12 Missing total_amount column

Why this is a Shape Anomaly:

  • A required column (total_amount) is completely missing from the structure.
  • A field format (order_date in row 102) does not match the required YYYY-MM-DD pattern.
  • The problem is with the shape/structure of the dataset, not just a wrong value.

Note

When a shape anomaly affects only a portion of the dataset, Qualytics can count the number of rows that have the structural problem. This count is stored in the anomalous_record_count field, providing a clear measure of how widespread the issue is within the dataset. Example: Imagine a dataset that is supposed to have columns for id, name, age, and salary. If some rows are missing the salary column, this would be flagged as a shape anomaly. If this issue only affects 50 out of 1,000 rows, the anomalous_record_count would be 50, indicating that 50 rows have a structural issue.