Anomalies in Datastore
Anomalies in Qualytics are indicators of data that deviate from expected patterns or defined quality rules, often pointing to inconsistencies, errors, or unusual behavior within your datasets. These anomalies can arise when records or columns fail to meet validation checks—whether those checks are inferred by the system or authored manually by users.
Let’s get started 🚀
Navigation
Step 1: Log in to your Qualytics account and select the datastore from the left menu on which you want to manage your anomalies.
Step 2: Click on the “Anomalies” from the Navigation Tab.
Anomaly Status Categories
Anomalies exist in one of five distinct statuses, which are grouped into two broad categories:
Open Anomalies
By selecting Open Anomalies, you can view anomalies that have been detected but remain unacknowledged or unresolved. These anomalies require attention and may need further investigation or corrective action.
This option helps focus on unaddressed issues while allowing seamless navigation to All, Active, or Acknowledged anomalies as needed.
1. Active: By selecting Active Anomalies, you can focus on anomalies that are currently unresolved or require immediate attention. These are the anomalies that are still in play and have not yet been acknowledged, archived, or resolved.
2. Acknowledged: By selecting Acknowledged Anomalies, you can see all anomalies that have been reviewed and marked as acknowledged. This status indicates that the anomalies have been noted, though they may still require further action.
3. All: By selecting All Anomalies, you can view the complete list of anomalies, regardless of their status. This option helps you get a comprehensive overview of all issues that have been detected, whether they are currently active, acknowledged, or archived.
Archived Anomalies
By selecting Archived Anomalies, you can view anomalies that have been resolved or moved out of active consideration. Archiving anomalies allows you to keep a record of past issues without cluttering the active list.
You can also categorize the archived anomalies based on their status as Resolved, Duplicate and Invalid, to manage and review them effectively.
1. Resolved: This indicates that the anomaly was a legitimate data quality concern and has been addressed.
2. Duplicate: This indicates that the anomaly is a duplicate of an existing record and has already been addressed.
3. Invalid: This indicates that the anomaly is not a legitimate data quality concern and does not require further action.
4. All: Displays all archived anomalies, including those marked as Resolved, Duplicate, and Invalid, giving a comprehensive view of all past issues.
Anomaly Details
Anomaly Details View provides key insights into a specific data anomaly, including its status, anomalous record count, failed checks, and weight. It also shows when the anomaly was detected, the triggering scan, and the related datastore, table, and location. This view helps users quickly understand the scope and source of the anomaly for easier investigation and resolution.
Step 1: Click on the anomaly that you want to see the details of.
You will be navigated to the detail section, where you can view the Summary, Failed Checks, Source Records and Activity information.
Summary Section
The Summary section provides a quick overview of the anomaly's key attributes. It includes the anomaly’s status, total anomalous records, failed checks, weight, detection time, scan information, and the corresponding datastore and table. This section helps users quickly understand where the anomaly occurred and its potential impact.
1. Status and Type: Shows the current state and category of the anomaly. In this case, the anomaly is Active and of type Shape, indicating it relates to the structure or distribution of the data.
2. Anomalous Records: Indicates the total number of records affected by the anomaly. Here, 102 records were identified as anomalous.
3. Failed Check: Displays the number of data quality checks that were violated and triggered this anomaly. In this instance, 1 check was failed.
4. Weight: Represents the significance or impact of the anomaly. A higher weight value implies a more critical issue. This anomaly has a weight of 8.
5. Detected: Shows how long ago the anomaly was first detected.
When you hover over the time the anomaly was detected, a pop-up appears displaying the complete date and time.
6. Scan: Indicates the scan operation that detected the anomaly. Scan ID #21379 is shown here, and it was an incremental scan.
Info
When you click on the expand icon, then you will be directed to the Scan Results page where you can view the specific scan that detected the anomaly.
7. Source Datastore: Identifies the dataset where the anomaly was found. This anomaly was found in the Qualytics Databricks POC datastore.
Info
Clicking on the expand icon opens a detailed view and navigates to the dataset’s page, providing more information about the source datastore where the anomaly was found.
8. Table: Points to the specific table involved in the anomaly. The affected table is raw_order.
Info
Clicking on the expand icon navigates to the table’s page, providing more in-depth information about the table structure and contents.
9. Location: Displays the full path of the table in the datastore. This helps users trace the exact location of the anomaly within the data pipeline.
You can click on the copy icon to copy the full location path of the table where the anomaly was detected.
10. Tags: Highlights the severity or categorization of the anomaly. The tag High indicates a high-priority issue.
You can add or remove tags from the anomaly by clicking on the tag badge.
Copy Anomaly Link
Click on the Copy Anomaly Link icon (represented by share icon) located at the right corner of the summary section to copy a direct link to the selected anomaly.
You can then use this link for easy access to the specific anomaly.
Copy Anomaly UUID
Click the Copy Anomaly UUID icon (represented by the key icon) located at the top-right corner of the Summary section to copy the unique identifier of the anomaly. This UUID can be used for reference, tracking, or integration with other tools.
Copy Anomaly Fingerprint
Click on the Copy Anomaly Fingerprint icon (represented by fingerprint icon) located at the top right corner of the Summary section to copy a unique identifier that represents the structural and behavioral characteristics of an anomaly. This fingerprint helps in tracking and comparing anomalies across different datasets or timeframes.
View Related Anomalies
The View Related Anomalies option helps users identify and analyze anomalies that share the same fingerprint. It groups together anomalies that share the same violation rule, affected field, and anomalous record pattern.
Step 1: Click on View Related Anomalies option to view anomalies associated with the same fingerprint.
A panel will open on the right side, listing related anomalies with the same violation rule, field and record pattern.
Step 2: Click on the anomaly from the list of related anomalies to view its details.
A modal window titled “Anomaly Details” will appear, displaying all the details of the selected anomaly.
For more details on Anomaly Details, please refer to the Anomaly Details section in the documentation.
Failed Checks
The Failed Checks section lists the data quality checks that were violated and subsequently triggered the anomaly. Each listed item displays the check ID, type of violation, and a summarized description of the failure condition.
Click on a failed check to view the corresponding quality check information.
A right-side panel will open, allowing you to view the details without navigating to a different page.
No. | Field | Description |
---|---|---|
1 | Check ID & Name | This is the name and unique number of the check. It’s checking if dates fall within a specific range. |
2 | Description | Tells you what this check is doing. Here it is making sure values are between a minimum and maximum time. |
3 | Tags | Shows how important this issue is. In this case, it's marked as High priority. |
4 | Table | The name of the dataset or table where this issue was found. |
5 | Field | The specific column being checked. Here it is LAST_UPDATED_DATE. |
6 | Filter Clause | Lets you narrow down the data being checked. No filter is applied here. |
7 | Min | The earliest allowed date/time value. Anything before this is marked as failed. |
8 | Max | The latest allowed date/time value. Anything after this is marked as failed. |
9 | Coverage | Defines how many records must meet the condition. A 100% coverage means all records must comply with this check. |
Filter by Anomalous Fields
The Filter by anomalous fields section enables users to refine failed checks by selecting specific fields where anomalies were detected, helping to focus the analysis on relevant data issues.
No. | Filter | Description |
---|---|---|
1 | All | Selects or deselects all anomalous fields at once for bulk filtering. |
2 | LAST_UPDATED_DATE | Filters records based on anomalies detected in the LAST_UPDATED_DATE field. A clock icon indicates it's a date/time field, and the red badge shows active checks or issues associated with it. |
Failed Check Description
This allows users to view detailed explanations of each failed check by hovering over the information icon, helping users better understand the nature of the violation.
Source Records
The Source Records section displays all the data and fields related to the detected anomaly from the dataset. It is an Enrichment Datastore that is used to store the analyzed results, including any anomalies and additional metadata in files, hence it is recommended to add/link an enrichment datastore with your connected source datastore.
Note
In anomaly detection, source records are displayed as part of the Anomaly Details. For a Record anomaly, the specific record is highlighted. For a Shape anomaly, 10 samples from the underlying anomalous records are highlighted.
Sort Options
The Sort By dropdown allows you to organize the failed source records based on the selected criteria.
No. | Sort By | Description |
---|---|---|
1 | Name | This is the name and unique number of the check. It’s checking if dates fall within a specific range. |
2 | Weight | Tells you what this check is doing. Here it is making sure values are between a minimum and maximum time. |
3 | Quality Score | Shows how important this issue is. In this case, it's marked as High priority. |
Download Source Records
Download and export all source records (up to 250MB in a compressed .csv) for further analysis or external use.
Activity Section
The Activity section provides a complete timeline of actions and events related to the anomaly. It helps users track how the anomaly has been handled and by whom, ensuring better collaboration and accountability.
Users can leave comments to discuss the issue, add context, or communicate decisions. All comments are timestamped and attributed to the respective user.