Enrichment Datastore Overview
An Enrichment Datastore is a user-managed storage location where the Qualytics platform records and accesses metadata through a set of system-defined tables. It is purpose-built to capture metadata generated by the platform's profiling and scanning operations.
Key Points
-
Metadata Storage: The Enrichment Datastore acts as a dedicated mechanism for writing and retaining metadata that the platform generates. This includes information about anomalies, quality checks, field profiling, and additional details that enrich the source data.
-
Feature Enablement: By using the Enrichment Datastore, the platform unlocks certain features such as the previewing of source records. For instance, when an anomaly is detected, the platform typically previews a limited set of affected records. For a comprehensive view and persistent access, the Enrichment Datastore captures and maintains a complete snapshot of the source records associated with the anomalies.
-
User-Managed Location: While the Qualytics platform handles the generation and processing of metadata, the actual storage is user-managed. This means that the user maintains control over the Enrichment Datastore, deciding where and how this data is stored, adhering to their governance and compliance requirements.
-
Insight and Reporting: Beyond storing metadata, the Enrichment Datastore allows users to derive actionable insights and develop custom reports for a variety of use cases, from compliance tracking to data quality improvement initiatives.
Table Types
The Enrichment Datastore contains several types of tables, each serving a specific purpose in the data enrichment and remediation process. These tables are categorized into:
- Enrichment Tables
- Remediation Tables
- Metadata Tables
Enrichment Tables
When anomalies are detected, the platform writes metadata into three primary enrichment tables:
- <enrichment_prefix>_failed_checks
- <enrichment_prefix>_source_records
- <enrichment_prefix>_scan_operations
_FAILED_CHECKS Table
Acts as an associative entity that consolidates information on failed checks, associating anomalies with their respective quality checks.
Columns
Name | Data Type | Description |
---|---|---|
QUALITY_CHECK_ID | NUMBER | Unique identifier for the quality check. |
ANOMALY_UUID | STRING | UUID for the anomaly detected. |
QUALITY_CHECK_MESSAGE | STRING | Message describing the quality check outcome. |
SUGGESTED_REMEDIATION_FIELD | STRING | Field suggesting remediation. |
SUGGESTED_REMEDIATION_VALUE | STRING | Suggested value for remediation. |
SUGGESTED_REMEDIATION_SCORE | FLOAT | Score indicating confidence in remediation. |
QUALITY_CHECK_RULE_TYPE | STRING | Type of rule applied for quality check. |
QUALITY_CHECK_TAGS | STRING | Tags associated with the quality check. |
QUALITY_CHECK_PARAMETERS | STRING | Parameters used for the quality check. |
QUALITY_CHECK_DESCRIPTION | STRING | Description of the quality check. |
OPERATION_ID | NUMBER | Identifier for the operation detecting anomaly. |
DETECTED_TIME | TIMESTAMP | Timestamp when the anomaly was detected. |
SOURCE_CONTAINER | STRING | Name of the source data container. |
SOURCE_PARTITION | STRING | Partition of the source data. |
SOURCE_DATASTORE | STRING | Datastore where the source data resides. |
Info
This table is not characterized by unique ANOMALY_UUID
or QUALITY_CHECK_ID
values alone. Instead, the combination of ANOMALY_UUID
and QUALITY_CHECK_ID
serves as a composite key, uniquely identifying each record in the table.
_SOURCE_RECORDS Table
Stores source records in JSON format, primarily to enable the preview source record feature in the Qualytics App.
Columns
Name | Data Type | Description |
---|---|---|
SOURCE_CONTAINER | STRING | Name of the source data container. |
SOURCE_PARTITION | STRING | Partition of the source data. |
ANOMALY_UUID | STRING | UUID for the associated anomaly. |
CONTEXT | STRING | Contextual information for the anomaly. |
RECORD | STRING | JSON representation of the source record. |
_SCAN_OPERATIONS Table
Captures and stores the results of every scan operation conducted on the Qualytics Platform.
Columns
Name | Data Type | Description |
---|---|---|
OPERATION_ID | NUMBER | Unique identifier for the scan operation. |
DATASTORE_ID | NUMBER | Identifier for the source datastore associated with the operation. |
CONTAINER_ID | NUMBER | Identifier for the container associated with the operation. |
CONTAINER_SCAN_ID | NUMBER | Identifier for the container scan associated with the operation. |
PARTITION_NAME | STRING | Name of the source partition on which the scan operation is performed. |
INCREMENTAL | BOOLEAN | Boolean flag indicating whether the scan operation is incremental. |
RECORDS_PROCESSED | NUMBER | Total number of records processed during the scan operation. |
ENRICHMENT_SOURCE_RECORD_LIMIT | NUMBER | Maximum number of records written to the enrichment for each anomaly detected. |
MAX_RECORDS_ANALYZED | NUMBER | Maximum number of records analyzed in the scan operation. |
ANOMALY_COUNT | NUMBER | Total number of anomalies identified in the scan operation. |
START_TIME | TIMESTAMP | Timestamp marking the start of the scan operation. |
END_TIME | TIMESTAMP | Timestamp marking the end of the scan operation. |
RESULT | STRING | Textual representation of the scan operation's status. |
MESSAGE | STRING | Detailed message regarding the process of the scan operation. |
Remediation Tables
When anomalies are detected in a container, the platform has the capability to create remediation tables in the Enrichment Datastore. These tables are detailed snapshots of the affected container, capturing the state of the data at the time of anomaly detection. They also include additional columns for metadata and remediation purposes. However, the creation of these tables depends upon the chosen remediation strategy during the scan operation.
Currently, there are three types of remediation strategies:
- None: No remediation tables will be created, regardless of anomaly detection.
- Append: Replicate source containers using an append-first strategy.
- Overwrite: Replicate source containers using an overwrite strategy.
Note
The naming convention for the remediation tables follows the pattern of <enrichment_prefix>_remediation_<container_id>
, where <enrichment_prefix>
is user-defined during the Enrichment Datastore configuration and <container_name>
corresponds to the original source container.
Illustrative Table
_{ENRICHMENT_CONTAINER_PREFIX}_REMEDIATION_{CONTAINER_ID}
This remediation table is an illustrative snapshot of the "Orders" container for reference purposes.
Name | Data Type | Description |
---|---|---|
_QUALYTICS_SOURCE_PARTITION | STRING | The partition from the source data container. |
ANOMALY_UUID | STRING | Unique identifier of the anomaly. |
_QUALYTICS_APPEARS_IN | STRING | Indicates whether the record came from the target or reference container in relation to the check definition. |
ORDERKEY | NUMBER | Unique identifier of the order. |
CUSTKEY | NUMBER | The customer key related to the order. |
ORDERSTATUS | CHAR | The status of the order (e.g., 'F' for 'finished'). |
TOTALPRICE | FLOAT | The total price of the order. |
ORDERDATE | DATE | The date when the order was placed. |
ORDERPRIORITY | STRING | Priority of the order (e.g., 'urgent'). |
CLERK | STRING | The clerk who took the order. |
SHIPPRIORITY | INTEGER | The priority given to the order for shipping. |
COMMENT | STRING | Comments related to the order. |
Note
In addition to capturing the original container fields, the platform includes three metadata columns designed to assist in the analysis and remediation process.
- _QUALYTICS_SOURCE_PARTITION
- ANOMALY_UUID
- _QUALYTICS_APPEARS_IN
- This optional column appears based on specific check rule types, such as
Is Replica Of
, to provide additional context.
- This optional column appears based on specific check rule types, such as
Understanding Remediation Tables vs. Source Record Tables
When managing data anomalies in containers, it's important to understand the structures of Remediation Tables and Source Record Tables in the Enrichment Datastore.
Remediation Tables
Purpose: Remediation tables are designed to capture detailed snapshots of the affected containers at the time of anomaly detection. They serve as a primary tool for remediation actions.
Creation: These tables are generated based on the remediation strategy selected during the scan operation:
- None: No tables are created.
- Append: Tables are created with new data appended.
- Overwrite: Tables are created and existing data is overwritten.
Structure: The structure includes all columns from the source container, along with additional columns for metadata and remediation purposes. The naming convention for these tables is <enrichment_prefix>_remediation_<container_id>
, where <enrichment_prefix>
is defined during the Enrichment Datastore configuration.
Source Record Tables
Purpose: The Source Record Table is mainly used within the Qualytics App to display anomalies directly to users by showing the source records.
Structure: Unlike remediation tables, the Source Record Table stores each record in a JSON format within a single column named RECORD
, along with other metadata columns like SOURCE_CONTAINER
, SOURCE_PARTITION
, ANOMALY_UUID
, and CONTEXT
.
Key Differences
-
Format: Remediation tables are structured with separate columns for each data field, making them easier to use for consulting and remediation processes.
Source Record Tables store data in a JSON format within a single column, which can be less convenient for direct data operations.
-
Usage: Remediation tables are optimal for performing corrective actions and are designed to integrate easily with data workflows.
Source Record Tables are best suited for reviewing specific anomalies within the Qualytics App due to their format and presentation.
Recommendation
For users intending to perform querying or need detailed snapshots for audit purposes, Remediation Tables are recommended.
For those who need to quickly review anomalies directly within the Qualytics App, Source Record Tables are more suitable due to their straightforward presentation of data.
Metadata Tables
The Qualytics platform enables users to manually export metadata into the enrichment datastore, providing a structured approach to data analysis and management. These metadata tables are structured to reflect the evolving characteristics of data entities, primarily focusing on aspects that are subject to changes.
Currently, the following assets are available for exporting:
- _<enrichment_prefix>_export_anomalies
- _<enrichment_prefix>_export_checks
- _<enrichment_prefix>_export_field_profiles
Note
The strategy used for managing these metadata tables employs a create or replace
approach, meaning that the export process will create a new table if one does not exist, or replace it entirely if it does. This means that any previous data will be overwritten.
For more detailed information on exporting metadata, please refer to the export documentation.
_EXPORT_ANOMALIES Table
Contains metadata from anomalies in a distinct normalized format. This table is specifically designed to capture the mutable states of anomalies, emphasizing their status changes.
Columns
Name | Data Type | Description |
---|---|---|
ID | NUMBER | Unique identifier for the anomaly. |
CREATED | TIMESTAMP | Timestamp of anomaly creation. |
UUID | UUID | Universal Unique Identifier of the anomaly. |
TYPE | STRING | Type of the anomaly (e.g., 'shape'). |
STATUS | STRING | Current status of the anomaly (e.g., 'Active'). |
GLOBAL_TAGS | STRING | Tags associated globally with the anomaly. |
CONTAINER_ID | NUMBER | Identifier for the associated container. |
SOURCE_CONTAINER | STRING | Name of the source container. |
DATASTORE_ID | NUMBER | Identifier for the associated datastore. |
SOURCE_DATASTORE | STRING | Name of the source datastore. |
GENERATED_AT | TIMESTAMP | Timestamp when the export was generated. |
_EXPORT_CHECKS Table
Contains metadata from quality checks.
Columns
Name | Data Type | Description |
---|---|---|
ADDITIONAL_METADATA | STRING | JSON-formatted string containing additional metadata for the check. |
COVERAGE | FLOAT | Represents the expected tolerance of the rule. |
CREATED | STRING | Created timestamp of the check. |
DELETED_AT | STRING | Deleted timestamp of the check. |
DESCRIPTION | STRING | Description of the check. |
FIELDS | STRING | Fields involved in the check separated by comma. |
FILTER | STRING | Criteria used to filter data when asserting the check. |
GENERATED_AT | STRING | Indicates when the export was generated. |
GLOBAL_TAGS | STRING | Represents the global tags of the check separated by comma. |
HAS_PASSED | BOOLEAN | Boolean indicator of whether the check has passed its last assertion . |
ID | NUMBER | Unique identifier for the check. |
INFERRED | BOOLEAN | Indicates whether the check was inferred by the platform. |
IS_NEW | BOOLEAN | Flags if the check is new. |
LAST_ASSERTED | STRING | Timestamp of the last assertion performed on the check. |
LAST_EDITOR | STRING | Represents the last editor of the check. |
LAST_UPDATED | STRING | Represents the last updated timestamp of the check. |
NUM_CONTAINER_SCANS | NUMBER | Number of containers scanned. |
PROPERTIES | STRING | Specific properties for the check in a JSON format. |
RULE_TYPE | STRING | Type of rule applied in the check. |
WEIGHT | FLOAT | Represents the weight of the check. |
DATASTORE_ID | NUMBER | Identifier of the datastore used in the check. |
CONTAINER_ID | NUMBER | Identifier of the container used in the check. |
TEMPLATE_ID | NUMBER | Identifier of the template id associated tothe check. |
IS_TEMPLATE | BOOLEAN | Indicates wheter the check is a template or not. |
SOURCE_CONTAINER | STRING | Name of the container used in the check. |
SOURCE_DATASTORE | STRING | Name of the datastore used in the check. |
_EXPORT_CHECK_TEMPLATES Table
Contains metadata from check templates.
Columns
Name | Data Type | Description |
---|---|---|
ADDITIONAL_METADATA | STRING | JSON-formatted string containing additional metadata for the check. |
COVERAGE | FLOAT | Represents the expected tolerance of the rule. |
CREATED | STRING | Created timestamp of the check. |
DELETED_AT | STRING | Deleted timestamp of the check. |
DESCRIPTION | STRING | Description of the check. |
FIELDS | STRING | Fields involved in the check separated by comma. |
FILTER | STRING | Criteria used to filter data when asserting the check. |
GENERATED_AT | STRING | Indicates when the export was generated. |
GLOBAL_TAGS | STRING | Represents the global tags of the check separated by comma. |
ID | NUMBER | Unique identifier for the check. |
IS_NEW | BOOLEAN | Flags if the check is new. |
IS_TEMPLATE | BOOLEAN | Indicates wheter the check is a template or not. |
LAST_EDITOR | STRING | Represents the last editor of the check. |
LAST_UPDATED | STRING | Represents the last updated timestamp of the check. |
PROPERTIES | STRING | Specific properties for the check in a JSON format. |
RULE_TYPE | STRING | Type of rule applied in the check. |
TEMPLATE_CHECKS_COUNT | NUMBER | The count of associated checks to the template. |
TEMPLATE_LOCKED | BOOLEAN | Indicates wheter the check template is locked or not. |
WEIGHT | FLOAT | Represents the weight of the check. |
_EXPORT_FIELD_PROFILES Table
Contains metadata from field profiles.
Columns
Name | Data Type | Description |
---|---|---|
APPROXIMATE_DISTINCT_VALUES | FLOAT | Estimated number of distinct values in the field. |
COMPLETENESS | FLOAT | Ratio of non-null entries to total entries in the field. |
CONTAINER_ID | NUMBER | Identifier for the container holding the field. |
SOURCE_CONTAINER | STRING | Name of the container holding the field. |
CONTAINER_STORE_TYPE | STRING | Storage type of the container. |
CREATED | STRING | Date when the field profile was created. |
DATASTORE_ID | NUMBER | Identifier for the datastore containing the field. |
SOURCE_DATASTORE | STRING | Name of the datastore containing the field. |
DATASTORE_TYPE | STRING | Type of datastore. |
ENTROPY | FLOAT | Measure of randomness in the information being processed. |
FIELD_GLOBAL_TAGS | STRING | Global tags associated with the field. |
FIELD_ID | NUMBER | Unique identifier for the field. |
FIELD_NAME | STRING | Name of the field being profiled. |
FIELD_PROFILE_ID | NUMBER | Identifier for the field profile record. |
FIELD_QUALITY_SCORE | FLOAT | Score representing the quality of the field. |
FIELD_TYPE | STRING | Data type of the field. |
FIELD_WEIGHT | NUMBER | Weight assigned to the field for quality scoring. |
GENERATED_AT | STRING | Date when the field profile was generated. |
HISTOGRAM_BUCKETS | STRING | Distribution of data within the field represented as buckets. |
IS_NOT_NORMAL | BOOLEAN | Indicator of whether the field data distribution is not normal. |
KLL | STRING | Sketch summary of the field data distribution. |
KURTOSIS | FLOAT | Measure of the tailedness of the probability distribution. |
MAX | FLOAT | Maximum value found in the field. |
MAX_LENGTH | FLOAT | Maximum length of string entries in the field. |
MEAN | FLOAT | Average value of the field's data. |
MEDIAN | FLOAT | Middle value in the field's data distribution. |
MIN | FLOAT | Minimum value found in the field. |
MIN_LENGTH | FLOAT | Minimum length of string entries in the field. |
NAME | STRING | Descriptive name of the field. |
Q1 | FLOAT | First quartile in the field's data distribution. |
Q3 | FLOAT | Third quartile in the field's data distribution. |
SKEWNESS | FLOAT | Measure of the asymmetry of the probability distribution. |
STD_DEV | FLOAT | Standard deviation of the field's data. |
SUM | FLOAT | Sum of all numerical values in the field. |
TYPE_DECLARED | BOOLEAN | Indicator of whether the field type is explicitly declared. |
UNIQUE_DISTINCT_RATIO | FLOAT | Ratio of unique distinct values to the total distinct values. |
Diagram
The diagram below provides a visual representation of the associations between various tables in the Enrichment Datastore. It illustrates how tables can be joined to track and analyze data across different processes.
Handling JSON and string splitting
SELECT
PARSE_JSON(ADDITIONAL_METADATA):metadata_1::string AS Metadata1_Key1,
PARSE_JSON(ADDITIONAL_METADATA):metadata_2::string AS Metadata2_Key1,
PARSE_JSON(ADDITIONAL_METADATA):metadata_3::string AS Metadata3_Key1,
-- Add more lines as needed up to MetadataN
CONTAINER_ID,
COVERAGE,
CREATED,
DATASTORE_ID,
DELETED_AT,
DESCRIPTION,
SPLIT_PART(FIELDS, ',', 1) AS Field1,
SPLIT_PART(FIELDS, ',', 2) AS Field2,
-- Add more lines as needed up to FieldN
FILTER,
GENERATED_AT,
SPLIT_PART(GLOBAL_TAGS, ',', 1) AS Tag1,
SPLIT_PART(GLOBAL_TAGS, ',', 2) AS Tag2,
-- Add more lines as needed up to TagN
HAS_PASSED,
ID,
INFERRED,
IS_NEW,
IS_TEMPLATE,
LAST_ASSERTED,
LAST_EDITOR,
LAST_UPDATED,
NUM_CONTAINER_SCANS,
PARSE_JSON(PROPERTIES):allow_other_fields::string AS Property_AllowOtherFields,
PARSE_JSON(PROPERTIES):assertion::string AS Property_Assertion,
PARSE_JSON(PROPERTIES):comparison::string AS Property_Comparison,
PARSE_JSON(PROPERTIES):datetime_::string AS Property_Datetime,
-- Add more lines as needed up to Property
RULE_TYPE,
SOURCE_CONTAINER,
SOURCE_DATASTORE,
TEMPLATE_ID,
WEIGHT
FROM "_EXPORT_CHECKS";
SELECT
(ADDITIONAL_METADATA::json ->> 'metadata_1') AS Metadata1_Key1,
(ADDITIONAL_METADATA::json ->> 'metadata_2') AS Metadata2_Key1,
(ADDITIONAL_METADATA::json ->> 'metadata_3') AS Metadata3_Key1,
-- Add more lines as needed up to MetadataN
CONTAINER_ID,
COVERAGE,
CREATED,
DATASTORE_ID,
DELETED_AT,
DESCRIPTION,
(string_to_array(FIELDS, ','))[1] AS Field1,
(string_to_array(FIELDS, ','))[2] AS Field2,
-- Add more lines as needed up to FieldN
FILTER,
GENERATED_AT,
(string_to_array(GLOBAL_TAGS, ','))[1] AS Tag1,
(string_to_array(GLOBAL_TAGS, ','))[2] AS Tag2,
-- Add more lines as needed up to TagN
HAS_PASSED,
ID,
INFERRED,
IS_NEW,
IS_TEMPLATE,
LAST_ASSERTED,
LAST_EDITOR,
LAST_UPDATED,
NUM_CONTAINER_SCANS,
(PROPERTIES::json ->> 'allow_other_fields') AS Property_AllowOtherFields,
(PROPERTIES::json ->> 'assertion') AS Property_Assertion,
(PROPERTIES::json ->> 'comparison') AS Property_Comparison,
(PROPERTIES::json ->> 'datetime_') AS Property_Datetime,
-- Add more lines as needed up to PropertyN
RULE_TYPE,
SOURCE_CONTAINER,
SOURCE_DATASTORE,
TEMPLATE_ID,
WEIGHT
FROM "_EXPORT_CHECKS";
SELECT
(ADDITIONAL_METADATA->>'$.metadata_1') AS Metadata1_Key1,
(ADDITIONAL_METADATA->>'$.metadata_2') AS Metadata2_Key1,
(ADDITIONAL_METADATA->>'$.metadata_3') AS Metadata3_Key1,
-- Add more lines as needed up to MetadataN
CONTAINER_ID,
COVERAGE,
CREATED,
DATASTORE_ID,
DELETED_AT,
DESCRIPTION,
SUBSTRING_INDEX(FIELDS, ',', 1) AS Field1,
-- Add more lines as needed up to FieldN
SUBSTRING_INDEX(GLOBAL_TAGS, ',', 1) AS Tag1,
-- Add more lines as needed up to TagN
HAS_PASSED,
ID,
INFERRED,
IS_NEW,
IS_TEMPLATE,
LAST_ASSERTED,
LAST_EDITOR,
LAST_UPDATED,
NUM_CONTAINER_SCANS,
(PROPERTIES->>'$.allow_other_fields') AS Property_AllowOtherFields,
(PROPERTIES->>'$.assertion') AS Property_Assertion,
(PROPERTIES->>'$.comparison') AS Property_Comparison,
(PROPERTIES->>'$.datetime_') AS Property_Datetime,
-- Add more lines as needed up to PropertyN
RULE_TYPE,
SOURCE_CONTAINER,
SOURCE_DATASTORE,
TEMPLATE_ID,
WEIGHT
FROM "_EXPORT_CHECKS";
Usage Notes
- Both metadata tables and remediation tables, are designed to be ephemeral and thus are recommended to be used as temporary datasets. Users are advised to move this data to a more permanent dataset for long-term storage and reporting.
- The anomaly UUID in the remediation tables acts as a link to the detailed data in the _anomaly enrichment table. This connection not only shows the number of failed checks but also provides insight into each one, such as the nature of the issue, the type of rule violated, and associated check tags. Additionally, when available, suggested remediation actions, including suggested field modifications and values, are presented alongside a score indicating the suggested action's potential effectiveness. This information helps users to better understand the specifics of each anomaly related to the remediation tables.
- The Qualytics platform is configured to capture and write a maximum of 10 rows of data per anomaly by default for both the _source_records enrichment table and the remediation tables. To adjust this limit, users can utilize the
enrichment_source_record_limit
parameter within the Scan Operation settings. This parameter accepts a minimum value of 10 but allows the specification of a higher limit, up to an unrestricted number of rows per anomaly. It is important to note that if an anomaly is associated with fewer than 10 records, the platform will only write the actual number of records where the anomaly was detected.
API Payload Examples
Retrieving Enrichment Datastore Tables
Endpoint (Get)
/api/datastores/{enrichment-datastore-id}/listing
(get)
[
{
"name":"_datastore_prefix_scan_operations",
"label":"scan_operations",
"datastore":{
"id":123,
"name":"My Datastore",
"store_type":"jdbc",
"type":"postgresql",
"enrich_only":false,
"enrich_container_prefix":"_datastore_prefix",
"favorite":false
}
},
{
"name":"_datastore_prefix_source_records",
"label":"source_records",
"datastore":{
"id":123,
"name":"My Datastore",
"store_type":"jdbc",
"type":"postgresql",
"enrich_only":false,
"enrich_container_prefix":"_datastore_prefix",
"favorite":false
}
},
{
"name":"_datastore_prefix_failed_checks",
"label":"failed_checks",
"datastore":{
"id":123,
"name":"My Datastore",
"store_type":"jdbc",
"type":"postgresql",
"enrich_only":false,
"enrich_container_prefix":"_datastore_prefix",
"favorite":false
}
},
{
"name": "_datastore_prefix_remediation_container_id",
"label": "table_name",
"datastore": {
"id": 123,
"name": "My Datastore",
"store_type": "jdbc",
"type": "postgresql",
"enrich_only": false,
"enrich_container_prefix": "_datastore_prefix",
"favorite": false
}
}
]
Retrieving Enrichment Datastore Source Records
Endpoint (Get)
/api/datastores/{enrichment-datastore-id}/source-records?path={_source-record-table-prefix}
(get)
Endpoint With Filters (Get)
/api/datastores/{enrichment-datastore-id}/source-records?filter=anomaly_uuid='{uuid}'&path={_source-record-table-prefix}
(get)
{
"source_records": "[{\"source_container\":\"table_name\",\"source_partition\":\"partition_name\",\"anomaly_uuid\":\"f11d4e7c-e757-4bf1-8cd6-d156d5bc4fa5\",\"context\":null,\"record\":\"{\\\"P_NAME\\\":\\\"\\\\\\\"strategize intuitive systems\\\\\\\"\\\",\\\"P_TYPE\\\":\\\"\\\\\\\"Radiographer, therapeutic\\\\\\\"\\\",\\\"P_RETAILPRICE\\\":\\\"-24.69\\\",\\\"LAST_MODIFIED_TIMESTAMP\\\":\\\"2023-09-29 11:17:19.048\\\",\\\"P_MFGR\\\":null,\\\"P_COMMENT\\\":\\\"\\\\\\\"Other take so.\\\\\\\"\\\",\\\"P_PARTKEY\\\":\\\"845004850\\\",\\\"P_SIZE\\\":\\\"4\\\",\\\"P_CONTAINER\\\":\\\"\\\\\\\"MED BOX\\\\\\\"\\\",\\\"P_BRAND\\\":\\\"\\\\\\\"PLC\\\\\\\"\\\"}\"}]"
}
Retrieving Enrichment Datastore Remediation
Endpoint (Get)
/api/datastores/{enrichment-datastore-id}/source-records?path={_remediation-table-prefix}
(get)
Endpoint With Filters (Get)
/api/datastores/{enrichment-datastore-id}/source-records?filter=anomaly_uuid='{uuid}'&path={_remediation-table-prefix}
(get)
{
"source_records": "[{\"source_container\":\"table_name\",\"source_partition\":\"partition_name\",\"anomaly_uuid\":\"f11d4e7c-e757-4bf1-8cd6-d156d5bc4fa5\",\"context\":null,\"record\":\"{\\\"P_NAME\\\":\\\"\\\\\\\"strategize intuitive systems\\\\\\\"\\\",\\\"P_TYPE\\\":\\\"\\\\\\\"Radiographer, therapeutic\\\\\\\"\\\",\\\"P_RETAILPRICE\\\":\\\"-24.69\\\",\\\"LAST_MODIFIED_TIMESTAMP\\\":\\\"2023-09-29 11:17:19.048\\\",\\\"P_MFGR\\\":null,\\\"P_COMMENT\\\":\\\"\\\\\\\"Other take so.\\\\\\\"\\\",\\\"P_PARTKEY\\\":\\\"845004850\\\",\\\"P_SIZE\\\":\\\"4\\\",\\\"P_CONTAINER\\\":\\\"\\\\\\\"MED BOX\\\\\\\"\\\",\\\"P_BRAND\\\":\\\"\\\\\\\"PLC\\\\\\\"\\\"}\"}]"
}
Retrieving Enrichment Datastore Failed Checks
Endpoint (Get)
/api/datastores/{enrichment-datastore-id}/source-records?path={_failed-checks-table-prefix}
(get)
Endpoint With Filters (Get)
/api/datastores/{enrichment-datastore-id}/source-records?filter=anomaly_uuid='{uuid}'&path={_failed-checks-table-prefix}
(get)
{
"source_records": "[{\"quality_check_id\":155481,\"anomaly_uuid\":\"1a937875-6bce-4bfe-8701-075ba66be364\",\"quality_check_message\":\"{\\\"SNPSHT_TIMESTAMP\\\":\\\"2023-09-03 10:26:15.0\\\"}\",\"suggested_remediation_field\":null,\"suggested_remediation_value\":null,\"suggested_remediation_score\":null,\"quality_check_rule_type\":\"greaterThanField\",\"quality_check_tags\":\"Time-Sensitive\",\"quality_check_parameters\":\"{\\\"field_name\\\":\\\"SNPSHT_DT\\\",\\\"inclusive\\\":false}\",\"quality_check_description\":\"Must have a value greater than the value of SNPSHT_DT\",\"operation_id\":28162,\"detected_time\":\"2024-03-29T15:08:07.585Z\",\"source_container\":\"ACTION_TEST_CLIENT_V3\",\"source_partition\":\"ACTION_TEST_CLIENT_V3\",\"source_datastore\":\"DB2 Dataset\"}]"
}
Retrieving Enrichment Datastore Scan Operations
Endpoint (Get)
/api/datastores/{enrichment-datastore-id}/source-records?path={_scan-operations-table-prefix}
(get)
Endpoint With Filters (Get)
/api/datastores/{enrichment-datastore-id}/source-records?filter=operation_id='{operation-id}'&path={_scan-operations-table-prefix}
(get)
{
"source_records": "[{\"operation_id\":22871,\"datastore_id\":850,\"container_id\":7239,\"container_scan_id\":43837,\"partition_name\":\"ACTION_TEST_CLIENT_V3\",\"incremental\":true,\"records_processed\":0,\"enrichment_source_record_limit\":10,\"max_records_analyzed\":-1,\"anomaly_count\":0,\"start_time\":\"2023-12-04T20:35:54.194Z\",\"end_time\":\"2023-12-04T20:35:54.692Z\",\"result\":\"success\",\"message\":null}]"
}
Retrieving Enrichment Datastore Exported Metadata
Endpoint (Get)
/api/datastores/{enrichment-datastore-id}/source-records?path={_export-metadata-table-prefix}
(get)
Endpoint With Filters (Get)
/api/datastores/{enrichment-datastore-id}/source-records?filter=container_id='{container-id}'&path={_export-metadata-table-prefix}
(get)
{
"source_records": "[{\"container_id\":13511,\"created\":\"2024-06-10T17:07:20.751438Z\",\"datastore_id\":1198,\"generated_at\":\"2024-06-11 18:42:31+0000\",\"global_tags\":\"\",\"id\":224818,\"source_container\":\"PARTSUPP-FORMATTED.csv\",\"source_datastore\":\"TPCH GCS\",\"status\":\"Active\",\"type\":\"shape\",\"uuid\":\"f2d4fae3-982b-45a1-b289-5854b7af4b03\"}]"
}
{
"source_records": "[{\"additional_metadata\":null,\"container_id\":13515,\"coverage\":1.0,\"created\":\"2024-06-10T16:27:05.600041Z\",\"datastore_id\":1198,\"deleted_at\":null,\"description\":\"Must have a numeric value above >= 0\",\"fields\":\"L_QUANTITY\",\"filter\":null,\"generated_at\":\"2024-06-11 18:42:38+0000\",\"global_tags\":\"\",\"has_passed\":false,\"id\":196810,\"inferred\":true,\"is_new\":false,\"is_template\":false,\"last_asserted\":\"2024-06-11T18:04:24.480899Z\",\"last_editor\":null,\"last_updated\":\"2024-06-10T17:07:43.248644Z\",\"num_container_scans\":4,\"properties\":null,\"rule_type\":\"notNegative\",\"source_container\":\"LINEITEM-FORMATTED.csv\",\"source_datastore\":\"TPCH GCS\",\"template_id\":null,\"weight\":7.0}]"
}
{
"source_records": "[{\"approximate_distinct_values\":106944.0,\"completeness\":0.7493389459,\"container_container_type\":\"file\",\"container_id\":13509,\"created\":\"2024-06-10T16:23:48.457907Z\",\"datastore_id\":1198,\"datastore_type\":\"gcs\",\"entropy\":null,\"field_global_tags\":\"\",\"field_id\":145476,\"field_name\":\"C_ACCTBAL\",\"field_profile_id\":882170,\"field_quality_score\":\"{\\\"total\\\": 81.70052209952111, \\\"completeness\\\": 74.93389459101233, \\\"coverage\\\": 66.66666666666666, \\\"conformity\\\": null, \\\"consistency\\\": 100.0, \\\"precision\\\": 100.0, \\\"timeliness\\\": null, \\\"volumetrics\\\": null, \\\"accuracy\\\": 100.0}\",\"field_type\":\"Fractional\",\"field_weight\":1,\"generated_at\":\"2024-06-11 18:42:32+0000\",\"histogram_buckets\":null,\"is_not_normal\":true,\"kll\":null,\"kurtosis\":-1.204241522,\"max\":9999.99,\"max_length\":null,\"mean\":4488.8079264033,\"median\":4468.34,\"min\":-999.99,\"min_length\":null,\"name\":\"C_ACCTBAL\",\"q1\":1738.87,\"q3\":7241.17,\"skewness\":0.0051837205,\"source_container\":\"CUSTOMER-FORMATTED.csv\",\"source_datastore\":\"TPCH GCS\",\"std_dev\":3177.3005493585,\"sum\":5.0501333575999904E8,\"type_declared\":false,\"unique_distinct_ratio\":null}]"
}