Quality Scores
Quality Scores are quantified measures of data quality calculated at the field and container levels, recorded as time-series to enable tracking of changes over time. Scores range from 0-100 with higher values indicating superior quality for the intended purpose. These scores integrate eight distinct dimensions, providing a granular analysis of the attributes that impact the overall data quality. The overall score is a composite reflecting the relative importance and configured weights of these factors:
- Completeness: Measures the average percentage of non-null values in a field throughout the measurement period. For example, if a "phone_number" field has values present in 90 out of 100 records, its completeness score for the measurement would be 90%.
- Coverage: Measures the number of quality checks defined for monitoring the field's quality.
- Conformity: Measures how well the data adheres to specified formats, patterns, and business rules. For example, checking if dates follow the required format (YYYY-MM-DD) or if phone numbers match the expected pattern.
See Appendix: Conformity Rule Types for the full Conformity rule type list. - Consistency: Measures uniformity in type and scale across all data representations. Verifies that data maintains the same type and representation over time. For example, ensuring that a typed numeric column does not change over time to a string.
- Precision: Evaluates the resolution of field values against defined quality checks.
See Appendix: Precision Rule Types for the full Precision rule type list. - Timeliness: Gauges data availability according to schedule.
See Appendix: Timeliness Rule Types for the full Timeliness rule type list. - Volumetrics: Analyzes consistency in data size and shape over time.
See Appendix: Volumetric Rule Types for the full Volumetrics rule type list. - Accuracy: Determines the fidelity of field values to their real-world counterparts or expected values.
How Completeness, Precision, and Accuracy Differ
| Dimension | Focus | Example Question It Answers |
|---|---|---|
| Completeness | Are values present? | What % of rows in phone_number are non-null? |
| Precision | Are values within the expected level of detail or granularity? | Are all age values between 0–120? Do decimals have required 2-digit precision? |
| Accuracy | Are values correct compared to real-world truth or integrity checks? | Is the relationship between square_footage and price maintained? |
Important
A data asset's quality score is a measure of its fitness for the intended use case. It is not a simple measure of error, but instead a holistic confidence measure that considers the eight fundamental dimensions of data quality as described below. Quality scores are dynamic and will evolve as your data and business needs change over time.
Field-Level Quality Scoring
Each field receives individual scores for eight quality dimensions, each evaluated on a 0-100 scale.
Completeness Dimension
The Completeness score measures the average percentage of non-null values in a field over the measurement period.
How Completeness is Calculated
- Scale: 0 to 100, representing the average completeness percentage
- Measurement period: Defined by the configured decay time (default 180 days)
- Formula: Average of
(non-null values / total records) × 100across all measurements in the period - Example: If a "phone_number" field averages 90% completeness over the measurement period, its completeness score would be 90
Coverage Dimension
The Coverage score measures how many distinct quality checks have been applied to a field. It is designed to reward the first few checks heavily, then taper off as more checks are added, following a curve of diminishing returns.
How Coverage is Calculated
- Scale: The score ranges continuously from 0 to 100
- Anchor points:
- 0 checks → score of 0
- 1 check → score of approximately 60
- Diminishing returns: Each additional check contributes less than the previous one. As the number of checks grows, the score approaches 100 but never exceeds it
Mathematically, the scoring curve follows an exponential growth model:
where n is the number of checks and k is tuned so that 1 check = 60.The first few checks have the greatest impact, and additional checks contribute progressively less:
| Checks | Approximate Score |
|---|---|
| 0 | 0 |
| 1 | ~60 |
| 2 | ~84 |
| 3 | ~94 |
| 4+ | ~97–100 |
Why This Model?
- Strong early reward: The first check dramatically increases confidence in field coverage
- Fair balance: More checks always improve the score, but the improvement diminishes as coverage becomes robust, preventing runaway inflation
- Practical takeaway: The first check has the biggest impact. 3 checks per field gets you to ~94%
Field vs. Container Coverage
At the field level, Coverage reflects the number of distinct quality checks defined for that field. At the container level, Coverage is an aggregate of field-level coverage scores, further adjusted by scan frequency (more frequent scans → greater confidence).
Conformity Dimension
The Conformity score measures how well the data adheres to specified formats, patterns, and business rules.
How Conformity is Calculated
- Scale: 0 to 100 based on the ratio of conforming values
- Formula:
(1 - (rows with anomalous values as specified by conformity checks / min(scanned rows, container rows))) × 100 - Denominator: Uses the smaller of scanned row count or container row count to prevent score inflation
- Applicable rule types: Pattern matching, length constraints, type validation, schema expectations, and format-specific validations
See Appendix: Conformity Rule Types for the full Conformity rule type list.
Examples
- Email field where 95% of scanned/total rows match valid email pattern → Score ~95
- Date field with consistent YYYY-MM-DD format → Score ~100
- Phone field with mixed formats and invalid entries → Score ~60
Consistency Dimension
The Consistency score measures how stable a field's values remain over time compared to their expected statistical profile. This highlights fields that are "drifting" (changing shape, format, or density).
How Consistency is Calculated
-
Check for type changes
- If a field flips between types (e.g., sometimes a number, sometimes a string), score is set to 0
-
Collect summary statistics per field type:
- Numeric fields: median and interquartile range (IQR)
- String fields: distinct count, min/max length, Shannon entropy
- Datetime fields: earliest timestamp, distinct timestamp count
-
Measure stability
- Track variation of each statistic across the analysis window
- Normalize changes for fair comparison across different scales
-
Apply thresholds and weights
- Each change type has an expected tolerance (e.g., ±10% for numeric medians)
- Variations within tolerance incur little/no penalty
- Larger variations reduce the score proportionally
-
Combine into final score
- 100: Field stayed fully consistent
- 60-90: Mild to moderate changes worth monitoring
- Below 60: Meaningful shift requiring investigation
- 0: Type change detected
Consistency vs. Accuracy
Consistency checks whether a field’s statistical shape and distribution remain stable over time (e.g., numeric medians, string entropy).
Accuracy, by contrast, evaluates whether values are correct and aligned to real-world truths or integrity rules.
Together, they capture different aspects of trustworthiness.
Examples
- Numeric "Price" field with stable median and IQR → Score ~100
- String "Country" field where distinct values double unexpectedly → Score ~75
- Datetime field with sudden two-year backfill → Score ~60
- ID field alternating between numeric and string types → Score = 0
Precision Dimension
The Precision score evaluates the resolution and granularity of field values against defined quality checks.
How Precision is Calculated
- Scale: 0 to 100 based on the ratio of values meeting precision requirements
- Formula:
(1 - (rows with anomalous values as specified by precision checks / min(scanned rows, container rows))) × 100 - Denominator: Uses the smaller of scanned row count or container row count to prevent score inflation
- Applicable rule types: Range validations, comparisons, mathematical constraints, and temporal boundaries
See Appendix: Precision Rule Types for the full Precision rule type list.
Examples
- Decimal field maintaining required 2-digit precision → Score ~100
- Timestamp field with appropriate granularity (no future dates) → Score ~95
- Age field with values outside valid range (0-120) → Score ~85
Accuracy Dimension
The Accuracy score determines the fidelity of field values to their real-world counterparts or expected values.
How Accuracy is Calculated
- Scale: 0 to 100 based on the overall anomaly rate across all data integrity (excludes metadata checks like schema, volume, freshness, etc..) check types
- Formula:
(1 - (rows with anomalous values as specified by accuracy checks / min(scanned rows, container rows))) × 100 - Denominator: Uses the smaller of scanned row count or container row count to prevent score inflation
- Comprehensive: Considers anomalies from all data integrity rule types
- Represents: Overall correctness and trustworthiness of the field data
Interpretation
- 95-100: Highly accurate data suitable for critical decisions
- 80-94: Generally reliable with some known issues
- 60-79: Moderate accuracy requiring validation for important uses
- Below 60: Significant accuracy concerns requiring remediation
Timeliness & Volumetrics Dimensions
Both the Timeliness and Volumetrics dimensions are measured at the container level as described below. Field-level scores are inherited from their container-level scores.
Container-Level Quality Scoring
A container (table, view, file, or other structured data asset or any aggregation of data assets such as assets that share a common tag) receives an overall quality score derived from its constituent fields and additional container-specific metrics.
Which Containers Get a Score?
A container receives a quality score only if it has both a completed profile and a completed scan. Containers that have not been both profiled and scanned display - in the UI and are completely excluded from the datastore aggregate — they do not participate in the calculation.
| Container State | UI Display | In Datastore Aggregate? |
|---|---|---|
| Cataloged only (no profile, no scan) | - | No |
| Profiled only (no scan) | - | No |
| Scanned only (no profile) | - | No |
| Profiled + Scanned, no checks | Numeric score (low) | Yes |
| Profiled + Scanned, with checks | Numeric score | Yes |
Common Misconception
Containers showing - are not scored as 0 and do not drag down your datastore score. They are completely excluded from the aggregate — neither in the numerator nor the denominator. However, a container that was profiled and scanned but has no quality checks will receive a low numeric score (because unmonitored dimensions like coverage and accuracy are penalized heavily) and will participate in the aggregate.
How Container Scores Are Calculated
Your container's total Quality Score starts at a baseline of 70. Each of the eight data quality dimensions then adjusts this baseline:
- Dimension aggregation:
- Completeness: Weighted average of all field completeness scores
- Coverage: Weighted average of field coverage scores, adjusted for scan frequency
- Conformity: Weighted average of field conformity scores, adjusted for schema-level conformity checks
- Consistency: Weighted average of field consistency scores, adjusted for profiling frequency
- Precision: Weighted average of field precision scores
- Accuracy: Weighted average of field accuracy scores
- Timeliness: Calculated using process described below
- Volumetrics: Calculated using process described below
-
Multiplicative adjustment: Each dimension applies a multiplicative factor to the baseline. Dimensions with strong quality signals boost the score slightly, while dimensions with poor signals or missing data can reduce it significantly.
score = baseline × f(coverage) × f(accuracy) × f(conformity) × f(precision) × f(consistency) × f(completeness) × f(timeliness) × f(volumetrics)Each factor is bounded — no single dimension can take over the entire score. The system caps both the maximum penalty and the maximum boost per dimension.
-
Weight controls: Higher weights make dimensions more influential; setting a weight to zero removes that dimension's effect entirely
- Missing value handling: When a dimension cannot be measured (e.g., no checks of that type exist), the system applies a default assumption. For some dimensions this means "assumed fine" (no penalty), while for others it means "unverified" (score reduction). See How Dimensions Influence the Score for details.
- Special case: If only one dimension is weighted, the Quality Score mirrors that dimension's rating directly
- Final clipping: Result is always constrained between 0 and 100
How Dimensions Influence the Score
Each dimension has a bounded range of influence — it can penalize the score when quality is poor and boost it slightly when quality is strong. The system is calibrated so that some dimensions have much more impact than others, reflecting their importance to overall data trustworthiness.
| Dimension | Influence Level | When No Checks Exist |
|---|---|---|
| Coverage | High | Score drops significantly — you can't trust what you don't measure |
| Accuracy | High | Score drops significantly — unvalidated data is assumed unreliable |
| Consistency | High | Score drops significantly — without profiling history, stability is unknown |
| Conformity | Low | No penalty — absence of format-specific checks doesn't imply bad data |
| Precision | Low | No penalty — absence of granularity checks doesn't imply bad data |
| Completeness | Moderate | Small penalty — completeness is measured directly from profiling, not checks |
| Timeliness | Moderate | Small penalty — without freshness checks, a slight confidence reduction is applied |
| Volumetrics | Moderate | Small penalty — without volumetric checks, a slight confidence reduction is applied |
Why Coverage and Accuracy Have the Biggest Impact
Coverage and Accuracy are the two most influential dimensions because they represent fundamental aspects of data governance:
- Coverage measures whether you are actively monitoring your data at all. A field with zero quality checks provides zero assurance — the score reflects this lack of observability.
- Accuracy measures whether your data passes the checks you've defined. If no accuracy-relevant checks exist, the system cannot confirm data correctness, so confidence is low.
By contrast, Conformity and Precision address specific formatting and granularity concerns. The absence of these checks doesn't mean the data is wrong — it simply means those particular aspects aren't being monitored. The system gives you the benefit of the doubt for these dimensions.
Consistency is also heavily weighted because it detects data drift. Without sufficient profiling history, the system cannot confirm that your data is stable, which reduces confidence.
Why a 70-Point Baseline?
The 70-point baseline represents a neutral confidence starting point.
- Dimensions then adjust the baseline downward when issues are found or upward when strong quality signals exist.
- This calibration ensures that new containers without extensive checks or history begin from a reasonable midpoint rather than 0.
Timeliness Dimension
The Timeliness score gauges whether data is available according to its expected schedule.
How Timeliness is Calculated
- Scale: 0 to 100 based on adherence to freshness requirements
- Field level: Directly inherited from the container's timeliness score
- Anomaly counting: Counts distinct anomalies from the relevant check types within the measurement period (cutoff date)
- Scoring model: Scores start at 100 and decrease based on anomaly count
- The first anomaly has the largest impact
- Each additional anomaly has diminishing impact (exponential decay)
- As anomaly count grows, the score approaches 0 but the marginal penalty per anomaly shrinks
- Applicable rule types: Time distribution size, freshness constraints
See Appendix: Timeliness Rule Types for the full Timeliness rule type list.
Score Interpretation
- 100: No timeliness anomalies detected
- 60–80: A small number of anomalies detected — worth investigating
- 40–60: Multiple anomalies indicating recurring timeliness issues
- Below 40: Significant and frequent anomaly counts indicating serious issues
- None/Null: No checks of this type configured (unmeasured)
Volumetrics Dimension
The Volumetrics score analyzes consistency in data size and shape over time.
Shared Scoring Formula
Timeliness and Volumetrics both use the same exponential penalty formula for anomaly counts. This consistency ensures comparable scoring behavior across dimensions, even though the anomalies being measured differ.
How Volumetrics is Calculated
- Scale: 0 to 100 based on volumetric stability
- Field level: Directly inherited from the container's volumetrics score
- Anomaly counting: Counts distinct anomalies from the relevant check types within the measurement period (cutoff date)
- Scoring model: Scores start at 100 and decrease based on anomaly count
- The first anomaly has the largest impact
- Each additional anomaly has diminishing impact (exponential decay)
- As anomaly count grows, the score approaches 0 but the marginal penalty per anomaly shrinks
- Applicable rule types: Row count size, partition size constraints
See Appendix: Volumetric Rule Types for the full Volumetric rule type list.
Examples
- Container with consistent record counts per partition → Score ~100
- Container showing unexpected spikes or drops in volume → Score ~75
- Container with erratic or missing time distributions → Score ~50
Additional Container-Level Factors
Beyond the eight dimensions, containers incorporate:
- Scanning frequency: More frequent scanning improves confidence and boosts coverage scores. Infrequent scanning reduces the coverage modifier; daily or more frequent scanning maximizes it. This applies when both metadata and integrity scans exist.
- Profiling frequency: Regular profiling ensures statistics remain current and boosts consistency scores. Weekly or more frequent profiling gives the maximum boost. The consistency score is capped at 100 after the modifier is applied.
- Field tag weights: Field weights (derived from tags) are used when calculating weighted averages for container-level dimensions. Fields with higher-weight tags have more influence on their container's dimension scores.
Most Impactful Dimensions
While specific scoring weights can be customized, dimensions that typically most influence quality scores are:
- Coverage: Asserting frequent, comprehensive quality checks is critical
- Accuracy: Large volumes of anomalies severely impact scores
- Consistency: Erratic or unstable data characteristics reduce confidence
Datastore-Level Quality Scoring
The datastore quality score is a weighted average of all scored containers:
Only containers with a score participate. Containers that have not been both profiled and scanned are excluded from both the numerator and the denominator.
How Tags Affect the Datastore Score
Each container's weight in the aggregate formula is derived from its tags:
container weight = sum(tag weight modifiers on this container), normalized so the minimum is at least 1
- A container with no tags has weight = 1 (default)
- Tags with positive weight modifiers increase a container's influence on the datastore score
- Tags with negative weight modifiers decrease a container's influence
Example: If you tag your 2 most important tables with a Critical (weight: +10) tag, each gets weight ~11 while untagged tables remain at weight 1. Your important tables now dominate the aggregate.
Historical Daily Scores
Daily datastore scores use the last 10 unique dates with quality score activity. For each date, each container's latest score up to that date is used (not just scores from that exact day), providing a continuous trend even when not all containers are scored daily.
Worked Example: Understanding an Unexpected Datastore Score
Scenario
- A datastore has 100 cataloged tables
- The user is actively monitoring 2 tables (one scores 100, one scores 0)
- The datastore score shows 8
- The user expects ~50 (the average of 100 and 0)
What's Actually Happening
Since containers showing - are excluded, a score of 8 means more than 2 tables have scores. The most common cause is running a profile and scan operation on the entire datastore (not just the 2 monitored tables). Every table that was both profiled and scanned now has a score — even tables nobody intended to monitor.
Profiled+scanned tables with no checks score very low. With no quality checks defined, each table's score is heavily penalized by several dimensions:
- Coverage: No checks defined → treated as unmonitored → significant penalty
- Accuracy: No accuracy checks → data correctness is unverified → significant penalty
- Consistency: No profiling history → stability is unknown → significant penalty
- Completeness: Measured from profiling data → minimal penalty
- Conformity: No conformity checks → assumed fine → no penalty
- Precision: No precision checks → assumed fine → no penalty
- Timeliness: No freshness checks → slight confidence reduction → small penalty
- Volumetrics: No volumetric checks → slight confidence reduction → small penalty
The combined effect of these penalties brings an unchecked table's score down to roughly 10–15 (from the 70-point baseline). When 98 tables score this low and only 2 monitored tables score 100 and 0, a datastore score of 8 is entirely plausible.
Recommended Actions
When your datastore score is lower than expected because unmonitored tables are pulling down the average, here are your options:
| Approach | Effort | Description |
|---|---|---|
| Explore + tag filter | Low | Create a tag (e.g., "Monitored"), apply it to your tables, and filter by it in Explore. The aggregate score shown will only include those tables. |
| Tag weights | Low | Apply a high-weight tag (e.g., +10) to your monitored tables so they dominate the datastore aggregate. |
| Dimension weights | Low | Set Coverage and/or Accuracy weight to 0 in Settings → Score to remove the "no checks defined" penalty for all tables. |
| Shorten decay period | Low | Reduce the decay period (default 180 days) to 60–90 days so tables profiled long ago drop out of the aggregate. |
| Selective profiling/scanning | Medium | Only run profile and scan operations on the tables you intend to monitor. This prevents the problem entirely. |
Root Cause
The most common cause of unexpectedly low datastore scores is a datastore-wide profile + scan during initial setup. This gives every table a score — even ones nobody intended to monitor. Those check-less tables receive very low scores and flood the average. Going forward, consider running operations on selected tables only.
How to Interpret and Use Quality Scores
Quality scores are dynamic measures of confidence that reflect the intrinsic quality of your data. It's important to recognize that different types of data will have varying levels of inherent quality. To illustrate this point, let's consider a standard mailing address in the USA. A typical schema representing a mailing address includes fields such as:
- Addressee
- Street
- Street 2
- City
- State
- Postal Code
The "State" field, which is naturally constrained by a limited set of known values, will inherently have a higher level of quality compared to the "Street 2" field. "Street 2" typically holds free-form text ranging from secondary unit designations to "care of" instructions and may often be left blank. In contrast, "State" is a required field for any valid mailing address.
Consider the level of confidence you would have in making business decisions based on the values held in the "State" field versus the "Street 2" field. This thought exercise demonstrates how the Qualytics Quality Score (with default configuration) should be interpreted.
While there are steps you can take to improve the quality score of the "Street 2" field, it would be unrealistic to expect it to meet the same standards as the "State" field. Instead, your efforts should focus on the change in measured quality score over time, with the goal of raising scores to an acceptable level of quality that meets your specific business needs.
To further explore how to respond to Quality Scores, let's consider the business requirements for capturing "Street 2" and its downstream use:
-
If the primary use case for this address is to support credit card payment processing, where "Street 2" is rarely, if ever, considered, there may be no business need to focus on improving the quality of this field over time. In this case, you can reduce the impact of this field on the overall measured quality of the Address by applying a Tag with a negative weight modifier.
-
On the other hand, if the primary use case for this address is to reliably ship a physical product to an intended recipient, ensuring a higher level of quality for the "Street 2" field becomes necessary. In this scenario, you may take actions such as defining additional data quality checks for the field, increasing the frequency of profiling and scanning, establishing a completeness goal, and working with upstream systems to enforce it over time.
Important
The key to effectively adopting Qualytics's Quality Scores into your data quality management efforts is to understand that it reflects both the intrinsic quality of the data and the steps taken to improve confidence that the data is fit for your specific business needs.
Fitness for Purpose in Practice
Remember: Quality Scores are not absolute “grades.” They reflect how well your data is suited for its intended business use, influenced by weighting, tagging, and anomaly detection. Two datasets may have different scores but still both be "fit for purpose" depending on use case.
Customizing Quality Score Weights and Decay Time
The default quality score weightings and decay time represent best practice considerations as codified by the data quality experts at Qualytics and our work with enterprises of all shapes, sizes, and sectors. We recommend that both be left in their default state for all customers and use cases.
That said, we recognize that customers may desire to alter our default scoring algorithms for a variety of reasons, and we support that optionality by allowing administrators to tailor the impact of each quality dimension on the total score by adjusting their weights. This alters the scoring algorithm to align with customized governance priorities. Additionally, the decay period for considering past data events defaults to 180 days but can be customized to fit your operational needs, ensuring the scores reflect the most relevant data quality insights for your organization.
Use Caution When Customizing Weights
We strongly recommend retaining default weights unless governance priorities clearly justify changes.
- Adjusting weights can significantly alter how anomalies impact overall scores.
- Misaligned weights may cause misleading signals about data quality.
Proceed carefully, and document any custom weighting rationale.
Quality Score Settings Reference
The following settings are configurable at the container or datastore level. Container settings override datastore settings.
| Setting | Default | Description |
|---|---|---|
| Decay Period | 180 days | How far back to look for profiles and scans |
| Coverage Weight | 1.0 | Coverage dimension weight (0 = disable) |
| Accuracy Weight | 1.0 | Accuracy dimension weight |
| Conformity Weight | 1.0 | Conformity dimension weight |
| Precision Weight | 1.0 | Precision dimension weight |
| Consistency Weight | 1.0 | Consistency dimension weight |
| Completeness Weight | 1.0 | Completeness dimension weight |
| Timeliness Weight | 1.0 | Timeliness dimension weight |
| Volumetrics Weight | 1.0 | Volumetrics dimension weight |
Setting any weight to 0 disables that dimension's penalty. The factor becomes a slight boost instead.
Single-Dimension Mode
If exactly one dimension has a non-zero weight (all others set to 0), the container score directly mirrors that dimension's score instead of using the multiplicative formula. This is useful when you only care about one aspect of quality.
Appendix: Rule Types
The following lists summarize which rule types contribute to each dimension’s quality score.
Conformity Rule Types
| No. | Rule Type |
|---|---|
| 1. | Matches Pattern |
| 2. | Min Length |
| 3. | Max Length |
| 4. | Data Diff |
| 5. | Is Type |
| 6. | Entity Resolution |
| 7. | Expected Schema |
| 8. | Field Count |
| 9. | Is Credit Card |
| 10. | Is Address |
| 11. | Contains Credit Card |
| 12. | Contains URL |
| 13. | Contains Email |
| 14. | Contains Social Security Number |
Precision Rule Types
| No. | Rule Type |
|---|---|
| 1. | After Date Time |
| 2. | Before Date Time |
| 3. | Between |
| 4. | Between Times |
| 5. | Equal To |
| 6. | Equal To Field |
| 7. | Greater Than |
| 8. | Greater Than Field |
| 9. | Less Than |
| 10. | Less Than Field |
| 11. | Max Value |
| 12. | Min Value |
| 13. | Not Future |
| 14. | Not Negative |
| 15. | Positive |
| 16. | Predicted By |
| 17. | Sum |
Volumetric Rule Types
| No. | Rule Type |
|---|---|
| 1. | Volumetric |
| 2. | Min Partition Size |
| 3. | Max Partition Size |
Timeliness Rule Types
| No. | Rule Type |
|---|---|
| 1. | Freshness |
| 2. | Time Distribution Size |