Skip to content

Expected Values Check FAQ

Common questions about how the Expected Values check evaluates values, how it handles NULLs and types, and how anomalies are reported.

Behavior

How are NULLs treated?

NULL values pass the check. The engine evaluates column IS NULL OR column IN (list), so a row with a NULL in the selected field is never reported as a violation. If missing values are also a problem, pair Expected Values with a Not Null check on the same field.

Is matching case-sensitive?

Yes. "O" does not match "o". The check uses Spark's IN operator, which compares strings exactly. If you need case-insensitive matching, normalize the data upstream or use a Satisfies Expression check with lower(field) IN ('o', 'f', 'p').

Is whitespace trimmed?

No. " O " does not match "O". The Authored Check modal shows a warning when an entry has leading or trailing whitespace, but it does not trim the entry for you. Clean the list before saving the check, or normalize the column upstream.

Does the filter clause run before or after the comparison?

Before. The platform applies the filter first and then evaluates the value comparison only on the rows that pass the filter. Rows outside the filter cannot trigger an Expected Values anomaly.

What happens when the list type does not match the field type?

Spark cannot coerce strings to numbers or booleans through IN, so every non-NULL row in the column will fail the check. For example, an Expected Values check on an integer column with a string list ["1", "2", "3"] flags every row. Always match the list type to the field type: numeric lists for numeric fields, boolean lists for boolean fields, string lists for string/date/timestamp fields.

Array Fields

Can Expected Values evaluate every element of an array?

Yes, for Array[String] fields. The platform auto-enables element-wise evaluation when the target field is an Array[String], equivalent to array_forall(field, element -> element IN (list)). Every element must be in the list; a single out-of-vocabulary element invalidates the row.

Element-wise evaluation is supported only for string element lists today. For other array element types, the check falls back to scalar comparison and is not currently recommended.

Do empty or NULL arrays pass?

Yes. An empty array ([]) has no elements to evaluate and passes by definition. A NULL array also passes (the standard NULL handling rule applies). If you want to flag empty arrays as a violation, pair Expected Values with a separate check (such as a Satisfies Expression on size(field) > 0).

Anomaly Reporting

Does Expected Values produce Record or Shape Anomalies?

Both, controlled by coverage:

  • Coverage = 1 (100%, default): every failing row emits a Record Anomaly with the offending value in the message.
  • Coverage < 1: failures are rolled up into a single Shape Anomaly for the dataset, fired only when the failing fraction exceeds the threshold.

What does the Record Anomaly message look like?

The field '<field>' has value '<value>', which is not in the list of expected values

When a filter is set, the message is followed by [filter: <expression>].

What does the Shape Anomaly message look like?

For the field '<field>', X.XXX% of N records (K) have values not in the list of expected values
  • X.XXX%: the fraction of evaluated rows that failed the comparison.
  • N: the number of rows evaluated (after the filter, if any).
  • K: the number of rows whose value was non-NULL and not in the list.

When a filter is set, the message is followed by [filter: <expression>].

How are source records highlighted in the app?

The platform highlights only the offending cell in the field the check applies to. Other columns in the same row are rendered normally. This matches the rule's semantics: the violation is about a single field's value, not the entire row. The Sample Data tables in Examples mirror this convention.

Configuration

Can I paste a list of values from a spreadsheet?

Yes. The List field in the Authored Check modal accepts pasted values; the platform splits the pasted text on line breaks, trims leading and trailing whitespace from each line, and drops empty lines. This is convenient for importing a vocabulary maintained in a spreadsheet column.

Can I use Expected Values together with a Check Template?

Yes. Set template_id to the ID of an existing Check Template. The template's properties (including the list) are reused by the check, and any future updates to the template propagate to every check linked to it.

Does Custom Anomaly Description work for Expected Values?

Yes, for Record Anomalies. Setting anomaly_message_field to the name of a source-record field uses that field's value as the anomaly message instead of the system-generated one. The field is silently ignored for Shape Anomalies (coverage < 1), because Shape messages use a fixed template.