Entity Resolution API

The Entity Resolution check is created and managed through the standard Quality Checks API by setting rule to entityResolution. The check is multi-field: rather than listing fields under fields, you list one entry per evaluated field under properties.target_fields and pick the distinction field under properties.distinct_field_name. The fields array on the check itself is auto-populated from target_fields and can be sent as an empty list.

Tip

For complete API documentation, including request and response schemas, visit the API docs.

Endpoints

Method	Path	Purpose
`POST`	`/api/quality-checks`	Create a new Entity Resolution check.
`GET`	`/api/quality-checks/{id}`	Retrieve an Entity Resolution check by ID.
`PUT`	`/api/quality-checks/{id}`	Update an existing Entity Resolution check.
`DELETE`	`/api/quality-checks/{id}`	Delete (or archive) an Entity Resolution check.

Permission: Author (or above) on the target container's team for POST, PUT, and DELETE; Reporter (or above) for GET.

Payload Example

Create a multi-field Entity Resolution check on full_name (fuzzy) and address (fuzzy), distinguished by customer_id, with POST /api/quality-checks:

{
    "description": "Customers with similar names and addresses must share a customer_id",
    "rule": "entityResolution",
    "fields": [],
    "container_id": 145,
    "filter": null,
    "properties": {
        "distinct_field_name": "customer_id",
        "composite_match_threshold": 0.75,
        "target_fields": [
            {
                "upickle_type": "StringTargetField",
                "field_name": "full_name",
                "match_type": "fuzzy",
                "pair_substrings": true,
                "pair_homophones": false,
                "consider_term_frequency": false,
                "weight": 1.0
            },
            {
                "upickle_type": "StringTargetField",
                "field_name": "address",
                "match_type": "fuzzy",
                "pair_substrings": false,
                "pair_homophones": false,
                "consider_term_frequency": false,
                "weight": 0.8
            }
        ]
    },
    "tags": ["pii", "master-data"],
    "additional_metadata": {"jira": "DATA-4101"},
    "anomaly_message_field": null,
    "template_id": null,
    "status": "Active",
    "owner_id": 7,
    "default_anomaly_assignee_id": 12
}

Top-Level Field Notes

Field	Required	Notes
`description`	Yes	Free-text description shown in the UI.
`rule`	Yes	Must be `"entityResolution"`.
`fields`	Yes	Send `[]`. The list of evaluated fields is computed from `properties.target_fields`.
`container_id`	Yes	ID of the container (table or file) the check runs against.
`filter`	No	Spark SQL `WHERE` expression. Applied before entity resolution runs, so only filtered rows are clustered. Send `null` for no filter.
`properties.distinct_field_name`	Yes	Name of the field that must hold a single value within each resolved entity cluster. Accepted types: `Integral`, `Fractional`, `Boolean`, `String`, `Date`, `Timestamp`.
`properties.composite_match_threshold`	Yes	Fractional value between `0.0` and `1.0`. Pairs whose weighted composite score is greater than or equal to this value are treated as matches. Default `0.7`.
`properties.target_fields`	Yes	Non-empty array. Each entry configures one field with its `match_type`, `weight`, and (for strings) optional substring/homophone/term-frequency knobs. See Target Field Notes below.
`tags`	No	List of tag names applied to the check for filtering and organization.
`additional_metadata`	No	Free-form key-value pairs (typically links to catalog, tickets, governance records).
`anomaly_message_field`	No	Name of a source-record field whose value should be used as the anomaly message instead of the system-generated one. Not applicable to Entity Resolution: because the rule emits only Shape Anomalies (which use a fixed message template), this field is silently ignored. Send `null`.
`template_id`	No	ID of a Check Template to associate the check with. `null` if not using a template.
`status`	No	`"Active"` (default) or `"Draft"`. Draft checks are not evaluated by Scans.
`owner_id`	No	ID of the user who owns the check. Defaults to the user creating the check when omitted.
`default_anomaly_assignee_id`	No	ID of the user automatically assigned to anomalies produced by the check.

Coverage is not supported

Entity Resolution does not accept a coverage value. The rule evaluates clusters as compliant or non-compliant; there is no fractional tolerance to set.

Target Field Notes

Each entry in target_fields is one of three shapes, identified by its upickle_type discriminator: "StringTargetField", "NumericTargetField", or "DateTimeTargetField". The platform validates that the declared upickle_type matches the actual data type of the field on the container.

String Target Field

{
    "upickle_type": "StringTargetField",
    "field_name": "full_name",
    "match_type": "fuzzy",
    "pair_substrings": true,
    "pair_homophones": false,
    "consider_term_frequency": false,
    "weight": 1.0
}

Field	Required	Notes
`upickle_type`	Yes	Must be `"StringTargetField"`. Identifies the shape so the platform can deserialize this entry.
`field_name`	Yes	Name of the string field on the container.
`match_type`	No	`"fuzzy"` (default) or `"exact"`. `exact` turns the field into a blocking pre-filter: pairs disagreeing on this field are never scored.
`pair_substrings`	No	When `true`, a pair where one string contains the other scores `1.0` on this field. Default `false`. Applies only to `fuzzy`.
`pair_homophones`	No	When `true`, a pair whose values sound alike (phonetic similarity) scores `1.0` on this field. Default `false`. Applies only to `fuzzy`.
`consider_term_frequency`	No	When `true`, rare tokens carry more weight than common tokens. Default `false`. Applies only to `fuzzy`.
`weight`	No	Non-negative number. Controls this field's contribution to the composite score. Default `1.0`. Ignored when `match_type` is `exact`.

Numeric Target Field

{
    "upickle_type": "NumericTargetField",
    "field_name": "phone_number",
    "match_type": "absolute",
    "offset": 0.0,
    "weight": 1.0
}

Field	Required	Notes
`upickle_type`	Yes	Must be `"NumericTargetField"`. Identifies the shape so the platform can deserialize this entry.
`field_name`	Yes	Name of the numeric field (Integral or Fractional) on the container.
`match_type`	No	`"absolute"` (default), `"relative"`, or `"exact"`. `"absolute"` compares with a fixed `offset`; `"relative"` compares with a percentage tolerance (e.g. `0.05` for 5%); `"exact"` turns the field into a blocking pre-filter.
`offset`	No	Non-negative numeric tolerance. With `match_type: "absolute"`, the pair scores `1.0` if `\|a − b\| ≤ offset`, otherwise `0.0`. With `match_type: "relative"`, the value is interpreted as a fraction (e.g. `0.05` for 5%). Default `0.0`.
`weight`	No	Non-negative number controlling contribution to the composite. Default `1.0`. Ignored when `match_type` is `exact`.

Datetime Target Field

{
    "upickle_type": "DateTimeTargetField",
    "field_name": "registered_at",
    "match_type": "offset",
    "offset_seconds": 3600,
    "weight": 1.0
}

Field	Required	Notes
`upickle_type`	Yes	Must be `"DateTimeTargetField"`. Identifies the shape so the platform can deserialize this entry.
`field_name`	Yes	Name of the Date or Timestamp field on the container.
`match_type`	No	`"offset"` (default), `"granularity"`, or `"exact"`. `"offset"` compares within a number of seconds; `"granularity"` compares whether both timestamps fall in the same bucket; `"exact"` turns the field into a blocking pre-filter.
`offset_seconds`	No	Non-negative integer tolerance in seconds. Applies when `match_type` is `"offset"`: the pair scores `1.0` if the two timestamps are within `offset_seconds` of each other. Default `0`.
`granularity`	No	Bucket applied before comparison. Applies when `match_type` is `"granularity"`. Accepted values: `"Day"`, `"Week"`, `"Month"`, `"Year"`. Omit (or send `null`) when `match_type` is not `"granularity"`.
`weight`	No	Non-negative number controlling contribution to the composite. Default `1.0`. Ignored when `match_type` is `exact`.

Introduction: formal definition, target field types, field scope, and general/anomaly properties.
How It Works: full semantics, clustering behavior, threshold tuning, and source-records behavior.
Examples: three production scenarios with sample data, source records, and resulting anomalies.
FAQ: short answers to the most frequent questions.