Entity Resolution API
The Entity Resolution check is created and managed through the standard Quality Checks API by setting rule to entityResolution. The check is multi-field: rather than listing fields under fields, you list one entry per evaluated field under properties.target_fields and pick the distinction field under properties.distinct_field_name. The fields array on the check itself is auto-populated from target_fields and can be sent as an empty list.
Tip
For complete API documentation, including request and response schemas, visit the API docs.
Endpoints
| Method | Path | Purpose |
|---|---|---|
POST |
/api/quality-checks |
Create a new Entity Resolution check. |
GET |
/api/quality-checks/{id} |
Retrieve an Entity Resolution check by ID. |
PUT |
/api/quality-checks/{id} |
Update an existing Entity Resolution check. |
DELETE |
/api/quality-checks/{id} |
Delete (or archive) an Entity Resolution check. |
Permission: Author (or above) on the target container's team for POST, PUT, and DELETE; Reporter (or above) for GET.
Payload Example
Create a multi-field Entity Resolution check on full_name (fuzzy) and address (fuzzy), distinguished by customer_id, with POST /api/quality-checks:
{
"description": "Customers with similar names and addresses must share a customer_id",
"rule": "entityResolution",
"fields": [],
"container_id": 145,
"filter": null,
"properties": {
"distinct_field_name": "customer_id",
"composite_match_threshold": 0.75,
"target_fields": [
{
"upickle_type": "StringTargetField",
"field_name": "full_name",
"match_type": "fuzzy",
"pair_substrings": true,
"pair_homophones": false,
"consider_term_frequency": false,
"weight": 1.0
},
{
"upickle_type": "StringTargetField",
"field_name": "address",
"match_type": "fuzzy",
"pair_substrings": false,
"pair_homophones": false,
"consider_term_frequency": false,
"weight": 0.8
}
]
},
"tags": ["pii", "master-data"],
"additional_metadata": {"jira": "DATA-4101"},
"anomaly_message_field": null,
"template_id": null,
"status": "Active",
"owner_id": 7,
"default_anomaly_assignee_id": 12
}
Top-Level Field Notes
| Field | Required | Notes |
|---|---|---|
description |
Yes | Free-text description shown in the UI. |
rule |
Yes | Must be "entityResolution". |
fields |
Yes | Send []. The list of evaluated fields is computed from properties.target_fields. |
container_id |
Yes | ID of the container (table or file) the check runs against. |
filter |
No | Spark SQL WHERE expression. Applied before entity resolution runs, so only filtered rows are clustered. Send null for no filter. |
properties.distinct_field_name |
Yes | Name of the field that must hold a single value within each resolved entity cluster. Accepted types: Integral, Fractional, Boolean, String, Date, Timestamp. |
properties.composite_match_threshold |
Yes | Fractional value between 0.0 and 1.0. Pairs whose weighted composite score is greater than or equal to this value are treated as matches. Default 0.7. |
properties.target_fields |
Yes | Non-empty array. Each entry configures one field with its match_type, weight, and (for strings) optional substring/homophone/term-frequency knobs. See Target Field Notes below. |
tags |
No | List of tag names applied to the check for filtering and organization. |
additional_metadata |
No | Free-form key-value pairs (typically links to catalog, tickets, governance records). |
anomaly_message_field |
No | Name of a source-record field whose value should be used as the anomaly message instead of the system-generated one. Not applicable to Entity Resolution: because the rule emits only Shape Anomalies (which use a fixed message template), this field is silently ignored. Send null. |
template_id |
No | ID of a Check Template to associate the check with. null if not using a template. |
status |
No | "Active" (default) or "Draft". Draft checks are not evaluated by Scans. |
owner_id |
No | ID of the user who owns the check. Defaults to the user creating the check when omitted. |
default_anomaly_assignee_id |
No | ID of the user automatically assigned to anomalies produced by the check. |
Coverage is not supported
Entity Resolution does not accept a coverage value. The rule evaluates clusters as compliant or non-compliant; there is no fractional tolerance to set.
Target Field Notes
Each entry in target_fields is one of three shapes, identified by its upickle_type discriminator: "StringTargetField", "NumericTargetField", or "DateTimeTargetField". The platform validates that the declared upickle_type matches the actual data type of the field on the container.
String Target Field
{
"upickle_type": "StringTargetField",
"field_name": "full_name",
"match_type": "fuzzy",
"pair_substrings": true,
"pair_homophones": false,
"consider_term_frequency": false,
"weight": 1.0
}
| Field | Required | Notes |
|---|---|---|
upickle_type |
Yes | Must be "StringTargetField". Identifies the shape so the platform can deserialize this entry. |
field_name |
Yes | Name of the string field on the container. |
match_type |
No | "fuzzy" (default) or "exact". exact turns the field into a blocking pre-filter: pairs disagreeing on this field are never scored. |
pair_substrings |
No | When true, a pair where one string contains the other scores 1.0 on this field. Default false. Applies only to fuzzy. |
pair_homophones |
No | When true, a pair whose values sound alike (phonetic similarity) scores 1.0 on this field. Default false. Applies only to fuzzy. |
consider_term_frequency |
No | When true, rare tokens carry more weight than common tokens. Default false. Applies only to fuzzy. |
weight |
No | Non-negative number. Controls this field's contribution to the composite score. Default 1.0. Ignored when match_type is exact. |
Numeric Target Field
{
"upickle_type": "NumericTargetField",
"field_name": "phone_number",
"match_type": "absolute",
"offset": 0.0,
"weight": 1.0
}
| Field | Required | Notes |
|---|---|---|
upickle_type |
Yes | Must be "NumericTargetField". Identifies the shape so the platform can deserialize this entry. |
field_name |
Yes | Name of the numeric field (Integral or Fractional) on the container. |
match_type |
No | "absolute" (default), "relative", or "exact". "absolute" compares with a fixed offset; "relative" compares with a percentage tolerance (e.g. 0.05 for 5%); "exact" turns the field into a blocking pre-filter. |
offset |
No | Non-negative numeric tolerance. With match_type: "absolute", the pair scores 1.0 if |a − b| ≤ offset, otherwise 0.0. With match_type: "relative", the value is interpreted as a fraction (e.g. 0.05 for 5%). Default 0.0. |
weight |
No | Non-negative number controlling contribution to the composite. Default 1.0. Ignored when match_type is exact. |
Datetime Target Field
{
"upickle_type": "DateTimeTargetField",
"field_name": "registered_at",
"match_type": "offset",
"offset_seconds": 3600,
"weight": 1.0
}
| Field | Required | Notes |
|---|---|---|
upickle_type |
Yes | Must be "DateTimeTargetField". Identifies the shape so the platform can deserialize this entry. |
field_name |
Yes | Name of the Date or Timestamp field on the container. |
match_type |
No | "offset" (default), "granularity", or "exact". "offset" compares within a number of seconds; "granularity" compares whether both timestamps fall in the same bucket; "exact" turns the field into a blocking pre-filter. |
offset_seconds |
No | Non-negative integer tolerance in seconds. Applies when match_type is "offset": the pair scores 1.0 if the two timestamps are within offset_seconds of each other. Default 0. |
granularity |
No | Bucket applied before comparison. Applies when match_type is "granularity". Accepted values: "Day", "Week", "Month", "Year". Omit (or send null) when match_type is not "granularity". |
weight |
No | Non-negative number controlling contribution to the composite. Default 1.0. Ignored when match_type is exact. |
Related
- Introduction: formal definition, target field types, field scope, and general/anomaly properties.
- How It Works: full semantics, clustering behavior, threshold tuning, and source-records behavior.
- Examples: three production scenarios with sample data, source records, and resulting anomalies.
- FAQ: short answers to the most frequent questions.