Skip to content

Build a computed join

Goal

Join two existing containers (tables or computed tables) into a new virtual container, without writing the join SQL by hand. Computed joins are how you analyze cross-table relationships in Qualytics: for example, joining orders with customers so a quality check can validate that every order has a non-null customer email.

Permissions

Step Endpoint Role Team permission
Validate the join shape POST /api/containers/validate Member N/A
Create the join POST /api/containers Member Editor on the datastore's team
Profile the new join POST /api/operations/run (profile) Member Editor

Prerequisites

  • The CLI is installed and authenticated.
  • Both source containers exist in the same datastore.
  • You know the join keys (left field and right field) and the desired join type (inner, left, right, full).

CLI workflow

graph LR
    L[(Left container: orders)] --> J[Computed join]
    R[(Right container: customers)] --> J
    J --> Profile[Profile]
    Profile --> Checks[Add quality checks]
qualytics containers validate \
    --type computed_join \
    --datastore-id 42 \
    --left-container-id 100 \
    --right-container-id 105 \
    --left-key-field "customer_id" \
    --right-key-field "id" \
    --select-clause "ord_id, ord_total, cust_email, cust_name" \
    --left-prefix "ord_" \
    --right-prefix "cust_" \
    --join-type inner

A successful validation confirms the field references resolve and the SQL is parseable.

2. Create the join

qualytics containers create \
    --type computed_join \
    --name "orders_with_customers" \
    --datastore-id 42 \
    --left-container-id 100 \
    --right-container-id 105 \
    --left-key-field "customer_id" \
    --right-key-field "id" \
    --left-prefix "ord_" \
    --right-prefix "cust_" \
    --join-type inner \
    --select-clause "ord_id, ord_total, cust_email, cust_name" \
    --description "Orders joined with their customer records" \
    --tags "joins,orders"

3. Profile the join

qualytics operations profile --datastore-id 42 --container-names "orders_with_customers"

4. Add a quality check on the joined view

For example, "every order must have a non-null customer email":

# checks/orders_with_customers/email_not_null.yaml
rule_type: isNotNull
container: orders_with_customers
fields:
  - cust_email
status: Active
qualytics checks create --datastore-id 42 --file ./checks/orders_with_customers/email_not_null.yaml

Behind the scenes

CLI step Method Path Notes
containers validate POST /api/containers/validate The body matches the create payload but is rejected at validation time without persisting.
containers create POST /api/containers The CLI sends the join definition; the create endpoint ignores description and tags and applies them in a follow-up PUT.
containers create (apply description/tags) PUT /api/containers/{container_id} Sets description and tags after creation.
operations profile POST /api/operations/run (profile) Standard async operation.

Python equivalent

import os
import time
import httpx

BASE_URL = os.environ["QUALYTICS_URL"].rstrip("/")
TOKEN    = os.environ["QUALYTICS_TOKEN"]
HEADERS  = {"Authorization": f"Bearer {TOKEN}"}

with httpx.Client(headers=HEADERS, timeout=60.0) as client:
    body = {
        "type":                "computed_join",
        "name":                "orders_with_customers",
        "datastore_id":        42,
        "left_container_id":   100,
        "right_container_id":  105,
        "left_key_field":      "customer_id",
        "right_key_field":     "id",
        "left_prefix":         "ord_",
        "right_prefix":        "cust_",
        "join_type":           "inner",
        "select_clause":       "ord_id, ord_total, cust_email, cust_name",
    }
    # Validate first
    client.post(f"{BASE_URL}/api/containers/validate", json=body).raise_for_status()
    # Create
    r = client.post(f"{BASE_URL}/api/containers", json=body)
    r.raise_for_status()
    container_id = r.json()["id"]

    # Apply description and tags
    client.put(f"{BASE_URL}/api/containers/{container_id}", json={
        "description": "Orders joined with their customer records",
        "tags":        ["joins", "orders"],
    }).raise_for_status()

    # Profile
    op = client.post(f"{BASE_URL}/api/operations/run", json={
        "type":          "profile",
        "datastore_ids": [42],
        "container_ids": [container_id],
    }).json()
    while True:
        s = client.get(f"{BASE_URL}/api/operations/{op['id']}").json()
        if s["result"] in ("success", "failure", "aborted"):
            print(f"profile: {s['result']}")
            break
        time.sleep(10)

Variations and advanced usage

Outer joins

--join-type left   # all rows from left, matching from right (or NULL)
--join-type right
--join-type full

Useful when you want to find rows in one container that have no match in the other (orphans).

Multi-key joins

The CLI flags accept a single key per side. For composite keys, define a computed table on each side that concatenates the keys, then join on those.

Computed join on top of computed tables

You can join two computed tables, or a computed table with a regular table. The pattern is identical; just pass the IDs of whichever containers you have.

Troubleshooting

Symptom Likely cause Fix
validate fails with Field not found Field name in --select-clause doesn't exist on either side Use qualytics containers get --id 100 --profiles to confirm field names.
Profile completes but the join has zero rows Inner join misses (no matching keys) Try --join-type left to see how many left rows have no match.
Materialize fails after the join was created Materialization needs the source database to support the SQL Qualytics emits Some warehouses (e.g., older Hive) restrict join syntax; check with materialize against a smaller container first.
Description and tags don't appear in the response The create endpoint ignores those fields; the CLI applies them via a follow-up PUT Confirm the PUT call ran (check verbose output); on Python, send the PUT explicitly.