Build a computed join
Goal
Join two existing containers (tables or computed tables) into a new virtual container, without writing the join SQL by hand. Computed joins are how you analyze cross-table relationships in Qualytics: for example, joining orders with customers so a quality check can validate that every order has a non-null customer email.
Permissions
| Step | Endpoint | Role | Team permission |
|---|---|---|---|
| Validate the join shape | POST /api/containers/validate |
Member |
N/A |
| Create the join | POST /api/containers |
Member |
Editor on the datastore's team |
| Profile the new join | POST /api/operations/run (profile) |
Member |
Editor |
Prerequisites
- The CLI is installed and authenticated.
- Both source containers exist in the same datastore.
- You know the join keys (left field and right field) and the desired join type (
inner,left,right,full).
CLI workflow
graph LR
L[(Left container: orders)] --> J[Computed join]
R[(Right container: customers)] --> J
J --> Profile[Profile]
Profile --> Checks[Add quality checks]
1. Validate the join shape (optional but recommended)
qualytics containers validate \
--type computed_join \
--datastore-id 42 \
--left-container-id 100 \
--right-container-id 105 \
--left-key-field "customer_id" \
--right-key-field "id" \
--select-clause "ord_id, ord_total, cust_email, cust_name" \
--left-prefix "ord_" \
--right-prefix "cust_" \
--join-type inner
A successful validation confirms the field references resolve and the SQL is parseable.
2. Create the join
qualytics containers create \
--type computed_join \
--name "orders_with_customers" \
--datastore-id 42 \
--left-container-id 100 \
--right-container-id 105 \
--left-key-field "customer_id" \
--right-key-field "id" \
--left-prefix "ord_" \
--right-prefix "cust_" \
--join-type inner \
--select-clause "ord_id, ord_total, cust_email, cust_name" \
--description "Orders joined with their customer records" \
--tags "joins,orders"
3. Profile the join
4. Add a quality check on the joined view
For example, "every order must have a non-null customer email":
# checks/orders_with_customers/email_not_null.yaml
rule_type: isNotNull
container: orders_with_customers
fields:
- cust_email
status: Active
Behind the scenes
| CLI step | Method | Path | Notes |
|---|---|---|---|
containers validate |
POST | /api/containers/validate |
The body matches the create payload but is rejected at validation time without persisting. |
containers create |
POST | /api/containers |
The CLI sends the join definition; the create endpoint ignores description and tags and applies them in a follow-up PUT. |
containers create (apply description/tags) |
PUT | /api/containers/{container_id} |
Sets description and tags after creation. |
operations profile |
POST | /api/operations/run (profile) |
Standard async operation. |
Python equivalent
import os
import time
import httpx
BASE_URL = os.environ["QUALYTICS_URL"].rstrip("/")
TOKEN = os.environ["QUALYTICS_TOKEN"]
HEADERS = {"Authorization": f"Bearer {TOKEN}"}
with httpx.Client(headers=HEADERS, timeout=60.0) as client:
body = {
"type": "computed_join",
"name": "orders_with_customers",
"datastore_id": 42,
"left_container_id": 100,
"right_container_id": 105,
"left_key_field": "customer_id",
"right_key_field": "id",
"left_prefix": "ord_",
"right_prefix": "cust_",
"join_type": "inner",
"select_clause": "ord_id, ord_total, cust_email, cust_name",
}
# Validate first
client.post(f"{BASE_URL}/api/containers/validate", json=body).raise_for_status()
# Create
r = client.post(f"{BASE_URL}/api/containers", json=body)
r.raise_for_status()
container_id = r.json()["id"]
# Apply description and tags
client.put(f"{BASE_URL}/api/containers/{container_id}", json={
"description": "Orders joined with their customer records",
"tags": ["joins", "orders"],
}).raise_for_status()
# Profile
op = client.post(f"{BASE_URL}/api/operations/run", json={
"type": "profile",
"datastore_ids": [42],
"container_ids": [container_id],
}).json()
while True:
s = client.get(f"{BASE_URL}/api/operations/{op['id']}").json()
if s["result"] in ("success", "failure", "aborted"):
print(f"profile: {s['result']}")
break
time.sleep(10)
Variations and advanced usage
Outer joins
--join-type left # all rows from left, matching from right (or NULL)
--join-type right
--join-type full
Useful when you want to find rows in one container that have no match in the other (orphans).
Multi-key joins
The CLI flags accept a single key per side. For composite keys, define a computed table on each side that concatenates the keys, then join on those.
Computed join on top of computed tables
You can join two computed tables, or a computed table with a regular table. The pattern is identical; just pass the IDs of whichever containers you have.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
validate fails with Field not found |
Field name in --select-clause doesn't exist on either side |
Use qualytics containers get --id 100 --profiles to confirm field names. |
| Profile completes but the join has zero rows | Inner join misses (no matching keys) | Try --join-type left to see how many left rows have no match. |
| Materialize fails after the join was created | Materialization needs the source database to support the SQL Qualytics emits | Some warehouses (e.g., older Hive) restrict join syntax; check with materialize against a smaller container first. |
| Description and tags don't appear in the response | The create endpoint ignores those fields; the CLI applies them via a follow-up PUT | Confirm the PUT call ran (check verbose output); on Python, send the PUT explicitly. |
Related
- Containers command reference
- Bulk import computed tables from CSV: for SQL-defined containers (not joins).
- Quality Checks command reference: write checks against the joined view.