Bulk-create quality checks
Goal
You have a library of quality check definitions in YAML or JSON and want to create them all in one Qualytics datastore in a single command. This is the fastest way to onboard a set of standardized checks (for example, "every PII column must have isNotNull and matchesPattern") onto a new datastore, or to migrate handwritten checks from a previous data quality tool.
Permissions
The CLI calls POST /api/quality-checks once per check in the file. The endpoint requires:
| Layer | Minimum | Notes |
|---|---|---|
| User role | Member |
Manager and Admin also work. |
| Team permission on the target datastore | Author (for status: Active) or Drafter (for status: Draft) |
The permission check is delegated to the create logic, which compares the check's status against your team permission. |
Team membership is required
If you're a Member but not a member of the team that owns the target datastore, every create call will fail with 403 Forbidden. Either join the team or have a Manager create the checks for you.
Prerequisites
- The CLI is installed and authenticated (run
qualytics doctorto confirm). - The target datastore exists and has been synced at least once, so the containers (
orders,customers, etc.) the checks reference are known to Qualytics. - A YAML or JSON file containing one or more check definitions.
# checks/orders/all.yaml
- rule_type: isNotNull
container: orders
fields: [order_id]
description: Order ID must not be null
coverage: 1.0
tags: [production, orders]
status: Active
- rule_type: isUnique
container: orders
fields: [order_id]
status: Active
- rule_type: satisfiesExpression
container: orders
fields: [total]
properties:
expression: "total >= 0"
status: Active
Use container names, not IDs
Reference containers by name (container: orders) rather than by ID. The same file then applies to dev, staging, and prod even though their container IDs differ.
CLI workflow
graph LR
F[checks.yaml] --> CLI[qualytics checks create]
CLI -->|per check| API[POST /api/quality-checks]
API --> DS[(Target datastore)]
CLI --> S[Per-check status output]
Sample output:
Loading 3 check definitions from ./checks/orders/all.yaml
[1/3] Creating isNotNull on orders.order_id ... OK (id=501)
[2/3] Creating isUnique on orders.order_id ... OK (id=502)
[3/3] Creating satisfiesExpression on orders.total ... OK (id=503)
3 created, 0 failed
Override the owner or default anomaly assignee for every check in the file:
qualytics checks create \
--datastore-id 42 \
--file ./checks/orders/all.yaml \
--owner-id 18 \
--default-anomaly-assignee-id 12
Behind the scenes
Each check in the file produces one API call:
| CLI step | Method | Path | Notes |
|---|---|---|---|
| Resolve target datastore | GET | /api/datastores/{datastore_id} |
Looks up the target so the CLI can resolve container names to IDs. |
| Resolve container names | GET | /api/containers?datastore_id={id} |
One per unique container name in the file. |
| Create each check | POST | /api/quality-checks |
One call per check. The body matches the YAML entry, with container_id resolved. |
Failure at any step is surfaced inline; subsequent checks are still attempted.
Python equivalent
The same workflow done programmatically. Useful if you want to integrate Qualytics into an existing Python automation that already loads YAML and you don't want to shell out:
import os
import httpx
import yaml
BASE_URL = os.environ["QUALYTICS_URL"].rstrip("/")
TOKEN = os.environ["QUALYTICS_TOKEN"]
DATASTORE_ID = 42
headers = {"Authorization": f"Bearer {TOKEN}"}
def resolve_container_id(client: httpx.Client, name: str) -> int:
r = client.get(
f"{BASE_URL}/api/containers",
params={"datastore_id": DATASTORE_ID, "name": name},
)
r.raise_for_status()
items = r.json()
if not items:
raise RuntimeError(f"Container '{name}' not found in datastore {DATASTORE_ID}")
return items[0]["id"]
def create_check(client: httpx.Client, check: dict) -> dict:
payload = {
"datastore_id": DATASTORE_ID,
"container_id": resolve_container_id(client, check["container"]),
"rule_type": check["rule_type"],
"fields": check["fields"],
"description": check.get("description"),
"coverage": check.get("coverage", 1.0),
"filter": check.get("filter"),
"properties": check.get("properties", {}),
"tags": check.get("tags", []),
"status": check.get("status", "Active"),
}
r = client.post(f"{BASE_URL}/api/quality-checks", json=payload)
r.raise_for_status()
return r.json()
with open("checks/orders/all.yaml") as f:
checks = yaml.safe_load(f)
with httpx.Client(headers=headers, timeout=30.0) as client:
for i, check in enumerate(checks, start=1):
try:
created = create_check(client, check)
print(f"[{i}/{len(checks)}] OK id={created['id']}")
except httpx.HTTPStatusError as e:
print(f"[{i}/{len(checks)}] FAILED: {e.response.status_code} {e.response.text}")
Why the CLI is still usually the right tool
The CLI gives you _qualytics_check_uid tracking, idempotent re-runs, dry-run preview, and per-team permission errors with friendly messages. The Python equivalent above is a starting point; it doesn't reproduce all of that. Reach for Python only when you genuinely need to embed the workflow into a larger application.
Variations and advanced usage
One file with all checks vs. one file per check
The file can be a single dict (one check) or a list of dicts (many checks). Many teams keep one file per check on disk and concatenate them at runtime, which makes git diffs precise:
yq ea '. as $i ireduce ([]; . + [$i])' checks/orders/*.yaml > /tmp/orders-all.yaml
qualytics checks create --datastore-id 42 --file /tmp/orders-all.yaml
Bulk-create across multiple datastores
checks create writes to one datastore at a time. To target many datastores, prefer checks import, which accepts --datastore-id repeated and uses upsert semantics:
qualytics checks import \
--datastore-id 42 \
--datastore-id 43 \
--datastore-id 44 \
--input ./checks/
See Promote checks Dev to Prod for the full multi-environment flow.
Drafts vs. active checks
If your governance flow requires human review before a check runs, set status: Draft in YAML. Drafts are created but skipped by scan until activated:
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
403 Forbidden on every create |
You're not a member of the datastore's team, or your team permission is below Author |
qualytics teams list and confirm membership; ask a Manager to grant Author permission. |
Container 'X' not found |
The datastore has not been synced, or the container name in the YAML is wrong | qualytics operations sync --datastore-id 42 first; confirm the name with qualytics containers list --datastore-id 42. |
| Some checks succeed, others 422 | Rule-specific properties are missing or malformed | Read the error message; it names the offending field. The most common cause is missing properties.expression for satisfiesExpression. |
| Every call hangs for 30s then fails | Wrong QUALYTICS_URL or VPN required |
qualytics doctor to confirm reachability; check auth status. |
400 Bad Request: status not allowed |
You set status: Active but only have Drafter team permission |
Either change to status: Draft, or escalate your team permission. |
Related
- Quality Checks command reference: every flag for
checks create, plusupdate,list,delete. - Promote checks Dev to Prod: once you have the checks, here is how to move them between environments.
- Audit and clean up draft checks: the natural follow-up if you create with
status: Draft. - Team Permissions: the conceptual model behind the permissions table above.