Skip to content

Bulk-create quality checks

Goal

You have a library of quality check definitions in YAML or JSON and want to create them all in one Qualytics datastore in a single command. This is the fastest way to onboard a set of standardized checks (for example, "every PII column must have isNotNull and matchesPattern") onto a new datastore, or to migrate handwritten checks from a previous data quality tool.

Permissions

The CLI calls POST /api/quality-checks once per check in the file. The endpoint requires:

Layer Minimum Notes
User role Member Manager and Admin also work.
Team permission on the target datastore Author (for status: Active) or Drafter (for status: Draft) The permission check is delegated to the create logic, which compares the check's status against your team permission.

Team membership is required

If you're a Member but not a member of the team that owns the target datastore, every create call will fail with 403 Forbidden. Either join the team or have a Manager create the checks for you.

Prerequisites

  • The CLI is installed and authenticated (run qualytics doctor to confirm).
  • The target datastore exists and has been synced at least once, so the containers (orders, customers, etc.) the checks reference are known to Qualytics.
  • A YAML or JSON file containing one or more check definitions.
# checks/orders/all.yaml
- rule_type: isNotNull
  container: orders
  fields: [order_id]
  description: Order ID must not be null
  coverage: 1.0
  tags: [production, orders]
  status: Active

- rule_type: isUnique
  container: orders
  fields: [order_id]
  status: Active

- rule_type: satisfiesExpression
  container: orders
  fields: [total]
  properties:
    expression: "total >= 0"
  status: Active

Use container names, not IDs

Reference containers by name (container: orders) rather than by ID. The same file then applies to dev, staging, and prod even though their container IDs differ.

CLI workflow

graph LR
    F[checks.yaml] --> CLI[qualytics checks create]
    CLI -->|per check| API[POST /api/quality-checks]
    API --> DS[(Target datastore)]
    CLI --> S[Per-check status output]
qualytics checks create \
    --datastore-id 42 \
    --file ./checks/orders/all.yaml

Sample output:

Loading 3 check definitions from ./checks/orders/all.yaml
[1/3] Creating isNotNull on orders.order_id ...  OK (id=501)
[2/3] Creating isUnique on orders.order_id ...   OK (id=502)
[3/3] Creating satisfiesExpression on orders.total ... OK (id=503)
3 created, 0 failed

Override the owner or default anomaly assignee for every check in the file:

qualytics checks create \
    --datastore-id 42 \
    --file ./checks/orders/all.yaml \
    --owner-id 18 \
    --default-anomaly-assignee-id 12

Behind the scenes

Each check in the file produces one API call:

CLI step Method Path Notes
Resolve target datastore GET /api/datastores/{datastore_id} Looks up the target so the CLI can resolve container names to IDs.
Resolve container names GET /api/containers?datastore_id={id} One per unique container name in the file.
Create each check POST /api/quality-checks One call per check. The body matches the YAML entry, with container_id resolved.

Failure at any step is surfaced inline; subsequent checks are still attempted.

Python equivalent

The same workflow done programmatically. Useful if you want to integrate Qualytics into an existing Python automation that already loads YAML and you don't want to shell out:

import os
import httpx
import yaml

BASE_URL = os.environ["QUALYTICS_URL"].rstrip("/")
TOKEN    = os.environ["QUALYTICS_TOKEN"]
DATASTORE_ID = 42

headers = {"Authorization": f"Bearer {TOKEN}"}

def resolve_container_id(client: httpx.Client, name: str) -> int:
    r = client.get(
        f"{BASE_URL}/api/containers",
        params={"datastore_id": DATASTORE_ID, "name": name},
    )
    r.raise_for_status()
    items = r.json()
    if not items:
        raise RuntimeError(f"Container '{name}' not found in datastore {DATASTORE_ID}")
    return items[0]["id"]

def create_check(client: httpx.Client, check: dict) -> dict:
    payload = {
        "datastore_id": DATASTORE_ID,
        "container_id": resolve_container_id(client, check["container"]),
        "rule_type":   check["rule_type"],
        "fields":      check["fields"],
        "description": check.get("description"),
        "coverage":    check.get("coverage", 1.0),
        "filter":      check.get("filter"),
        "properties":  check.get("properties", {}),
        "tags":        check.get("tags", []),
        "status":      check.get("status", "Active"),
    }
    r = client.post(f"{BASE_URL}/api/quality-checks", json=payload)
    r.raise_for_status()
    return r.json()

with open("checks/orders/all.yaml") as f:
    checks = yaml.safe_load(f)

with httpx.Client(headers=headers, timeout=30.0) as client:
    for i, check in enumerate(checks, start=1):
        try:
            created = create_check(client, check)
            print(f"[{i}/{len(checks)}] OK id={created['id']}")
        except httpx.HTTPStatusError as e:
            print(f"[{i}/{len(checks)}] FAILED: {e.response.status_code} {e.response.text}")

Why the CLI is still usually the right tool

The CLI gives you _qualytics_check_uid tracking, idempotent re-runs, dry-run preview, and per-team permission errors with friendly messages. The Python equivalent above is a starting point; it doesn't reproduce all of that. Reach for Python only when you genuinely need to embed the workflow into a larger application.

Variations and advanced usage

One file with all checks vs. one file per check

The file can be a single dict (one check) or a list of dicts (many checks). Many teams keep one file per check on disk and concatenate them at runtime, which makes git diffs precise:

yq ea '. as $i ireduce ([]; . + [$i])' checks/orders/*.yaml > /tmp/orders-all.yaml
qualytics checks create --datastore-id 42 --file /tmp/orders-all.yaml

Bulk-create across multiple datastores

checks create writes to one datastore at a time. To target many datastores, prefer checks import, which accepts --datastore-id repeated and uses upsert semantics:

qualytics checks import \
    --datastore-id 42 \
    --datastore-id 43 \
    --datastore-id 44 \
    --input ./checks/

See Promote checks Dev to Prod for the full multi-environment flow.

Drafts vs. active checks

If your governance flow requires human review before a check runs, set status: Draft in YAML. Drafts are created but skipped by scan until activated:

qualytics checks activate --ids "501,502,503"

Troubleshooting

Symptom Likely cause Fix
403 Forbidden on every create You're not a member of the datastore's team, or your team permission is below Author qualytics teams list and confirm membership; ask a Manager to grant Author permission.
Container 'X' not found The datastore has not been synced, or the container name in the YAML is wrong qualytics operations sync --datastore-id 42 first; confirm the name with qualytics containers list --datastore-id 42.
Some checks succeed, others 422 Rule-specific properties are missing or malformed Read the error message; it names the offending field. The most common cause is missing properties.expression for satisfiesExpression.
Every call hangs for 30s then fails Wrong QUALYTICS_URL or VPN required qualytics doctor to confirm reachability; check auth status.
400 Bad Request: status not allowed You set status: Active but only have Drafter team permission Either change to status: Draft, or escalate your team permission.