Daily sync, profile, and scan

Goal

Run the standard data quality pipeline on a schedule: refresh the catalog (sync), refresh statistics and regenerate AI Managed checks (profile), then run all active checks against current data (scan). This is the recurring workflow most production users hit every day.

Permissions

Step	Endpoint	Role	Team permission
Trigger any operation	`POST /api/operations/run`	`Member`	`Editor` on the datastore's team
Poll operation status	`GET /api/operations/{id}`	`Member`	`Reporter`
List anomalies after the scan	`GET /api/anomalies`	`Member`	`Reporter`

Prerequisites

The datastore exists and has been synced at least once.
Your token has Editor permission on the datastore's owning team.
The datastore has at least one active check (otherwise scan runs without producing anomalies). See Bulk-create quality checks.

CLI workflow

graph LR
    Sync[1. sync] --> Profile[2. profile]
    Profile --> Scan[3. scan]
    Scan --> Review[4. List anomalies]

Foreground (good for ad-hoc runs and CI jobs you want to gate on success)

qualytics operations sync     --datastore-id 42
qualytics operations profile  --datastore-id 42
qualytics operations scan     --datastore-id 42
qualytics anomalies   list    --datastore-id 42 --status Active

Each operation prints progress while it runs and exits non-zero on failure.

Background (good when you don't need to wait or you want parallelism)

qualytics operations sync     --datastore-id 42 --background
qualytics operations profile  --datastore-id 42 --background
qualytics operations scan     --datastore-id 42 --background

Then check status whenever you want:

qualytics operations list --datastore-id 42 --status running
qualytics operations get  --id 9876

Across many datastores in one command

operations sync/profile/scan accept a comma-separated list of datastore IDs:

qualytics operations sync    --datastore-id 42,43,44
qualytics operations profile --datastore-id 42,43,44 --ai-effort medium
qualytics operations scan    --datastore-id 42,43,44

Behind the scenes

CLI step	Method	Path	Notes
`operations sync/profile/scan` (start)	POST	`/api/operations/run`	Body includes `type` (`sync`, `profile`, `scan`) and `datastore_ids`. Returns the operation ID.
Poll status	GET	`/api/operations/{operation_id}`	The CLI polls every `--poll-interval` seconds (default 5) up to `--timeout` (default 1800).
List anomalies	GET	`/api/anomalies`	Filterable by datastore, status, container, tag, dates.

Each operation runs asynchronously on the Qualytics side; the CLI is just polling its status.

Python equivalent

import os
import time
import httpx

BASE_URL = os.environ["QUALYTICS_URL"].rstrip("/")
TOKEN    = os.environ["QUALYTICS_TOKEN"]
HEADERS  = {"Authorization": f"Bearer {TOKEN}"}
DATASTORE_ID = 42

def run_op(client, op_type: str, timeout: int = 1800) -> dict:
    r = client.post(
        f"{BASE_URL}/api/operations/run",
        json={"type": op_type, "datastore_ids": [DATASTORE_ID]},
    )
    r.raise_for_status()
    op_id = r.json()["id"]

    deadline = time.monotonic() + timeout
    while time.monotonic() < deadline:
        op = client.get(f"{BASE_URL}/api/operations/{op_id}").json()
        if op["result"] in ("success", "failure", "aborted"):
            return op
        time.sleep(5)
    raise TimeoutError(f"{op_type} did not finish within {timeout}s")

with httpx.Client(headers=HEADERS, timeout=60.0) as client:
    for op_type in ("sync", "profile", "scan"):
        result = run_op(client, op_type)
        print(f"{op_type:>8}: {result['result']}")

    anomalies = client.get(
        f"{BASE_URL}/api/anomalies",
        params={"datastore_id": DATASTORE_ID, "status": "Active"},
    ).json()
    print(f"{len(anomalies)} active anomalies")

Variations and advanced usage

AI-assisted profiling

qualytics operations profile --datastore-id 42 --ai-effort high
qualytics operations profile --datastore-id 42 --ai-effort high --infer-as-draft

--infer-as-draft means new AI Managed checks land in Draft status, awaiting human review. Use it whenever a person should sign off on the generated rules before they start producing anomalies. See AI Managed Checks for the full conceptual model.

Auto-resolve fixed anomalies

When a previously failing check passes again, optionally close out its anomaly automatically:

qualytics operations scan --datastore-id 42 --auto-resolve-passed-anomalies

Long timeouts for huge datastores

The default operation timeout is 30 minutes. For warehouses with billions of rows:

qualytics operations scan --datastore-id 42 --timeout 7200

Or run with --background and don't tie up the shell at all.

Targeted runs

Don't profile or scan the whole datastore if only a few tables changed:

qualytics operations scan --datastore-id 42 --container-names "orders,customers"
qualytics operations scan --datastore-id 42 --container-tags "critical"

See Targeted scans by container or tag.

Crontab on Linux/macOS

# Every day at 3 AM, run the full pipeline on datastore 42
# Load credentials from a restricted file instead of inline:
#   echo 'export QUALYTICS_TOKEN=...' > /etc/qualytics-secrets && chmod 600 /etc/qualytics-secrets
0 3 * * * . /etc/qualytics-secrets && QUALYTICS_NO_BANNER=1 /usr/local/bin/qualytics operations sync --datastore-id 42 && /usr/local/bin/qualytics operations profile --datastore-id 42 && /usr/local/bin/qualytics operations scan --datastore-id 42 >> /var/log/qualytics.log 2>&1

For more sophisticated scheduling and exports, see Scheduled metadata exports.

Troubleshooting

Symptom	Likely cause	Fix
Operation times out at 30 minutes	Default timeout reached	Add `--timeout 7200` or run with `--background`.
`403 Forbidden` on `operations run`	Missing `Editor` team permission	Confirm team membership; ask a team admin to grant `Editor`.
`scan` reports "no checks to run"	The datastore has no active checks	Confirm with `qualytics checks list --datastore-id 42 --status Active`; if empty, profile with `--ai-effort high` or create checks manually.
`profile` runs forever	Default record limit is unset, table is huge	Set `--max-records-analyzed-per-partition`; see Operations.
Anomalies count is huge after first scan	First scan after profile inference can produce many anomalies	Triage with Bulk anomaly triage.

Operations command reference: every flag for sync, profile, scan, materialize, export.
Targeted scans by container or tag: run on subsets, not the whole datastore.
Incremental scans for large tables: scan only what changed since last run.
Bulk anomaly triage: handling the output of scans.