Onboard a single datastore

Goal

You have a database (Postgres, Snowflake, BigQuery, etc.) you want Qualytics to start monitoring. This page walks through the full sequence: create a connection, create a datastore, run the first sync, profile, and scan, and review the first set of anomalies.

Permissions

Step	Endpoint	Role	Team permission
Create connection	`POST /api/connections`	`Manager`	N/A (system-wide)
Test connection	`POST /api/connections/{id}/test`	`Manager`	N/A
Create datastore	`POST /api/datastores`	`Manager`	N/A
Run sync / profile / scan	`POST /api/operations/run`	`Member`	`Editor` on the datastore's team
List anomalies	`GET /api/anomalies`	`Member`	`Reporter`

You need Manager to create connections and datastores

A plain Member cannot onboard a new source. Either escalate temporarily or have a Manager run the first three steps and a Member run the operations afterwards.

Prerequisites

The CLI is installed and authenticated (see Installing Qualytics CLI).
Network connectivity from the Qualytics instance to the source database (test from a VM in the same network if unsure).
Database credentials with at least SELECT on the schemas you plan to monitor. See Available Connectors for per-database minimum permissions.

CLI workflow

graph LR
    C[1. Create connection] --> T[2. Test connection]
    T --> D[3. Create datastore]
    D --> S[4. Sync]
    S --> P[5. Profile]
    P --> SC[6. Scan]
    SC --> A[7. Review anomalies]

1. Create the connection

Put credentials in environment variables so they aren't recorded in shell history:

export DB_HOST=warehouse.example.com
export DB_USER=qualytics_reader
export DB_PASSWORD='S3cur3p@ss'

qualytics connections create \
    --type postgresql \
    --name "warehouse-prod-db" \
    --host '${DB_HOST}' \
    --port 5432 \
    --username '${DB_USER}' \
    --password '${DB_PASSWORD}'

The single quotes around '${DB_HOST}' are intentional. They prevent the local shell from expanding the variable; the CLI does the expansion at runtime. Secrets never land on disk in plaintext.

2. Test the connection

qualytics connections test --id 17

A successful test returns the database version and reachable schemas.

3. Create the datastore

qualytics datastores create \
    --name "warehouse-prod" \
    --connection-name "warehouse-prod-db" \
    --database "analytics" \
    --schema "public" \
    --tags "production,warehouse"

4. Sync the catalog

qualytics operations sync --datastore-id 42

sync populates the list of tables and schemas Qualytics knows about. It must succeed before profile or scan will return useful results.

5. Profile the data

qualytics operations profile --datastore-id 42 --ai-effort medium

Profile inspects each container, infers per-field statistics, and (when --ai-effort is on) suggests quality checks based on observed patterns.

6. Run the first scan

qualytics operations scan --datastore-id 42

scan runs every active check and creates an anomaly for each violation.

7. Review the anomalies

qualytics anomalies list --datastore-id 42 --status Active

Behind the scenes

CLI step	Method	Path
`connections create`	POST	`/api/connections`
`connections test`	POST	`/api/connections/{connection_id}/test`
`datastores create`	POST	`/api/datastores`
`operations sync/profile/scan`	POST	`/api/operations/run` (with `type: sync \\| profile \\| scan` in the body)
`operations get` (polling)	GET	`/api/operations/{operation_id}`
`anomalies list`	GET	`/api/anomalies`

Python equivalent

import os
import time
import httpx

BASE_URL = os.environ["QUALYTICS_URL"].rstrip("/")
TOKEN    = os.environ["QUALYTICS_TOKEN"]
HEADERS  = {"Authorization": f"Bearer {TOKEN}"}

def run_operation(client, datastore_id: int, op_type: str) -> dict:
    """Trigger an operation and poll until it completes."""
    r = client.post(
        f"{BASE_URL}/api/operations/run",
        json={"type": op_type, "datastore_id": datastore_id},
    )
    r.raise_for_status()
    op_id = r.json()["id"]

    while True:
        r = client.get(f"{BASE_URL}/api/operations/{op_id}")
        r.raise_for_status()
        op = r.json()
        if op["result"] in ("success", "failure", "aborted"):
            return op
        time.sleep(5)

with httpx.Client(headers=HEADERS, timeout=60.0) as client:
    # 1. Connection
    r = client.post(f"{BASE_URL}/api/connections", json={
        "type": "postgresql",
        "name": "warehouse-prod-db",
        "host": os.environ["DB_HOST"],
        "port": 5432,
        "username": os.environ["DB_USER"],
        "password": os.environ["DB_PASSWORD"],
    })
    r.raise_for_status()
    connection_id = r.json()["id"]

    # 2. Test
    client.post(f"{BASE_URL}/api/connections/{connection_id}/test").raise_for_status()

    # 3. Datastore
    r = client.post(f"{BASE_URL}/api/datastores", json={
        "name": "warehouse-prod",
        "connection_id": connection_id,
        "database": "analytics",
        "schema": "public",
        "tags": ["production", "warehouse"],
    })
    r.raise_for_status()
    datastore_id = r.json()["id"]

    # 4-6. Operations pipeline
    for op_type in ("sync", "profile", "scan"):
        op = run_operation(client, datastore_id, op_type)
        print(f"{op_type:>8}: {op['result']}")

Variations and advanced usage

DFS (S3 / GCS / Azure) instead of JDBC

The pattern is identical; only the connection type and fields change:

qualytics connections create \
    --type amazon-s3 \
    --name "data-lake-prod" \
    --uri 's3://my-bucket/data/' \
    --access-key '${AWS_ACCESS_KEY}' \
    --secret-key '${AWS_SECRET_KEY}'

IAM role authentication

For S3, Athena, and Redshift, prefer an IAM role over static credentials:

qualytics connections create \
    --type amazon-s3 \
    --name "data-lake-prod" \
    --uri 's3://my-bucket/data/' \
    --authentication-type IAM_ROLE \
    --role-arn arn:aws:iam::123456789012:role/QualyticsReader \
    --external-id qualytics-prod-external-id

See IAM Role Authentication.

Dry-run before creating

qualytics connections create --type postgresql ... --dry-run
qualytics datastores create --name ... --connection-name ... --dry-run

--dry-run prints the exact API payload without making the call. Useful for review or to copy into Python.

Troubleshooting

Symptom	Likely cause	Fix
`connections test` fails with `Connection refused`	Network path missing (no VPN, security group, or peering)	Test connectivity from the same network the Qualytics instance runs in.
`connections test` fails with `authentication failed`	Wrong username/password, or missing privilege	Verify with a SQL client outside Qualytics first.
`sync` succeeds but no containers appear	Schema name typo, or no tables in the schema	Confirm with `qualytics datastores get --id 42`. The schema is part of the datastore record.
`profile` runs forever on huge tables	Default per-partition record limit is too low or unset	Set `--max-records-analyzed-per-partition`; see Operations.
`scan` returns "no checks to run"	Profile produced zero AI Managed checks (turn AI Effort up, or write checks manually)	`qualytics operations profile --ai-effort high`, or Bulk-create quality checks.

Connections command reference
Datastores command reference
Operations command reference
Bulk datastore onboarding: when you have many to onboard at once.
Daily sync, profile, and scan: the recurring follow-up to this one-time setup.