Daily triage automation

Goal

Automate the parts of anomaly triage that are clearly mechanical: archiving stale Active anomalies, auto-acknowledging known-good patterns, and routing fresh anomalies to the right on-call person. The goal is to keep the queue small enough that human attention is reserved for the cases that need it.

Permissions

Step	Endpoint	Role	Team permission
List anomalies	`GET /api/anomalies`	`Member`	N/A
Bulk update	`PATCH /api/anomalies`	`Member`	`Editor`
Bulk archive	`DELETE /api/anomalies?ids=...`	`Member`	`Author`
Look up users (for assignment)	`GET /api/users`	`Member`	N/A

Prerequisites

The CLI is installed and authenticated as a service account with the permissions above.
A scheduled runner (cron, systemd timer, or a CI scheduled workflow).
A clear policy for what's automatic vs. human (otherwise you'll automate away signal you needed).

Automate carefully

Automating archive of "stale" anomalies is convenient until it hides a real-but-unattended problem. Pair every auto-archive rule with an alert that triggers if archive volume spikes.

CLI workflow

graph LR
    Cron[Daily cron] --> Stale[Archive stale Active >30d]
    Cron --> Routine[Auto-acknowledge known patterns]
    Cron --> Fresh[Assign fresh ones to on-call]
    Cron --> Report[Email summary]

Auto-archive stale Active anomalies

#!/usr/bin/env bash
set -euo pipefail

THIRTY_DAYS_AGO=$(date -u -d '30 days ago' +%Y-%m-%d 2>/dev/null \
                || date -u -v-30d +%Y-%m-%d)  # macOS fallback

OLD_IDS=$(qualytics anomalies list \
    --datastore-id 42 \
    --status Active \
    --end-date "$THIRTY_DAYS_AGO" \
    --format json \
  | jq -r '.[].id' | paste -sd, -)

if [[ -n "$OLD_IDS" ]]; then
    qualytics anomalies archive --ids "$OLD_IDS" --status Discarded
    echo "Archived $(echo "$OLD_IDS" | tr ',' '\n' | wc -l) stale anomalies"
fi

Auto-acknowledge a known-good pattern

If a specific check generates anomalies you always investigate but never resolve until much later, auto-acknowledge them so they leave the unattended queue but stay visible:

ACK_IDS=$(qualytics anomalies list \
    --datastore-id 42 \
    --check-id 555 \
    --status Active \
    --format json | jq -r '.[].id' | paste -sd, -)

if [[ -n "$ACK_IDS" ]]; then
    qualytics anomalies update \
        --ids "$ACK_IDS" \
        --status Acknowledged \
        --description "Auto-acknowledged for check 555 (latency-sensitive; investigated weekly)" \
        --assignee-ids "$ONCALL_USER_ID" \
        --tags "auto-ack"
fi

Route fresh anomalies to the current on-call

TODAY=$(date -u +%Y-%m-%d)
ONCALL=$(./scripts/lookup-oncall-user-id.sh)  # however you resolve this

NEW_IDS=$(qualytics anomalies list \
    --datastore-id 42 \
    --status Active \
    --start-date "$TODAY" \
    --format json | jq -r '.[].id' | paste -sd, -)

if [[ -n "$NEW_IDS" ]]; then
    qualytics anomalies update \
        --ids "$NEW_IDS" \
        --status Active \
        --assignee-ids "$ONCALL"
fi

Cron entry

# Run daily at 6 AM
# Load credentials from a restricted file instead of inline:
#   echo 'export QUALYTICS_TOKEN=...' > /etc/qualytics-secrets && chmod 600 /etc/qualytics-secrets
0 6 * * * cd /opt/qualytics && . /etc/qualytics-secrets && QUALYTICS_NO_BANNER=1 ./triage.sh >> /var/log/qualytics-triage.log 2>&1

Behind the scenes

Same endpoints as Bulk anomaly triage, called on a schedule:

Action	Method	Path
List by date / status	GET	`/api/anomalies?...`
Bulk acknowledge / reassign	PATCH	`/api/anomalies`
Bulk archive	DELETE	`/api/anomalies?ids=...&archive=true&status=...`

Python equivalent

A more sophisticated triage script is much easier in Python than Bash. This one auto-archives stale anomalies and routes fresh ones to the on-call, with a final summary:

import os
import httpx
from datetime import datetime, timedelta, timezone

BASE_URL = os.environ["QUALYTICS_URL"].rstrip("/")
TOKEN    = os.environ["QUALYTICS_TOKEN"]
HEADERS  = {"Authorization": f"Bearer {TOKEN}"}
DATASTORE_ID = 42
ONCALL_USER_ID = 18

today        = datetime.now(timezone.utc).date()
stale_cutoff = (today - timedelta(days=30)).isoformat()
today_iso    = today.isoformat()

with httpx.Client(headers=HEADERS, timeout=30.0) as client:
    # Stale archive
    stale = client.get(f"{BASE_URL}/api/anomalies", params={
        "datastore_id": DATASTORE_ID, "status": "Active",
        "end_date":     stale_cutoff,
    }).json()
    stale_ids = [a["id"] for a in stale]
    if stale_ids:
        client.request(
            "DELETE", f"{BASE_URL}/api/anomalies",
            params={"ids": ",".join(map(str, stale_ids)),
                    "archive": "true", "status": "Discarded"},
        ).raise_for_status()

    # Route fresh
    fresh = client.get(f"{BASE_URL}/api/anomalies", params={
        "datastore_id": DATASTORE_ID, "status": "Active",
        "start_date":   today_iso,
    }).json()
    fresh_ids = [a["id"] for a in fresh]
    if fresh_ids:
        client.patch(f"{BASE_URL}/api/anomalies", json={
            "ids": fresh_ids, "status": "Active",
            "assignee_ids": [ONCALL_USER_ID],
        }).raise_for_status()

    print(f"archived {len(stale_ids)} stale, assigned {len(fresh_ids)} fresh to user {ONCALL_USER_ID}")

Variations and advanced usage

Slack the daily summary

Pipe a humanized summary into a webhook:

SUMMARY=$(qualytics anomalies list --datastore-id 42 --status Active --format json \
          | jq -r 'group_by(.container.name) | map("\(.[0].container.name): \(length)") | join("\n")')

curl -s -X POST -H 'Content-Type: application/json' \
     -d "{\"text\": \"Active anomalies by container:\n$SUMMARY\"}" \
     "$SLACK_WEBHOOK"

Open Jira tickets for fresh high-severity anomalies

If you've configured Jira via the Ticketing integration, tagged anomalies can flow into Jira automatically through Flows. The CLI's job is then just routing and tagging.

Per-team triage rules

Run the same script with different filters per team, each with its own assignees and on-call user. Use --container-tags to scope.

Troubleshooting

Symptom	Likely cause	Fix
Cron job runs but produces empty output	The CLI banner is being captured before the script logic	Set `QUALYTICS_NO_BANNER=1` and `CI=true` in the cron env.
Daily summary always shows zeros	Wrong timezone in `date` (cron uses system TZ)	Anchor to UTC explicitly: `date -u +%Y-%m-%d`.
Auto-archive surge after a deploy	Real anomalies got swept up	Tighten the filter: only archive ones with `--check-id` in a known-noisy list, not all stale.
Bash script fails on macOS but works on Linux	`date -d '30 days ago'` is GNU-specific	Use `date -u -v-30d +%Y-%m-%d` on macOS, or run in a Linux container.

Bulk anomaly triage: the manual version of this scenario.
Anomalies command reference
Scheduled metadata exports: a different kind of automation around anomalies (snapshots for audit/DR).