Skip to content

Scan FAQ

Answers to common questions about the Scan Operation, grouped by topic. For step-by-step instructions, see the how-tos; for conceptual references, see the deep dive.

Read strategy

When should I choose Incremental over Full?

Use Incremental for routine scans where only new or changed records need to be re-validated. It reads less data, finishes faster, and is the right choice for high-frequency post-load runs. Use Full when you need a complete re-validation across the entire dataset, when checks were edited and you want every record re-evaluated, or when you want Auto-Resolve to clear anomalies that no longer reproduce. See Read Strategies for the conceptual reference.

What happens if a selected container doesn't have an incremental identifier?

The scan falls back to a Full read on that specific container, even when the Read Strategy is set to Incremental. Containers in the same scan that do have an incremental identifier still use Incremental; the fallback is per container, not per scan.

Does the first Incremental scan really run a Full scan?

Yes. The first Incremental scan has no prior baseline, so it processes every record in the container and stores the highest incremental identifier value. From the second run on, only records with a higher identifier are processed.

What is a Starting Threshold and when do I need one?

The Starting Threshold is an optional value (a timestamp for time-based incremental, a numeric value for batch-based) that tells the scan where to start reading. You only need it when you want to override the automatic baseline, for example after a backfill where you want to re-scan a specific range of history.

Auto Resolve Anomalies

Why didn't Auto-Resolve run on my Incremental scan?

Auto-Resolve only runs after a successful Full scan. Incremental scans, by definition, do not read every record in the container, so they cannot confirm whether records that previously caused an open anomaly still violate the check. To prevent false resolutions, the Auto Resolve Anomalies toggle is hidden during configuration of Incremental scans, and any value sent through the API is forced off before the scan starts.

To auto-resolve anomalies, run the scan with the Full read strategy.

Can I disable Auto Resolve Anomalies for a specific scan?

Yes. The Auto Resolve Anomalies toggle in the Scan Settings step controls Auto-Resolve per scan run, and the same field is exposed on schedule create/update. Setting it to off keeps existing open anomalies untouched after the scan finishes. See Scan Settings.

Which anomalies are auto-resolved?

Only anomalies currently in Active or Acknowledged status are eligible. Anomalies already in Resolved, Invalid, Duplicate, or Discarded are left untouched. For an anomaly to be resolved, all the checks that originally flagged it must run successfully in this scan, and none of those checks may raise the same issue against the same fingerprint again. See Auto-Resolve on Full Scans for the full eligibility rules.

How is the resolving scan attributed to the anomaly?

When a scan auto-resolves an anomaly, the resolution is recorded as a status change in the anomaly's history. The change is attributed to Qualytics with the scan operation referenced, so analysts can open the history of any auto-resolved anomaly and trace it back to the scan that produced the resolution.

What if a check that previously flagged an anomaly isn't included in the new scan?

The anomaly is left as-is. Auto-Resolve requires every check that originally flagged the anomaly to run successfully in the current scan; if even one of those checks was skipped, archived, or excluded by the table or category selection, the anomaly is not resolved. This protects against accidental resolutions when the scope of a scan narrows.

Does Auto-Resolve change the anomaly's status to Resolved or to something else?

Resolved. The auto-resolved anomaly's final status is Resolved, the same status used by manual resolution and Flow actions. Downstream Flows and reports that filter on Resolved status will include auto-resolved anomalies.

Anomaly handling

What's the difference between Archive Duplicate Anomalies and Reactivate Recurring Anomalies?

Archive Duplicate sends a brand-new anomaly that matches an existing open one straight to archived status, keeping the open list focused on truly new findings. Reactivate Recurring does the opposite for archived ones: when a new anomaly matches an archived one, the original is reactivated (status moves back to Active) instead of staying archived. They cover opposite sides of the same de-duplication problem.

Why are anomaly counts higher than I expected after enabling Reactivate Recurring?

Reactivate Recurring moves anomalies back from Archived to Active when the same issue resurfaces. If a check repeatedly flags the same fingerprint over time, the same anomaly will keep flipping back to Active rather than producing new entries. Check the anomaly's history to confirm whether it has been reactivated multiple times.

Does increasing Maximum Source Examples per Anomaly affect storage?

Yes. Source examples are written to the Enrichment Datastore at scan time. A limit of 10,000 captures 1,000x more rows per anomaly than the default 10, and storage in the enrichment datastore grows proportionally. Use the higher limits only when you actually need many examples for debugging, then lower it back.

Can I change Maximum Source Examples after the scan finished?

No. The limit applies at scan time, so once the scan finishes the captured source records are fixed. To capture more, raise the limit and re-run the scan.

How does Maximum Record Anomalies per Check work?

When a single check produces more anomalies than the configured limit in one container, the additional anomalies are rolled up into a single anomaly that preserves the total count. This keeps the open anomaly list manageable while still recording the magnitude of the violation.

Scheduling

What timezone do scheduled scans use if I don't pick one?

UTC is the default for every new and existing schedule. Existing schedules created before timezone support continue to run in UTC unless explicitly edited.

Can a deactivated schedule be reactivated without re-entering the cron expression?

Yes. Deactivating a schedule keeps its cron expression on file, so reactivating it later resumes the same schedule without setup. Exception: schedules deactivated before May 7, 2026 may need the cron expression re-entered once after reactivation.

How does Daylight Saving Time affect scheduled scans?

Schedules in DST-observant timezones (for example, America/New_York) automatically shift across DST transitions. A job set to 9:00 AM in America/New_York runs at 9:00 AM local time year-round, regardless of whether the zone is in EST or EDT.

Can I run multiple schedules on the same datastore?

Yes. Each schedule is independent, with its own cron expression, timezone, and Scan Settings. Use this to run, for example, a fast Incremental scan every hour and a Full scan with Auto-Resolve once a day on the same datastore.

Permissions

Who can run, schedule, or configure a scan?

Users with Editor team permission on the target datastore can run ad-hoc scans, create and edit schedules, and configure all Scan Settings (including Auto Resolve). Viewer, Reporter, and Drafter cannot. See Permissions for the full matrix.

Are masked field values revealed during a scan?

Scans run normally against masked fields, and masking does not affect anomaly detection or check execution. However, when results are displayed, masked values are obfuscated in anomaly source records, anomaly descriptions, and the Enrichment Datastore. Users with Editor permission can reveal masked source-record values per anomaly.

Do I need Editor on every datastore, or just one?

You need Editor on each datastore you want to scan. Permissions are evaluated per datastore, so a user can be Editor on one datastore and Viewer on another.

Results and history

Where do I see the operation summary after the scan finishes?

In the Activity tab of the datastore. Each row shows the operation status, duration, and inline counters (including Anomalies Identified and Anomalies Auto-Resolved when applicable). Click the row to expand the full summary card.

Can I rerun a scan with the same settings as a previous run?

Yes. Open the operation from the Activity tab and click Rerun. The new operation inherits the source containers, check categories, read strategy, and Scan Settings of the original.

Can I resume a scan that was aborted?

Yes, for Scan and Profile operations. Open the aborted operation in Activity and click Resume, and the system continues from where it stopped instead of restarting.

What do the Identified and Auto-Resolved tabs in the Scan Results modal mean?

The Identified tab lists anomalies that this scan detected (whether they ended up Active, Acknowledged, or were rolled up). The Auto-Resolved tab lists previously open anomalies that this scan automatically resolved because the same checks ran again and no longer found the issue. The Auto-Resolved tab is hidden when the scan ran as Incremental or when Auto Resolve was disabled.

API

How do I run a scan via the API?

POST to /api/operations/run with "type": "scan" and the datastore ID. See Running a Scan operation for the full payload, including the optional auto_resolve_passed_anomalies field.

Can I pass runtime variables when running a scan via the API?

Yes. Reference variables in your check definitions with {{ variable_name }} and pass values in the check_variables field of the scan payload. See Use Runtime Variables.

How do I check the status of a running scan via the API?

GET /api/operations/{id} to retrieve the current state. The response includes the status object with all counters (records processed, anomalies identified, anomalies auto-resolved) as the scan progresses.

Can I schedule a scan via the API?

Yes. POST to /api/operations/schedule with the scan payload plus a crontab field. The same auto_resolve_passed_anomalies rules apply: it is forced to false when incremental is true.