Skip to content

Athena Connector

Amazon Athena is a serverless query service that runs standard SQL over data in Amazon S3, using the AWS Glue Data Catalog for table metadata. It is the query layer many teams use to expose S3-based data lakes — Parquet, ORC, JSON, CSV, and Iceberg files — as governed, queryable tables without provisioning a separate warehouse.

The Qualytics Athena connector reuses that same access path. Point it at a workgroup, database, and S3 query-results bucket, and Qualytics will profile tables, run scheduled scans, and surface record- and schema-level anomalies on data that already lives in your lake — alongside the rest of your governed sources.


Deep Dive

  • Permissions


    Minimum IAM permissions for Athena, Glue, and the S3 query-results bucket, plus a ready-to-paste example IAM policy.

    Permissions

  • Authentication


    Choose between Basic credentials (Access Key + Secret) and IAM Role (assumed role with External ID) for Athena connections.

    Authentication

  • Troubleshooting


    Common Athena connection errors and how to resolve them — credentials, permissions, or S3 output configuration.

    Troubleshooting


Managing

  • Add Source Datastore


    Step-by-step UI walkthrough for adding Athena as a source datastore, with new or existing connections.

    Add Source Datastore

  • Create via API


    REST and CLI payload examples for creating Athena source datastores, with both Access Key and IAM Role authentication.

    Create via API


Notes

Supported tables and formats. The connector reads any database and table accessible through Athena via the AWS Glue Data Catalog — Parquet, ORC, Avro, JSON, CSV, Iceberg, and Hive-style external tables included. No format-specific configuration is required on the Qualytics side.

Cost model. Athena bills per TB scanned, so profile and scan operations directly affect your AWS bill. Tables that are partitioned and stored in a columnar format (Parquet, ORC) scan dramatically less data than row-formatted or unpartitioned tables — preferring these for tables Qualytics monitors keeps scan cost predictable.