Athena Connector

Amazon Athena is a serverless query service that runs standard SQL over data in Amazon S3, using the AWS Glue Data Catalog for table metadata. It is the query layer many teams use to expose S3-based data lakes — Parquet, ORC, JSON, CSV, and Iceberg files — as governed, queryable tables without provisioning a separate warehouse.

The Qualytics Athena connector reuses that same access path. Point it at a workgroup, database, and S3 query-results bucket, and Qualytics will profile tables, run scheduled scans, and surface record- and schema-level anomalies on data that already lives in your lake — alongside the rest of your governed sources.

Deep Dive

Permissions

Minimum IAM permissions for Athena, Glue, and the S3 query-results bucket, plus a ready-to-paste example IAM policy.

Permissions
Authentication

Choose between Basic credentials (Access Key + Secret) and IAM Role (assumed role with External ID) for Athena connections.

Authentication
Troubleshooting

Common Athena connection errors and how to resolve them — credentials, permissions, or S3 output configuration.

Troubleshooting

Managing

Add Source Datastore

Step-by-step UI walkthrough for adding Athena as a source datastore, with new or existing connections.

Add Source Datastore
Create via API

REST and CLI payload examples for creating Athena source datastores, with both Access Key and IAM Role authentication.

Create via API

Notes

Supported tables and formats. The connector reads any database and table accessible through Athena via the AWS Glue Data Catalog — Parquet, ORC, Avro, JSON, CSV, Iceberg, and Hive-style external tables included. No format-specific configuration is required on the Qualytics side.

Cost model. Athena bills per TB scanned, so profile and scan operations directly affect your AWS bill. Tables that are partitioned and stored in a columnar format (Parquet, ORC) scan dramatically less data than row-formatted or unpartitioned tables — preferring these for tables Qualytics monitors keeps scan cost predictable.