Athena Connector
Amazon Athena is a serverless query service that runs standard SQL over data in Amazon S3, using the AWS Glue Data Catalog for table metadata. It is the query layer many teams use to expose S3-based data lakes — Parquet, ORC, JSON, CSV, and Iceberg files — as governed, queryable tables without provisioning a separate warehouse.
The Qualytics Athena connector reuses that same access path. Point it at a workgroup, database, and S3 query-results bucket, and Qualytics will profile tables, run scheduled scans, and surface record- and schema-level anomalies on data that already lives in your lake — alongside the rest of your governed sources.
Deep Dive
-
Permissions
Minimum IAM permissions for Athena, Glue, and the S3 query-results bucket, plus a ready-to-paste example IAM policy.
-
Authentication
Choose between Basic credentials (Access Key + Secret) and IAM Role (assumed role with External ID) for Athena connections.
-
Troubleshooting
Common Athena connection errors and how to resolve them — credentials, permissions, or S3 output configuration.
Managing
-
Add Source Datastore
Step-by-step UI walkthrough for adding Athena as a source datastore, with new or existing connections.
-
Create via API
REST and CLI payload examples for creating Athena source datastores, with both Access Key and IAM Role authentication.
Notes
Supported tables and formats. The connector reads any database and table accessible through Athena via the AWS Glue Data Catalog — Parquet, ORC, Avro, JSON, CSV, Iceberg, and Hive-style external tables included. No format-specific configuration is required on the Qualytics side.
Cost model. Athena bills per TB scanned, so profile and scan operations directly affect your AWS bill. Tables that are partitioned and stored in a columnar format (Parquet, ORC) scan dramatically less data than row-formatted or unpartitioned tables — preferring these for tables Qualytics monitors keeps scan cost predictable.