Amazon S3 Connector
Amazon S3 is the object storage service most teams use as the foundation of an AWS data lake, and it is typically where raw and curated data lands before being modeled into a warehouse. Files in S3 (Parquet, ORC, JSON, CSV, Avro, Delta, Iceberg, and more) are queryable by Athena, EMR, Glue, and Spark.
The Qualytics Amazon S3 connector reads files directly from S3 buckets. You provide a bucket URI, a root path, and an authentication mode. Qualytics then profiles the files, runs scheduled scans, and surfaces record- and schema-level anomalies. The region is inferred from the bucket. The same connector can also serve as an enrichment store.
Deep Dive
-
Permissions
Minimum IAM permissions for source (read-only) and enrichment (read-write) S3 buckets, plus ready-to-paste example IAM policies.
-
Authentication
Choose between Access Key (static IAM user credentials) and IAM Role (assumed role with optional External ID) for S3 connections.
-
Troubleshooting
Common S3 connection errors and how to resolve them, covering credentials, permissions, bucket configuration, and SSE-KMS keys.
How-tos
-
Add Source Datastore
Step-by-step UI walkthrough for adding S3 as a source datastore, using either a new or existing connection.
-
Create via API
REST and CLI payload examples for creating S3 source and enrichment datastores, with both Access Key and IAM Role authentication.
Notes
Supported file formats
The connector reads any file format Qualytics supports for DFS sources, including Parquet, ORC, Avro, JSON, CSV, Delta, and Iceberg. See Supported File Formats for the full list. Schema inference and partition discovery work the same way they do for any other DFS source.
Public vs private buckets
Public buckets do not require credentials, but most production deployments use a private bucket. With a private bucket, the IAM identity behind the connection must have the read permissions listed on the Permissions page (plus write permissions if the bucket is used as an enrichment store).
Object vs bucket permissions
S3 requires both bucket-level (s3:ListBucket, s3:GetBucketLocation) and object-level (s3:GetObject, s3:PutObject) permissions on the bucket and its objects. Granting only object-level permissions is the most common cause of permission errors that look like authentication failures.