Skip to content

Amazon S3 Connector

Amazon S3 is the object storage service most teams use as the foundation of an AWS data lake, and it is typically where raw and curated data lands before being modeled into a warehouse. Files in S3 (Parquet, ORC, JSON, CSV, Avro, Delta, Iceberg, and more) are queryable by Athena, EMR, Glue, and Spark.

The Qualytics Amazon S3 connector reads files directly from S3 buckets. You provide a bucket URI, a root path, and an authentication mode. Qualytics then profiles the files, runs scheduled scans, and surfaces record- and schema-level anomalies. The region is inferred from the bucket. The same connector can also serve as an enrichment store.


Deep Dive

  • Permissions


    Minimum IAM permissions for source (read-only) and enrichment (read-write) S3 buckets, plus ready-to-paste example IAM policies.

    Permissions

  • Authentication


    Choose between Access Key (static IAM user credentials) and IAM Role (assumed role with optional External ID) for S3 connections.

    Authentication

  • Troubleshooting


    Common S3 connection errors and how to resolve them, covering credentials, permissions, bucket configuration, and SSE-KMS keys.

    Troubleshooting


How-tos

  • Add Source Datastore


    Step-by-step UI walkthrough for adding S3 as a source datastore, using either a new or existing connection.

    Add Source Datastore

  • Create via API


    REST and CLI payload examples for creating S3 source and enrichment datastores, with both Access Key and IAM Role authentication.

    Create via API


Notes

Supported file formats

The connector reads any file format Qualytics supports for DFS sources, including Parquet, ORC, Avro, JSON, CSV, Delta, and Iceberg. See Supported File Formats for the full list. Schema inference and partition discovery work the same way they do for any other DFS source.

Public vs private buckets

Public buckets do not require credentials, but most production deployments use a private bucket. With a private bucket, the IAM identity behind the connection must have the read permissions listed on the Permissions page (plus write permissions if the bucket is used as an enrichment store).

Object vs bucket permissions

S3 requires both bucket-level (s3:ListBucket, s3:GetBucketLocation) and object-level (s3:GetObject, s3:PutObject) permissions on the bucket and its objects. Granting only object-level permissions is the most common cause of permission errors that look like authentication failures.