Skip to content

Add an Amazon S3 Source Datastore

A source datastore is a storage location Qualytics connects to so it can profile, scan, and monitor data. Adding S3 as a source lets Qualytics read files directly from your bucket and run quality operations on the data they contain.

Before you start, review the Amazon S3 Permissions and the available Authentication methods.

This page covers two ways to add an S3 source datastore: using a new connection or reusing a saved one. Both flows share the same form fields. Use the tabs in Field reference below to pick the flow that matches your situation. If this is your first S3 datastore in Qualytics, use the New Connection tab.

For the generic step-by-step walkthrough of the Add Source Datastore modal (opening it, toggling the connection mode, testing, and finishing), see Add Source Datastore. The fields described below apply to the S3-specific portion of that flow.

Field reference

The Add Source Datastore form changes depending on whether you create a new connection or reuse a saved one. Pick the tab below that matches your flow.

When Add New Connection is toggled ON, the form shows five groups of fields: Connection Properties, Authentication, Secrets Management (optional Vault integration), Datastores Extraction, and Datastore Properties.

Connection Properties

These fields define where the S3 bucket lives.

new-connection-properties

REF. FIELD REQUIRED DESCRIPTION
1 Connection Name Yes A label for the saved connection (e.g., acme_s3_lake). Appears in the Connection dropdown when you create future datastores.
2 URI Yes The bucket-level S3 URI in the format s3://<bucket-name>. Do not include a file path or subfolder here. Use Root Path in Datastores Extraction below to scope to a subfolder.

Authentication

Choose how Qualytics authenticates to AWS. Setting Type changes the credential fields shown below it.

The default mode. Supply static AWS credentials for an IAM user that has the S3 permissions listed in Permissions.

new-authentication-access-key

REF. FIELD REQUIRED DESCRIPTION
1 Type Yes Set to Access Key.
2 Access Key Yes The AWS Access Key ID for the IAM user.
3 Secret Key Yes The matching AWS Secret Access Key.

AWS-only

The IAM Role option in Type is shown only on AWS and local Qualytics deployments. On Azure and GCP deployments, only Access Key authentication is available.

Assume an IAM role in your AWS account via AWS STS. Qualytics uses short-lived credentials that refresh automatically. See Authentication for the assume-role flow walkthrough.

new-authentication-iam-role

REF. FIELD REQUIRED DESCRIPTION
1 Type Yes Set to IAM Role.
2 Role ARN Yes The IAM role ARN Qualytics will assume via AWS STS.
3 External ID No Include only if your role's trust policy requires one.

Secrets Management (optional)

Use this group only if you want Qualytics to pull credentials from HashiCorp Vault instead of typing them into the form. Toggle HashiCorp Vault ON to expose the fields below.

new-secrets-management

REF. FIELD REQUIRED DESCRIPTION
1 Login URL Yes The Vault endpoint Qualytics uses to authenticate (e.g., https://vault.example.com/v1/auth/approle/login).
2 Credentials Payload Yes A JSON body containing the credentials Vault expects (e.g., {"role_id":"...","secret_id":"..."}).
3 Token JSONPath Yes The JSONPath that extracts the client token from Vault's response. Defaults to $.auth.client_token.
4 Secret URL Yes The Vault path where the secret is stored (e.g., https://vault.example.com/v1/secret/data/s3).
5 Token Header Name Yes The HTTP header name used to send the token. Defaults to X-Vault-Token.
6 Data JSONPath Yes The JSONPath that extracts the secret payload from Vault's response. Defaults to $.data.

Note

Once Vault is configured, reference any secret value in the Connection Properties or Authentication fields using ${key} (e.g., ${secret_key}). Qualytics resolves the secret at the moment the connection is opened, so updated keys take effect on the next connection.

Datastores Extraction

Pick the subfolder inside the bucket Qualytics should read from.

new-datastores-extraction

REF. FIELD REQUIRED DESCRIPTION
1 Root Path Yes The subfolder inside the bucket where the data lives (e.g., /raw/orders/). Use / to read from the bucket root.

Datastore Properties

Common fields for every source datastore, visible below the Datastores Extraction section in the same form.

new-datastore-properties

REF. FIELD REQUIRED DESCRIPTION
1 Name Yes A label for the datastore (e.g., acme_lake_orders). Appears on the datastore cards in the workspace.
2 Teams Yes Select one or more teams to associate with this source datastore.
3 Initiate Sync No Automatically sync the datastore to detect containers and fields after creation.
4 Connection Info No Read-only banner that shows the IP address the Qualytics dataplane uses to reach your S3 bucket. Allowlist this IP in your AWS security group inbound rules so the dataplane can connect.

When Add New Connection is toggled OFF and you pick a saved S3 connection, the Connection Properties, Authentication, and Secrets Management sections are collapsed and read-only. Qualytics has already validated those credentials, so there is nothing for you to fill in. You only fill in the Datastores Extraction and Datastore Properties below.

To change a saved connection's credentials, edit the connection itself from Settings → Connections. Edits there apply to every datastore that reuses the connection.

Datastores Extraction

existing-datastores-extraction

REF. FIELD REQUIRED DESCRIPTION
1 Root Path Yes The subfolder inside the bucket where the data lives. You can override the Root Path from the saved connection per datastore.

Datastore Properties

existing-datastore-properties

REF. FIELD REQUIRED DESCRIPTION
1 Name Yes A label for the datastore.
2 Teams Yes Select one or more teams to associate with this source datastore.
3 Initiate Sync No Automatically sync the datastore to detect containers and fields after creation.
4 Connection Info No Read-only banner that shows the IP address the Qualytics dataplane uses to reach your S3 bucket. Allowlist this IP in your AWS security group inbound rules so the dataplane can connect.