IAM Role authentication

Qualytics supports IAM Role authentication for Amazon S3, Athena, and Redshift. Instead of storing static AWS credentials in Qualytics, the dataplane calls sts:AssumeRole to obtain temporary credentials each time it connects — eliminating the need to share or rotate long-lived access keys.

AWS only

IAM Role authentication is an AWS-only feature. It is not available on Azure or GCP deployments because it relies on AWS STS and (for Self-Hosted) the EKS-managed OIDC issuer used by IRSA.

Two roles, two places

IAM Role authentication involves two AWS roles. Knowing which goes where avoids most setup errors:

Target role — the role with the actual S3, Athena, or Redshift permissions. You enter its ARN in the Qualytics connection form (UI / API).
Base role — the AWS identity the dataplane uses to call sts:AssumeRole against the target role. This lives in infrastructure, not in the Qualytics UI. Who manages it depends on your deployment type.

The target role's trust policy lists the base role as Principal.

Pick your deployment

The setup flow differs based on who manages the dataplane.

Deployment	Who manages the dataplane	Base role provisioning	Where to start
Managed	Qualytics — your dataplane runs in the Qualytics AWS account (`315179787488`) on a dedicated, isolated EKS cluster.	Automatic — Qualytics creates a per-tenant IAM role and wires IRSA for you. You receive the ARN from your account manager.	Managed Deployments
Self-Hosted	You — the dataplane runs in your own AWS account on a Kubernetes cluster you operate.	Manual — you create the base role and (on EKS) wire IRSA on the Spark ServiceAccount yourself.	Self-Hosted Deployments

Managed Deployments

Qualytics provisions one dedicated EKS cluster per tenant in its own AWS account (315179787488). When your tenant is created, Qualytics also creates an IAM role of the form <tenant>-dataplane-role and binds it to your Spark ServiceAccount via IRSA. You do not need to configure any infrastructure — just create the target role in your AWS account and trust the Qualytics-provided dataplane role.

Step 1 — Get your dataplane role ARN

Contact your Qualytics account manager. They will give you:

Item	Format
Dataplane role ARN	`arn:aws:iam::315179787488:role/<tenant>-dataplane-role`

You will use this ARN as the Principal in your target role's trust policy (Step 3) and nowhere else.

Step 2 — Create the target IAM role

In your own AWS account, create an IAM role with permissions for the connector you're configuring. This role does not need any STS or AssumeRole permissions of its own — it only needs to read or write the data being connected to.

S3 source (read-only)S3 enrichment (read-write)AthenaRedshift

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:GetObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::<YOUR_BUCKET>",
                "arn:aws:s3:::<YOUR_BUCKET>/*"
            ]
        }
    ]
}

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::<YOUR_BUCKET>",
                "arn:aws:s3:::<YOUR_BUCKET>/*"
            ]
        }
    ]
}

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AthenaQueryAccess",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:StopQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:ListDatabases",
                "athena:ListTableMetadata",
                "athena:GetTableMetadata",
                "athena:GetWorkGroup"
            ],
            "Resource": "*"
        },
        {
            "Sid": "GlueCatalogReadOnly",
            "Effect": "Allow",
            "Action": [
                "glue:GetCatalog",
                "glue:GetCatalogs",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetTable",
                "glue:GetTables",
                "glue:GetPartition",
                "glue:GetPartitions",
                "glue:BatchGetPartition"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AthenaQueryResultsBucket",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListBucketMultipartUploads",
                "s3:ListMultipartUploadParts",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<YOUR_RESULTS_BUCKET>",
                "arn:aws:s3:::<YOUR_RESULTS_BUCKET>/*"
            ]
        }
    ]
}

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "redshift:GetClusterCredentials",
                "redshift:DescribeClusters"
            ],
            "Resource": [
                "arn:aws:redshift:<REGION>:<YOUR_ACCOUNT_ID>:cluster:<CLUSTER_IDENTIFIER>",
                "arn:aws:redshift:<REGION>:<YOUR_ACCOUNT_ID>:dbuser:<CLUSTER_IDENTIFIER>/<DB_USER>",
                "arn:aws:redshift:<REGION>:<YOUR_ACCOUNT_ID>:dbname:<CLUSTER_IDENTIFIER>/<DB_NAME>"
            ]
        }
    ]
}

For Redshift Serverless, replace the actions with redshift-serverless:GetCredentials and scope the resource to your workgroup ARN (arn:aws:redshift-serverless:<region>:<account>:workgroup/<workgroup-id>).

If your S3 bucket (source, enrichment, or Athena query results) uses SSE-KMS, the role also needs kms:Decrypt for read paths and kms:GenerateDataKey for write paths, scoped to the KMS key ARN.

Step 3 — Attach the trust policy to the target role

Attach this trust policy to the role you created in Step 2. The Principal is the Qualytics dataplane role ARN from Step 1. You choose the External ID (AWS requires 2–1224 characters from [A-Za-z0-9=,.@:/-]).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::315179787488:role/<tenant>-dataplane-role"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<YOUR_EXTERNAL_ID>"
                }
            }
        }
    ]
}

The External ID is customer-supplied — you choose it and use the same value in both the trust policy and the Qualytics connection form. Qualytics does not generate it.

In the AWS console, Steps 2 and 3 are entered together in the role-creation wizard (trusted entity first, then permissions).

Step 4 — Configure the connection in Qualytics

When adding or editing a connection of type Amazon S3, Athena, or Redshift:

Set Authentication Type to IAM Role.
Enter the Role ARN (the target role from Step 2) and External ID.
Leave the static credential fields empty — the dataplane uses its IRSA identity automatically.
Click Test Connection.

Athena and Redshift hostnames

The connection's Host must be the canonical AWS endpoint (for example, athena.<region>.amazonaws.com or <workgroup>.<account>.<region>.redshift-serverless.amazonaws.com). VPC endpoint URLs (vpce-…) and custom Route 53 / private DNS aliases are not supported on the IAM Role path. Use the canonical endpoint and rely on VPC routing to keep traffic private.

Self-Hosted Deployments

Self-hosted customers operate their own Kubernetes cluster, so they own the base identity that the dataplane uses to call sts:AssumeRole. The recommended path is AWS EKS with IRSA (no static credentials anywhere); a fallback path using static base keys is available for non-EKS clusters.

Path	Cluster	Static keys stored in Qualytics?
IRSA (keyless)	AWS EKS only	No
Static base keys	Any Kubernetes (vanilla, GKE/AKS, on-prem)	Yes — base credentials only, never used to access data directly

Step 1 — Set up the dataplane base identity

Choose one of the two paths below, then continue with Steps 2–4.

Path A — EKS with IRSA (recommended, keyless)

This path uses IAM Roles for Service Accounts (IRSA) so the Spark pod has an ambient AWS identity.

Prerequisites

An EKS cluster running the Qualytics Helm chart with the Spark Operator enabled.

The cluster's OIDC provider is associated with IAM. Check with:

aws eks describe-cluster --name <CLUSTER_NAME> \
    --query "cluster.identity.oidc.issuer" --output text

If no IAM OIDC provider is registered for that issuer URL, create one:

eksctl utils associate-iam-oidc-provider \
    --cluster <CLUSTER_NAME> --approve

helm and kubectl access to the cluster running Qualytics.

A.1 — Create the base IAM role

Create a base role in your AWS account with a permissions policy that only allows sts:AssumeRole. The base role does not access S3, Athena, or Redshift directly — it only assumes into target roles.

Permissions policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "*"
        }
    ]
}

You may scope Resource to specific target role ARNs once you know them — "*" is shown for first-time setup.

Trust policy (allows the Spark ServiceAccount to assume the base role via IRSA):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::<YOUR_ACCOUNT_ID>:oidc-provider/oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>:sub": "system:serviceaccount:qualytics:qualytics-spark",
                    "oidc.eks.<REGION>.amazonaws.com/id/<OIDC_ID>:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}

<OIDC_ID> is the trailing path segment of the issuer URL returned by aws eks describe-cluster above. The sub condition pins the role to the qualytics-spark ServiceAccount in the qualytics namespace — these are the chart defaults; only change them if you've overridden sparkoperator.spark.serviceAccount.name or the namespace in your Helm values.

A.2 — Annotate the Spark ServiceAccount

Add the IRSA annotation via the Qualytics Helm chart at value path sparkoperator.spark.serviceAccount.annotations:

sparkoperator:
  spark:
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<BASE_ROLE_NAME>"

Apply with helm upgrade:

helm upgrade qualytics qualytics/qualytics \
    --namespace qualytics --reuse-values \
    --set sparkoperator.spark.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<BASE_ROLE_NAME>"

Verify the annotation landed:

kubectl get sa qualytics-spark -n qualytics -o yaml | grep eks.amazonaws.com

Existing Spark pods do not pick up new annotations — restart any running Spark drivers (or wait for the next operation) so they receive the IRSA-projected token.

The base role ARN you just created is what you'll use as Principal in Step 3.

Path B — Non-EKS with static base keys

Use this path if you are not on EKS or cannot use IRSA. The keys stored in Qualytics are used only to call sts:AssumeRole — they never read or write data directly.

B.1 — Create an IAM user

Create an IAM user with a single permissions policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Resource": "*"
        }
    ]
}

Generate an Access Key ID and Secret Access Key for this user. The user needs no S3, Athena, or Redshift permissions — only sts:AssumeRole.

The IAM user ARN is what you'll use as Principal in Step 3, and the keys are what you'll enter in Step 4.

Step 2 — Create the target IAM role

The target role is identical to the managed flow. Create the role with one of the connector-specific permission sets shown in Managed Deployments → Step 2.

Step 3 — Attach the trust policy to the target role

Attach this trust policy to the target role. The Principal is the base identity you set up in Step 1:

Path A (IRSA) — the base role ARN from Step A.1.
Path B (static keys) — the IAM user ARN from Step B.1.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<BASE_ROLE_NAME>"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "<YOUR_EXTERNAL_ID>"
                }
            }
        }
    ]
}

For Path B, replace role/<BASE_ROLE_NAME> with user/<IAM_USERNAME>.

The External ID is customer-supplied — you choose it and use the same value in both the trust policy and the Qualytics connection form. Qualytics does not generate it.

Repeat this trust-policy update for every target role you connect to (one per S3 bucket, Athena workgroup, or Redshift cluster).

Step 4 — Configure the connection in Qualytics

When adding or editing a connection of type Amazon S3, Athena, or Redshift:

Set Authentication Type to IAM Role.
Enter the Role ARN (the target role from Step 2) and External ID.
Static credential fields:
- Path A (IRSA) — leave empty. The dataplane uses its IRSA identity automatically.
- Path B (static keys) — enter the IAM user's Access Key ID and Secret Access Key from Step B.1.
Click Test Connection.

Athena and Redshift hostnames

The connection's Host must be the canonical AWS endpoint (for example, athena.<region>.amazonaws.com or <workgroup>.<account>.<region>.redshift-serverless.amazonaws.com). VPC endpoint URLs (vpce-…) and custom Route 53 / private DNS aliases are not supported on the IAM Role path. Use the canonical endpoint and rely on VPC routing to keep traffic private.

Troubleshooting

The two most common failures:

AccessDenied on sts:AssumeRole — either the trust policy doesn't list the dataplane identity as Principal, or the External ID in Qualytics doesn't byte-match the sts:ExternalId condition. Watch for whitespace and copy-paste artifacts.
KMS.AccessDeniedException — the role can read S3 objects but can't decrypt the bucket's SSE-KMS key. Add kms:Decrypt (and kms:GenerateDataKey for write paths) for the bucket's KMS key.

To isolate trust-policy issues from Qualytics-side issues, run the same call locally:

aws sts assume-role \
    --role-arn "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/<YOUR_ROLE_NAME>" \
    --role-session-name "qualytics-test" \
    --external-id "<YOUR_EXTERNAL_ID>"

If this succeeds for an identity equivalent to the dataplane but fails inside Qualytics, the issue is on the Qualytics side. AWS CloudTrail logs every sts:AssumeRole attempt and is the definitive source for the denial reason.

Self-hosted IRSA-specific checks

If you're on Path A (EKS + IRSA) and seeing AccessDenied on sts:AssumeRole or Not authorized to perform sts:AssumeRoleWithWebIdentity:

Confirm the annotation reached the ServiceAccount:

kubectl get sa qualytics-spark -n qualytics -o yaml | grep eks.amazonaws.com

Confirm the OIDC provider exists in IAM:
```
aws iam list-open-id-connect-providers
```
The output should include the issuer URL returned by aws eks describe-cluster.
Confirm the base role's trust policy sub matches the actual ServiceAccount path:

Format must be system:serviceaccount:<namespace>:<serviceaccount-name> — for the chart defaults that's system:serviceaccount:qualytics:qualytics-spark.

Restart Spark drivers so they receive the projected token:

kubectl delete pod -n qualytics -l spark-role=driver