Skip to content

Google Cloud Storage

Adding and configuring a Google Cloud Storage connection within Qualytics empowers the platform to build a symbolic link with your file system to perform operations like data discovery, visualization, reporting, syncing, profiling, scanning, anomaly surveillance, and more.

This documentation provides a step-by-step guide on how to add Google Cloud Storage as both a source and enrichment datastore in Qualytics. It covers the entire process, from initial connection setup to testing and finalizing the configuration.

By following these instructions, enterprises can ensure their Google Cloud Storage environment is properly connected with Qualytics, unlocking the platform's potential to help you proactively manage your full data quality lifecycle.

Let’s get started πŸš€

Google Cloud Storage Setup Guide

This guide will walk you through the steps to set up Google Cloud Storage, including how to retrieve the necessary URIs, access keys, and secret keys, which are essential for integrating this datastore into Qualytics.

Retrieve the Google Cloud Storage URI

To retrieve the Cloud Storage URI, follow the given steps:

  1. Go to the Cloud Storage Console.
  2. Navigate to the location of the object (file) that holds the source data.
  3. At the top of the Cloud Storage console, locate and note down the path to the object.
  4. Create the URI using the following format:
gs://bucket/file
  • bucket is the name of the Cloud Storage bucket.

  • file is the name of the object (file) containing the data.

Retrieve the Access Key and Secret Key

You need these keys when integrating Google Cloud Storage with other applications or services, such as when adding it as a datastore in Qualytics. The keys allow you to reuse existing code to access Google Cloud Storage without needing to implement a different authentication mechanism.

To retrieve the access key and secret key in the Google Cloud Storage Console account, follow the given steps:

Step 1: Log in to the Google Cloud Console, navigate to the Google Cloud Storage settings, and this will redirect you to the Settings page.

navigate-to-gcs

Step 2: Click on the Interoperability tab.

interoperability-tab

Step 3: Scroll down the Interoperability page and under Access keys for your user account, click the CREATE A KEY button to generate a new Access Key and Secret Key.

create-a-key

Step 4: Use these generated Access Key and Secret Key values when adding your Google Cloud Storage account to SimpleBackups.

generate-keys

For example, once you generate the keys, they might look like this:

  • Access Key: GOOG1234ABCDEFGH5678

  • Secret Key: abcd1234efgh5678ijklmnopqrstuvwx

Warning

Make sure to store these keys securely, as they provide access to your Google Cloud Storage resources.

Datastore Google Cloud Storage Privileges

The permissions required depend on whether you are using Google Cloud Storage as a source or enrichment datastore. Qualytics accesses GCS using HMAC keys (Access Key / Secret Key) or a Service Account Key.

Minimum Permissions (Source Datastore)

The service account or HMAC key must have the following permissions:

Permission Purpose
storage.buckets.get Validate the bucket exists and retrieve its metadata
storage.objects.get Read file contents for profiling and scanning
storage.objects.list List files in the bucket to discover data assets

Tip

You can grant these permissions by assigning the Storage Object Viewer (roles/storage.objectViewer) role to the service account on the target bucket.

Additional Permissions for Enrichment Datastore

When using Google Cloud Storage as an enrichment datastore, the following additional permissions are required:

Permission Purpose
storage.objects.create Write enrichment result files
storage.objects.delete Remove temporary or outdated enrichment files

Tip

You can grant all required permissions (read + write) by assigning the Storage Object Admin (roles/storage.objectAdmin) role to the service account on the target bucket.

Example IAM Policy

Replace <SERVICE_ACCOUNT_EMAIL> and <BUCKET_NAME> with your actual values.

Source Datastore (Read-Only)

{
  "bindings": [
    {
      "role": "roles/storage.objectViewer",
      "members": [
        "serviceAccount:<SERVICE_ACCOUNT_EMAIL>"
      ]
    }
  ]
}

Enrichment Datastore (Read-Write)

{
  "bindings": [
    {
      "role": "roles/storage.objectAdmin",
      "members": [
        "serviceAccount:<SERVICE_ACCOUNT_EMAIL>"
      ]
    }
  ]
}

Tip

If you need both storage.buckets.get and object-level permissions but want to avoid a broader role, you can create a custom role with only the specific permissions listed in the Minimum Permissions section.

Assigning via gcloud CLI

# Source Datastore (Read-Only)
gsutil iam ch \
  serviceAccount:<SERVICE_ACCOUNT_EMAIL>:roles/storage.objectViewer \
  gs://<BUCKET_NAME>

# Enrichment Datastore (Read-Write)
gsutil iam ch \
  serviceAccount:<SERVICE_ACCOUNT_EMAIL>:roles/storage.objectAdmin \
  gs://<BUCKET_NAME>

Tip

You can also assign roles through the Google Cloud Console by navigating to the bucket, selecting Permissions, and clicking Grant Access.

GCS Roles Summary

Role Use Case Permissions Included
roles/storage.objectViewer Source Datastore storage.objects.get, storage.objects.list, storage.buckets.get
roles/storage.objectAdmin Enrichment Datastore storage.objects.get, storage.objects.list, storage.objects.create, storage.objects.delete, storage.buckets.get

Troubleshooting Common Errors

Error Likely Cause Fix
403 Forbidden The service account or HMAC key lacks the required permissions on the bucket Assign the appropriate role (Storage Object Viewer or Storage Object Admin) to the service account on the target bucket
404 Not Found: Bucket not found The bucket name in the URI is incorrect or the bucket does not exist Verify the bucket name and ensure the URI follows the format gs://bucket-name
Invalid credentials The Access Key / Secret Key pair is incorrect or the service account key file is malformed Regenerate the HMAC keys from Cloud Storage > Settings > Interoperability or re-download the service account key
The caller does not have storage.objects.list access The service account has object-level access but lacks bucket-level list permission Assign the Storage Object Viewer role at the bucket level (not just object level)
The caller does not have storage.objects.create access The enrichment service account lacks write permissions Upgrade the role assignment from Storage Object Viewer to Storage Object Admin

Detailed Troubleshooting Notes

Authentication Errors

The error Invalid credentials indicates that the HMAC keys or service account key are incorrect or malformed.

Common causes:

  • Incorrect Access Key / Secret Key β€” the HMAC key pair was copied incorrectly or has been deleted.
  • Malformed service account key β€” the JSON key file is corrupted, truncated, or belongs to a different project.
  • Service account disabled β€” the service account has been disabled in the Google Cloud Console.

Note

HMAC keys are tied to a specific service account. If the service account is deleted or disabled, the HMAC keys will stop working even if they have not been explicitly revoked.

Permission Errors

The error 403 Forbidden or The caller does not have storage.objects.list access means the credentials are valid but lack the required IAM permissions.

Common causes:

  • Missing IAM role β€” the service account does not have Storage Object Viewer (source) or Storage Object Admin (enrichment) assigned on the target bucket.
  • Role assigned at wrong level β€” the role is assigned at the project level but a bucket-level policy overrides it.
  • Uniform bucket-level access β€” if the bucket uses uniform bucket-level access (recommended), ensure IAM policies are set at the bucket level, not through ACLs.
  • Source vs. enrichment mismatch β€” the service account has Storage Object Viewer but the operation requires write access (enrichment).

Connection Errors

The error 404 Not Found: Bucket not found indicates a configuration issue with the bucket name or URI.

Common causes:

  • Bucket does not exist β€” the bucket name was misspelled or the bucket has been deleted.
  • Wrong project β€” the service account belongs to a different Google Cloud project than the bucket.
  • Invalid URI format β€” the URI must follow gs://bucket-name. Extra path segments or incorrect formatting will cause failures.

Tip

Start by confirming credentials are valid (authentication errors), then verify IAM role assignments (permission errors), and finally check the bucket name and URI format (connection errors).

Add a Source Datastore

A source datastore is a storage location used to connect and access data from external sources. Google Cloud Storage is an example of a source datastore, specifically a type of Distributed File System (DFS) datastore that is designed to handle data stored in distributed file systems. Configuring a DFS datastore enables the Qualytics platform to access and perform operations on the data, thereby generating valuable insights.

Step 1: Log in to your Qualytics account and click on the Add Source Datastore button located at the top-right corner of the interface.

add-datastore

Step 2: A modal window - Add Datastore will appear, providing you with the options to connect a datastore.

select-a-connector

REF. FIELDS ACTIONS
1. Name (Required) Specify the name of the datastore (e.g., The specified name will appear on the datastore cards)
2. Toggle Button Toggle ON to create a new source datastore from scratch, or toggle OFF to reuse credentials from an existing connection
3. Connector (Required) Select Google Cloud Storage from the dropdown list.

Option I: Create a Datastore with a new Connection

If the toggle for Add New connection is turned on, then this will prompt you to add and configure the source datastore from scratch without using existing connection details.

Step 1: Select the Google Cloud Storage connector from the dropdown list and add connection details such as Secrets Management, URI, service account key, root path, and teams.

add-datastore-credentials

Secrets Management: This is an optional connection property that allows you to securely store and manage credentials by integrating with HashiCorp Vault and other secret management systems. Toggle it ON to enable Vault integration for managing secrets.

Note

After configuring HashiCorp Vault integration, you can use ${key} in any Connection property to reference a key from the configured Vault secret. Each time the Connection is initiated, the corresponding secret value will be retrieved dynamically.

REF FIELDS ACTIONS
1. Login URL Enter the URL used to authenticate with HashiCorp Vault.
2. Credentials Payload Input a valid JSON containing credentials for Vault authentication.
3. Token JSONPath Specify the JSONPath to retrieve the client authentication token from the response (e.g., $.auth.client_token).
4. Secret URL Enter the URL where the secret is stored in Vault.
5. Token Header Name Set the header name used for the authentication token (e.g., X-Vault-Token).
6. Data JSONPath Specify the JSONPath to retrieve the secret data (e.g., $.data).

secret-management

Step 2: The configuration form will expand, requesting credential details before establishing the connection.

add-datastore-credentials-explain

REF. FIELDS ACTIONS
1. URI (Required) Enter the Uniform Resource Identifier (URI) of the Google Cloud Storage.
2. Service Account Key (Required) Upload a JSON file that contains the credentials required for accessing the Google Cloud Storage.
3. Root Path (Required) Specify the root path where the data is stored.
4. Teams (Required) Select one or more teams from the dropdown to associate with this source datastore.
5. Initiate Sync (Optional) Tick the checkbox to automatically perform sync operation on the configured source datastore to detect new, changed, or removed containers and fields.

Step 3: After adding the source datastore details, click on the Test Connection button to check and verify its connection.

test-datastore-connection

If the credentials and provided details are verified, a success message will be displayed indicating that the connection has been verified.

Option II: Use an Existing Connection

If the toggle for Add New connection is turned off, then this will prompt you to configure the source datastore using the existing connection details.

Step 1: Select a connection to reuse existing credentials.

use-existing-datastore

Note

If you are using existing credentials, you can only edit the details such as Root Path, Teams, and Initiate Sync.

Step 2: Click on the Test Connection button to check and verify the source data connection. If connection details are verified, a success message will be displayed.

test-connection-for-existing-datastore

Note

Clicking on the Finish button will create the source datastore and bypass the enrichment datastore configuration step.

Tip

It is recommended to click on the Next button, which will take you to the enrichment datastore configuration page.

Add Enrichment Datastore

Once you have successfully tested and verified your source datastore connection, you have the option to add the enrichment datastore (recommended). This datastore is used to store the analyzed results, including any anomalies and additional metadata files. This setup provides full visibility into your data quality, helping you manage and improve it effectively.

Step 1: Whether you have added a source datastore by creating a new datastore connection or using an existing connection, click on the Next button to start adding the Enrichment Datastore.

next-button-for-enrichment

Step 2: A modal window - Link Enrichment Datastore will appear, providing you with the options to configure an enrichment datastore.

select-enrichment-connector

REF. FIELDS ACTIONS
1. Prefix Add a prefix name to uniquely identify tables/files when Qualytics writes metadata from the source datastore to your enrichment datastore.
2. Caret Down Button Click the caret down to select either Use Enrichment Datastore or Add Enrichment Datastore.
3. Enrichment Datastore Select an enrichment datastore from the dropdown list.

Option I: Create an Enrichment Datastore with a new Connection

If the toggle for Add New connection is turned on, then this will prompt you to add and configure the enrichment datastore from scratch without using an existing enrichment datastore and its connection details.

Step 1: Click on the caret button and select Add Enrichment Datastore.

select-enrichment

A modal window - Link Enrichment Datastore will appear. Enter the following details to create an enrichment datastore with a new connection.

enrichment-detail

REF. FIELDS ACTIONS
1. Prefix Add a prefix name to uniquely identify tables/files when Qualytics writes metadata from the source datastore to your enrichment datastore.
2. Name Give a name for the enrichment datastore.
3. Toggle Button for Add New Connection Toggle ON to create a new enrichment datastore from scratch or toggle OFF to reuse credentials from an existing connection.
4. Connector Select a datastore connector from the dropdown list.

Step 2: Add connection details for your selected enrichment datastore connector.

enrichment-datastore-explain

REF. FIELDS ACTIONS
1. URI (Required) Enter the Uniform Resource Identifier (URI) for the Google Cloud Storage.
2. Service Account Key (Required) Upload a JSON file that contains the credentials required for accessing the Google Cloud Storage.
3. Root Path (Required) Specify the root path where the data is stored.
4. Teams (Required) Select one or more teams from the dropdown to associate with this source datastore.

Step 3: Click on the Test Connection button to verify the selected enrichment datastore connection. If the connection is verified, a flash message will indicate that the connection with the datastore has been successfully verified.

test-connection-for-enrichment-datastore

Step 4: Click on the Finish button to complete the configuration process.

finish-configuration

When the configuration process is finished, a modal will display a success message indicating that your datastore has been successfully added.

success-message

Step 5: Close the Success dialog and the page will automatically redirect you to the Source Datastore Details page where you can perform data operations on your configured source datastore.

data-operation-page

Option II: Use an Existing Connection

If the toggle for Use an existing enrichment datastore is turned on, you will be prompted to configure the enrichment datastore using existing connection details.

Step 1: Click on the caret button and select Use Enrichment Datastore.

select-enrichment-details

Step 2: A modal window - Link Enrichment Datastore will appear. Add a prefix name and select an existing enrichment datastore from the dropdown list.

add-enrichment-details

REF. FIELDS ACTIONS
1. Prefix Add a prefix name to uniquely identify tables/files when Qualytics writes metadata from the source datastore to your enrichment datastore.
2. Enrichment Datastore Select an enrichment datastore from the dropdown list.

Step 3: After selecting an existing enrichment datastore connection, you will view the following details related to the selected enrichment:

  • Teams: The team associated with managing the enrichment datastore is based on the role of public or private. Example: Marked as Public means that this datastore is accessible to all the users.

  • URI: Uniform Resource Identifier (URI) points to the specific location of the source data and should be formatted accordingly (e.g., gs://bucket/file for Google Cloud Storage).

  • Root Path: Specify the root path where the data is stored. This path defines the base directory or folder from which all data operations will be performed.

use-existing-enrichment-datastore

Step 4: Click on the Finish button to complete the configuration process for the existing enrichment datastore.

finish-configuration-for-existing-enrichment-datastore

When the configuration process is finished, a modal will display a success message indicating that your datastore has been successfully added.

success-message

Close the success message and you will be automatically redirected to the Source Datastore Details page where you can perform data operations on your configured source datastore.

data-operation-page

API Payload Examples

This section provides detailed examples of API payloads to guide you through the process of creating and managing datastores using Qualytics API. Each example includes endpoint details, sample payloads, and instructions on how to replace placeholder values with actual data relevant to your setup.

Creating a Source Datastore

This section provides sample payloads for creating the Google Cloud Storage datastore. Replace the placeholder values with actual data relevant to your setup.

Endpoint: /api/datastores (post)

        {
        "name": "your_datastore_name",
        "teams": ["Public"],
        "trigger_catalog": true,
        "root_path": "/gcs_root_path",
        "enrich_only": false,
        "connection": {
            "name": "your_connection_name",
            "type": "gcs",
            "uri": "gs://<bucket_name>",
            "secret_key": "gcs_service_account_key"
        }
    }
   {
        "name": "your_datastore_name",
        "teams": ["Public"],
        "trigger_catalog": true,
        "root_path": "/gcs_root_path",
        "enrich_only": false,
        "connection_id": connection-id
    }
# Step 1: Create a Connection
qualytics connections create \
    --type gcs \
    --name "your_connection_name" \
    --uri "gs://<bucket_name>" \
    --secret-key ${GCS_SERVICE_ACCOUNT_KEY}

# Step 2: Create a Source Datastore
qualytics datastores create \
    --name "your_datastore_name" \
    --connection-name "your_connection_name" \
    --database . \
    --schema /

Creating an Enrichment Datastore

This section provides sample payloads for creating an enrichment datastore. Replace the placeholder values with actual data relevant to your setup.

Endpoint: /api/datastores (post)

    {
        "name": "your_datastore_name",
        "teams": ["Public"],
        "trigger_catalog": true,
        "root_path": "/gcs_root_path",
        "enrich_only": true,
        "connection": {
            "name": "your_connection_name",
            "type": "gcs",
            "uri": "gs://<bucket_name>",
            "secret_key": "gcs_service_account_key"
        }
    }
    {
        "name": "your_datastore_name",
        "teams": ["Public"],
        "trigger_catalog": true,
        "root_path": "/gcs_root_path",
        "enrich_only": true,
        "connection_id": connection-id
    }
# Step 1: Create a Connection
qualytics connections create \
    --type gcs \
    --name "your_connection_name" \
    --uri "gs://<bucket_name>" \
    --secret-key ${GCS_SERVICE_ACCOUNT_KEY}

# Step 2: Create an Enrichment Datastore
qualytics datastores create \
    --name "your_datastore_name" \
    --connection-name "your_connection_name" \
    --database . \
    --schema /your_enrichment_path \
    --enrichment-only

Use the provided endpoint to link an enrichment datastore to a source datastore:

Endpoint Details: /api/datastores/{datastore-id}/enrichment/{enrichment-id} (patch)