Skip to content

Getting Started with Datastores

What is a Datastore?

A datastore is a general term for any system or service that persists data — whether it's a relational database, a distributed file system, a data warehouse, or a cloud object storage bucket. The concept has evolved alongside the data landscape: from traditional on-premise databases in the 1970s–90s, through the rise of data warehouses and data lakes in the 2000s, to today's modern cloud-native platforms that span multiple engines and formats.

Learn more

For a broader understanding of datastores and how they fit into the modern data stack, see What is a Data Store? by AWS.

Datastores in Qualytics

In Qualytics, a Source Datastore is a unified abstraction that represents any structured data source — regardless of the underlying storage engine. Rather than building separate integrations for each platform, Qualytics uses Apache Spark to read and write Fields and Records across all supported connectors, providing a consistent data quality experience whether your data lives in a relational database, a distributed file system, or a cloud storage service.

This abstraction enables organizations to:

  • Connect with a wide range of source datastores through verified connectors.
  • Support both traditional databases and modern object storage.
  • Profile and monitor structured data across systems.
  • Secure fast and reliable access to data.
  • Scale data quality operations across platforms.
  • Manage data quality centrally across all sources.

These source datastore integrations ensure comprehensive quality management across your entire data landscape, regardless of where your data resides.

Datastore Lifecycle

Every datastore in Qualytics follows a structured lifecycle — from initial creation through ongoing data quality operations.

graph TD
    A["<b>Create</b><br/>Add datastore & test connection"] --> B["<b>Link Enrichment</b><br/>Connect enrichment datastore"]
    B --> C["<b>Sync</b><br/>Discover tables, files & fields"]
    C --> D["<b>Profile</b><br/>Analyze data & infer checks"]
    D --> E["<b>Scan</b><br/>Run checks & detect anomalies"]
    E -->|"Repeat"| C
Stage Description
Create Add a new source datastore by selecting a connector, providing connection credentials, and testing the connection.
Link Enrichment Optionally link an enrichment datastore to persist scan results, anomalies, and remediation data.
Sync Discover the schema — tables, files, views, and fields — from your source datastore. This is the first operation after creation.
Profile Analyze field patterns across records, detect data types, compute statistics, and automatically infer quality checks.
Scan Execute quality checks against the data, measure data quality metrics, and detect anomalies from failed checks.

Tip

The Sync → Profile → Scan cycle is repeatable. As your data evolves, re-running these operations keeps your quality checks and anomaly detection up to date.

Architecture

The Qualytics Datastore framework is organized in four layers:

Layer Description
Qualytics I/O Spark-enabled reads and writes of Fields and Records — the common interface for all datastores.
Namespace How data is organized within the storage engine: Tables in a Schema (JDBC), Files in a Folder (DFS), or Topics in a Broker (Streaming).
Schema The data definition layer: RDBMS DDL for relational databases, AVRO/Parquet/CSV/JSON for file systems, or AVRO/JSON/Protobuf for streaming platforms.
Storage Engine The actual platform where data resides (see supported engines below).

Supported Storage Engines

Supported Storage Engines

For the full list of supported JDBC and DFS connectors, see the Available Datastore Connectors page.

How It All Connects

graph LR
    IO["<b>Qualytics I/O</b><br/>Spark-enabled reads and writes<br/>of Fields and Records"]

    IO --> NS1["<b>Tables in a Schema</b>"]
    IO --> NS2["<b>Files in a Folder</b>"]

    NS1 --> SC1["<b>RDBMS DDL</b>"]
    NS2 --> SC2["<b>AVRO, Parquet, CSV, JSON</b>"]

    SC1 --> SE1["Oracle DB"]
    SC1 --> SE2["PostgreSQL"]
    SC1 --> SE3["MySQL"]
    SC1 --> SE4["Snowflake"]
    SC1 --> SE5["Redshift"]
    SC1 --> SE6["...and more"]

    SC2 --> SE7["Amazon S3"]
    SC2 --> SE8["Azure Storage"]
    SC2 --> SE9["GCS"]

The key insight is that Qualytics treats all datastores uniformly at the I/O layer. Whether you connect a PostgreSQL database or an S3 bucket, Qualytics reads and writes Fields and Records the same way — enabling consistent profiling, scanning, and anomaly detection across your entire data landscape.

Available Connectors

Qualytics supports 19 JDBC relational database connectors and 3 DFS cloud storage connectors out of the box.

  • All Connectors


    See the complete list of supported connectors with links to their individual setup guides.

    View Connectors

Deep Dive

Explore the details of each datastore type — how JDBC and DFS connectors work, their configuration, and supported features.

  • JDBC


    Learn how to connect relational databases using JDBC connectors like PostgreSQL, Snowflake, Oracle, and more.

    Understanding JDBC

  • DFS


    Learn how to connect distributed file systems like Amazon S3, Azure Data Lake Storage, and Google Cloud Storage.

    Understanding DFS

Connection

Learn how to set up and manage connections to your datastores — create new connections from scratch or reuse existing credentials.

  • Connections Overview


    Set up new connections or reuse existing credentials to connect your datastores.

    Connections Overview

Multiple-Schema

Discover and onboard multiple schemas from a single connection at once — including schema discovery, name templates, supported connectors, and API reference.

  • Introduction


    Discover and onboard multiple schemas from a single connection in one step.

    Getting Started

  • How It Works


    Understand the multi-schema creation flow, schema discovery, and name templates.

    How It Works

  • Supported Connectors


    See which connectors support multi-schema discovery and their catalog/schema mappings.

    Supported Connectors

  • Permissions


    Understand the roles and permissions required for multi-schema creation.

    Permissions

  • API


    API endpoints for bulk datastore creation, schema discovery, and validation.

    API

  • FAQ


    Answers to common questions about multi-schema source datastore creation.

    FAQ

Managing

Add, edit, and delete datastores — whether creating from scratch with a new connection or reusing an existing one.

  • Add Datastore with new connection


    Create a new source datastore by setting up a new connection from scratch.

    New Connection

  • Add Datastore with existing connection


    Create a new source datastore by reusing credentials from an existing connection.

    Existing Connection

  • Edit Datastore


    Modify the settings and connection details of an existing datastore.

    Edit Datastore

  • Delete Datastore


    Remove a datastore and its associated configuration from Qualytics.

    Delete Datastore