Getting Started with Datastores
What is a Datastore?
A datastore is a general term for any system or service that persists data — whether it's a relational database, a distributed file system, a data warehouse, or a cloud object storage bucket. The concept has evolved alongside the data landscape: from traditional on-premise databases in the 1970s–90s, through the rise of data warehouses and data lakes in the 2000s, to today's modern cloud-native platforms that span multiple engines and formats.
Learn more
For a broader understanding of datastores and how they fit into the modern data stack, see What is a Data Store? by AWS.
Datastores in Qualytics
In Qualytics, a Source Datastore is a unified abstraction that represents any structured data source — regardless of the underlying storage engine. Rather than building separate integrations for each platform, Qualytics uses Apache Spark to read and write Fields and Records across all supported connectors, providing a consistent data quality experience whether your data lives in a relational database, a distributed file system, or a cloud storage service.
This abstraction enables organizations to:
- Connect with a wide range of source datastores through verified connectors.
- Support both traditional databases and modern object storage.
- Profile and monitor structured data across systems.
- Secure fast and reliable access to data.
- Scale data quality operations across platforms.
- Manage data quality centrally across all sources.
These source datastore integrations ensure comprehensive quality management across your entire data landscape, regardless of where your data resides.
Datastore Lifecycle
Every datastore in Qualytics follows a structured lifecycle — from initial creation through ongoing data quality operations.
graph TD
A["<b>Create</b><br/>Add datastore & test connection"] --> B["<b>Link Enrichment</b><br/>Connect enrichment datastore"]
B --> C["<b>Sync</b><br/>Discover tables, files & fields"]
C --> D["<b>Profile</b><br/>Analyze data & infer checks"]
D --> E["<b>Scan</b><br/>Run checks & detect anomalies"]
E -->|"Repeat"| C
| Stage | Description |
|---|---|
| Create | Add a new source datastore by selecting a connector, providing connection credentials, and testing the connection. |
| Link Enrichment | Optionally link an enrichment datastore to persist scan results, anomalies, and remediation data. |
| Sync | Discover the schema — tables, files, views, and fields — from your source datastore. This is the first operation after creation. |
| Profile | Analyze field patterns across records, detect data types, compute statistics, and automatically infer quality checks. |
| Scan | Execute quality checks against the data, measure data quality metrics, and detect anomalies from failed checks. |
Tip
The Sync → Profile → Scan cycle is repeatable. As your data evolves, re-running these operations keeps your quality checks and anomaly detection up to date.
Architecture
The Qualytics Datastore framework is organized in four layers:
| Layer | Description |
|---|---|
| Qualytics I/O | Spark-enabled reads and writes of Fields and Records — the common interface for all datastores. |
| Namespace | How data is organized within the storage engine: Tables in a Schema (JDBC), Files in a Folder (DFS), or Topics in a Broker (Streaming). |
| Schema | The data definition layer: RDBMS DDL for relational databases, AVRO/Parquet/CSV/JSON for file systems, or AVRO/JSON/Protobuf for streaming platforms. |
| Storage Engine | The actual platform where data resides (see supported engines below). |
Supported Storage Engines
Supported Storage Engines
For the full list of supported JDBC and DFS connectors, see the Available Datastore Connectors page.
How It All Connects
graph LR
IO["<b>Qualytics I/O</b><br/>Spark-enabled reads and writes<br/>of Fields and Records"]
IO --> NS1["<b>Tables in a Schema</b>"]
IO --> NS2["<b>Files in a Folder</b>"]
NS1 --> SC1["<b>RDBMS DDL</b>"]
NS2 --> SC2["<b>AVRO, Parquet, CSV, JSON</b>"]
SC1 --> SE1["Oracle DB"]
SC1 --> SE2["PostgreSQL"]
SC1 --> SE3["MySQL"]
SC1 --> SE4["Snowflake"]
SC1 --> SE5["Redshift"]
SC1 --> SE6["...and more"]
SC2 --> SE7["Amazon S3"]
SC2 --> SE8["Azure Storage"]
SC2 --> SE9["GCS"]
The key insight is that Qualytics treats all datastores uniformly at the I/O layer. Whether you connect a PostgreSQL database or an S3 bucket, Qualytics reads and writes Fields and Records the same way — enabling consistent profiling, scanning, and anomaly detection across your entire data landscape.
Available Connectors
Qualytics supports 19 JDBC relational database connectors and 3 DFS cloud storage connectors out of the box.
-
All Connectors
See the complete list of supported connectors with links to their individual setup guides.
Deep Dive
Explore the details of each datastore type — how JDBC and DFS connectors work, their configuration, and supported features.
-
JDBC
Learn how to connect relational databases using JDBC connectors like PostgreSQL, Snowflake, Oracle, and more.
-
DFS
Learn how to connect distributed file systems like Amazon S3, Azure Data Lake Storage, and Google Cloud Storage.
Connection
Learn how to set up and manage connections to your datastores — create new connections from scratch or reuse existing credentials.
-
Connections Overview
Set up new connections or reuse existing credentials to connect your datastores.
Multiple-Schema
Discover and onboard multiple schemas from a single connection at once — including schema discovery, name templates, supported connectors, and API reference.
-
Introduction
Discover and onboard multiple schemas from a single connection in one step.
-
How It Works
Understand the multi-schema creation flow, schema discovery, and name templates.
-
Supported Connectors
See which connectors support multi-schema discovery and their catalog/schema mappings.
-
Permissions
Understand the roles and permissions required for multi-schema creation.
-
API
API endpoints for bulk datastore creation, schema discovery, and validation.
-
FAQ
Answers to common questions about multi-schema source datastore creation.
Managing
Add, edit, and delete datastores — whether creating from scratch with a new connection or reusing an existing one.
-
Add Datastore with new connection
Create a new source datastore by setting up a new connection from scratch.
-
Add Datastore with existing connection
Create a new source datastore by reusing credentials from an existing connection.
-
Edit Datastore
Modify the settings and connection details of an existing datastore.
-
Delete Datastore
Remove a datastore and its associated configuration from Qualytics.