Understanding JDBC
What is JDBC?
JDBC (Java Database Connectivity) is a standard Java API that provides a universal interface for connecting to relational databases. It acts as a bridge between applications and databases, allowing programs to execute SQL queries, retrieve results, and manage data — regardless of which database vendor is being used.
How JDBC Works in Qualytics
Qualytics uses JDBC connectors powered by Apache Spark to connect to relational databases. When you add a JDBC datastore, Qualytics:
- Establishes a connection to your database using the credentials and connection properties you provide (host, port, username, password, database, schema).
- Discovers your schema by reading the database catalog — tables, views, columns, and data types are automatically detected during the Sync operation.
- Reads data through Spark using optimized JDBC queries, enabling parallel reads across partitions for large datasets.
- Writes enrichment data (for connectors with enrichment support) back to the database to persist scan results, anomalies, and remediation records.
Connections and Security
For details on connection configuration, authentication methods (Basic, Keypair, Service Principal, OAuth), and secrets management (HashiCorp Vault integration), see the How Connections Work documentation.
Data Organization
In JDBC datastores, data is organized as:
- Containers — Tables and views in the database. Each container represents a structured dataset that Qualytics can profile, scan, and monitor.
- Fields — Columns within each table/view. Qualytics detects field names, data types, and constraints automatically.
- Records — Rows of data within each container, analyzed during Profile and Scan operations.
Containers
For a detailed understanding of how Qualytics manages containers in JDBC datastores, see the Containers Overview documentation.
Field Type Inference
During the Sync operation, Qualytics uses weighted histogram analysis to infer field types automatically. This detects data types such as integers, decimals, dates, timestamps, and text fields based on actual data distribution — not just the declared database column type. Inferred types can be reviewed and overridden manually on each field.
Multi-Schema Support
JDBC datastores support multi-schema creation, allowing you to discover and select multiple schemas from a single connection and create a separate source datastore for each selected schema in one step. This eliminates the need to add each schema individually.
Multiple-Schema
For detailed instructions on multi-schema creation, refer to the Multiple-Schema documentation.
Getting Started
-
Add with New Connection
Create a new JDBC datastore by setting up a new connection from scratch.
-
Add with Existing Connection
Create a new JDBC datastore by reusing credentials from an existing connection.
-
Available JDBC Connectors
Browse the full list of supported JDBC connectors and multi-schema support.
-
Connections
Configure connection details, authentication, and secrets management.
Available Operations
Once a JDBC datastore is added, you can run the following operations to manage and ensure data quality:
| Operation | Description |
|---|---|
| Sync | Discovers tables, views, and fields from your database. Detects new, changed, or removed containers incrementally. This is always the first operation after adding a datastore. |
| Profile | Analyzes records across containers to compute statistics, detect data patterns, and automatically infer quality checks using the Qualytics Inference Engine. |
| Scan | Executes quality checks against the data, measures data quality metrics, and detects anomalies at the record and schema levels. |
| External Scan | Runs scan operations using externally provided data files instead of reading directly from the database. |
Tip
The recommended sequence is Sync → Profile → Scan. This cycle is repeatable — as your data evolves, re-running these operations keeps your quality checks and anomaly detection up to date.