Skip to content

Types of Transformations

Cast

The Cast transformation converts field values to a specified target data type using Spark SQL casting rules. This is commonly used when a field is ingested as a string but must behave as a numeric, date, timestamp, or boolean value for profiling, checks, or downstream computations.

When to Use Cast

Use Cast when:

  • A numeric field is ingested as a string (for example, "1000" instead of 1000)
  • Dates or timestamps need explicit typing
  • Boolean logic is required on string values
  • Quality checks or aggregations require strict data types

Options for Cast

REF. FIELD WHAT IT MEANS
1 Name A clear, descriptive name for the new computed field you are creating.
2 Transformation Type Choose Cast to convert the data type of an existing field.
3 Field The existing column whose values you want to convert into another data type.
4 Target Type The data type you want the field to become (for example: number, date, timestamp, decimal, or boolean).
5 Format Optional, Used only when converting text into dates or timestamps, so the system knows how to read the date format correctly.
6 Additional Metadata Optional information you can attach to the field to provide extra context or documentation.

cast-options

Target Type Examples

Input Value Target Type Result
"1000" int 1000
"1234.56" decimal(10,2) 1234.56
"true" boolean true
"2023-12-31" date 2023-12-31

Format Examples (Date / Timestamp Casting)

The Format field is only required when casting string values into date or timestamp.

Common examples:

Format Pattern Example Input
MM/dd/yyyy 12/31/2023
dd/MM/yyyy 31/12/2023
yyyy-MM-dd HH:mm:ss 2023-12-31 14:30:00

Cleaned Entity Name

This transformation removes common business signifiers from entity names, making your data cleaner and more uniform.

Options for Cleaned Entity Name

Computed Field Configuration

No. Field Name Description
1 Name Defines a unique name for the computed field. This name is used to identify and reference the computed field.
2 Transformation Type Select Cleaned Entity Name to create a standardized and cleaned version of the selected source field value.
3 Field Specifies the source field whose values will be cleaned and transformed.
4 Term Settings Defines how specific terms are removed from the source field value. This includes:
Drop from Prefix – Removes specified terms from the beginning of the entity name.
Drop from Suffix – Removes specified terms from the end of the entity name.
Drop from Interior – Removes specified terms from anywhere within the entity name.
5 Additional Terms to Drop Allows you to define custom terms that should be removed from the field value.
6 Terms to Ignore Specifies terms that should be excluded from the cleaning process and retained in the final output.
7 Additional Metadata Enables you to add custom metadata to enhance the computed field definition, such as descriptions or key-value attributes.

cleaned-entity

Example for Cleaned Entity Name

Example Input Transformation Output
1 "TechCorp, Inc." Drop from Suffix: "Inc." "TechCorp"
2 "Global Services Ltd." Drop from Prefix: "Global" "Services Ltd."
3 "Central LTD & Finance Co." Drop from Interior: "LTD" "Central & Finance Co."
4 "Eat & Drink LLC" Additional Terms to Drop: "LLC", "&" "Eat Drink"
5 "ProNet Solutions Ltd." Terms to Ignore: "Ltd." "ProNet Solutions"

Convert Formatted Numeric

This transformation converts formatted numeric values into a plain numeric format, stripping out any characters like commas or parentheses that are not numerically significant.

Options for Converted Formatted Numeric

No. Field Name Description
1 Name Specifies the name of the computed field. This name is used to identify the computed field.
2 Transformation Type Select Convert Formatted Numeric to convert formatted numeric values into a standard numeric format.
3 Field Selects the source field on which the transformation is applied.
4 Additional Metadata Allows adding custom metadata to enhance the computed field definition (for example, descriptions or key-value attributes).

converted-formatted-numeric

Example for Convert Formatted Numeric

Example Input Transformation Output
1 "$1,234.56" Remove non-numeric characters: ",", "$" "1234.56"
2 "(2020)" Remove non-numeric characters: "(", ")" "-2020"
3 "100%" Remove non-numeric characters: "%" "100"

Custom Expression

Enables the creation of a field based on a custom computation using Spark SQL. This is useful for applying complex logic or transformations that are not covered by other types.

Options for Using Custom Expression

No. Field Name Description
1 Name Specifies the name of the computed field. This name is used to identify the computed field.
2 Transformation Type Select Custom Expression to define a computed field using a custom SQL expression.
3 SQL Expression Enter the SQL logic used to compute the field. Use Ctrl + Space to view available hints and functions.
4 Additional Metadata Allows adding custom metadata to enhance the computed field definition (for example, descriptions or key-value attributes).

custom-expression

Using Custom Expression:

You can combine multiple fields, apply conditional logic, or use any valid Spark SQL functions to derive your new computed field.

Example: To create a field that sums two existing fields, you could use the expression SUM(field1, field2).

Advanced Example: You need to ensure that a log of leases has no overlapping dates for an asset, but your data only captures a single lease's details like:

LeaseID AssetID Lease_Start Lease_End
1 42 1/1/2025 2/1/2026
2 43 1/1/2025 2/1/2026
3 42 1/1/2026 2/1/2026
4 43 2/2/2026 2/1/2027

You can see in this example that Lease 1 has overlapping dates with Lease 3 for the same Asset. This can be difficult to detect without a full transformation of the data. However, we can accomplish our goal easily with a Computed Field. We'll simply add a Computed Field to our table named "Next_Lease_Start" and define it with the following custom expression so that our table will now hold the new field and render it as shown below.

LEAD(Lease_Start, 1) OVER (PARTITION BY AssetID ORDER BY Lease_Start)

LeaseID AssetID Lease_Start Lease_End Next_Lease_Start
1 42 1/1/2025 2/1/2026 1/1/2026
2 43 1/1/2025 2/1/2026 2/2/2026
3 42 1/1/2026 2/1/2026
4 43 2/2/2026 2/1/2027

Now you can author a Quality Check stating that Lease_End should always be less than "Next_Lease_Start" to catch any errors of this type. In fact, Qualytics will automatically infer that check for you at Level 3 Inference!

More Examples for Custom Expression

Example Input Fields Custom Expression Output
1 field1 = 10, field2 = 20 SUM(field1, field2) 30
2 salary = 50000, bonus = 5000 salary + bonus 55000
3 hours = 8, rate = 15.50 hours * rate 124
4 status = 'active', score = 85 CASE WHEN status = 'active' THEN score ELSE 0 END 85