Skip to content

Types of Transformations

Cleaned Entity Name

This transformation removes common business signifiers from entity names, making your data cleaner and more uniform.

Options for Cleaned Entity Name

REF. FIELDS ACTIONS
1 Drop from Suffix Add a unique name for your computed field.
2 Drop from Prefix Removes specified terms from the beginning of the entity name.
3 Drop from Interior Removes specified terms from the beginning of the entity name.
4 Additional Terms to Drop (Custom) Allows you to specify additional terms that should be dropped from the entity name.
5 Terms to Ignore (Custom) Designate terms that should be ignored during the cleaning process.

Example for Cleaned Entity Name

Example Input Transformation Output
1 "TechCorp, Inc." Drop from Suffix: "Inc." "TechCorp"
2 "Global Services Ltd." Drop from Prefix: "Global" "Services Ltd."
3 "Central LTD & Finance Co." Drop from Interior: "LTD" "Central & Finance Co."
4 "Eat & Drink LLC" Additional Terms to Drop: "LLC", "&" "Eat Drink"
5 "ProNet Solutions Ltd." Terms to Ignore: "Ltd." "ProNet Solutions"

Convert Formatted Numeric

This transformation converts formatted numeric values into a plain numeric format, stripping out any characters like commas or parentheses that are not numerically significant.

Example for Convert Formatted Numeric

Example Input Transformation Output
1 "$1,234.56" Remove non-numeric characters: ",", "$" "1234.56"
2 "(2020)" Remove non-numeric characters: "(", ")" "-2020"
3 "100%" Remove non-numeric characters: "%" "100"

Custom Expression

Enables the creation of a field based on a custom computation using Spark SQL. This is useful for applying complex logic or transformations that are not covered by other types.

Using Custom Expression:

You can combine multiple fields, apply conditional logic, or use any valid Spark SQL functions to derive your new computed field.

Example: To create a field that sums two existing fields, you could use the expression SUM(field1, field2).

Advanced Example: You need to ensure that a log of leases has no overlapping dates for an asset, but your data only captures a single lease's details like:

LeaseID AssetID Lease_Start Lease_End
1 42 1/1/2025 2/1/2026
2 43 1/1/2025 2/1/2026
3 42 1/1/2026 2/1/2026
4 43 2/2/2026 2/1/2027

You can see in this example that Lease 1 has overlapping dates with Lease 3 for the same Asset. This can be difficult to detect without a full transformation of the data. However, we can accomplish our goal easily with a Computed Field. We'll simply add a Computed Field to our table named "Next_Lease_Start" and define it with the following custom expression so that our table will now hold the new field and render it as shown below.

LEAD(Lease_Start, 1) OVER (PARTITION BY AssetID ORDER BY Lease_Start)

LeaseID AssetID Lease_Start Lease_End Next_Lease_Start
1 42 1/1/2025 2/1/2026 1/1/2026
2 43 1/1/2025 2/1/2026 2/2/2026
3 42 1/1/2026 2/1/2026
4 43 2/2/2026 2/1/2027

Now you can author a Quality Check stating that Lease_End should always be less than "Next_Lease_Start" to catch any errors of this type. In fact, Qualytics will automatically infer that check for you at Level 3 Inference!

More Examples for Custom Expression

Example Input Fields Custom Expression Output
1 field1 = 10, field2 = 20 SUM(field1, field2) 30
2 salary = 50000, bonus = 5000 salary + bonus 55000
3 hours = 8, rate = 15.50 hours * rate 124
4 status = 'active', score = 85 CASE WHEN status = 'active' THEN score ELSE 0 END 85