Data transformations are often the most complex and, in terms of processing time, the most costly part of the ETL process. They can range from simple data scrubbing techniques. Data transformation can occur within a database or implemented outside of the database (for example, on flat files) as well.
From an architectural perspective, There are ways to transform data brought from external resources to the data warehouse repository:
Transformation logic consists of multiple steps. i.e in transfering new records to be inserted into a sales table, there may be separate logical transformation steps to validate each dimension key.
A common strategy is to implement each different transformation as a seperate SQL operation and to create a separate, temporary staging table (for example: table new_sales_step1 and new_sales_step2 ) to store incremental results for each steps
This load-then-transform strategy also provides a natural check pointing scheme to the entire transformation process, which enables to the process to be more easily monitored and restricted.
It may also be possible to combine many simple logical transformations into a single SQL statement or single PL/SQL procedure.
However it may also introduce difficulties in modifying adding or dropping individual transformations, as well as recovering from failed transformation.
Pipeline Data Transformation:
Capabilities can be significantly enhanced to address specifically some of the tasks in ETL environments.
Process flow changed dramatically and database becomes an inegral part of the ETL solution.
This shifts from serial transform then load or load then transform to an enhanced transform while loading process.
Before any data transformations can occur within the database, the raw data must become accessible for the database
We can use :
- SQL *Loader (Flat files into Warehouse)
- External Tables
- OCI and Direct-Path APIs (frequently used)
- Export /Import (no large volumes of data should be handled and no complex restrictions are possible)
Example External Table:
CREATE TABLE sales_transactions_ext(PROD_ID NUMBER(6) CUST_ID NUMBER,
CHANNEL_ID CHAR (1),
DEFAULT DIRECTORY data_file_dir
(RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE
LOGFILE LOG_FILE_DIR: ‘sh_sales_log.xt’
FIELDS TERMINATED by “|” LDRTRIM)
}REJECT LIMIT UNLIMITED;
INSERT /*+APEND*/ INTO COSTS
GROUP BY time_id.prod_id;
Another technique for implenting a data substitution is to use an UPDATE statement to modify the sales.channel_id column. An UPDATE will provide the correct result. However if transformation require a very large percentage of the rows or all of the rows , then, it may be more efficient to use a CTAS statement then an UPDATE
ORACLE WAREHOUSE BUILDER (OWB)
it use one of the following methods to transform:
- Predefined Transformations
- Custom Transformations
Part of ORACLE Library, can directly transform data. These transformations are organised into the following categories:
- Control centre