Extract Transform and Load Process (ETL) Explained

Data transformations are often the most complex and, in terms of processing time, the most costly part of the ETL process. They can range from simple data scrubbing techniques. Data transformation can occur within a database or implemented outside of the database (for example, on flat files) as well.

Transformation flow:

From an architectural perspective, There are ways to transform data brought from external resources to the data warehouse repository:

Multistage transformation: 

Transformation logic consists of multiple steps. i.e in transfering new records to be inserted into a sales table, there may be separate logical transformation steps to validate each dimension key.

A common strategy is to implement each different transformation as a seperate SQL operation and to create a separate, temporary staging table (for example: table new_sales_step1 and new_sales_step2 ) to store incremental results for each steps

This load-then-transform strategy also provides a natural check pointing scheme to the entire transformation process, which enables to the process to be more easily monitored and restricted.

It may also be possible to combine many simple logical transformations into a single SQL statement or single PL/SQL procedure.

However it may also introduce difficulties in modifying adding or dropping individual transformations, as well as recovering from failed transformation.

Pipeline Data Transformation:

Capabilities can be significantly enhanced to address specifically some of the tasks in ETL environments.

Process flow changed dramatically and database becomes an inegral part of the ETL solution.

This shifts from serial transform then load or load then transform to an enhanced transform while loading process.

Loading Machanism:

Before any data transformations can occur within the database, the raw data must become accessible for the database

We can use :

  • SQL *Loader (Flat files into Warehouse)
  • External Tables
  • OCI and Direct-Path APIs (frequently used)
  • Export /Import (no large volumes of data should be handled and no complex restrictions are possible)

Example External Table:

CREATE TABLE sales_transactions_ext(PROD_ID NUMBER(6) CUST_ID NUMBER,
{
TIME_ID DATE,

CHANNEL_ID CHAR (1),
PROMOT_ID NUMBER(6),
QUANTITY_SOLD NUMBER(3),
AMOUNT_SOLD NUMBER(10,2),
UNIT_COST NUMBER(10,2),
UNIT_PRICE NUMBER(10,2)}

ORGANIZATION EXTERNAL

{TYPE oracle_loader
DEFAULT DIRECTORY data_file_dir
ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE
log_file_dir ‘sh_sales_log.xt’
LOGFILE LOG_FILE_DIR: ‘sh_sales_log.xt’
FIELDS TERMINATED by “|” LDRTRIM)
location
(‘sh_sales.dat’)
}REJECT LIMIT UNLIMITED;

External Tables:

INSERT /*+APEND*/ INTO COSTS

(

TIME_ID,

PROD_ID,
UNIT_COST,
UNIT_PRICE

)
SELECT
TIME_ID,
PROD_ID,
SUM(UNIT_COST),
SUM (UNIT_PRICE)
FROM sales_transaction_ext
GROUP BY time_id.prod_id;

Another technique for implenting a data substitution is to use an UPDATE statement to modify the sales.channel_id column. An UPDATE will provide the correct result. However if transformation require a very large percentage of the rows or all of the rows , then, it may be more efficient to use a CTAS statement then an UPDATE

Alternate OPTION:

ORACLE WAREHOUSE BUILDER (OWB)

it use one of the following methods to transform:

  • Transformations 
  • Operators

Transformation types:

  • Predefined Transformations
  • Custom Transformations

Predefined Transformations:

Part of ORACLE Library, can directly transform data. These transformations are organised into the following categories:

  • Administration
  • Character
  • Control centre
  • Conversion
  • Date
  • Numberic
  • OLAP
  • Others
  • SYS
  • Spatial
  • Streams
  • XML