ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to...

12
ETL Extract Transform Load

Transcript of ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to...

Page 1: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

ETL

Extract Transform Load

Page 2: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Introduction of ETL• ETL is used to migrate data from one database to another, to

form data marts and data warehouses and also to convert database from one format or type to another

Page 3: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Process of ETL• Extract • Process of reading data from a database

• Transform• Process of converting the extracted data from its previous

form into the form it need to be• By using rules or lookup tables or by combing the data

with other data

• Load• Process of writing the data into the target database

Page 4: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Operations of Transform

• Selecting only certain columns to load• Translating coded values• Encoding free-form values• Sorting• Joining data from multiple sources• Aggregation• Splitting a column into multiple columns• Deriving a new calculated values• …

Page 5: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Pentaho Data Integration

• Pentaho data integration (PDI, also called Kettle) is for ETL processes.

• Download: http://sourceforge.net/projects/pentaho/files/Data Integration/

• Two parts of PDI– Transformation: transformation is the process of ETL– Job: job is used to run transformation

Page 6: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Transformation• Definition

Transformation Description

Value Values are part of a row and can contain any type of data: strings, floating point numbers, integers, dates or boolean values

Row A row consists of 0 or more values that are processed together as a single entry

Input Stream A stack of rows that enters a step

Hop A graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another

Note Descriptive text that can be added to a transformation

Step

HopNote

Page 7: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Main ComponentsAll components Input Output Transformation

Page 8: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Job

• DefinitionTransformation Description

Job Entry A part of job that performs a special task

Hop A graphical representation of one or more data streams between two steps; a hop always represents the output stream for one step and the input stream for another

Note Descriptive text that can be added to a job

Job EntryHopNote

Page 9: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Components of PDI

• Spoon– GUI tool to design the ETL process transformations.– Creating jobs which automate the database update process– Performing the typical data flow functions including: reading, validating,

refining, transforming, writing data

• Pan– Application to run data transformations designed in Spoon

• Kitchen– Application helps execute the jobs in a batch mode, usually using a schedule

• Carte– A web Server which allows remote monitoring of the running PDI ETL

processes through a web browser

Page 10: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Feature of PDI• Simple Visual Designer• Graphic ETL tool• Dynamic transformations• Integrated debugger for testing and tuning job execution

Page 11: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Feature of PDI

• Drag and Drop Integration• Rich library of pre-build components to access• Integration with Zero-Coding Required

• Powerful Administration and Management• Data Profiling and Data Quality• Identify data that fails to comply with business rules and

standards• Manager data quality with partners such as human

interface

Page 12: ETL Extract Transform Load. Introduction of ETL ETL is used to migrate data from one database to another, to form data marts and data warehouses and also.

Feature of PDI• Support for Any Big Data Source