ETL TESTING-Handling Heterogeneous Data Formats

download ETL TESTING-Handling Heterogeneous Data Formats

of 59

  • date post

    11-Nov-2014
  • Category

    Documents

  • view

    176
  • download

    1

Embed Size (px)

description

super testing document for ETL good for begineers

Transcript of ETL TESTING-Handling Heterogeneous Data Formats

ETL TESTINGHandling Heterogeneous Data Formats

Rajasimman Selvaraj Simanchal Sahu Tithi Mukherjee

2009 Wipro Ltd - Confidential

Agenda

12 3 4

ETL Basic ConceptSOURCE & TARGET SYSTEMS

Interpretation of Mapping DocumentCreation of DSN GENERAL CASES OF DATA COMPARISON

5

2

2009 Wipro Ltd - Confidential

ETL Basic Concept

3

2009 Wipro Ltd - Confidential

ETL Basic ConceptETL is the automated and auditable data acquisition process from heterogeneous source systems that involves one or more sub processes listed below: Data extraction Data transportation Data transformation Data consolidation Data integration Data cleaning Data loading

4

2009 Wipro Ltd - Confidential

Contd Source System can be any application or data store that creates or stores data and acts as a data source to other systems. Will cover this topic in details later. Automation is critical without which the very purpose of ETL will be defeated. ETL is no good if processes need to be manually scheduled, executed or manually monitored. Extraction is first major step in physical implementation of ETL. Extraction initiates or triggers further downstream processes. Needless to say, once data is extracted it has to be hauled and transported to target, because the physical location of the source system might be different from the target warehouse. Data Cleansing is very essential as the data pulled from various source systems can have some unwanted data, unprintable characters, extra blank spaces, etc. This might cause some absurd result while loading the data into the Data warehouse.

5

2009 Wipro Ltd - Confidential

Contd Transformation is the series of tasks that prepares the data for loading into the warehouse. Once data is secured, you have worry about its format or structure. Because it will be not be in the format needed for the target. Example the grain level, data type, might be different. Data cannot be used as it is. Some rules and functions need to be applied to transform the data. One of the purposes of ETL is to consolidate the data in a central repository or to bring it at one logical or physical place. Data can be consolidated from similar systems, different subject areas, etc. ETL must support data integration for the data coming from multiple sources and data coming at different times. This has to be seamless operation. This will avoid overwriting existing data, creating duplicate data or even worst simply unable to load the data in the target.

6

2009 Wipro Ltd - Confidential

Contd Loading part of the process is critical to integration and consolidation. Loading process decides the modality of how the data is added in the warehouse or simply rejected. Methods like addition, Updating or deleting are executed at this step. What happens to the existing data? Should the old data be deleted because of new information? Or should the data be archived? Should the data be treated as additional data to the existing one? Data should be loaded with lots of care. Does that that means data loaded in the Warehouse is incorrect? What is the confidence level in the data? A data auditing process can only establish the confidence level. This auditing process normally happens after the loading of data.

7

2009 Wipro Ltd - Confidential

CONTDA generic pictorial representation of ETL Process :

8

2009 Wipro Ltd - Confidential

SOURCE & Target SYSTEMS

9

2009 Wipro Ltd - Confidential

SOURCE SYSTEMS SAP RDBMS Oracle SQL Server DB2 Teradata FLAT FILES .TSV .TXT .CSV MS-ACCESS .MDB Temporary Storage

10

2009 Wipro Ltd - Confidential

TARGET SYSTEMS RDBMS Oracle Teradata SQL Server FLAT FILE

11

2009 Wipro Ltd - Confidential

Interpretation of Mapping Document

12

2009 Wipro Ltd - Confidential

Interpretation of Mapping DocumentMapping document is an excel sheet which acts as a reference document for the testing team to understand the data flow and based on this understanding the test scripts are prepared. A Mapping document generally provides the following information: Details about the Source and Target systems (Location, Connection, etc.) Details of Source and Target tables involved Various attributes of the Source and Target fields (Field Name, Data type, Size, etc) Dependencies between Source systems/tables for fetching the source data All transformation rules to be applied on the data before loading them into the Target tables

13

2009 Wipro Ltd - Confidential

CONTD

A sample Mapping sheet looks like..

14

2009 Wipro Ltd - Confidential

Creating a data source name (dsn)

15

2009 Wipro Ltd - Confidential

Creation of DSNStep1: Go to START RUN. Type odbcad32 and click OK. Step2: An ODBC Data Source Administrator will open in which, select system DSN and Click ADD button. Another window create a new data source will open.

16

2009 Wipro Ltd - Confidential

ContdStep3: Select SQL SERVER or Microsoft ODBC for Oracle from the list. Click OK. A small window will open

17

2009 Wipro Ltd - Confidential

CONTDStep4: Enter any name in Data Source Name text field. Enter your User Name for that data base. Enter the name of the server as such given in tns.ora file. Click ok

18

2009 Wipro Ltd - Confidential

General Cases of Data Comparison

19

2009 Wipro Ltd - Confidential

General Cases of Data Migration Case-1: Source: Oracle Target: Oracle Other Tools: Edit Plus, Beyond Compare

20

2009 Wipro Ltd - Confidential

CONTDExecuting the Source Query in PL/SQL Developer

21

2009 Wipro Ltd - Confidential

CONTDExecuting the Target Query in PL/SQL Developer

22

2009 Wipro Ltd - Confidential

Methods of comparing the data Excel Macro or third party tool verificationSRC: Select * From (SELECT VNDR_KEY, NVL2(STR_ADDR,ltrim(rtrim(STR_ADDR), NVL2(PO_BOX,'PO Box'||' '||ltrim(rtrim(PO_BOX)),ltrim(rtrim(STR_ADDR)))) FROM DW_R0001_T.VNDR V WHERE VNDR_KEY >= '0000100000' AND VNDR_KEY < '0000400000' AND DEL_F is NULL Order by VNDR_KEY) SRC_VALUE TGT: Select * From (SELECT VNDR_NUM, STR_ADDR FROM AMB_CARE_T.MSS_VENDOR_MAST_STG Order BY VNDR_NUM) TGT_VALUE

23

2009 Wipro Ltd - Confidential

CONTD Using MINUSSelect * From (SELECT VNDR_KEY, NVL2(STR_ADDR,ltrim(rtrim(STR_ADDR), NVL2(PO_BOX,'PO Box'||' '||ltrim(rtrim(PO_BOX)),ltrim(rtrim(STR_ADDR)))) FROM DW_R0001_T.VNDR V WHERE VNDR_KEY >= '0000100000' AND VNDR_KEY < '0000400000' AND DEL_F is NULL Order by VNDR_KEY) SRC_VALUE MINUS Select * From (SELECT VNDR_NUM, STR_ADDR FROM AMB_CARE_T.MSS_VENDOR_MAST_STG Order BY VNDR_NUM) TGT_VALUE

24

2009 Wipro Ltd - Confidential

CONTD Using Full Outer JoinSelect SRC.VNDR_KEY AS SRC_VNDR_KEY, TGT.VNDR_NUM AS TGT_VNDR_NUM, SRC. STR_ADDR AS SRC_ STR_ADDR, TGT. STR_ADDR AS TGT_ STR_ADDR From (SELECT VNDR_KEY, NVL2(STR_ADDR,ltrim(rtrim(STR_ADDR), NVL2(PO_BOX,'PO Box'||' '||ltrim(rtrim(PO_BOX)),ltrim(rtrim(STR_ADDR)))) AS STR_ADDR FROM DW_R0001_T.VNDR V WHERE VNDR_KEY >= '0000100000' AND VNDR_KEY < '0000400000' AND DEL_F is NULL Order by VNDR_KEY) SRC FULL OUTER JOIN Select * From (SELECT VNDR_NUM, STR_ADDR FROM AMB_CARE_T.MSS_VENDOR_MAST_STG Order BY VNDR_NUM) TGT ON ltrim(rtrim(SRC.VNDR_KEY)) = TGT.VNDR_NUM OR (ltrim(rtrim(SRC.VNDR_KEY) ) IS NULL AND TGT.VNDR_NUM IS NULL) AND ltrim(rtrim(SRC. STR_ADDR) ) = TGT. STR_ADDR OR (ltrim(rtrim(SRC. STR_ADDR) ) IS NULL AND TGT. STR_ADDR IS NULL)25 2009 Wipro Ltd - Confidential

CONTDCase-2: Source: RDBMS\Flat file Target: SQL Server Other Tools: Edit Plus, Beyond Compare, VIM Editor

26

2009 Wipro Ltd - Confidential

Importing Oracle Table into AccessStep1: Click the menu NEW and a window by name NEW TABLE will open. Select the option import table as shown in the slide. Click OK

27

2009 Wipro Ltd - Confidential

CONTDStep2: A window named Import will open. In that select ODBC Data Sources from the drop down list of Files of Type combo box. Click ok

28

2009 Wipro Ltd - Confidential

CONTD Step3:Select Data Source window will open. Select Machine Data Source tab. Select the name of the data source, from which you want to import a table. Click Ok

29

2009 Wipro Ltd - Confidential

CONTDStep4: A login window will open. Enter your login credentials for that database. Click ok.

30

2009 Wipro Ltd - Confidential

CONTDStep5: A window name Import Objects will open. Select the table from the list. Click OK and the table will start getting imported into MS_ACCESS..

31

2009 Wipro Ltd - Confidential

Importing flat-file into AccessAfter clicking import table menu in MS_ACCESS, the following screen appears

Select the appropriate text file and click IMPORT32 2009 Wipro Ltd - Confidential

CONTDSelect the appropriate radio button based on whether the text file is delimited or fixed width

33

2009 Wipro Ltd - Confidential

CONTDSelect proper radio button based on the type of delimitation of the flat file. For flat files having the column names as first record, select the check box-First Row Contains Field Names

34

2009 Wipro Ltd - Confidential

CONTDSelect new table

35

2009 Wipro Ltd - Confidential

CONTDClick on individual columns and give the name and data type of the field in the designated text fields. Click Finish once you are done and the table will be imported

36

2009 Wipro