Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for...

21
Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar ([email protected]), BIE US, MSIT Anandam Sarcar ([email protected]), BIE India, MSIT Disclaimer: Authors have used pseudo code , examples , tools for illustration purpose only.

Transcript of Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for...

Page 1: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Data Dictionary based Testing (DDbT) for Business Intelligence Applications

Author(s):

Narendra Parihar ([email protected]), BIE US, MSIT

Anandam Sarcar ([email protected]), BIE India, MSIT

Disclaimer: Authors have used pseudo code , examples , tools for illustration purpose only.

Page 2: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Abstract

This Paper tries to present a new test methodology which can help us better test Business Intelligence (BI) or Data warehouse (DW) with more coverage, more data centric approach and with less time. The horizon's to cover with this idea are endless to explore.

Assume that we want to test a Business Intelligence (BI) or Data warehouse (DW) application , and we normally start to think that it’s going to be black box testing or data quality testing, or both.

Our test plans are towards how to make sure data is correct, functionality is working, jobs are completing etc. and how do we achieve it…?Normally using black box techniques. This is where the twist lies..

Do we really test a BI/DW using black box techniques? Theoretically the answer may be Yes, but practically No. What we need is essentially a data verification testing methodology to test BI/DW applications. This is what we are calling in more technical term – ‘Data dictionary Based Testing Framework’ “A test framework wherein the data dictionary is leveraged to describe each and every element of the data residing in database server objects, their relationships within the data, the data lineage between the elements, which in turn when rationalized would help us to test the data quality and functionality of the application ”

Page 3: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Problem’s with conventional BI/DW Test strategies

o Black box with manual execution test methods are time consuming and test coverage of application under test is difficult to determine.

o Repeated Data Quality (DQ) checks in BI/DW testing.

o Automation at database object level and transformations are difficult.

o Most tests don’t cover all objects and apply DQ tests upon them.

o Most tests are focused on functional scenarios and pay less attention to different referential checks, which eventually can break system down in future.

o Large portion of BI/DW testing happens after required ETL jobs are complete- aren't we losing those long hours waiting for jobs to complete.

o Limited or no data growth checks in test.

Why DDbT?

Page 4: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Details on DDbT

Fundamental thought process behind this approach:

Understand data, their types, relationships first before preparing test designs, automation or DQ checks.

Map the data , data elements, relationships, lineage in Metadata tables.

Use the metadata derived for testing the data inside the application.

Lets assume below as a simple BI/DW application to understand:

Source

Staging

Data Mart Cube

What is DDbT?

Page 5: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Cont.… Let us say there are only two tables in source( to make it understandable) and below is sample data:

Note that Order table has foreign key referencing to Product ID column of Product table

Product Column Types

ProductID Name UnitPrice Column_name Type

1 Mango 100 ProductID int

2 Organe 0 Name char

3 Apple 100 UnitPrice int

4 Grape 10

5 Junkproductname -100

Order Column Types

OrderID ProductID OrderQty Column_name Type

1 1 5 OrderID int

2 1 5 ProductID int

3 2 -3 OrderQty int

4 3 5

5 4 100

6 4 100

7 5 0

Page 6: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Cont.…

In target Staging server, we are just pulling tables from Source and appending surrogate key like productkeyid, orderkeyid, and adding some information which is needed for BI/DW as recordinserttime, recordmodifiedate, and implementing delta logic based on productid.

Now say, once data is present in Staging server, we have a salesfact table which is the fact table in our DataMart as below:

Salesfact

ProductKey OrderQuantity UnitPrice SalesAmount

11 10 100 1000

22 -3 0 0

33 5 100 500

44 200 10 2000

55 0 -100 0

Page 7: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Lets get on with DDbT on sample app So we have a simple enough BI/DW app and now lets assume some reports are made once data moves

into Analysis service cubes from datamart, and from cube to reports.

Now think of the scenario when this would have been a V1 project then would have to create your data dictionary on own , as you cant do reverse engineering at that point of time.

So, lets get to that meat of the topic on exact steps to perform DDbT:

1. Create Data Dictionary for different data sources, idea is to organize dictionaries

2. Create a generic table which have columns which drives your ETL Logic, eg. Last modified date, product ids etc in case of delta ETLs and for full ETL only table/column names may be sufficient

3. Create a Stored procedure, which will do objects existence BVT against data dictionary in one shot. This SP can be always called in post installation and your BVT of objects will be automated.

4. Create a Stored procedure, which will take parameters from table created in Step 2 and compare with pulled data, this is Dev logic of pulling data v/s Test logic.. We can run this stored procedure after one run of ETL in test to check automatically in every release.

5. Create a Stored procedure, which will do data growth monitoring Once one round of ETL is complete in test to check if suddenly data in some table have dropped or grown due to release changes. This will run in one shot and show us all tables where data has changed by more than X percent against it was before run of the ETL at table level.

6. Create a Stored procedure which can do DQ checks (and also identify patterns of data for certain columns to be set by the user) automatically for you by looping through each parameter in table of Step 2 and putting results into a different table.

7. Create Metadata Lineage Framework to Test Dimension and Facts (Explained in more detail with sub steps in subsequent section)

How to do

DDbT?

Page 8: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

New Topology of sample app with DDbT

Source

Staging

Data Mart Cube

DDbT

This database holds data dictionaries, Test

Logics, Stored Procedures , Data

Lineage Information and Test Results

Page 9: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Step 1: Create Data Dictionary for different data sources, idea is to organize dictionaries Schema Table Description

dbo Order

dbo Product

dbo SalesFact

Schema Table Number Column Datatype Size Nullable InPrimaryKey

IsForeignKe

y Description

dbo Order 1 OrderID Int 4 N Y N

dbo Order 2 ProductID Int 4 N N Y

dbo Order 3 OrderQty Int 4 N N N

dbo Product 1 ProductID Int 4 N Y N

dbo Product 2 Name Char (20) 20 N N N

dbo Product 3 UnitPrice Int 4 N N N

dbo SalesFact 1 ProductKey Int 4 N N N

dbo SalesFact 2 OrderQuantity SmallInt 2 Y N N

dbo SalesFact 3 UnitPrice Money 8 Y N N

dbo SalesFact 4 SalesAmount Money 8 Y N N

Above is simple Data Dictionary created using DCT tool at www.CodePlex.com , notice that this does not have description for columns/tables but ideal assumption is they will be there for better understanding of objects, in our sample app example they are self explanatory.

Page 10: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Step 2: Create a generic table which have columns which drives your ETL Logic, eg. Last modified date, product ids etc in case of delta ETLs and for full ETL only table/column names may be sufficient -- Creating below Generic table for our sample application

CREATE TABLE ddbt_tw11 ( sourceserver CHAR(40) NOT NULL, sourcedatabase CHAR(40) NOT NULL, sourcetable CHAR(40) NOT NULL, destinationserver CHAR(40) NOT NULL, destinationtdatabase CHAR(40) NOT NULL, destinationtable CHAR(40) NOT NULL, deltacolumn CHAR(40) NULL, rulestring VARCHAR(MAX) -- If other transformations are there, input them here so that you can form dynamic SQL later to do Test ) INSERT INTO ddbt_tw11 VALUES ( 'SalesOLTP', 'SalesOLTPDB', 'Product', 'TW11Server', 'TW11Staging', 'Product', 'Productid', NULL ) SELECT * FROM ddbt_tw11

SourceServer Sourcedatabase Sourcetable DestinationServer Destinationtdatabase

Destinationtable DeltaColumn RuleString

SalesOLTP SalesOLTPDB Product TW11Server TW11Staging Product Productid NULL

Page 11: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Step 3: Create a Stored procedure, which will do objects existence BVT against data dictionary in one shot. This SP can be always called in post installation and your BVT of objects will be automated.

Step 4: Create a Stored procedure, which will take parameters from table created in Step 2 and compare with pulled data, this is Dev logic of pulling data v/s Test logic.. We can run this stored procedure after one run of ETL in test to check automatically in every release. Think of this as a wrapper SP which holds all your test logics in it and you can run this SP (Counter Dev Logic tests) Step 5: Create a Stored procedure, which will do data growth monitoring Once one round of ETL is complete in test to check if suddenly data in some table have dropped or grown due to release changes. This will run in one shot and show us all tables where data has changed by more than X percent against it was before run of the ETL at table level. Data Dictionary in Step1 makes it implementable with ease.

T- S

Q L

is

E S S E N T I A L

Page 12: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Step 6: Create a Stored procedure which can do DQ checks (and also identify patterns of data for certain columns to be set by the user) automatically for you by looping through each parameter in table of Step 2 and putting results into a different table

set @srcTable = 'tblCustomer‘

set @tgtTable = 'tblCustomerTgt‘

set @pksrcTable = 'intID'

set @pktgtTable = 'intID'

-To find out the differences missing records

SET @SQL = 'select * from ' + @srcTable +' Src' +

' full outer join ' +

@tgtTable+' tgt '+

'ON SRC.'+@pksrcTable+'=TGT.'+ @pktgtTable+

' where src.'+@pksrcTable+' IS NULL or tgt.'+@pktgtTable+' IS NULL'

SELECT @sql

execute sp_executesql @SQL

-To find out the differences mismatch records

SET @DIFFSQL = 'SELECT * from tblCustomer Src

FULL OUTER JOIN

tblCustomerTgt Tgt

ON SRC.intID = TGT.intID

WHERE

SRC.txtName <> TGT.txtName'

Example of simple DQ

checks

Page 13: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Step 7: Create Metadata Lineage Framework to Test Dimension and Facts (also Intermediate Tables)

ProductID Name Unit Price 1 Mango 100 2 Orange 200

CustomerID CustomerName 1 Anandam 2 Narendra

OrderID ProductID Customerid OrderQty

100 1 1 5

200 2 2 6

ProductKeyID Name Unit Price .. 11 Mango 100 … 22 Orange 200

Typical OLTP Data in Staging

Data in Dimensional Model

CustomerKeyID CustomerName

..

55 Anandam

.. ..

66 Narendra

Dimensions

Page 14: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

ProductKeyID CustomerKeyID OrderQty Unit Price Sales Amount 11 55 5 100 500 22 66 6 200 1200

Process Step7-a: For Dimensions, map each and every column in the table (except the surrogate) and create the data lineage. If there are intermediate tables also, create the data lineage . Populate tblDataLineageDimensions as below for above example

tblStagingName

StagingColumnName

tblDimensionName

DimensionColumnName

Free Text SQL (for Filter, concatenation, or any lofgic) at Staging

Criteria for Uniqeuly IDENTIFYING THE ROW IN Staging ( normally the Source Natural Key)

Criteria for Uniqeuly IDENTIFYING THE ROW IN Dimension ( normally the Source Natural Key)

Free Text SQL (for Filter, concatenation, or any lofgic) at Dimension

Product Name Dimension Product

Name Name Name

Product Unit Price Unit Price Name Name

.. .. .. ..

… .. .. .. ..

Sales Fact

Page 15: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Step 7-b: Run a stored procedure to compare the value against the tblDataLineageDimensions for set of rows by forming the dynamic SQL based on the Staging and Dimension Column. If values match, DQ is same, else we have to flag the suspect record.

tblStagingName StagingColumnName

tblFactName FactColumnName

Free Text SQL (for Filter, concatenation, or any lofgic) at Staging

Criteria for Uniquely identifying row in Staging ( normally the Natural Key)

Criteria for Uniquely identifying row in in Fact( normally the composite PK Key)

Free Text SQL (for Filter, concatenation, or any logic) at Fact

Order OrderQty Sales Fact

OrderQty JOIN tblProduct ->Productid, JOIN tblCustomer->CustomerID

pRODUCTid, cUSTOMERID

PRODUCTKeyID, CustomerKeyID

JOIN tblDimProduct->ProductKeyID, JOIN tblDimCustomer->CustomerKeyID

Product Unit Price Sales Fact

Unit Price .. .. .. ..

Step 7-c: Populate tblDataLineageFacts

Page 16: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Step 7-d: For Computed Columns, which are not there in DB, here is the process

tblStagingName(list down all tables which help to derive the computed fact)

StagingColumnName

tblFactName FactColumnName

Free Text SQL (for Filter, concatenation, or any lofgic) at Staging

Criteria for Uniquely identifying row in Staging ( normally the Natural Key)

Criteria for Uniquely identifying row in Fact( normally the composite PK Key)

Free Text SQL (for Filter, concatenation, or any lofgic) at Fact

Order,Product OrderQty, UnitPrice

Sales Fact Sales Amount JOIN tblProduct ->Productid, JOIN tblCustomer->CustomerID

ProductID, CustomerID

ProductKeyID, CustomerKeyID

JOIN tblDimProduct->ProductKeyID, JOIN tblDimCustomer->CustomerKeyID SalesAmount =(Order.OrderQty*Product.UnitPrice)

Step 7-e: R Run a stored procedure to compare the value against the tblDataLineageFacts for set of rows by forming the dynamic SQL based on the Staging and Fact Columns. If values match, DQ is same, else we have to flag the suspect record

Page 17: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Data Dictionary Lifecycle – when does it start and end?

Business

• Creates Data Dictionary for Business Process. E.g.: Data in some flat files, excels, basic mapping of subject areas, entities and rules

SD/PM

• Maps the Data Dictionary of the processes to subject areas.

• Creates logical data models maintaining the relationships and data definitions.

Dev

• Creates the Physical Data Model , Data Flow Diagram.

• Documents the Metadata for each Data Dictionary element's flow

QA

• Verify, Utilize and Build upon the metadata created by Dev team to map to current proposed Metadata framework, enhance it more for DDbT

QA duties

at each

phase

Review for completeness, relationships, and business

rules

Review for logical data entities and

relationships, exception's,

rules, definitions, hierarchy

Review for meta data, rules, data

flow, order of data flow rules

Test design using DDbT, DQ Checks ,

execution using DDbT

steps and refinement of

dictionary

O n e

V e r S i o n

O f

T r u t h

Page 18: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

DDbT in Scenario Focused Engineering

One of the challenges in adoption of Scenario Focused Test Design (SFTD) for BI/DW application is how to design Data flow/Data Quality test cases using SFTD recommended steps. Many a times, Use cases for data flow are generic at very high level which really does not help tester to design SFTD for data flows and DQ checks at object levels.

DDbT can be involved in QA strategy using SFTD and can help here to drive adoption of SFTD. Below is the proposed way how to get DDbT part of SFTD:

Understand Use cases

Understand Data Dictionaries provided by Business / SD / Dev

Validate Mappings and rules in Test Data Dictionary

Design Data flows from data dictionary (already in step 1-7 of DDbT) [How about a GUI to do so?]

Design data flows in SFTD vision using pre-defined 6 controls

Color code Positive and Negative data test cases

Number the flows

Generate pseudo code for automation

Finally Test more DQ scenarios with SFTD/DDbT combination. The value is QA teams can understand entire data flow while doing DDbT along with SFTD. This can even provoke thoughts of Program managers and Developers to maintain their own data dictionaries and move more quality upstream.

Page 19: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Big Wins with Data Dictionary Testing

o With increasing number of Sources in DW, complexity will be abstracted to minimum from DW Tester perspective.

o Proactively identify DQ/ performance issues as soon as the data patterns are not normal

o More Test coverage, rather Test team would be sure what they have tested and what not!

o BVT can be fully automated

o Regression suite is good potential to automate with DDbT

o After one time setup of Metadata, drastic reduction in effort and time from testing perspective for subsequent releases of the Project

o Know Data well , more specifically domain data of application !!!!!

Page 20: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Limitations

DDbT require one time setup effort

Unless Test Team is well versed with application, it may become challenge to implement DDbT, however following this methodology will help in knowing application and data better.

Re-work is anticipated if not correct!

Maintenance effort

Haven’t tried in project yet!

Page 21: Data Dictionary based Testing (DDbT) for Business ... · Data Dictionary based Testing (DDbT) for Business Intelligence Applications Author(s): Narendra Parihar (nparihar@microsoft.com),

Closing thoughts :: Are we ready to implement DDbT process for BI/DW testing?

One common definition of data dictionary will solve many problems.

Contact [email protected] and [email protected]