BI tutorial NIKS -...

29
B.I. Testing B.I. Testing B.I. Testing B.I. Testing .......torture the data .......torture the data .......torture the data .......torture the data By: By: By: By: Nikhil Bajaj Nikhil Bajaj Nikhil Bajaj Nikhil Bajaj (Bachelor of Engineering in Information Technology) ( B.I. tester in iGATE Patni )

Transcript of BI tutorial NIKS -...

B.I. TestingB.I. TestingB.I. TestingB.I. Testing

.......torture the data.......torture the data.......torture the data.......torture the data

By:By:By:By:

Nikhil BajajNikhil BajajNikhil BajajNikhil Bajaj (Bachelor of Engineering in Information Technology)

( B.I. tester in iGATE Patni )

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 1

Version 1.0------------August 2011

INDEX

Sr.no. Topic Page

No.

1 Challenges in the BI Testing field 2

2 Expectations from the IT Industry 4

BI/DW 5

3 What is BI? 5

4 Business Intelligence and Data Warehouse 7

5 What is a Data Warehouse? 8

6 Generally how does the data flow in a Data Warehouse? 10

7 What is a Data Mart? 11

8 What is ETL? 12

BI/DW Testing 13

9 What is the need to test a Data Warehouse? 13

10 Data Warehouse Testing and Database Testing 14

11 Type of testing done in a Data Warehouse project 15

12 Who all are involved in testing a data warehouse? 17

13 What are the phases undergone by the QA team? 18

14 How does the QA team prepare test cases? 19

15 Query Format 21

16 Example 25

17 What are the tools that a QA team may use? 28

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 2

Challenges in the BI Testing field:

There are many challenges to the development of the specialized skills

required for BI testing:

1. Unwillingness on the part of DW developers

Any IT professional planning to build a career in this exciting field aims

to be an expert ETL developer, OLAP specialist, dimensional data

modeler or DW architect; DW tester doesn't even make the list of

desirable roles.

This is due to the false perception that only such roles carry premium

rates in the job market and only such roles get to face the technical

challenges associated with a BI project.

This has left the BI project team with very few takers for the challenging

and critical role of tester.

2. Lack of awareness

As a general practice, testers plan their career in such a way that they

specialize and equip themselves with technical skills for the tools

involved in test execution (e.g., Winrunner, SilkTest) and test

management (e.g., Quality Center), with very little endeavor to develop

skills in the underlying technology.

But a good understanding of ETL/OLAP tools and technologies is an

essential skill for BI testing and, so far, testers have not developed a

keen interest in this skill.

3. Absence of tools

The BI marketplace is flooded with many tools and vendors, each

attempting to replace the other in the three layers of BI: database, ETL

and OLAP.

But there are no popular ETL/OLAP testing tools in the market that offer

features for automated testing or functional testing.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 3

4. Lack of standard approach/methodology

While standard methodologies exist for testing as a whole, there seems

to be no industry-wide view on the suggested approach or methodology

for BI testing.

An ideal methodology should include a test strategy, a test plan and test

cases that cover thorough testing of the various phases of data

movement.

Creating test cases and test data that provide adequate coverage to

each of the phases is very critical for ensuring a comprehensive quality

assurance (QA) of the DW.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 4

Expectations from the IT Industry

Listed below are some initiatives that can provide the much-needed boost

to BI testing field:

1. Promote awareness within the DW community that BI testing is a

challenging proposition requiring highly valued skills, thereby

encouraging ETL and BI developers to assume these roles.

Moreover, leading IT players with extensive experience in the DW/BI

area should promote well-defined career options and career

progression plans to the ETL/OLAP developers and conventional

testers.

2. Invest in research to formalize methodologies covering the entire

spectrum of DW/BI testing in full detail.

3. Invest in building assets, tools and job aids to strengthen this

function and provide productivity gains.

4. Develop training courses and course content to cross-train ETL/OLAP

developers in testing nuances and testers in DW and ETL/OLAP tools

and technology concepts.

5. Build strong testing teams with complimentary skills.

The topics covered in this document are prepared according to the above

challenges faced. Keeping in mind all these challenges along with the

expectations from us, let us first start with what is Business Intelligence,

then we will see why is it often used with the term Data Warehouse, then

what is Data Warehouse, Data Mart, ETL and what is the difference

between database testing and data warehouse testing and finally go into

the details of what is BI testing.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 5

To test an object, first we need to understand what is that object. So let us

start with understanding Business Intelligence so that we can learn how

to test it.

BI/DW:

What is BI?

BI is an abbreviation of the two words Business Intelligence, bringing the

right information at the right time to the right people in the right format.

Definition:

It is a broad category of applications and technologies for gathering,

storing, analyzing, and providing access to data to help enterprise users

make better business decisions.

Explanation (What is BI about?):

The five key stages of Business Intelligence:

1. Data Sourcing

2. Data Analysis

3. Situation Awareness

4. Risk Assessment

5. Decision Support

1. Data sourcing

Business Intelligence is about extracting information from multiple

sources of data.

The data might be: text documents - e.g. memos or reports or email

messages, photographs and images, sounds, formatted tables, web

pages and URL lists.

The key to data sourcing is to obtain the information in electronic form.

So typical sources of data might include: scanners, digital cameras,

database queries, web searches, computer file access, etc.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 6

2. Data analysis

Business Intelligence is about synthesizing useful knowledge from

collections of data.

It is about estimating current trends, integrating and summarizing

disparate information, validating models of understanding, and

predicting missing information or future trends.

This process of data analysis is also called data mining or knowledge

discovery.

3. Situation awareness

Business Intelligence is about filtering out irrelevant information, and

setting the remaining information in the context of the business and its

environment.

The user needs the key items of information relevant to his or her

needs, and summaries that are syntheses of all the relevant data

(market forces, government policy etc.).

Situation awareness is the grasp of the context in which to understand

and make decisions.

4. Risk assessment

Business Intelligence is about discovering what plausible actions might

be taken, or decisions made, at different times.

It is about helping you weigh up the current and future risk, cost or

benefit of taking one action over another, or making one decision versus

another.

It is about inferring and summarizing your best options or choices.

5. Decision support

Business Intelligence is about using information wisely.

It aims to provide you warning of important events, such as takeovers,

market changes, and poor staff performance, so that you can take

preventative steps.

It seeks to help you analyze and make better business decisions, to

improve sales or customer satisfaction or staff morale.

It presents the information you need, when you need it.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 7

Business Intelligence and Data Warehouse

Business intelligence is a term commonly associated with data

warehousing.

In fact, many of the tool vendors position their products as business

intelligence software rather than data warehousing software.

This is because often BI applications use data gathered from a data

warehouse or a data mart.

However, not all data warehouses are used for business intelligence, nor do

all business intelligence applications require a data warehouse.

In this document, we are considering DW and BI testing as the same.

Difference:

Business intelligence usually refers to the information that is available for

the enterprise to make decisions on.

A data warehousing (or data mart) system is the backend, or the

infrastructural component for achieving business intelligence.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 8

What is a Data Warehouse?

Abbreviated DW, a collection of data designed to support management

decision making. Data warehouses contain a wide variety of data that

present a coherent picture of business conditions at a single point in time.

A data warehouse is a place where data is stored for archival, analysis

and security purposes. Usually a data warehouse is either a single

computer or many computers (servers) tied together to create one giant

computer system.

Definition:

A data warehouse is a subject-oriented, integrated, time-variant and non-

volatile collection of data in support of management's decision making

process.

Explanation:

The important characteristics of a Data Warehouse:

1. Subject Oriented

2. Integrated

3. Time-variant

4. Non-volatile

1. Subject Oriented

It contains data that gives information about a particular subject instead

of about a company's ongoing operations.

2. Integrated

It contains data that is gathered into the data warehouse from a variety

of sources and merged into a coherent whole.

3. Time-variant

All data in the data warehouse is identified with a particular time period.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 9

4. Non-volatile

Data is stable in a data warehouse. More data is added but data is never

removed. This enables management to gain a consistent picture of the

business.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 10

Generally how does the data flow in a Data Warehouse?

(Flat File)

.txt File

ETL

Source

Staging 1. Staging History

2. Staging Error

Data

Warehouse

Data Mart

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 11

What is a Data Mart?

Data Mart is subset of the data warehouse.

It is a repository of data that holds information on a specific business area,

for example – Sales.

Data Marts have the same definition as data warehouse but have limited

audience and/or data content.

So now the question is how do we move or copy the data from everyday

transactional database to data warehouse?

Here is where ETL comes to play.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 12

What is ETL?

Short for Extract, Transform, Load, three database functions that are

combined into one tool to pull data out of one database and place it into

another database.

Definition:

ETL is a process used to collect data from various sources, transform the

data depending on business rules/needs and load the data into a

destination database.

Explanation:

The ETL process has 3 main steps, which are Extract, Transform and Load.

1. Extract

The first step in the ETL process is extracting the data from various

sources.

Each of the source systems may store its data in completely different

format from the rest.

The sources are usually flat files or RDBMS, but almost any data storage

can be used as a source for an ETL process.

2. Transform

Once the data has been extracted and converted in the expected

format, it’s time for the next step in the ETL process, which is

transforming the data according to set of business rules.

The data transformation may include various operations including but

not limited to filtering, sorting, aggregating, joining data, cleaning data,

generating calculated data based on existing values, validating data, etc.

3. Load

The final ETL step involves loading the transformed data into the

destination target, which might be a database or data warehouse.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 13

BI/DW Testing:

The main difference between normal testing and testing a data

warehouse is that we basically involve the SQL queries in our test case

documents.

What is the need to test a Data Warehouse?

1. Data selection from multiple source systems and analysis that

follows, pose great challenge.

2. Volume and the complexity of the data.

3. Inconsistent and redundant data in a data warehouse.

4. Loss of data during the ETL process.

5. Non-Availability of comprehensive test bed

6. Critical Data for Business.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 14

Data Warehouse Testing and Database Testing

All data warehouses are database, but not all databases are data

warehouse.

A Data Warehouse is a database that is designed for facilitating querying

and analysis. Often designed as OLAP (On-Line Analytical Processing)

systems, these databases contain read-only data that can be queried and

analyzed far more efficiently as compared to your regular OLTP application

databases.

Testing a database and testing a data warehouse are more or less the same

except for some points as follows:

1. The ETL processes together form a DW, so ETL testing is the main

component of DW testing.

2. Since data warehouse is mainly used for reporting purpose, it

becomes necessary to test the reporting functionality of it.

3. Data warehouses store very large amount of data as compared to

databases. So testing the performance of a DW is also

recommended. Whereas in databases, performance is not an issue.

4. Data warehouses have to store the historic data and this feature has

to be checked in DW testing. Whereas in databases, historic data can

be seen very rarely.

This document mainly focuses on the ETL testing part.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 15

Type of testing done in a Data Warehouse project:

The type and number of test performed on a data warehouse varies with

projects.

Some of the common ones are:

1. Requirement testing:

Requirement testing is conducted before any other level of testing.

It verifies whether or not all the requirements provided in the

specification are fulfilled.

2. ETL testing:

In the ETL testing stage, we make sure that appropriate changes in the

source system are captured properly and propagated correctly into the

data warehouse.

3. Smoke Testing:

A smoke test is a collection of written tests that are performed on a

system prior to being accepted for further testing.

This is also known as a build verification test.

4. Functional Testing:

In the functional testing stage, we make sure all the business

requirements are fulfilled.

5. Unit Testing:

Developers perform tests on their deliverables during and after their

development process.

The unit test is performed on individual components and is based on the

developer's knowledge of what should be developed.

6. Integration Testing:

Here we validate the business and functional requirement from which

data according to correct business rules should produce the correct

number of rows being transferred and to verify the data load volumes.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 16

7. Regression Testing:

Validate that the system continues to function correctly after being

changed.

It is performed after a defect reported has been fixed by developer.

8. End-to-end testing:

In the end-to-end testing stage, we let the system run for a few days to

simulate production situations.

9. System Testing:

System Testing is performed to prove that the system meets the

functional specifications from an end to end perspective.

We as a testing team will verify that the data in the source system

databases and the data in the target are consistent through out the

process.

Here QA environment should be the replica of Production prior running

the system test.

10. User Acceptance Testing:

The objective of user acceptance testing is to certify that a release

meets user expectations and is ready for production.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 17

Who all are involved in testing a data warehouse?

1. Business Analysts gather and document requirements

2. QA Testers develop and execute test plans and test scripts

3. Infrastructure people set up test environments

4. Developers perform unit tests of their deliverables

5. DBA’s test for performance and stress

6. Business Users perform functional tests including User Acceptance

Tests (UAT)

QA, short for Quality Assurance is any systematic process of checking to

see whether a product or service being developed is meeting specified

requirements. Many companies have a separate department devoted to

quality assurance, known as the QA team.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 18

What are the phases undergone by the QA team?

While implementing the best practices at testing, the QA teams follow the

various phases in data warehouse testing. They are:

1. Business understanding

a. High Level Test Approach

b. Test Estimation

c. Review Business Specification

d. Attend Business Specification and Technical Specification

e. Walkthroughs

2. Test plan creation, review and walkthrough

3. Test case creation, review and walkthrough

4. Test Bed & Environment setup

5. Receiving test data file from the developers

6. Test predictions creation, review (Setting up the expected results)

7. Test case execution and regression testing if required.

a. Comparing the predictions with the actual results by testing

the business rules in the test environment.

b. Displaying the comparison result in a separate worksheet.

8. Deployment

a. Validating the business rule in the production environment.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 19

How does the QA team prepare test cases?

This topic is very important for a test engineer who is responsible for

writing the test cases.

There are certain types of checks that can be done on the data under

review:

1. Attribute check

2. Current Row check

3. Duplicate check

4. Original Key check

5. Reconciliation check

6. Relationship check

1. Attribute check

Attribute check means verifying that the data is moving correctly from

source table to target table.

2. Current Row check

Current Row check means verifying that the current indicator is “Y” (an

indicator for latest record) for all the latest rows (with latest time

stamp).

3. Duplicate check

Duplicate check means checking that there are no duplicate values for

columns that are required to be unique.

4. Original Key check

Original Key check means checking whether the NOT NULL columns have

some value in them.

5. Reconciliation check

Reconciliation check means verifying that the number of rows in target

and the number of rows coming from source are the same.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 20

6. Relationship check

Relationship check means checking that every primary key value in child

table is present in parent table.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 21

Query Format

Given below are the formats for writing SQL queries to perform all types

of checks.

1. Attribute check

Select count(1)

From( Select source table attributes

From source table

Where list of conditions

Except

Select corresponding target table attributes

From target table

Where list of conditions

)alias(alternate name)

Expected output: Count=0

In the above query, we are first retrieving all the attributes from source

table which are mapped to target and then removing from this list all

the attributes that are present in target table.

So the result count should be zero, meaning that all the attributes that

are present in source table are present in target table and the test case

can be passed.

2. Current Row check

Select count(1)

From( Select records

From table_1

Where list of conditions(records with current time stamp but

having indicator N)

)alias

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 22

Assumption: Indicator for current record: Y

Indicator for old record: N

Expected output: Count=0

In the above query, we are retrieving those records which have current

timestamp but still their indicator is ‘N’.

So if the result count is zero, it means that there are no such records who

are current but have an indicator of being old and the test case can be

passed.

3. Duplicate check

Select count(1)

From( Select attribute_list_1

From table_1

Where list of conditions

Group by attribute_list_1

Having count(1)>1

)alias

Expected output: Count=0

In the above query, we are retrieving the attributes which are supposed

to be unique and then grouping them in the same order in which they

were retrieved.

This will group all the records which have these attributes duplicated

and so the count will be greater than 1 for such records.

When we take the count of such duplicate records and we get zero

output, then this shows that there are no duplicate values for unique

columns and the test case can be passed.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 23

4. Original Key check

Select count(1)

From table

Where list of conditions

And (any of NOT NULL values are NULL)

Expected output: Count=0

In the above query, we are retrieving all the records which have any of

the NOT NULL columns as NULL and then taking count of it.

If the count is zero, this means there are no such records and the test

case can be passed.

5. Reconciliation check

Select count(*)

From source table

Where list of conditions

Select count(*)

From target table

Where list of conditions

Expected output: Source count = Target count

In the above check, there are two queries, one fetching the count of

total number of records in source table and the other fetching the count

of total number of records in target table.

If both the counts are same, this means that there are equal number of

records in source and target and the test case can be passed.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 24

6. Relationship check

Select count(child_id)

From( Select parent_attribute_to_be_checked parent_id,

Child_attribute_to_be_checked child_id

From( Select distinct attributes from child table

Left outer join

Select distinct attributes from parent table

On join conditions

)

)

Where parent_id IS NULL

Expected output: Count=0

In the above query, we are retrieving all the records in target table

which has no parent in source table and then taking its count.

If the count is zero, this means that there are no such records and the

test case can be passed.

Checking lookup condition is the most common example for this type of

check.

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 25

Example

Below example will make the above queries easy to understand.

Consider a source table STUDENTS and a target table FIRST_CLASS_STUDS.

We have to test whether the transformations between these two tables

given in the mapping document are working properly or not.

Below table shows the mapping between the two tables.

Source

table

Source

columns

Target table Target

columns

Transformation

STUDENTS SR. NO

STUDENTS NAME FRST_CLAS_STUDS NAME Capitalize each

letter

STUDENTS ROLL_NO FRST_CLAS_STUDS ROLL_NO

(P.K.)

It should be

present in the

source table

STUDENTS PERCENTAGE FRST_CLAS_STUDS PERCENT Direct mapping

STUDENTS CLASS

STUDENTS ADDRESS

STUDENTS DOB

Attribute check: In target table, 3 columns are mapped from source table

which have their own individual transformations.

We have to test each attribute that is present in the target table keeping

aside the other attributes in source table which are not mapped.

Select count(*) from

(Select upper(S.NAME), S.ROLL_NO, S. PERCENTAGE

From STUDENTS S

Where S. PERCENTAGE >= 60

Except

Select F.NAME, F.ROLL_NO, F.PERCENT

From FRST_CLAS_STUDS F) A;

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 26

Expected output: Count=0

Duplicate check: In target table, attribute ROLL_NO is the primary key.

So it has to be unique.

We have to test whether this attribute is unique or not.

Select count(*) from

( Select ROLL_NO from FRST_CLAS_STUDS

Group by ROLL_NO

Having count(*)>1) A;

Expected output: Count=0

Original key check: In target table attribute ROLL_NO is the primary key.

So it has to be NOT NULL.

We have to test whether this attribute has values for all the records or not.

Select count(*) from FRST_CLAS_STUDS

Where ROLL_NO is NULL;

Expected output: Count=0

Reconciliation check: We have to test whether correct number of records

has been moved from source to target.

Select count(*) from STUDENTS

Where PERCENTAGE >= 60;

Select count(*) from FRST_CLAS_STUDS;

Expected output: count from 1st

query = count from 2nd

query

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 27

Relationship check: In target, the attribute ROLL_NO is derived from

attribute ROLL_NO in source.

So we have to check whether all roll numbers in target are present in

source or not.

Select count(F. ROLL_NO) from

( select distinct F.ROLL_NO from FRST_CLAS_STUDS F

Left outer join

Select distinct S.ROLL_NO from STUDENTS S)

Where S.ROLL_NO is NULL;

B.I. Testing……………….torture the data

Mail your queries to [email protected] Page 28

What are the tools that a QA team may use?

1. Data access tools (e.g., TOAD, WinSQL) are used to analyze content

of tables and to analyze results of loads.

2. ETL Tools (e.g. Informatica, Datastage).

3. Test management tool (e.g. Test Director, Quality Center) that

maintains and tracks the requirements, test cases, defects and

traceability matrix.

All the best for your future as a data warehouse or database tester!!!!!!!!!!!