A system integration test approach mitigating the unavailability of good quality data in DWT
-
Upload
amber-bowen -
Category
Documents
-
view
19 -
download
0
description
Transcript of A system integration test approach mitigating the unavailability of good quality data in DWT
A system integration test approach mitigating the unavailability of
good quality data in DWTSaurabh Shinde
Partha Majhi
Infosys Limited (NASDAQ: INFY)
Abstract
Retail banks have ever increasing challenge of servicing millions of customers with varied needs and hence posses the necessity to maintain sound functional systems which deliver on time. Some of the key challenges with performing testing related to such systems are proper knowledge of the interacting systems, the business awareness and availability of good quality test data. A bad production release might impact the overall functioning of the bank, affect the credit risk analyzers or even loose the trust of its customers. Hence it is imperative need of the quality management team for the bank to have system integration test approach that also gives due diligence to ensure good test data.
2
Abstract (cont..)
We would like to share our experience with an approach related to such quality testing for banking organization whose need was to ensure that its data is Basel compliant. Typical challenges in such area of testing are lack of domain and technical expertise, stringent timeline for delivery with no expected issues, etc. Hence a testing approach is required which encompasses testing techniques that are driven by business rules and data, and possess means to validate with good quality test data.
3
Target Audience
This tutorial can benefit:
Test Managers to plan approach for creating test data and performing system integration test.
Test Leads/Engineers to practice the approach to perform the system integration test along with creating valid test data
4
Outline of the Tutorial1. Introduction
a. Objectives of the sessionb. User expectationsc. Context setting
2. Banking Operationsa. Overview and relation to context
3. Data warehouse testinga. Overview and relation to context
4. Limitations of test data encountered in testing
a. Limitations
5. Managing test dataa. Approach to manage test data
5
5. System Integration test approach a. System integration test
approach
6. Case Study (application of the approach)
a. Case studies
7. Summarization
8. Closure a. Q&A
Objectives of this session – is to prepare you for the situations that we come across daily in our testing life with respect to test data tend to ignore it.
Your Expectations for this session – It won’t solve all of your test data related woes but promise handle a handful that trouble you most.
6
Objectives, Expectations…
Banking Operations
Retail Banks provide varied payment services to its customer.• Personal accounts (checking, saving)• Cards (Credit, Debit)• Mortgage loans• Home equity loans• Personal loans
They have to manage the related data for day-to-day functioning.
They charge interests on the services and generate revenue which keeps them functioning. They possess risk of ‘failure to generate revenue’ in the event that the customer defaults. Hence, it is critical for every bank to analyze and manage their risks to keep them profitable. They process the data through such ‘risk application’ systems. Typically, the data would flow such as
Banking Operations
The application systems are supported by data warehouses. Testing these systems requires having sound understanding of the data processing done by them.
Data Warehouse testing
Typically, DWT project testing is done at 3 phases:
o EXTRACTo TRANSFORMo LOAD
Banking Operations
The testing that is to be done in EXTRACT and LOAD phase is predominantly process oriented and has no to little data dependency. A subset of production data will be sufficient to test these two phases.
The challenge comes when we are to test TRANSFORM phase. This is
the process of altering data based on a set of business rules. o Simple conversion: Value of a field converted based on another.
Eg: a currency field is converted from native currency to local currency value.
o Complex conversion: Value of a field is derived based on relation between multiple fields. This might also necessitate joining different tables.
o Filter out: Some data is filtered out based on defined business criteria. Eg: Duplicate records are dropped.
Limitations of test data encountered in testing
To visualize an example, consider a transform process that has many logical branches based on which the Source data is transformed.
e.g. If (0<= amount_field <100) set attribute_1 = 1 else If (100 <= amount_field <200) then attribute_1 = 2 else If (200 <= amount_field <300) then attribute_1 = 3 else If (300 <= amount_field <400) then attribute_1 = 4 else If (400 <= amount_field <500) then attribute_1 = 5 . . . Else If (1000 <= amount_field <1100) then attribute_1 = A
We require test data that could test all the conditions in order to validate the logical branches.
Limitations of test data encountered in testing
To visualize an example, consider a transform process that has many logical branches based on which the Source data is transformed.
e.g. If (0<= amount_field <100) set attribute_1 = 1 else If (100 <= amount_field <200) then attribute_1 = 2 else If (200 <= amount_field <300) then attribute_1 = 3 else If (300 <= amount_field <400) then attribute_1 = 4 else If (400 <= amount_field <500) then attribute_1 = 5 . . . Else If (1000 <= amount_field <1100) then attribute_1 = A
We require test data that could test all the conditions in order to validate the logical branches.
Limitations of test data encountered in testing
But most often, we experience
o Test data do not contain values to test all scenarioso Test data is not available for certain fields that have restricted
permissionso Test data that is sourced from upstream may not be available in
time for the testing cycle, leading to reduced testing window for the release
o Test data is not production like, leading to possibly miss out on identifying specific production issues
o Even if we get production sanitized test data, it may not provide coverage for all business rules covering the application domain
To mitigate this unavailability of test data, we need to manage the test data so that it resembles to production data and at same time it covers business scenarios and is based on our requirements.
Managing test data Managing test data is a process involving several activities.
o Creating instances of test environment layers• Layer 1: Procuring data from production• Layer 2: Manipulating data as per test requirements• Layer 3: Storing data as ‘Regression Set’ for future
regression use• Layer 4: Test instance where data would be made available
for test
o Procure data from Production• Analyze the test scope and identify the data that is to be
fetched from production• Batch jobs programs could be created to fetch the data from
production (with assist from Development team) into the Test Layer 1 instance
• Handling production data is difficult if it is of huge volume, hence based on the test scope and test requirements, a proper sub-set should be chosen
Limitations of test data encountered in testing o Masking the data for security
• Customer sensitive information such as customer name, customer account number, customer address, customer phone number, customer identification number (country specific), etc should be masked in order to provide security for such confidential data.
o Analyze data for test coverage• Analyze the data fetched from production to verify it is
sufficient to cover all the business scenarios of the test requirements
• Usually it is found that the data do not provide coverage for all business scenarios and performing test with such data potential leads to letting the defects move into production
• In order to manipulate the data as per business scenarios under test, copy the Test Layer 1 instance data to Test Layer 2 instance
• Data could directly be modified in Layer 2 or else if the volume of data is minimum, excel files could be used for modification
Limitations of test data encountered in testing
o Manipulating data as per test requirements• Study the business scenarios• Identify data that is nearest to match to test the business
scenarios• Update or create test data as needed• The test data as per test requirements to cover the business
scenarios is available in Layer 2o Test data regression set
• Identify regression test scenarios/test cases• Store data created for these in Layer 3 for future use• Create a mapping between test scenarios/business
scenarios to test data
Limitations of test data encountered in testing
o Move test data to testing environment• Once the Layer 2 data is ready, it could be copied to the
desired test environment instance (Layer 4)
Other salient activitieso Timely Clean up mechanism should be planned
• Layer 1 and Layer 2 instances serve temporary instances for creating the required test data
• A periodical cleanup plan as applicable to the testing cycles should be planned
Limitations of test data encountered in testing
o New project initiative• If test is to perform for a completely new project and hence
no production data is available, one should understand the business scenarios and create data meets the coverage
o Multiple project requirements
• Mechanism for provisioning simultaneous projects requiring different sets of data should be planned
System Integration test approach
An integration test could be made more effective with the following approach
Limitations of test data encountered in testing
Referenceo Get the count of records sourced from the input sourceo Get the sum of critical amount fieldso This will be form as the reference to check the loading process
into staging areao The data at this stage is selected from production
Test 1
o Compare the count of records and sum of critical amount fields against the reference values
o Take into account records that are designed to be dropped (eg: duplicate records or null value records, etc)
o This validates the loading process into staging areao Manipulate the data to cover all business scenarioso This data forms the input to perform the functional test on the
systemo Get the count of records and sum of critical amount fields
Limitations of test data encountered in testing
Test 2o Compare the count of records and sum of critical amount fields
against the values from staging areao Take into account records that are designed to be dropped or
modified as per business design
Limitations of test data encountered in testing
o Perform thorough business functionality testing, some of the tests here would include• Straight move:
• Valid values / lookup values validation:
• Data derivation:
Limitations of test data encountered in testing
Test 3o Compare the count of records and sum of critical amount fields
against the values from system intermediate area. In most systems, these should match
o Perform tests for critical business functionalitieso This is final system area which is available for business users
and downstream applications Benefits of the approach
o Identification of defect earlier in the system processo Identification of precise defect origination stage within the
systemo Uncovering defects not just related to data, but also related to
processes within the systemo Early defect fix reduces cost to fix the defect, improves system
quality and accelerates the release cycleo Gains confidence of the end user that the system is delivered as
expected quality and functionality
Case Study (application of the approach)
Consider a credit risk reporting system of a Financial Institution that gets data from multiple sources and this data is transformed and used further for regulatory reporting. There exist numerous processes that are used to transform this incoming data. We explain here the difficulty that we faced when we tried to test a process whose purpose was to set a flag called Asset flag. Asset flag indicated what kind of asset we are dealing with. Based on combination of parameters, assets (in this case loan) used to be classified in to "Commercial" or "Retail" category.Asset flag was a very important flag as further BASEL calculation were done based on this flag's value. These calculations are very important for Regulatory reporting. These kinds of things make or break a bank in today's world. We have represented a subset of scenarios and test data requirements using below GRID.
Limitations of test data encountered in testing
• Column 1 - Gives you the scenario number• Columns 2 to Column 9 are the input parameters• Column 10 is the output.
Limitations of test data encountered in testing
Here we are showing only a sample example of the actual grid; in practice this grid had few thousands of rows. Due to sheer volume of data (millions of records), each row cannot be validated individually.In order to test, one approach was to execute these processes for a month end (production) data and the output of process to be validated using exception queries. The month end data may or may not satisfy all the scenarios that are given in above grid.So even though test scenarios weren’t satisfied by data, the test is passed when no exception is returned by exception query.Hence many of the logical branches were not tested but passed.When you consider few hundreds of sources of data the problem multiplies by that factor and you end with huge amount of code that has not been tested but still QA certified. To overcome this we had to resort to test data management. What we did was create a golden copy of data which is a subset of production data which satisfied maximum number of our test scenarios. Then based on testing requirement this data was cloned and sanitized further based on test requirement. So whenever a new source of data was introduced the data was sanitized and manipulated to satisfy all the test scenarios and was made available for testing
Limitations of test data encountered in testing
Summarization To summarize, in today’s time where data ware house testing is an
inseparable part of banking industry, there is an urgent need of incorporating test data management in QA process so that no part of code remains untested.
o Incorporating test data management will result in:
• Data for all test scenarios• Increase data quality• Better testing • Better test coverage
Closure
29
Did we meet the objectives?
References
Infosys project experience Infosys resources (www.infosys.com)