BI/DWH Test specifics - Davinci software blog more than 10 years experience in testing area in...
Transcript of BI/DWH Test specifics - Davinci software blog more than 10 years experience in testing area in...
Page
BI/DWH Test specifics
26/05/2016
me =>
26/05/2016
• more than 16 years in IT
• more than 10 years experience in testing area in various roles: Test
Manager, Test Coordinator, Test Analyst, Quality assurance engineer
• ISTQB certifications: foundation+agile+advanced TM
• experienced in test and analysis trainings, user consultancy,
application support, quality assurance, process improvement,
analysis, business analysis and test analysis
• co-founder of ProTest public test community (e.g. www.pro-test.info]
TestMoto: inadequate test scope definition ? no problem problem cold be only bad test strategy
SELECT TestSpecifics FROM dwh_test_approach
What is DWH? - data, layers, transformation steps - huge volume, long processing - DWH is not private world
Test approach - data is highest priority - no data, but data error is our fast food - SQL like team communication language - development patterns - strange, but not only data are tested
DWH test specifics - local point of view - group point of view - how to test business use cases
Test Environment and Test Data Management - not so easy - not so clear - but it’s working if you know how to do it
http://czechtest.com/select-testspecifics-dwhtestapproach
Quote1
“The price of light is less than the cost of darkness.”
~ Arthur C. Nielsen ~
What is DWH?
Data warehouse (DWH): … system used for reporting and data
analysis, and is considered as a core component of Business
Intelligence … central repositories of integrated data from one
or more disparate sources … store current and historical data …
used for creating analytical reports for knowledge workers … data
… uploaded from the operational systems … data may pass
through an operational data store for additional operations before
it is used in the DWH …
source: Wikipedia®
DWH - data, layers, transformation steps
T1 (Source – Landing)· data transmission
T2 (Landing– BASE): end of day processing· full snapshot in Base· cross System unification
T3 (Base– Core):· transformation with Business Logic· verification that all sources has to be loaded· surrogation (primary+foreing key)· historization
T1 (Source – Landing)· data transmission
T2 (Landing– BASE): end of day processing· full snapshot in Base· cross System unification
T3 (Base– Core):· transformation with Business Logic· verification that all sources has to be loaded· surrogation (primary+foreing key)· historization
DWH – data layers/zones
Zone Description Supported
Processes
Update
Frequency
History
Management1)
Landing Source delivery layer Data delivery Daily 1 Day
Base Source data vault
Data historization,
preparation, Data
unification
Daily Meta Data Driven
Core
Single version of truth for
analytical and reporting
purposes
Data processing stage for all
types of data and information
transformations based on all
types of mappings
Data harmonization
Reporting and
analytics
Daily Meta Data Driven
Minimum 7 Years
DWH - huge volume, long processing
1. DWH contains a huge amount of data: - various business entities/areas (deal – customer – account – provision – collateral)
- complex data relations (bridge tables)
- saved for long time (many years) with data historization (valid from-to, time-stamps)
2. data processing ensure: - data transformation from sources to targets usable for next business analysis needs
- transformation is going from one layer/zone to another
- transformation is specified by business and technical mapping
DWH is not private world
Quote Test approach
“If you can’t explain it simply, you don’t understand it well enough.”
~ Albert Einstein ~
(Tyres) Test approach
Release
scope
Data features
Data
flows
Test
Types
Test Schedule
Data sources
Data changes
Data processing
Data availability
goal: improved data quality and availability
Data = highest priority
1. all expected data have to loaded
2. no duplicities are loaded
3. data are aggregated appropriately
4. results are based on business rules
5. data transformed within data flows: - accurately
- completely
- correctly
- on time
- integrated
SQL like team communication language
SELECT COUNT(*) as RECORD_CNT_BASE_CUST FROM L_BASE.LC_CUST
WHERE s_num <> 0
AND cob_dt = TO_DATE('29.05.2015', 'DD.MM.YYYY')
Note: business testers/users in specifics roles are using SQL too (data stewards,
data specialists, data analysts, …)
Test object + mapping specification + test code
Test Object:
Target Table: RT_ACC
Target Attribute: ACC123
Mapping specification:
for all (take IFRS)
Everything is used upon the Group chart of accounts
(DOM_GCOA)
1. The digit 1 must be 1 (1 = only assets)
2. Check position 5 + 6:
a) 31, 32 or 33 is HFT (Held for Trading) Fair value
b) 41, 71, 72, 73 is FV (Fair value)
c) 51 is AFS (Available for Sale) Fair value
d) 52 is AFS at-cost
e) 11 is LAR (Loans and receivables) at cost
f) 61 is HTM (Held to Maturity) at cost
Code:
CASE WHEN substr(dacIFRS.DOM_GCOA,1,1) = '1' THEN
CASE WHEN substr(dacIFRS.DOM_GCOA,5,2) in ('31', '32', '33', '41', '71', '72', '73','51') THEN 0
WHEN substr(dacIFRS.DOM_GCOA,5,2) in ('52', '11', '61') THEN 1
ELSE 2 END
WHEN anc.ACC000 = 11 THEN 1
ELSE 0 END ACC123
Test Steps + Test Results
Results in the database:
31 records with ACC123 IS NULL have been subject of further investigations and discussions …
Test Steps:
Static Testing: Code-Review of implementation (PL / SQL package)
Plausibility Checks on resulting records after data processing
SELECT ACC123, COUNT(*) AS NUM_RECORDS,
CASE WHEN ACC123 IN (0, 1, 2) THEN 'OK' ELSE 'NOK' END AS BASIC_MAP_CHECK_RESULT
FROM LDDADMIN.AUR_ACC_AC GROUP BY ACC123 ORDER BY ACC123
Test Steps + Test Results
dev
elo
pm
ent Implementation
SELECT
SQL script
SQL result
Question: What is goal of mapping test ?
What is the goal of such test ? :
1. transformed data (SQL result)
2. mapping specification
3. mapping implementation
anal
ysis
Mapping specification
Requirement specification
test
dev
elo
pm
ent
Implementation
SELECT
SQL script
SQL result compare SQL result
mapping: TargetField = SourceField + 1
Test development patterns (reusable SQL patterns)
- repeated checks/SQL statements allow to define SQL patterns
- SQL patterns = test development unification
- could be used as basis for test automation
Strange, but not only data are tested
- primary focus => test data
- BI/DWH applications have GUI too => so we need standard
tests (functionality, usability, performance, ...)
- DWH could be surrounded by other applications => data
corrections, manual data uploads, specific calculations and
recalculations
Quote BI/DWH test specifics
“Not everything that can be counted counts, and not everything
that counts can be counted.”
~ Albert Einstein ~
BI/DWH Test approach
1. standard test activities have to be done:
- test planning
- test scope definition
- traceability setup
- roles & responsibilities
- test scheduling
but
BI/DWH test specifics has to be considered
BI/DWH test specifics
Planning & Organization: - many teams, complex environment
- dependency on data processing planning
Test environments, test data: - many data sources and localities
- long data processing
- data features & specific test types
Activities: - essential technical (SQL) data check
- business data confirmation during and after data processing
- data quality as the ongoing activity
Test plan – live structure example
BI/DWH Test approach
Release
scope
Data features
Data
flows
Test
Types
Test Schedule
Data sources
Data changes
Data processing
Data availability
goal: improved data quality and availability
Data features
Data Quality components should check data quality issues
Logging and Audit components have to provide log and audit trail info
Load Type components should be able to handle full loads and/or delta
Loading Dependency loading dependency between source deliveries e.g. corrections cannot
be loaded if original source file is missing
Transformations components with business transformation (framework) should be
provided
Generation components should be based on a common metadata set
Metadata all components access and utilize the same common metadata set
Restartability components are fully restartable
Test types I.
nonBI world:
- functionality
- usability
- reliability
- performance
- supportability
BI world:
- Accuracy
- Timeliness
- Consistency
- Integrity
- Conformity
- Completeness
- Correctness
Test types II.
Source data: 1.c. Consistency relations: 1.d. Integrity
data:1.a. Accuracy timing: 1.b. Timeliness
format:1.e. Conformity
data:1.f. Completness data 1.g. Correctness
other/derived technical test types: • Plausibility Test • Interface Operability Test • Interface Code Review • Checks on Landing-Zone
Interface • Regression Tests for
Fundamental Source Changes • Data Type Changes • Data Quality Log Checks • ...
DWH test specifics - local vs group point of view
Local specifics:
- local data sources only
- interactivity between local
teams is more flexible
- data processing is shorter
- data reprocessing is usually
possible without limitations
Group specifics:
- more data sources (multi-entity)
- interactivity between local
teams is less flexible
- data processing is longer (from
a few days to weeks)
- data reprocessing is usually
possible with limitations (e.g.
different interface versions, …)
How to test business use cases
Use case testing could be based on:
1. final outputs (e.g reports, DM smart cubes): advantage: full test of business requirements
disadvantage: testable only in the end of data processing (short time for test
execution, data reprocessing if huge time-consumer, …)
2. ongoing outputs from data processing (e.g. core layer tables): advantage: running test before finishing of data processing, defects are better
analyseable on data level
disadvantages: business testers have to have technical knowledge (SQL, DB
structure, …), tested outputs are sometimes not complete and not presented at the
same structure/form like final outputs
QuoteTestEnvironment
“If you can’t describe what you are doing as a process, you don’t
know what you are doing.”
~ Dr. W. Edwards Deming ~
Test Data Management
1. based on:
- data request specification (more projects)
2. not all data are processable (volume, time dependency, ...)
3. test data types:
- new version - data structure or content is changed
- prod_copy – if fresh data are needed
4. data testability depends on data processing status
Test Environment Management
1. based on:
- data request specification
- list of jobs
2. data processing is part of test environment preparation
3. test environment is shared between more projects
4. test manager have to monitor current testability
Data processing – general example
Data processing steps source_1 source_2 source_3 source_4 source_5 source_6 … source_x
Step1
Step2
Step3
Step4
Step5
Step6
Step7
Step8
Step9
Step10
Step11
Step12
source_3: 1. day
source_3: 2 day
source_3: 3 day
1. day - testable for step1 and step2 2. day - testable to step 6 3. day - testable to step 11
Data source_3 testability:
Data processing – real anonymized example
pictures
General Recap
DWH: data, layers, transformation steps,
huge volume, long processing
SQL is essential for technical checks
Business testers use SQL too
Test approach = data vs data processing vs
data features + test types
BI/DWH test specifics
Planning & Organization:
- many teams, complex environment
- dependency on data processing planning
Test environments, test data:
- many data sources and localities
- long data processing
- data features & specific test types
Activities:
- essential technical (SQL) data check
- business data confirmation during and after data
processing
- data quality as the ongoing activity
Q & A
Many thanks for your attention.
26/05/2016
? car test or wall test ?
real gap :o)
nice spring leafs or bug ?
Vocabulary Data warehouse (DWH): … system used for reporting and data analysis, and is considered as a core component of Business Intelligence … central repositories of integrated data from one or more disparate sources … store current and historical data … used for creating analytical reports for knowledge workers … data … uploaded from the operational systems … data may pass through an operational data store for additional operations before it is used in the DWH …
Business intelligence (BI): …"a set of techniques and tools for the acquisition and transformation of raw data into meaningful and useful information for business analysis purposes“ … BI technologies are capable of handling large amounts of unstructured data to help identify, develop and otherwise create new strategic business opportunities … allow easy interpretation of these large volumes of data … provide historical, current and predictive views of business operations … functions of BI technologies are reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics.
source: Wikipedia®
Vocabulary
Extract, Transform and Load (ETL): … process in database usage and
especially in data warehousing that:
• Extracts data from homogeneous or heterogeneous data sources
• Transforms the data for storing it in the proper format or structure for
the purposes of querying and analysis
• Loads it into the final target (database, more specifically, operational
data store, data mart, or data warehouse)
Data mart: … the access layer of the data warehouse ... is a subset of
the data warehouse that is usually oriented to a specific business line or
team… where conformed dimensions are used … The reasons why …
because the information in the database is not organized … Also,
complicated queries … While transactional databases are designed to
be updated, data warehouses or marts are read only.
source: Wikipedia®