Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large...

51
IBM® © 2009 IBM Corporation Essentials of Test Data Management and Data Privacy Dean Compher, Sr IT Specialist Shaji Chandrashekhar, Sr IT Specialist

Transcript of Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large...

Page 1: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM®

© 2009 IBM Corporation

Essentials of Test Data Management and Data PrivacyDean Compher, Sr IT SpecialistShaji Chandrashekhar, Sr IT Specialist

Page 2: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation2 02/24/09

DisclaimersIBM customers are responsible for ensuring their own compliance with legal

requirements. It is the customer's sole responsibility to obtain advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulatory requirements that may affect the customer's business and any actions the customer may need to take to comply with such laws.

IBM does not provide legal advice or represent or warrant that its services or products will ensure that the customer is in compliance with any law.

The information contained in this documentation is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information provided, it is provided “as is” without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this documentation or any other documentation. Nothing contained in this documentation is intended to, nor shall have the effect of, creating any warranties or representations from IBM (or its suppliers or licensors), or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

Page 3: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation3 02/24/09

Agenda

TDM – What is it? Why is it important?

Current approaches

Key requirements for an effective approach for TDM

Technical Deep Dive

Questions

Page 4: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation4 02/24/09

Enterprise Application/Database Snapshot in Time

DevelopmentUnit Testing

V3

V2

V2

QARegression Testing

UserAcceptance Testing

Development and QA Environments

Production Version 1

Page 5: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation5 02/24/09

Multiple “Consumers” For Test EnvironmentsDevelopersTesters on project teamsTesters in central groupCustomers IT operations

Unit Sole

Integration Sole

Component Primary Secondary

System Secondary Primary Secondary

System integration Secondary Primary

UAT Primary

Implementation Secondary Primary

Internal and External “consumers” such as off-shore teams and partners

Page 6: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation6 02/24/09

Multiple Requirements for Test Environments

Functionality– Features and capabilities

Performance– Speed, availability, tolerance for load

Usability– Ease with which the software can be employed

Security– Vulnerability to unauthorized usage

Compliance– Conformance to internal standards or external regulations

Page 7: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation7 02/24/09

Key Business Goals

Reduce Business Downtime

Get to Market Faster

Maximize Process Efficiencies

Improve Quality

Page 8: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation8 02/24/09

Infrastructure Costs – higher HW storage costs

Development Labor - higher costs

Defects – Can be expensive• Cost to resolve defects in the production environment can be 10 – 100 time greater than those caught in the development

environment

Data Privacy/Compliance• Data breaches can put you out of business

Some Key Considerations

Page 9: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation9 02/24/09

Test Data Management

Strategy and approach to creating and managing test environments to meet the needs of various

stakeholders and business requirements.

Page 10: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation10 02/24/09

Enterprise Data Management

Page 11: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation11 02/24/09

Current Approaches

Wait

ProductionDatabase

Copy

Manual examination:Right data?

What Changed?Correct results?

Unintended Result?Someone else modify?

After

ProductionDatabase

CopyChanges

Clone Production

Request for Copy

#1 - Clone Production

Share test databasewith everyone else

Extract

• RI Accuracy?• Right Data?

Expensive,Dedicated Staff,Ongoing Responsibility.

ChangesAfter

• Complex• Subject to Change

Extract

Write SQL

#2 - Write SQL

Page 12: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation12 02/24/09

Cloning And Data Multiplier Effect

6

5

41

32

1. Production 500 GB

2. Training 500 GB

3. Unit 500 GB

4. System 500 GB

5. UAT 500 GB

6. Integration 500 GB

Total 3,000 GB

Page 13: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation13 02/24/09

Some Key Issues With Current Approaches Cloning can create duplicate copies of large databases

– Large storage requirements and associated expenses– Time consuming to create– Difficult to manage on an on-going basis

Data privacy not addressed Production data without production level security Data breach risk

Internally developed approaches not cost effective– Lengthy development cycles– Dedicated staff– On-going maintenance

Page 14: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation14 02/24/09

In Oracle’s Own Words……..

Tip #27—Test with a Representative Subset of Production Data

“When performing the development upgrade, it is important to leverage a representative subset of production data instead of an exact copy; this is because the development environment usually has less capacity in both memory and hard drive space than the test and production environments. Limiting the size of the conversion files during the development upgrade will better ensure that the processes will complete in a timely manner.”

Page 15: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation15 02/24/09

Using Subsetting For Effective TDM

PRODCLONED

PROD REDUCEDCLONE GOLD

Database resized*

and re-indexed

TEST

TRAINING

DEV

Extract & Load

The cloning is performed only once!

Page 16: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation16 02/24/09

Test Data Management – Building Blocks

Run Test

Refresh Test Data

Go Production!

Extract

CompleteBusiness Object(s)

Compare Before & After Image

•Transform / Mask

•Edit Data

Page 17: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation17 02/24/09

Effective Test Data Management Solution

Subsetting capabilities to create realistic and manageable test databases

De-identify (mask) data to protect privacy

Quickly and easily refresh test environments

Edit data to create targeted test cases

Audit/Compare ‘before’ and ‘after’ images of the test data

Page 18: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation18 02/24/09

. . . Test Database

Test Environment

TestEnvironment

Key Aspects of an Effective TDM Approach

Production Environment

Subset De-Identify? Refresh Analyze

Test Database

Test Environment

Production Database

Production Database

Page 19: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation19 02/24/09

Subset: Key Capabilities Precise subsets to build realistic “right-sized” test databases

• Application Aware • Flexible criteria for determining record sets• Business Logic Driven• Complete Business Object: Referentially intact subsets• Across heterogeneous environments

Oracle ERPLegacy

CRM

Custom Order Entry

Page 20: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation20 02/24/09

Subset: Complete Business Object

• Referentially-intact subset of data

• Example:All Open –DN Call Back

related to Cust_ID 27645 (Karen Smith)

ORDERS

27645 80-2382 20 June 2006

27645 86-4538 10 October 2006

DETAILS

86-4538 DR1001 System Outage

86-4538 CL2010 Broken Cup Holder

CUSTOMERS

191010213427645

Joe PittJohn JonesKaren Smith

Cust_ID is

Primary Key

Page 21: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation21 02/24/09

Subset: Complete Business Object

ORDERS

27645 80-2382 20 June 2006

27645 86-4538 10 October 2006

DETAILS

86-4538 DR1001 System Outage

86-4538 CL2010 Broken Cup Holder

CUSTOMERS

191010213427645

Joe PittJohn JonesKaren Smith

ITEMS

DR1001 Widget #1 25.00

CL2010 Widget #PG13 30.00

CM3002 Widget#45 28.00

• ITEMS is a “Reference Table”

Page 22: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation22 02/24/09

Data De-IdentificationProduction Test

Validate, Compare, Audit

Application X(Oracle)

Application X(Oracle)

Application Y(Oracle)

Application Z(DB2)

Application Y(SQLServer)

Application Z(DB2)

• De-identify for privacy protection• Deploy multiple masking algorithms • Substitute real data with fictionalized yet contextually accurate data

• Provide consistency across environments and iterations

• No value to hackers• Enable off-shore testing

Ensure Data Privacy Across Non-Production Environments!

Subset De-Identify De-IdentifiedSubset

De-IdentifiedSubset

Only de-identifieddata moved to test

Page 23: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation23 02/24/09

The Latest on Data Privacy 2007 statistics

– $197• Cost to companies per

compromised record– $6.3 Million

• Average cost per data breach “incident”

– 40%• % of breaches where the

responsibility was with Outsourcers, contractors, consultants and business partners

– 217 Million• TOTAL number of records

containing sensitive personal information involved in security breaches in the U.S. since 2005

* Sources”: Ponemon Institute, Pirvacy Rights Clearinghouse, 2007

Page 24: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation24 02/24/09

Did You Hear?

UK gov’t suffered a massive data breach in Nov. 07 – HMRC (Her Majesty's Revenue

& Customs) UK equivalent to IRS

Lost 2 disks containing personal information on 25 million people (ALMOST ½ of UK population!)

Information has a criminal value of $3.1 Billion

No reported criminal activity to date

Page 25: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation

Refresh

Load test environment with precise set of data

Subset further as required

Load utility for large volumes of data

Easily refresh environments

Subset

TESTDB

QADB

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

AP_INVOICES

INVOICE DIST

ACCT EVENTS

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

AP_INVOICES

INVOICE DIST

ACCT EVENTS• Insert• Update• Load

Page 26: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation26 02/24/09

Analyzing Test Data

Both Invoices total $100

Composition is different

Could an error have been missed?

INVOICES27645 86-4538 Widget#1 $80.00

27645 86-4538 Widget#PG13 $20.00

Version 1

Version 2

INVOICES27645 86-4538 Widget#1 $50.00

27645 86-4538 Widget#PG13 $50.00

Invoice Total $100.00

Invoice Total $100.00

Page 27: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation

Analyze Test Data

SOURCE 1

SOURCE 2

COMPAREPROCESS COMPARE

FILE

Compare the "before" and "after" data from an application test

Compare results after running modified application during regression testing

Identify differences between separate databases

Audit changes to a database

Compare should analyze complete sets data – finding changes in rows in tablesSingle-table or multi-table compareCompare file of results

Edit Data to Create Test Cases

Page 28: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation28 02/24/09

Effective TDM: Example ROI Benefits

Projected ROI = 504% (3 years), Payback Period = 13 months

Page 29: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation29 02/24/09

Summary: An Effective TDM Solution

Ability to extract precise subsets of related data to build realistic, “right-sized” test databases– Complete business object– Create referentially intact subsets– Flexible criteria for determining record sets

De-identify sensitive data in the test environment to ensure compliance with regulatory requirements for data privacy

Easily refresh test environments

Analyze test data.

Page 30: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation30 02/24/09

Success: Data Privacy & Test Data Management

Application:– Custom Insurance Applications

Challenges:– Protecting confidential customer

information required by GLB by addressing privacy vulnerabilities in the application development and testing environments

– Creating realistic “federated” testing environments by extracting test data across complex DB2, Oracle, Informix, IMS, VSAM databases

– Ensuring valid testing results by retaining the data integrity after sensitive information is de-identified

Solution:– IBM Optim Test Data Management

Solution

Client Value:– Mitigated risk of data breaches by

implementing a consistent strategy for de-identifying sensitive data in development and testing environments

– Improved enterprise-wide testing processes by using subsetting and transformation capabilities across applications, databases and operating systems

– Ensured test validity by using a variety of masking techniques that preserved the data integrity, while propagating the de-identified data throughout the test environment

About the Client:$10 Billion Insurance Company

G06

Page 31: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation31 02/24/09

Why Do Something? TDM Saves MoneyEliminated downtime associated with rebuilding test environments - savings of up to $250,000 per year. Achieved more than $100,000 annual savings collectively for 10 to 15 projects.$

Reduced the time needed to create a test environment by up to 90% (from 20 days to just 2 days). Improved time-to-deployment of new application functionality, contributing to critical business/financial initiatives.

Reduced operational cost and improved efficiencies by reducing the size of test database from 1.2TB to 24GB

Page 32: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation32 02/24/09

Enterprise Architecture

Single, scalable, interoperable EDM solution provides a central point to deploy policies to extract,store, port, and protect application data records from creation to deletion

ArchiveProduction Environments

OEM/

ISV

Cust

om

Amdo

cs

SAP

JDEd

ward

s

Peop

leSof

t

Orac

le

Sieb

el

Subset & MaskNon Production Environments

OEM/

ISV

Cust

om

Amdo

cs

SAP

JDEd

ward

s

Peop

leSof

t

Orac

le

Sieb

el

NAS SAN ATA CAS Optical Tape

Windows XP/2000 Solaris HP/UX Linux AIX OS/390 Z/OS i-Series

Oracle SQL Server Sybase Informix DB2 LUW XML IMS VSAM Adabas DB2 z/OS Teradata

Optim ™

Data Growth, Data Privacy, Test Data Management, Application Upgrades, Application Retirement

Page 33: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation33 02/24/09

Saves:Programmer/DBA timeDisk space utilizationTesting interference

ExtractFile

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

CUSTOMERS

ORDERS

DETAILS

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

TESTDB

-- ---- ---- ---- ------- ----CUSTOMERS

-- ---- ---- ---- ------- ----ORDERS

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETAILS

QADB

PRODDB

LOAD

EXTRACT INSERT/UPDATE

LoadFiles

The Relational Extract Facility

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

New_DB

Create

Page 34: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation34 02/24/09

Optional:

• Selection Criteria• Data Sampling• Data Partitioning• Relationship Usage

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

CUSTOMERS

ORDERS

DETAILS

PRODDB

ExtractFile

• Start Table• Set of Tables

Required:

Defining the Extract…..

Tables

Views

Synonyms

Aliases

Page 35: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation35 02/24/09

Extract Process - Extract Parameters

Extract from source tables using dynamic SQL

Extract data and/or object definitions

EXTRACT

ExtractFile

ProcessReport

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

CUSTOMERS

ORDERS

DETAILS

PRODDB

Use BROWSE to verifyextracted data

Point &Shoot

Page 36: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation36 02/24/09

ExtractFile

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

TESTDB

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

QADB

LOAD

INSERT/UPDATE

LoadFiles

Application Testing Data

Extract a relationally intact subset from production database(s)

• Extract data and/or object definitions• From multiple tables (files) that are related• From multiple tables (files) that are not related• From single tables (files)• All data or subset

• Define a new set of test tables• Populate Target databases• Refresh Target databases

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

CUSTOMERS

ORDERS

DETAILS

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

CUSTOMERS

ORDERS

DETAILS

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

NewDB

Create

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

New_DB

Create

Page 37: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation37 02/24/09

During Extract Process

Or

Standalone Convert Process

Or

During Insert/Load Process

Transform or mask sensitive data using Standard mapping rules: Literals, Special Registers, Expressions, Default Values, Look-up tables

Complex mapping rules: User exits written by R2K

De-Identify Test Data

Production Data

Extract and

Convert

Masked Test Data

Page 38: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation38 02/24/09

ExtractFile

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

Transform / mask sensitive data

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

TESTDB

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

QADB

LOAD

INSERT/UPDATE

LoadFiles

Data Privacy in Application Testing

Extract a relationally intact subset from production database(s)

• Extract data and/or object definitions• Define a new set of test tables• Apply masking during population process • Extract file may be reused but contains un-Masked data• Good practice for testing masks

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

CUSTOMERS

ORDERS

DETAILS

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

CUSTOMERS

ORDERS

DETAILS

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

NewDB

Create

Page 39: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation39 02/24/09

ExtractFile

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

CUSTOMERS

ORDERS

DETAILS

Transform / mask sensitive data

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

TESTDB

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

QADB

LOAD

INSERT/UPDATE

LoadFiles

Data Privacy in Application Testing

Data transformation functions: Hard-code literals, special registers such as date, time Arithmetic calculations Sequential number generation Random number generation Substring and/or concatenation of values Lookup Table Functions Random, Specific or HASH Intelligent TRANSformation Library – SSN, CCN, email,… Access to client-defined exit routines to apply complex algorithms, encryption, …Propagation of masked primary keys to dependent foreign keys

Extract a relationally intact subset from production database(s)

Page 40: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation40 02/24/09

Propagating Keys

CUSTOMERS2

ORDERS2

DETAILS2

88888 80-2382 20 June 2002

88888 86-4538 10 October 2002

86-4538 Merrill Lynch MER

86-4538 Citigroup C

CUSTOMERS

ORDERS

DETAILS

27645 80-2382 20 June 2002

27645 86-4538 10 October 2002

86-4538 Merrill Lynch MER

86-4538 Citigroup C

08054 Jim Jackson ----------------19101 John Jones ----------------27645 Mary Smith ----------------

55555 Jim Jackson ----------------33333 John Jones ----------------88888 Mary Smith ----------------

Referential integrity is maintained

Page 41: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation41 02/24/09

Consistent Masking across the Enterprise

Masked fields are consistent

Data is masked

132009824

157342266

SS#s

132009824

157342266

SS#s

DB2

323457245

134235489

SSN#s

323457245

134235489

SSN#s

Client Billing Application

Page 42: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation42 02/24/09

First Names and Last Names Data Sets

StaceyDaveDanielleBobJohn

First Name Last Name GPA High School Advisor State

Paul Smith 3.2 Princeton Johnson NJ

Kate Jones 2.7 Albany Kline NY

First Name Last Name GPA High School Advisor State

Stacey Nelson 3.2 Princeton Johnson NJ

Dave Reese 2.7 Albany Kline NY

1) Client is a University who wishes to mask the first and last name fields in their admissions database

2) Optim now has a first name lookup table with over 5,000 male/female names and a last name lookup table with over 80,000 names

Test Database

ReeseHowellKline

NelsonNewton

First Name Lookup Table

Production Database

Last Name Lookup Table

3) Use Lookup Tables to randomly replace table first and last names

Page 43: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation43 02/24/09

Intelligent Masking Capability

154-74-7788254-77-6644

SSN#

4324115574123654JonesVanessa5298774132478855DenverJohn

Credit Card#L. NameF. Name

154-74-7788854-77-6644

SSN#

4972584612457744JonesVanessa5326458711224956DenverJohn

Credit Card#L. NameF. Name

Production Database

Data before Masking

Data after Masking…

Masked with Valid CC# and SS#

How are these numbers valid?

Test DatabaseValidValid

Most credit card numbers are encoded with a "Check Digit". A check digit is a digit added to a number (either at the end or the beginning) that validates the authenticity of the number. A simple algorithm is applied to the other digits of the number which yields the check digit.

A Social Security Number (SSN) consists of nine digits. The first three digits is called the "area number'. The central, two-digit field is called the "group Number". The final four-digit field is called the "serial Number". All numbers must fit the latest available criteria for each section.

For Credit Card Numbers

For Social Security Numbers

Page 44: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation44 02/24/09

Populate Destination Tables - Column Map

Map unlike column namesTransform/mask sensitive dataDatatype conversionsColumn-level date agingColumn-level currency conversionSource columns:

Literals

SpecialRegisters

Expressions

DefaultValues

User exits

Page 45: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation45 02/24/09

Populate Destination Tables - Control File

If INSERT/UPDATE errors occur:

BROWSE the control file for error information

RETRY/RESTART the INSERT/UPDATE process

ExtractFile

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

TESTDB INSERT/UPDATE

ControlFile

Statistical informationError information

ProcessReport

Page 46: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation46 02/24/09

The Relational Extract Facility - Summary

Creating and maintaining test data bases

Migrating data

Populating decision support data bases

ExtractFile

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

-- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ----

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

CUSTOMERS

ORDERS

DETAILS

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

TESTDB

-- ---- ---- ---- ------- ----CUST

-- ---- ---- ---- ------- ----ORD

-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----

DETL

QADB

PRODDB

LOAD

EXTRACT INSERT/UPDATE

Point &Shoot

LoadFiles

Page 47: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation47 02/24/09

Traditional vs. Relational Tools

One table/view at a time

No edit of related datafrom multiple tables

FIND CUSTOMERNOTE INFOEXIT TABLE

FIND ORDERSNOTE INFOEXIT TABLE

FIND DETAILSNOTE INFOEXIT TABLE CUSTOMERS

ORDERS

DETAILS

........................ ........................ ........................ ........................ ........................

Single Table Editors The Relational Editor• Simultaneous browse/edit of related data from multiple tables

Page 48: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation48 02/24/09

Product Editor - The Programmer’s Solution

Understand the data your application is to process

Create data values to test program logicInspect and correct data that is causing

problems

Verify execution results

Editor helps you to:

Page 49: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation49 02/24/09

Compare

Single-table or multi-table compare

Creates compare file of results

Displays results on screen

SOURCE 1

SOURCE 2

COMPAREPROCESS COMPARE

FILE

COLUMNMAP

TABLEMAP

Page 50: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation50 02/24/09

Business Benefits of Test Data Management (TDM) More time for testing

– In many organizations, 30-40% of test script execution is spent on definition and creation of new test data. A TDM will reduce the amount of time spent creating new and current data thereby allowing for the execution of more tests.

Increase data quality– Refreshing test data from a baseline will minimize the amount of manual

intervention currently required when creating new test data reducing triaging efforts and increasing test repeatability.

Enforce data ownership– Often the “honor system” and spreadsheets are used to control test data

ownership. A TDM strategy offers role driven security to support level segmentation of the development and testing teams.

Reduce data dependencies across test sets– Multiple test sets often use the same data, but different tests can negatively

impact other tests using the same data if they are not refreshed together. TDM strategies allow for the creation of an unlimited number of test data sets and can create unique and timely test sets to ensure clean data is used when testing.

Page 51: Essentials of Test Data Management and Data Privacy · 2019-04-17 · Load utility for large volumes of data Easily refresh environments Subset TESTDB ... across complex DB2, Oracle,

IBM

© 2009 IBM Corporation51 02/24/09