Post on 11-May-2015
© 2011 IBM Corporation
Information Management
InfoSphere Optim Test Data Management Solution– IMS Focus
Peter Costigan – Product Line Manager, Optim Solutions9/28/2011
© 2011 IBM Corporation2
Information Management
Agenda
Information Governance Overview
Risks and Challenges of Poor Test Data Management
Best Practices in Test Data Management
InfoSphere Optim Test Data Management
Data Privacy Concerns with Non-Production Data
IMS and z/OS Considerations
Other InfoSphere Optim Solutions: Discovery, Archiving, Application Retirement
Conclusion
© 2011 IBM Corporation3
Information Management
Transactional & Collaborative Applications
Business Analytics Applications
External Information Sources
Mastering information across the Information Supply Chain
Trusted Relevant Governed
Analyze
Integrate
Manage Cubes
Streams
Big Data
Master Data
Content
Data
StreamingInformation
Information Governance
Data Warehouses
ContentAnalytics
Govern
Quality Security & Privacy
Lifecycle Standards
© 2011 IBM Corporation4
Information Management
Requirements to manage data across its lifecycle
Validate test resultsDefine policiesReport & retrieve
archived data
Enable compliance with retention &
e-discovery
Move only the needed information
Integrate into single data source
Create & refresh test data
Manage data growthClassify & define
data and relationships
Develop database structures & code
Enhance performance
Discover where data resides
Develop &Develop &TestTest
Discover &Discover &DefineDefine
Optimize, Archive Optimize, Archive & Access& Access
Consolidate &Consolidate &RetireRetire
Information Governance Core Disciplines Lifecycle Management
© 2011 IBM Corporation5
Information Management
How test data creation is often accomplished
Positives Negatives Simple to do
Requires little knowledge of the data model or infrastructure
Creates an exact duplication of production
Uses more storage than needed, multiple times Production data is a privacy risk
Data model changes are expected in Dev/Test, but require significant manual rework
Takes much time to create and refresh
No way to compare to original after test is complete
Cannot span multiple data sources/applications
Developer/Tester downtime when sharing data accessibility
Clone
Production Database Test Database Development
© 2011 IBM Corporation6
Information Management
Test Data Management Best Practices
TDM refers to the need to manage data used in testing and other non-production environments
Extract related subsets of production data that are targeted to functionality under test
De-identify / mask related test data to protect privacy
Quickly and easily refresh test environments
Edit data to create error and boundary conditions
Compare “before” and “after” images of test data
Benefits: Improving application quality & customer satisfaction
© 2011 IBM Corporation7
Information Management
Optim Captures Complete Business Objects
Business data is related across a wide variety of data sources
© 2011 IBM Corporation8
Information Management
InfoSphere Optim Test Data Management Solution
100 GB100 GB
25 GB
50 GB50 GB
Create targeted, right-sized test environments
Automate support for Data Model changes
Replace sensitive data with masked data
Refresh, reset and maintain test environments
Compare and resolve application defects
Accelerate release schedules
Production or Production Clone
25 GB
2TB
Development
Unit Test
TrainingIntegration
Test
Extract
Related subsetsMask / RemapInsert / Update / LoadCompare
© 2011 IBM Corporation9
Information Management
Business benefits of Test Data Management More time for testing
– In many organizations, 30-40% of test script execution is spent on manufacturing new test data. Test Data Management will reduce the amount of time spent creating new data thereby allowing for the execution of more tests
Reduce cost– Maximize allocated disk space– Catch errors earlier in the testing cycle– Shift errors from production to test
Increase data quality– Refreshing test data from a baseline will minimize the amount of manual
intervention currently required when creating new test data reducing triaging efforts and increasing test repeatability
Enforce data ownership– Often the “honor system” and spreadsheets are used to control test data
ownership. Test Data Management offers role driven security to support level segmentation of the development and testing teams
Reduce data dependencies across test sets– Multiple test sets often use the same data, but different tests can negatively
impact other tests using the same data. Test Data Management allows for the creation of an unlimited number of test data sets and can create unique IDs each time to ensue clean data is used when testing
© 2011 IBM Corporation10
Information Management
TDM Business Value Assessment: Detailed Financial Analysis
© 2011 IBM Corporation11
Information Management
Sensitive Production Data: What’s the risk?
Hackers obtained personal information on 70 million subscribers. April 2011: Malicious outsiders stole name, address (city, state, zip), country, email address, birth date, PlayStation Network/Qriocity password and login, and handle/PSN online ID, and possibly credit card numbers from 70 million Sony PlayStation users.
SQL injection is fast becoming one of the biggest and most high profile web security threats.April 2011: A mass SQL injection attack that initially compromised 28,000 websites shows no sign of slowing down. Known as LizaMoon, this malicious code is after anything stored in a database.
Unprotected test data sent to and used by test/development teams as well as third-party consultants.February 2009: An FAA server used for application development & testing was breached, exposing the personally identifiable information of 45,000+ employees.
Hundreds of thousands of secret reports regarding US wars in Iraq and Afghanistan published on WikiLeaks.December 2010: A private in the US military, downloaded top secret military documents and passed them to journalist for publication. This puts US national security at risk as well as the lives of those named in reports.
© 2011 IBM Corporation12
Information Management
What is data masking? Definition
Method for creating a structurally similar but inauthentic version of an organization's data. The purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required.
RequirementEffective data masking requires data to be altered in a way that the actual values cannot be determined or reengineered, functional appearance is maintained.
Other Terms UsedObfuscation, scrambling, data de-identification
Commonly masked data typesName, address, telephone, SSN/national identity number, credit card number
Methods– Static Masking: Extracts rows from production databases, obfuscating data
values that ultimately get stored in the columns in the test databases– Dynamic Masking: Masks specific data elements on the fly without touching
applications or physical production data store
© 2011 IBM Corporation13
Information Management
InfoSphere Optim Data Masking Solution / Option
Example 2Example 2Example 1Example 1
PersNbr FstNEvtOwn LstNEvtOwn
27645 Elliot Flynn
27645 Elliot Flynn
Event TableEvent Table
PersNbr FstNEvtOwn LstNEvtOwn
10002 Pablo Picasso
10002 Pablo Picasso
Event TableEvent Table
Personal Info TablePersonal Info Table
PersNbr FirstName LastName
08054 Alice Bennett
19101 Carl Davis
27645 Elliot Flynn
Personal Info TablePersonal Info Table
PersNbr FirstName LastName
10000 Jeanne Renoir
10001 Claude Monet
10002 Pablo Picasso
Data masking techniques include:
String literal valuesCharacter substringsRandom or sequential numbers
Arithmetic expressionsConcatenated expressionsDate aging
Lookup valuesGeneric mask
Referential integrity is maintained with key propagation
Patient InformationPatient InformationPatient InformationPatient Information
Patient No. SSN
Name
Address
City State Zip
Patient No. SSN
Name
Address
City State Zip
112233 123-45-6789
Amanda Winters
40 Bayberry Drive
Elgin IL 60123
123456 333-22-4444
Erica Schafer
12 Murray Court
Austin TX 78704
Data is masked with contextually correct data to preserve integrity of test data
Satisfy Privacy regulations Reduce risk of data breaches Maintain value of test data
© 2011 IBM Corporation14
Information Management
What is IMS Data to InfoSphere Optim?
IMS = Hierarchical Database– Database consists of segments – Segments are related (physically)
Optim uses a relational model of tables, rows and columns
Optim Distributed uses Middleware to access IMS. More tied to relational model.
Optim z/OS uses native (DL/I) access to IMS data.
-- ---- ---- ---- ------- ----EMPLOYEE
-- ---- ---- ---- ------- ----DEPARTMENT
-- ---- ---- ---- ------- ------ ---- ---- ---- ------- ----
JOB
© 2011 IBM Corporation15
Information Management
InfoSphere Optim z/OS IMS Definitions
Legacy Table Definition(s)Legacy Table
Definition(s)Legacy Table Definition(s)
Describes physical layout of segment
Create from COBOL or PL/I copybook– Associated with IMS segment– Definition stored in the Optim Directory
Relate to other tables (DB2 or Legacy) via Optim Relationship
Segment treated as virtual DB2 table by any Optim process
IMS DB
IMS Definitions
Maps
Legacy Tables
Relationships
Definitions
Optim Directory
EMPLOYEE
VENDITEM
OPT.PROD.PSTDEPDB
VSAMFileOPT.PROD.
VENDITEM
copybooks
© 2011 IBM Corporation16
Information Management
InfoSphere Optim z/OS Platform Access to Data Sources
DB2AS400
DB2 IMS VSAM / Seq
Native Client Access
InfoSphere Optim
& DB2 for z/OS
Excluded for IMS/VSAM/Seq:
-TDM Compare
-TDM Edit
-Archive
-Application Retirement
© 2011 IBM Corporation17
Information Management
InfoSphere Optim Distributed Platform Access to Data Sources
Data sources / tables exposed as NicknamesIBM
FederationServer
Oracle 9HP UX
DB2 AIX SQL ServerWin 2003
DB2 LinuxIMS / VSAMz/OS
DB2AS400
ClassicFederation
ODBC Client Client Client
Optim Server Native Client Access
Leverage Middleware
© 2011 IBM Corporation18
Information Management
InfoSphere Optim z/OS Requirements for IMS / VSAM / Sequential
Available:– IMS V12 Support (Optim z/OS V6 and V7)– Support for masking data in fixed length arrays (OCCURS) – IMS Sequential Dependent (SDEP) Segment Support– Support multiple record layouts for an IMS segment– Batch IMS/VSAM/Seq Table definition utility– Date/Time/Timestamp data types in IMS/VSAM/Seq Table Definitions – IMS Compression Exit
High Priority:– VSAM, Sequential and IMS Related Compare– Support for masking data in variable length arrays (ODO)– More flexible Optim relationship support– Tester productivity enhancements via Self-Service– Improvements in unkeyed segment support (over time)– Improvements in IMS access path selection (over time)– Extract IMS data during IMS Unload– Archive IMS, VSAM and Sequential natively on z/OS– Common Eclipse-based UI (Optim Designer and Manager)
© 2011 IBM Corporation19
Information Management
Requirements to manage data across its lifecycle
Validate test resultsDefine policiesReport & retrieve
archived data
Enable compliance with retention &
e-discovery
Move only the needed information
Integrate into single data source
Create & refresh test data
Manage data growthClassify & define
data and relationships
Develop database structures & code
Enhance performance
Discover where data resides
Develop &Develop &TestTest
Discover &Discover &DefineDefine
Optimize, Archive Optimize, Archive & Access& Access
Consolidate &Consolidate &RetireRetire
Information Governance Core Disciplines Lifecycle Management
© 2011 IBM Corporation20
Information Management
Discovery: You can’t manage what you don’t understand
?
??
??
??
?
???
?
?
?
?
?
?
?
??
?
??
?
??
?
?
?
?
Challenges:– How do I know what data is
needed for test cases– Lack of understanding of where
data is located and how the data is related
– Limited understanding of confidential data elements
– Cost prohibitive to conduct manual analysis and hand coding
Result:– Lack of agility in testing– Poor data governance– Bad data = Bad business
decisions– Inadvertent exposure of sensitive
information
© 2011 IBM Corporation21
Information Management
InfoSphere Discovery Speeds Understanding Data
21
Row Member SS # Age Phone Sex
1 595846226 123-45-6789 15 (123) 456-7890 M
2 567472596 138-27-1604 8 (138) 271-6037 F
3 540450092 154-86-4196 22 (154) 864-1961 M
4 514714372 173-44-7900 55 (173) 447-8996 F
5 490204164 194-26-1648 4 (194) 261-6476 F
6 466861109 217-57-3046 66 (217) 573-0453 M
987,623 444629628 243-68-1812 25 (243) 681-8107 F
987,624 423456789 272-92-3629 87 (272) 923-6280 M
ID Demo1
595846226 0
567472596 1
540450091 2
514714372 3
490204164 1
466861109 0
444629628 3
423456789 2
Table 1Table 25
The Discovery Engine analyzes data values to automatically discover the columns that relate rows across data sources, and the columns which contain sensitive data.
IBM InfoSphere Discovery
Hit
Ra
te: 98
%
X -
© 2011 IBM Corporation22
Information Management
InfoSphere Optim Data Growth Solution
Compressed Archives
Compressed Archives
2 -
4 Y
ears
Act
ive/
His
toric
al O
nlin
e2
- 4
Years
Act
ive/
His
toric
al O
nlin
e
InfoSphere Optim
Business Value:
Saves Production storage costs
Improves Production performance
Manage Archive Files through their lifecycle: retention policy compliance
Mitigates risks of removing data from Prod.
ArchiveArchive
RestoreRestore
Additional Options
ODBC / JDBC
XML
SQL
Excel
Access
Non DBMS Retention Platform
ATA File ServerEMC Centera™, DR550, Etc.
Non DBMS Retention Platform
ATA File ServerEMC Centera™, DR550, Etc.4
- 6
Years
On/
Nea
r-Li
ne A
rchi
ve4
- 6
Years
On/
Nea
r-Li
ne A
rchi
ve
Off-line Retention PlatformCD,Tape,Optical, WORM,IBM TSM,NetApp NearStore® SnapLock™,IBM Total Storage® solutions (including the DR550) EMC Centera™.
Off-line Retention PlatformCD,Tape,Optical, WORM,IBM TSM,NetApp NearStore® SnapLock™,IBM Total Storage® solutions (including the DR550) EMC Centera™.
6+
Years
Off-
Line
Arc
hive
6+
Years
Off-
Line
Arc
hive
Native access
UNIVERSAL
ACCESS
UNIVERSAL
ACCESS
ProductionData
1 - 2 YearsCurrent Data
1 - 2 YearsCurrent Data
© 2011 IBM Corporation23
Information Management
InfoSphere Optim Application Retirement
Preserve application data in its business context
Retire out-of-date packaged applications as well as legacy custom applications
Shut down legacy system without a replacement
Infrastructure before RetirementInfrastructure before Retirement Archived Data after ConsolidationArchived Data after Consolidation
`
User Archive DataArchive Engine
`
User
`
User
`
User DatabaseApplication Data
`
User DatabaseApplication Data
`
User DatabaseApplication Data
© 2011 IBM Corporation24
Information Management
Conclusion
Test Data Management allows development teams to accelerate testing activities on a project
Test Data Management exploits production data while ensuring security of confidential data
Providing testers and developers with access to test data can improve operational efficiency and optimize resources on a project
A comprehensive Test Data Management solution is needed to minimize cost and shorten development cycles
© 2011 IBM Corporation25
Information Management
Learn more
Product Family Webpage
Solution Sheet: InfoSphere Test Data Management Solution brief
Whitepaper: Integrated Strategies to Improve Application Testing
Case Study: InfoSphere Test Data Management
© 2011 IBM Corporation26
Information Management