Data Warehousing in Pharma: How to Find Bad Data while Meeting Regualtory Requirements
-
Upload
rtts -
Category
Technology
-
view
501 -
download
4
description
Transcript of Data Warehousing in Pharma: How to Find Bad Data while Meeting Regualtory Requirements
QuerySurgeTM
PresenterBill HaydukFounder / President
PresenterJeff Bocarsly, Ph.D.
Senior Architect
ModeratorLaura PoggiMarketing Manager
1
How to Find Bad Data
while Meeting Regulatory
Requirements
Data Warehousing in Pharma :
built by
The average organization loses $8.2 million annually through poor Data Quality.
- Gartner
46% of companies cite Data Quality as a barrier for adopting Business Intelligence products.
- InformationWeek
The cost per patient data of Phase 3 clinical studies of new pharmaceuticals exceeds $26,000.
- Journal of Clinical Research Best Practices
Pharma’s 2 Largest
Data Warehousing Concerns
(1) Data Integrity (2) Compliance
Pharma’s Largest DWH Concerns
Pharma’s Largest DWH Concerns(1) Data Integrity
high risk of defects that are not readily visible
Missing Data
Truncation of Data
Data Type Mismatch
Null Translation errors
Incorrect Type Translation
Misplaced Data
Extra Records
Transformation Logic Errors/Holes
Simple/Small Errors
Sequence Generator errors
Undocumented Requirements
Not Enough Records
Pharma’s Largest DWH Concerns
(2) ComplianceNeed to comply with Part 11 mandates
historical test information test version history
test execution data: who, what & when
test cycle information
visibility of assets archived test results
Why is this Important?
Periodic data reporting to FDAPeriodic data reporting to int’l
bodies
(1) Data Integrity (2) ComplianceFDA announced auditsUnannounced FDA audits
ConsequencesSevere financial and
business
Pharma’s Testing and Reporting
Needs
automate the manual testing of data
compare millions of rows of data quickly
flag mismatches and inconsistencies in data sets
provide flexibility in scheduling test runs
generate informative reports that can easily be shared
with the team
validate up to 100% of all of all data, mitigating the risk
Data Integrity needs
Need a test tool that can…
Part 11 Reporting needs
track test history
provide reporting on test version history
record all test execution by testing owner’s
name and date
deliver auditable reports of test cycles
store all test outcomes and test data
offer a read-only user type for reviewing
test assets
support archiving of results
Need a test tool that can…
built by
The solution…
QuerySurge is the
premier test tool built
to automate Data Warehouse testing
and the ETL Testing Process
What is QuerySurge ™?
built by
automates the testing effort the kickoff, the tests, the comparison, emailing the results
speeds up testing up to 1,000 times faster than manual testing
simplifies the scheduling of test runs run now, every Tuesday at 11pm or right after ETL process
tests across different platformsany JDBC-compliant database, DWH, D-Mart, flat file, XML
provides reports, shares & stores results covering Part 11 requirements
QuerySurge……verifies more data verifies upwards of 100% of all data
built by
Design Library Create Query Pairs (source & target SQLs) Test queries
QuerySurge™ Modules
Scheduling Build groups of Query Pairs Schedule Tests to run
unattended
Run Dashboard View real-time execution Analyze real-time results
Deep-Dive Reporting Examine, share and store
test results
built by
QuerySurge™ Architecture
Que
rySu
rge
Arch
itect
ure
Target
Sources
Case Study
Fortune 500 firmClinical Trial Data
Case Study: Fortune 500 PharmaChallenge
How can a Data Warehouse team assure data integrity over multiple builds when the cost per patient data of Phase 3 clinical studies exceeds $26,000 and volume of live case data is > 1 TB?
Strategy
Implement QuerySurge™ to dramatically increase coverage of data that is verified for each build.
Implementation
• 1,000 SQL queries written to compare case data from the source systems to the DWH after ETL.
• QuerySurge™automated the scheduling, test runs, comparisons and reporting for each build.
Metrics
500 mappings 2.5 million data items 1.25 billion verifications Complete run finished in 7 days 45% of data was covered. 14 builds were deployed 115 defects were discovered and
remediated
Case Study: Fortune 500 Pharma
Benefits
• 10-fold increase in the speed of testing.• Huge increase in coverage of data (from less than 1/10 % to 45%)• Production defects discovered that were missed in previous cycles• Huge savings on clean records (115 defects x $26,000/record)• A huge time savings (3.6 years x 10 people)• Avoidance of lawsuits and FDA fines
20
QuerySurge™: DEMO
How to Find Bad Data
while Meeting Regulatory
RequirementsDEMO
www.QuerySurge.combuilt by
www.QuerySurge.com
Questions and Answers