© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 1 Octagon Research Solutions, Inc....

18
© 2009 Octagon Research Solutions, Inc. All Rights Reserved. 1 Octagon Research Solutions, Inc. Leading the Electronic Transformation of Clinical R&D © 2009 Octagon Research Solutions, Inc. All Rights Reserved.

Transcript of © 2009 Octagon Research Solutions, Inc. All Rights Reserved. 1 Octagon Research Solutions, Inc....

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.1

Octagon Research Solutions, Inc.Leading the Electronic Transformation of Clinical R&D

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.2

Data Profiling

Octagon Research Solutions

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.3

Metadata Profiling

• Metadata (structure)– Likeness of nomenclature among study

databases– Answer some planning questions:

• Claim: “The studies are 90% identical.” Are they?

• If they indeed are, can you to create pool(s) of source data to gain efficiency?

Not our main focus today

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.4

Data Profiling

• Data (content)– Statistics, e.g., min, max, average– Relationship– PatternFact: Data are often “bad, worse, or ugly”

Goal: Get a realistic pulse on quality of the data

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.5

Case Study(“Slightly” Altered for Illustration Purposes)

• Background– Central lab, i.e., eDT

• CHEM for biochemistry (20807 records), along with 4 other labs

– No annotated CRF• Mapping document initially authored using

variable label

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.6

Case Study (con’t)

• Sponsor decisions:– Match standard results with original results,

i.e., no unit conversion; therefore, LBSTRSC = LBORRES

– LPARM to (LBTEST and LBTESTCD) will be done through a sponsor-supplied lookup table

Easy enough, right?

High-level mapping based on source dataset metadata

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.8

Case Study (con’t)

• Programmer noticed errors– LBSTRESN is a numeric variable, but

CHEM.LVALUE contains non-numeric data

• Programmer determined the mapping specifications document is not detailed enough, began to involve the analyst

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.9

Case Study (con’t)

• Let’s look some options at their disposal (novice to veteran):– SAS System Viewer– A creative method by an Excel-savvy– SAS PROC FREQ

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.10

Case Study (con’t)

• SAS System Viewer– Read-only, great for displaying data– Unreliable as a data browser

• Analyze data in Excel– Very manual– Changes of data ownership, possible “lost in

translations”?• “Smart” behaviors, e.g., “01JAN2009 12:00” to “1/1/2009

12:00:00 PM”, auto-trimming, etc

• SAS PROC FREQ– CHEM.LVALUE: 20807 records reduced to

1237 unique values

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.11

Case Study (con’t)

• 4th option– A data pattern analyzer

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.12

Case Study (con’t)

– Reduced 20807 records to only 11 patterns

Aha, we found the needle in the haystack! 0.3% of LVAULE is not numeric.

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.13

Case Study (con’t)

– Drilled down to the actual values with non-numeric data patterns

Through issue/resolution with the sponsor, addeddetailed instructions for LVALUE to accommodatethe non-numeric values

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.15

Another Data Pattern Example #1

• Source: Character variable AEV.STOP (AE stop date), being mapped to AE

• Realized source is “somewhat” a free-form field– Critical data point, must

handle case-by-case using regular expression (regex) technique

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.16

Another Data Pattern Example #2

• Source: Character variable DOSE.DOSE_ACT (Actual dose), being mapped to EX

• Realized source does not always contain numbers– Used both

EX.EXDOSE and EX.EXDOSTXT

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.17

Wrapping Up

• Integrated data profiling – a tool demo

• The bigger picture:– Data rules (e.g., pre-defined business

rules, data standards, etc)– Data corrections

• Although ETL is a solution platform for CDISC SDTM data conversion, too much of it is symptom of a problem

© 2009 Octagon Research Solutions, Inc. All Rights Reserved.18

Thank you!

Anthony Chow

[email protected]

(610) 535-6500 x5526