Project Lead: Jyotishman Pathak, PhD PI: Christopher G. Chute, MD, DrPH

Strategic Health IT Advanced Research Projects (SHARP) Area 4: Secondary Use of EHR Data Project 3: High-Throughput PhenotypingProject Lead: Jyotishman Pathak, PhDPI: Christopher G. Chute, MD, DrPH

June 12, 2012

SHARPn High-Throughput Phenotyping

Electronic health records (EHRs) driven phenotyping

• Overarching goal• To develop high-throughput automated

techniques and algorithms that operate on normalized EHR data to identify cohorts of potentially eligible subjects on the basis of disease, symptoms, or related findings

©2012 MFMER | slide-2


Current HTP project themes

• Standardization of phenotype definitions

• Library of phenotyping algorithms

• Phenotyping workbench

• Machine learning techniques for phenotyping

• Just-in-time phenotyping



DataTransformTransform

Algorithm Development Process - Modified


PhenotypeAlgorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

• Standardized representation of clinical data

• Create new and re-use existing clinical element models (CEMs)

• Standardized and structured representation of phenotype definition criteria

• Use the NQF Quality Data Model (QDM)

• Conversion of structured phenotype criteria into executable queries

• Use JBoss® Drools (DRLs)

[Welch et al. 2012][Thompson et al., submitted 2012]

[Li et al., submitted 2012]


NQF Quality Data Model (QDM)• Standard of the National Quality Forum (NQF)

• A structure and grammar to represent quality measures in a standardized format

• Groups of codes in a code set (ICD-9, etc.)• "Diagnosis, Active: steroid induced diabetes" using

"steroid induced diabetes Value Set GROUPING (2.16.840.1.113883.3.464.0001.113)”

• Supports temporality & sequences• AND: "Procedure, Performed: eye exam" > 1 year(s)

starts before or during "Measurement end date"• Implemented as set of XML schemas

• Links to standardized terminologies (ICD-9, ICD-10, SNOMED-CT, CPT-4, LOINC, RxNorm etc.)


SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-6

116 Meaningful Use Phase I Quality Measures


Example: Diabetes & Lipid Mgmt. - I


Human readable HTML


Example: Diabetes & Lipid Mgmt. - II


Computable XML


DataTransformTransform

Algorithm Development Process - Modified


PhenotypeAlgorithm

Visualization

Evaluation

NLP, SQL

Rules

Mappings

Semi-Automatic Execution

• Standardized representation of clinical data

• Create new and re-use existing clinical element models (CEMs)

• Standardized and structured representation of phenotype definition criteria

• Use the NQF Quality Data Model (QDM)

• Conversion of structured phenotype criteria into executable queries

• Use JBoss® Drools (DRLs)

[Welch et al. 2012][Thompson et al., submitted 2012]



Drools-based Phenotyping Architecture


Business Logic

Clinical Element

Database

List ofDiabetic Patients

Data Access Layer

Transformation Layer

Inference Engine (Drools)

Service for Creating Output (File, Database,

etc)

Transform physical representation Normalized logical representation (Fact Model)


Automatic translation from NQF QDM criteria to Drools



The “executable” Drools flow



Phenotype library and workbench - I

1. Converts QDM to Drools2. Rule execution by querying

the CEM database3. Generate summary reports

http://phenotypeportal.org


Phenotype library and workbench - IIhttp://phenotypeportal.org

SHARPn High-Throughput Phenotyping ©2012 MFMER | slide-15

Phenotype library and workbench - III


Machine learning and HTP - I• Machine learning and

association rule mining• Manual creation of

algorithms take time• Let computers do the

“hard work”• Validate against

expert developed ones


[Caroll et al. 2011]


Machine learning and HTP - II

• Origins from sales data• Items (columns): co-morbid conditions• Transactions (rows): patients• Itemsets: sets of co-morbid conditions• Goal: find all itemsets (sets of conditions)

that frequently co-occur in patients.• One of those conditions should be DM.

• Support: # of transactions the itemset I appeared in• Support({TB, DLM, ND})=3

• Frequent: an itemset I is frequent, if support(I)>minsup

Patient TB DLM

ND … IEC

001 Y Y Y Y

002 Y Y Y Y

003 Y Y

004 Y

005 Y Y Y

X: infrequent

[Simon et al. 2012]

Electronic Health Records and Phenomics

Just-in-Time phenotyping - I

Transfusion-related Acute Lung Injury (TRALI)Transfusion-associated Circulatory Overload (TACO)


Just-in-Time phenotyping - II


TRALI/TACO “sniffer”

Electronic Health Records and Phenomics


Active Surveillance for TRALI and TACO

Of the 88 TRALI cases correctly identified by the CART algorithm, only 11 (12.5%) of these were reported to the blood bank by the clinical service.

Of the 45 TACO cases correctly identified by the CART algorithm, only 5 (11.1%) were reported to the blood bank by the clinical service.


Publications till date (conservative)

Year 1 (2011) Year 2 (2012) Year 3 (2013)0

2

4

6

8

10

12

14

8

66

2

12

PapersAbstractsUnder review



2011 Milestones Standardized definitions for phenotype criteria Rules-based environment for phenotype

algorithm execution National library for standardized phenotype

definitions (collaboration with eMERGE) Machine learning techniques for algorithm

definitions Online, real-time phenotype execution Phenotyping algorithm authoring environment



2012 Milestones• Machine learning techniques for algorithm

definitions

• Online, real-time phenotype execution

• Collaboration with NQF, Query Health and i2b2 infrastructures

• Use cases and demonstrations• MU quality metrics (w/ NQF, Query Health)• Cohort identification (w/ eMERGE, PGRN)• Value analysis (w/ Mayo CSHCD, REP)• Clinical trial alerting (w/ Mayo Cancer Ctr./CTSA)



Project 3: Collaborators & Acknowledgments• CDISC (Clinical Data Interchange Standards Consortium)

• Rebecca Kush, Landen Bain• Centerphase Solutions

• Gary Lubin, Jeff Tarlowe• Group Health Seattle

• David Carrell• Harvard University/MIT

• Guergana Savova, Peter Szolovits• Intermountain Healthcare/University of Utah

• Susan Welch, Herman Post, Darin Wilcox, Peter Haug• Mayo Clinic

• Cory Endle, Rick Kiefer, Sahana Murthy, Gopu Shrestha, Dingcheng Li, Gyorgy Simon, Matt Durski, Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin Martin, Kent Bailey, Scott Tabor, Chris Chute


Project Lead: Jyotishman Pathak, PhD PI: Christopher G. Chute, MD, DrPH

Documents

Transcript of Project Lead: Jyotishman Pathak, PhD PI: Christopher G. Chute, MD, DrPH