Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
Transcript of Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 1/25
Quality Evaluation of Cancer Study Common Data Elements
Using the UMLS Semantic Network
Guoqian Jiang, PhD, Harold R. Solbrig,
Christopher G. Chute, MD, DrPH
Division of Biomedical Statistics and Informatics,
Department of Health Sciences Research,
Mayo Clinic College of Medicine, Rochester, MN, 55905
AMIA CRI Summit 2011. March 10, 2011. San Francisco, CA
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 2/25
Introduction
• Semantic interoperability amongterminologies, data elements, andinformation model is fundamental andcritical for sharing information.
• Consistent use of controlled terminologyis essential to support efficient, end-to-enddata flows, including the aggregation andanalysis of large data sets as well astimely response to important clinicalevents.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 3/25
NCI Cancer Common OntologicRepresentation Environment (caCORE)
Komatsuoulis GA, et al. caCORE version 3: Implementation of a model driven, service-oriented
architecture for semantic interoperability. J Biomed Inform. 2008; 41(1):106-23.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 4/25
ISO/IEC 11179 Data Element Structure
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 5/25
Issues
• The potential of the binding has notyet been fully explored.
• There is a very limited toolbox atpresent for quality assurance (QA) of meta-data registered in such arepository like the caDSR.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 6/25
The UMLS Semantic Network (SN)
• aims to provide aconsistent categorizationof all concepts representedin the UMLS Meta-
thesaurus
• and to provide a set of useful relationshipsbetween these concepts
• It has been widely used in
terminology qualityassurance, structurevalidation, and newrelationship discovery
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 7/25
Objective
• To explore the role of terminologicalannotations on quality evaluation for
the caDSR CDEs.• We profiled the terminological
concepts associated with thestandard structure of the caDSRCDEs using the UMLS SN.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 8/25
The linkage between the data element
constructs and the UMLS SN
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 9/25
Data Collection
• We accessed the caDSR CDE Browser
• Root node "caDSR Contexts"
• Workflow Status "RELEASED“• We extracted mappings between NCIt
codes, UMLS Concept Unique Identifiers(CUI) and semantic types
• Data file "MRSTY.RRF" from NCIMetathesaurus (NCIM) version 200904D
• Data file of the mappings
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 10/25
Data Processing
• Extract the data element conceptannotations.
• Link the NCIt concept annotationswith the semantic types
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 11/25
Profiling analysis and evaluation
• We then calculated the frequency of thesemantic types for the object classconcepts (category ObjectClass) and theproperty concepts (category Property).
• To distinguish the category specificsemantic type group, we rank the semantictypes for each category by setting thefiltering criteria,
• i.e. frequency greater than 100 andratio of the frequency between the twocategories greater than 2 times.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 12/25
Profiling analysis and evaluation
• We then isolated the set of Object Classand Property concepts that did not fit intothe resulting profile and
• performed a preliminary evaluation on asmall sample of these to determine
• whether, in fact, these elements may
have been misclassified and,• by inference, whether the category
specific semantic type might be auseful auditing tool for data elementcuration.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 13/25
Results
• In total, there are 42,426 data elementsregistered in the caDSR database as of February 1, 2010.
• Of them, 17,798 data elements have aworkflow status "RELEASED" while 17,526primary object class/property conceptpairs were identified.
• Of the pairs, there are 6,625 distinct pairs,comprising 1,801 distinct object classconcepts and 1,759 property concepts.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 14/25
Profiling by semantic types
Object Class
Property
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 15/25
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 16/25
20 sample data elements
From T061 (Therapeutic or Preventive Procedure )
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 17/25
18 sample data elements
From other 9 semantic types
in category ObjectClass
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 18/25
Discussion
• The dominant semantic types can beused to trigger an auditing process
for the curation of the CDEs.• Our preliminary evaluation results
validated the observation that thesemantic annotation of a data
element, which did not observe theprinciple of disjointness, had a highprobability to have issues with itsmodeling and curation.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 19/25
Lack of constraints in ISO/IEC 11179 standard
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 20/25
Disjointness Principle
• Upper level ontologies
• the basic formal ontology (BFO)
and relation ontology by B. Smith,• or the four-category ontology by
E.J. Lowe
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 21/25
Constraint Example by BasicFormal Ontology
• Linking to ISO/IEC 11179model, a constraint can bemade like “an object class
concept has to be anindependent continuant or processual entity whereasa property concept can notbe such entity”.
• Accordingly, it would be
ideal if the structure of theUMLS SN can follow thedisjointness principlesdefined in the BFO.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 22/25
Contributing Factors of Misclassification Issue
• The certain structural problem of theUMLS SN itself may probably cause
false positive results.• The current content distribution of
the meta-data repository may justrepresent a portion of cancer study
domains, so the dominant semantictypes identified from this study mayprobably not be complete.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 23/25
Summary
• The UMLS SN based profiling approach isfeasible for the quality assuranceof thecancer study CDEs.
• We consider that this approach couldprovide useful insight about how to buildmechanisms of quality assurance in ameta-data repository, and would be useful
for semantic infrastructure development innext generation of the NCI caDSR.
8/7/2019 Guoqian Jiang, PhD - Quality Evaluation of Cancer Study
http://slidepdf.com/reader/full/guoqian-jiang-phd-quality-evaluation-of-cancer-study 24/25
Acknowledgement
• This study is supported in part byNCI caBIG Vocabulary Knowledge
Center.