Metadata Extraction & Web Archives: Automating the Record Creation Process
Automating the formalization of clinical guidelines using information extraction
-
Upload
phil-gooch -
Category
Technology
-
view
669 -
download
0
description
Transcript of Automating the formalization of clinical guidelines using information extraction
Automating the formalization of clinicalguidelines using information extraction:an overview of recent lexical approaches
05 August 2011
Phil GoochCentre for Health InformaticsCity University, London UK
Clinical guidelines
• Contain recommendations for best practice based on systematic
reviews of clinical evidence, consensus statements and expert opinion.
• Goal is to reduce variation in medical care by promoting the most
effective treatments, and to provide a means of quality control in clinical
practice via audit
• Produced by a variety of organizations (e.g. NICE, RCP, SIGN) in a
variety of document formats usually not conducive to use at the point of
care.
Clinical decision support (CDS)
• Aims to provide diagnostic and treatment recommendations and
advice at the point of care, i.e. information tailored for the specific
patient under consideration by the clinician during a consultation
• CDS systems require a knowledge base (KB), usually derived from
guidelines, consisting of declarative knowledge (penicillin is-a
antibiotic) and procedural (if…then) rules, and some sort of electronic
patient record system (EPR)
Computer-interpretable guidelines
• Early systems ‘computerized’ guidelines by making them available ‘on
the computer’, e.g. as HTML or PDF
• Did not lead to improved guideline compliance or use!
• To standardize the format of the knowledge-base, ease development
of CDS, and to improve guideline use at the point of care, a number of
formalisms for representing guidelines have been developed
Computer-interpretable guidelines (CIGs)
Rule-based: ‘if ... then’, e.g. Arden Syntax for individual clinical decisionsLET Last_HgA1C BE READ LATEST {"HgA1C Value"};LET Diabetic_Patient BE READ LATEST {"Problem: Diabetes"};if Diabetic_Patient and Last_HgA1C Occurred not within past 6 months and Last_HgA1C is less
than or equal 7then conclude true;
Document based, e.g. GEM, for complete guideline documents in XML
OO expression query languages e.g. GELLO:observation.code == ‘SBP’ AND observation.value > 140 AND assessment.code ==‘LVF’
Task-network models (TNM), e.g. GLIF, Asbru, PROforma, for workflow-likemodelling of decisions over time
Formalization of guidelines into a CIG model
• Declarative: Mapping clinical concepts in the guideline to terms within a
controlled vocabulary (e.g. UMLS) or ‘virtual medical record’
• Procedural: Identification and extraction of eligibility criteria, clinical
actions (tests, treatment regimes, referrals), temporal constraints and
if…then decision rules
• Translation to a formal model, e.g. PROforma, GLIF, Asbru
• Time-consuming, iterative, manual process as the guideline text tends to
assume background knowledge, is incomplete or contains ambiguity and
vague terms
Example CIG fragment (Asbru)
<plan name="Doxycycline : 100 mg orally twice a day for 7 days"plan_id="plan52769441"> <cyclical_plan plan_id="plan5675512"> <frequency value="12" unit="hour"/> </cyclical_plan> <duration> <min value="7" unit="day"/> <max value="7" unit="day"/> </duration></plan>
Examples of vague guideline statements
Underspecification:• Avoid the use of highly intensive management strategies to achieve
an HbA1c level less than 6.5% (48 mmol/mol)
• Monitor HbA1c every 2–6 months (according to individual need) until itis stable on unchanging treatment
Qualitative terms requiring mapping to numeric values or ranges:• The moderate use of alcohol may increase HDL-cholesterol
• If blood pressure remains uncontrolled on adequate doses of threedrugs, consider adding a fourth and/or seeking expert advice
Information extraction for guideline formalization
• Helpful to automate
• Knowledge base construction: text to formal model translation
• Identification of opportunities for decision support: mapping
guideline concepts and rules to concepts in the EPR
• Measurement of guideline compliance
Information extraction approaches
• Bottom-up: identification of individual clinical terms, temporalexpressions, units of measure• Look-up lists, regular expressions• Shallow parsing to identify noun phrases• Terminology services: UMLS, MetaMap• Co-reference resolution: WordNet
• Top-down: identification of guideline structure: preamble, eligibility,recommendations, ‘action’ sentences and rules• Shallow parsing to identify verb phrases• Ontologies for semantic relations, e.g. UMLS Semantic Network• Use of linguistic guideline patterns (see later)
Mapping text to UMLS concepts - problems
• Identification of clinical terms is dependent on context:
- family history of congestive heart failure
- probable diagnosis of congestive heart failure
- no evidence of congestive heart failure
- patient does not have established cardiovascular disease
• Clearly just identifying the raw concepts congestive heart failure and
cardiovascular disease and mapping them to UMLS terms is
inadequate.
Mapping guideline text to UMLS concepts - problems
• Guideline documents are typically large (100 pages), in PDF or XML
format
• Requires guideline text to be segmented to enable efficient processing
- How best to segment the text that maximizes contextual clinical concept
identification?
Solutions: Text segmentation
• Customised phrase chunker to identify candidate terms:
- Noun phrases (NP), prepositional phrases (PP), verb phrases (VP)
- Neoclassical combining forms phrases (Token groups containingLatin/Greek prefixes, roots, suffixes)
- Past-participle and gerund NPs:- 'results in increased blood pressure', 'fasting blood glucose'
- List expansion:- 'mild, moderate and severe hypertension → mild hypertension,
moderate hypertension and severe hypertension'- 'lowering of heart rate and blood pressure → lowering of heart
rate and lowering of blood pressure'- Abbreviation expansion: 'waist circumference (WC)'
Solutions: GATE-MetaMap Server integration plugin
- Extracts clinical concepts, in context, from large guideline texts in
multiple formats and encodings (PDF, XML, RTF, ASCII, UTF-8)
- Exchanges data/annotations with a MetaMap server
- Implements Unicode Normalization Forms for UTF-8 → ASCII
- Provides flexible text chunking options
- Optimises input data to MetaMap for mapping to UMLS concepts
- Integrates with other information extraction pipelines
GATE-MetaMap integration module
Guideline patterns
Serban et al. (2007), examples:
(med_context, target_group, recommendation_operator, med_action)
In the event of [pregnancy]med_context, [patients with diabetes]target_group
[should]recommendation_op be[prescribed calcium channel blocker]med_action
(target_group, med_context, med_goal)
For [diabetic patients]target_group with [kidney damage]med_context the [blood
pressure target is130/80]med_goal
Extracting guideline recommendations
Extracting guideline recommendations
… and rules from guideline text
Information extraction from patient data
Patient data: automatic spelling correction
Patient data: automatic spelling correction
Patient data: WordNet mappings for coreferencing