Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science...
-
date post
20-Jan-2016 -
Category
Documents
-
view
218 -
download
0
Transcript of Knowledge-based Information Management for Biomedical Applications Wesley Chu Computer Science...
Knowledge-based Information Management for Biomedical
Applications
Wesley Chu
Computer Science Department
University of California
Los Angeles, CA
www.kmed.cs.ucla.edu
Outline Data types Uses of knowledge bases to enhance
information management Sample systems
Structured data Multi-media Free-text
Conclusion
Information Formats used in Biomedical Applications
Structure Data
Multi-media Images
Semi-structure
Free-text
Uses of Knowledge Bases to Enhance Information Management
Approximate matching
Query conditions
Image features
Similar conceptual terms
Uses of Knowledge Bases to Enhance Information Management
KB query processing
Similarity query answering
Associative query answering
Scenario-specific query answering
Sentinel --Triggering and alerting
Examples of KB Information Systems
CoBase (1990-1998), DARPA A database that cooperates with the user for
structure data
KMeD (1991-2000), NSF A Knowledge-based medical multi-media
database
Medical Digital Library (2001-2005), NIH A knowledge-based digital file room for patient
care, education, and research.
CoBase www.cobase.cs.ucla.edu
Graduate students:K. ChiangC. LarsonR. Lee
M. Merzbacher M. Minock
Frank Meng Wenlei Mao
Mark YangK. Zhang
Staff:Q. ChenGladys ChowHua Yang
Project leader: Wesley W. Chu
CoBase: Cooperative Databases
Conventional query answering Need to know the detailed data based
schema Cannot get approximate answers Cannot answer conceptual queries
Cooperative query answering Derive approximate answers Answer conceptual queries Provide additional relevant answers that
user does not (or does not know how to) ask for
Find a seaport with railway facility in Los Angeles
CoBase ServersHeterogeneousInformation Sources
CoBase provides: Relaxation Approximation Association Explanation
Find a nearby friendly airport that can land F-15
Domain Knowledge
Find hospitals with facility similar to St. John’s near LAX
Cooperative Queries
Generalization and Specialization
More Conceptual Query
Specific Query
Conceptual Query Conceptual Query
Specific Query
Generalization
SpecializationGeneralization
Specialization
Cooperative Querying for Medical Applications
Query Find the treatment used for the tumor similar-
to (loc, size) X1 on 12 year-old Korean males.
Relaxed Query Find the treatment used for the tumor Class X
on preteen Asians. Association
The success rate, side effects, and cost of the treatment.
Type Abstraction Hierarchies forMedical Domain
Age
Preteens
910 1112
Teen Adult
Ethnic Group
Asian
Korean Chinese Japanese Filipino
African European
Tumor (location, size)
Class X
[loc1 loc3]
[s1 s3]
Class Y
[locY sY]
X1
[loc1 s1]
X2
[loc2 s2]
X3
[loc3 s3]
KB: Type Abstraction Hierarchy
Using clustering technique to group similar Attribute values Image features Spatial relationships among objects
Provides multi-level knowledge (conceptual) representation
Data mining for TAH for NumericalAttribute Values
Clustering metrics: relaxation error Difference between the exact value and
the returned approximate value Relaxation error is weighted by the
probability of occurrence of each value Can be extended to multiple
attributes
Query Relaxation
RelaxAttribute
Query
Yes
Display
QueryModification
AnswersDatabase
TAHs
No
Summary: CoBase
Derive Approximate Answers Answer Conceptual Queries Provide Associative Query
Answers
KMeD www.kmed.cs.ucla.edu
Graduate students:Alex BuiChrisitna ChuJohn DionisioT. PlattnerD. JohnsonC. HsuT. Ieong
Consultants:Denies Aberle, M.D.C.M. Breant, Ph.D
PI: Wesley Chu, Ph.D, Computer Science Department
Co-PIs: A. Cardenas, Ph.D, Computer Science
Department Ricky Taira , Ph.D, School of Medicine
KMeD Goal: Retrieval of Images by Features & Content
Features size, shape, texture, density,
histology Spatial Relations
angle of coverage, shortest distance, overlapping ratio, contact ratio, relative direction
Evolution of Object Growth fusion, fission
Characteristics of Medical Queries
Multimedia Temporal Evolutionary Spatial Imprecise
Knowledge-Based Image Model
Representation Level(features and content)
Brain TumorLateral
Ventricle
TAHSR(t,b)
TAHTumor Size
TAHSR(t,l)
TAHLateral
Ventricle
SR: Spatial Relationb: Braint: Tumorl: Lateral Ventricle
Knowledge Level
Schema LevelSR(t,b) SR(t,l)
Knowledge-BasedQueryProcessing
Queries
Query Analysis andFeature Selection
Knowledge-BasedContent Matching
Via TAHs
Query Relaxation
Query Answers
User Model
To customize users’interest and preference, needs, and
goals. e.g. query conditions, relaxation control,
etc.
User type Default Parameter Values Feature and Content Matching Policies
Complete Match Partial Match
User Model (cont.)
Relaxation Control Policies Relaxation Order Unrelaxable Object Preference List
Measure for Ranking Triggering conditions
Query Preprocessing
Segment and label contours for objects of interest
Determine relevant features and spatial relationships (e.g., location, containment, intersection) of the selected objects
Organize the features and spatial relationships of objects into a feature database
Classify the feature database into a Type Abstraction Hierarchy (TAH)
Similarity Query Answering
Determine relevant features based on query input
Select TAH based on these features Traverse through the TAH nodes to
match all the images with similar features in the database
Present the images and rank their similarity (e.g., by mean square error)
Visual Query Language and Interface
Point-click-drag interface Objects may be represented by
icons Spatial relationships among
objects are represented graphically
Visual Query Example
Retrieve brain tumor cases where a tumor is located in the region as indicated in the picture
Summary: KMeD
Image retrieval by feature and content Matching images based on features Processing of queries based on spatial
relationships among objects Answering of imprecise queries Expression of queries via visual query language Integrated view of temporal multimedia data in
a timeline metaphor
Medical Digital Librarywww.kmed.cs.ucla.edu
Graduate students:Victor Z. LiuWenlei MaoQinghua Zou
Consultants:Hooshang Kangaloo, M.D.Denies Aberle, M.D.
Project leader: Wesley W. Chu
Data Types Used in a Medical Digital Library
Structured data (patient lab data, demographic data,…)--CoBase
Images (X rays, MRI, CT scans)--KMeD
Free-text (Patient reports, Teaching files, Literature, News articles)--FTRS (Free-text retrieval system)
A Free-Text Retrieval System (FTRS)
Patient reports
Medical literature
Knowledge-based Free- Text Retrieval System (FTRS)
Teaching materials
Query results
Ad hoc query
Patient report for content correlation
News Articles
A Sample Patient Report…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT
LOWER LOBE)…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
…Tissue Source:LUNG (FINE NEEDLE ASPIRATION) (LEFT
LOWER LOBE)…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
Treatment-related articles
??? How to treat the disease
Diagnosis-related articles
??? How to diagnose the disease
Scenario-Specific Retrieval…Tissue Source:LUNG (FINE NEEDLE
ASPIRATION) (LEFT LOWER LOBE)
…FINAL DIAGNOSIS:
- LUNG NODULE, LEFT LOWER LOBE (FINE NEEDLE ASPIRATION):- LUNG CANCER, SMALL CELL, STAGE II.
…
Challenge I: Indexing for Free-Text
Extracting key concepts in the free-text for indexing Free-text: Lung cancer, small cell,
stage II
Concept terms in knowledge source: stage II small cell lung cancer
Conventional methods use NLP Not scalable
Challenge II: Mismatch between terms used in query and documents
ExampleQuery: … lung cancer, …
Document 3: anti-cancerdrug combinations…
?? ?Document 1: … lung carcinoma …
Document 2: … lung neoplasm …
Challenge III: Terms used in the query are too general
Expanding the general terms in the query to specific terms that are used in the document
Query: lung cancer, diagnosis options
Document: … the effectiveness of chest x-ray and bronchography on patients with lung cancer …
?√
Query: lung cancer, chest x-ray, bronchography, …
A Medical KB:Unified Medical Language System (UMLS)
Meta-thesaurus - control vocabulary (1.6M biomedical phrases, representing 800K concepts)
Semantic Network – classify concepts into classes (e.g. disease and syndrome, treated by, therapeutic procedure, etc.)
Specialized Lexicon
Using knowledge sources to resolve these challenges
Challenge I: Automatic indexing of free text
Challenge II : Mismatch between terms in the query and the documents
Challenge III: Terms in the query are too general
IndexFinder: Extracting domain-specific key concepts
Technique Permute words from text to generate
concept candidates. Use knowledge base to select the
valid candidates. Problem
Valid candidates may be irrelevant to the document.
Redundant concept
Filtering out Irrelevant Concepts
Syntactic filter: Limit permutation of words within a
sentence. Semantic filter:
Use the semantic type (e.g. body part, disease, treatment, diagnosis) to filter out irrelevant concepts
Use ISA relationship to filter out general concepts and yield specific concepts.
Using knowledge sources to resolve these challenges
Challenge I: Automatic indexing of free text
Challenge II : Mismatch between terms in the query and the documents
Challenge III: Terms in the query are too general
Document: … lung carcinoma …Document: … lung neoplasm …Document: … anti-cancer drugcombinations …
Document: … anti-cancer drugcombinations …
Phrase-based Vector Space Model (VSM)
Query: … lung cancer, …
?
Knowledge source
lung cancer = lung carcinoma …√
lung neoplasm …
parent_of
√
anti-cancer drug combinations
missing!!!
Query: … lung cancer, …
√??
Phrase-based VSM Examples
Query
Document
[(C0242379); “lung” “cancer”] …[(C0003393); “anti” “cancer” “drug” “combin”] …
Query:“lung cancer …”
Phrases:[(C0242379); “lung” “cancer”]…
Document:“anti-cancer drugcombinations …”
Phrases:[(C0003393); “anti” “cancer” “drug” “combin”]…
Using knowledge sources to resolve these challenges
Challenge I: Automatic indexing of free text
Challenge II : Mismatch between terms in the query and the documents
Challenge III: Terms in the query are too general
Query Expansion (QE)
Queries in the following form benefit from expansion:
<key concept> + <general supporting concept(s)>e.g. lung cancer e.g. treatment options
<key concept> + <specific supporting concept(s)>e.g. lung cancer e.g. chemotherapy, radiotherapy
expansion
result
lung cancer
study
patientsurvive
mediastinoscopybronchoscopy chemotherapy radiotherapy
increase
Statistical lung cancer
study
patientsurvive
mediastinoscopybronchoscopy chemotherapy radiotherapy
increase
result
Knowledge Source
heart surgery
heart disease
Disease orSyndrome
Therapeutic orPreventive Procedure
treats
+Statistical
Knowledge-based Scenario-specific Expansion
lung cancer
study
patientsurvive
mediastinoscopybronchoscopy chemotherapy radiotherapy
increase
result
Knowledge Source
heart surgery
heart disease
Disease orSyndrome
Therapeutic orPreventive Procedure
treats
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Pre
cisi
on
Statistical expansion (Stem VSM) Stem VSM (no expansion)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Pre
cisi
on
Statistical expansion (Stem VSM) Phrase VSM (no expansion)
Stem VSM (no expansion)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Pre
cisi
on
Knowledge-based expansion (Phrase VSM) Statistical expansion (Stem VSM)
Phrase VSM (no expansion) Stem VSM (no expansion)
Retrieval Effectiveness Comparison (Corpus: OHSUMED, KB: UMLS)
Overallimprovement:33%,100 queriesvs.5%,50 queries
Template:“<disease>, treatment”
FTRS: Scenario-specificQuery Answering
Sample templates:“<disease>, treatment,”“<disease>, diagnosis ”
lung cancer
relevant documents
QueryExpansion
…
lung cancerradiotherapychemotherapycisplatin
IndexFinder
lung cancer,treatment
Phrase-basedVSM Engine
FTRS: Scenario-specific content correlation IndexFinder extracts key concepts from free-text for content
correlation
Query Templates Scenario
Selection
e.g. treatment, diagnosis, etc.
PatientReport
relevant documents
Phrase-basedVSM Engine
IndexFinder QueryExpansion
…
Summary: KB Free-text retrieval
Technologies IndexFinder – extracts key concepts from
the free-text Phrase-based VSM – a new document
indexing paradigm (concept and its word stems) to improve retrieval effectiveness
Knowledge-based query expansion – match query with scenario-specific documents
provides scenario-specific free-text retrieval
Conclusions Knowledge sources provides
Approximate matching Query conditions Image features
Query processing Similarity query answering User modeling Associative answering Triggering and alerting
Document retrieval Convert ad hoc free-text into controlled vocabulary Phrase-based VSM Content correlation Scenario-specific retrieval
Increase capabilities and effectiveness Information Management
Acknowledgement
This research is supported by DARPA, NSF Grant # 9619345, and NIC/NIH Grant#4442511-33780