Enabling semantic search in a bio-specimen repository - ICBO 2013
-
Upload
mhaendel -
Category
Technology
-
view
302 -
download
1
description
Transcript of Enabling semantic search in a bio-specimen repository - ICBO 2013
Enabling semantic search in a bio-
specimen repository
July 9th, 2013ICBO 2013
Shahim Essaid, Carlo Torniai, and Melissa Haendel
OHSU’s Biolibrary Search Engine
Data aggregated from four repositories with plans for additional repositories
A web-based search engine over de-identified data
Our goal was to develop a controlled application ontology to support search capabilities
OHSU Biolibrary system
Search application
Two search interfaces(with no data integration)
Limited free text search
Search application
Search through anatomy and histology lists
Multiple wizard-like forms
Example coded data vs. pathology report
(Available structured data from one case)
However, pathology report also includes:•Low grade pancreatic intraepithelial
neoplasia•Extensive perineural invasion•Acute and chronic cholecystitis•Bile duct tissue with chronic inflammation•Chronic pancreatitis•Acute gastric serositis
Entity recognition with MetaMap
Selected mapping examples(the same report from earlier)
Final Pathologic Diagnosis:
A: Gallbladder, cholecystectomy:
- Acute and chronic cholecystitis
- Negative for malignancy
B: Bile ductular tissue, biopsy: - Bile duct tissue with chronic
inflammation - Negative for malignancy
C: Superior mesenteric vein margin, biopsy:
- Vascular tissue with no diagnostic abnormality
- Negative for malignancy
D: Portal vein margin, biopsy: - Fibroconnective tissue with no diagnostic abnormality - Negative for malignancy
Selected mapping examples(the same report from earlier)
Selected mapping examples(the same report from earlier)
E: Pancreas, stomach, duodenum, pancreaticogastroduodenectomy:
- Pancreatic ductal adenocarcinoma, grade 2/3, invading peripancreatic fat
- Size: 3 cm in greatest dimension
- Pancreatic neck margin positive for invasive carcinoma (please see comment)
- Superior mesenteric artery margin negative at 0.2 cm from invasive tumor, deep pancreatic margin negative at 0.6 cm from invasive tumor
- Extensive perineural invasion present
- No angiolymphatic invasion identified
- Metastatic pancreatic ductal adenocarcinoma present in two of ten peripancreatic lymph nodes (2/10)
…
Deriving an OWL ontology for DL queries
Adding relationships(developing an application ontology to
support search) “subclass of” axioms generated based on the UMLS hierarchy
table
Mapped entities were augmented with transitive closure of parents
“part of” axioms were generated by aggregating many mereological relationships from the UMLS relationship table
Relate anatomy, pathology, and disease entities with SMOMED-CT disorder/disease definitions
Adding relationships(developing an application ontology to
support search)
Problematic multiple and cyclic inheritance resolved manually
Resulted in an OWL ontology that supports useful DL queries along the “subclass of” and “part of/has part” axes. Examples:• Retrieve all pathologies (limited to a type if needed) that
affect an anatomical site (± all parts)• Retrieve all anatomical sites with a specific type of pathology• List all pathologies/sites for a disease• Etc.
The MetaMap mappings were saved in a database table. After relevant concepts are identified with a DL query, a database query can find actual reports.
SNOMED-CT examples of disorder definitions(used to relate anatomy to pathology in the application ontology)
Application integration
Integration with existing application was limited to appending the annotations to the text of pathology reports| C1521733 C0332144 0:26 | C0016976 32:44 | C0205178 63:70 | …
Annotations (CUIs and location) are then indexed in Solr and can be searched with the existing free text search form. (after a DL query on the OWL file)
A simple DL query for anatomy
(linked to actual report in the mapping table)
Difficulties and limitations
“Structured” text in pathology reports is not in natural language, making it perform less well using MetaMap
Named entity recognition helps with document retrieval but extraction of structured data is more valuable
Negation detection is poor but very important Significant multiple inheritance and subsumption cycles
(inappropriate equivalences) when several UMLS vocabularies are used to derive an OWL representation
Short project, no access to full reports, limited computational resources
Conclusions OHSU Biolibrary is adding many other specimen
collections, need for better search will increase
Can use NER to enhance the data with SNOMED-CT
Interest in identifying references in pathology reports to specimen blocks and slides to annotate these resources as well
Still limited resources for supporting sophisticated terminology and semantic efforts….
Thanks
Dr. Chris Corless
Rob Schuff
Medical Research Foundation of Oregon