The medGIFT project and medical image retrieval

26
medGIFT - medical image retrieval by the visual image content Presentation at UCLA, 20.2.2004 Henning Müller Service of Medical Informatics Geneva University Hospitals

Transcript of The medGIFT project and medical image retrieval

Page 1: The medGIFT project and medical image retrieval

medGIFT - medical image retrieval by the visual image content

Presentation at UCLA, 20.2.2004

Henning MüllerService of Medical InformaticsGeneva University Hospitals

Page 2: The medGIFT project and medical image retrieval

2©2003 Hôpitaux Universitaires de Genève

Overview

• University Hospitals of GenevaUniversity Hospitals of Geneva• Medical InformaticsMedical Informatics Service Service• Medical Imaging UnitMedical Imaging Unit• The The medGIFTmedGIFT project project

• Viper/GIFT/CasimageViper/GIFT/Casimage• TechnologiesTechnologies• Clinical goalsClinical goals

• Some future projectsSome future projects• ConclusionsConclusions

Page 3: The medGIFT project and medical image retrieval

3©2003 Hôpitaux Universitaires de Genève

University Hospitals of Geneva

• 2,200 beds, 6 hospitals2,200 beds, 6 hospitals• 900 beds in the main clinic900 beds in the main clinic

• 780,000 hospital days780,000 hospital days• ~9,000 employees~9,000 employees

• 1,300 MDs1,300 MDs

• 22,000 operations per year22,000 operations per year• 5,400 computers5,400 computers• Budget $ 1 billion/yearBudget $ 1 billion/year• Research and teaching have high importanceResearch and teaching have high importance

• Geneva is strong in bioinformatics, genetics, neurosciencesGeneva is strong in bioinformatics, genetics, neurosciences

• Service for medical informatics - management Service for medical informatics - management informaticsinformatics

Page 4: The medGIFT project and medical image retrieval

4©2003 Hôpitaux Universitaires de Genève

Service of medical informatics

• ~60 employees, part of the radiology ~60 employees, part of the radiology departmentdepartment

• ~10 persons in ~10 persons in researchresearch• Research areas:Research areas:

• Multimedia electronic patient recordMultimedia electronic patient record• Decision support systemsDecision support systems• Telemedicine, especially with African countriesTelemedicine, especially with African countries• Knowledge representationKnowledge representation, natural language processing, , natural language processing,

data miningdata mining• Image processingImage processing, PACS, operation planning, PACS, operation planning

• TeachingTeaching• Postgraduate course in medical informaticsPostgraduate course in medical informatics• Virtual campus for medical students in medical informaticsVirtual campus for medical students in medical informatics

Page 5: The medGIFT project and medical image retrieval

5©2003 Hôpitaux Universitaires de Genève

Medical Imaging Unit

• PACSPACS• Management of images, prefetching, hardwareManagement of images, prefetching, hardware• Integration of image storage into several departmentsIntegration of image storage into several departments

• LAVIMLAVIM• ServiceService for other divisions in the for other divisions in the

hospitalhospital• Visualization, 3D renderingVisualization, 3D rendering• Morphological image processingMorphological image processing• SegmentationSegmentation

• Research projectsResearch projects• fMRIfMRI• Liver tumours (time analysis of contrast agents)Liver tumours (time analysis of contrast agents)

Page 6: The medGIFT project and medical image retrieval

6©2003 Hôpitaux Universitaires de Genève

PACS and image management

• PACS system with several layersPACS system with several layers• ~~13,00013,000 images per day (>3 mio. per year) images per day (>3 mio. per year)• Since 2002 accessible in the electronic patient recordSince 2002 accessible in the electronic patient record• Connection of cardiology this year (1 TB/year)Connection of cardiology this year (1 TB/year)• Connection of dermatology is planned, psychology is Connection of dermatology is planned, psychology is

thinking about storage of patient videosthinking about storage of patient videos• Big part of the material costs of the hospitalBig part of the material costs of the hospital

• Access to the images almost exclusively by patient IDAccess to the images almost exclusively by patient ID• CasimageCasimage

• Radiological Radiological teaching fileteaching file system (based on 4D database system (based on 4D database system)system)

• Typical or interesting casesTypical or interesting cases• > 50,000 images stored by MDs> 50,000 images stored by MDs• Search with key words or hierarchicallySearch with key words or hierarchically

Page 7: The medGIFT project and medical image retrieval

7©2003 Hôpitaux Universitaires de Genève

Why content-based data access?

Page 8: The medGIFT project and medical image retrieval

8©2003 Hôpitaux Universitaires de Genève

Why content-based access?

• Increasing amountIncreasing amount of data are produced in digital of data are produced in digital form and get accessible by computersform and get accessible by computers

• Much Much knowledgeknowledge is stored in the visual data (and is stored in the visual data (and the attached descriptions)the attached descriptions)• Not everything is accessible in DICOMNot everything is accessible in DICOM• Combination of free text, structured data and visual Combination of free text, structured data and visual

featuresfeatures

• Tools Tools will be will be neededneeded to extract this knowledge to extract this knowledge• Control of textControl of text• Control of coding (or semi-automatic coding)Control of coding (or semi-automatic coding)

• Case-based reasoning and evidence-based Case-based reasoning and evidence-based medicine need similar example casesmedicine need similar example cases

• Teaching and research can use the new search toolsTeaching and research can use the new search tools

Page 9: The medGIFT project and medical image retrieval

9©2003 Hôpitaux Universitaires de Genève

medGIFT

• http://www.http://www.simsim..hcugehcuge..chch//medgiftmedgift// ((open sourceopen source))• Project for content-based search in medical image Project for content-based search in medical image

databasesdatabases• GoalsGoals of the project of the project

• Better management of visual medical data (retrieval)Better management of visual medical data (retrieval)• Visual Knowledge ManagementVisual Knowledge Management

• Textual and visual dataTextual and visual data

• Diagnostic aidDiagnostic aid• Specialized retrieval (lung CTs, dermatologic images)Specialized retrieval (lung CTs, dermatologic images)

• Access to PACS dataAccess to PACS data• In the short term:In the short term:

• ResearchResearch• TeachingTeaching

Page 10: The medGIFT project and medical image retrieval

10©2003 Hôpitaux Universitaires de Genève

Casimage – a radiological case database

• Case database, especially for Case database, especially for teachingteaching• http://www.http://www.casimagecasimage.com/.com/ ,interface developed ,interface developed

with the proprietary 4D softwarewith the proprietary 4D software• >50,000>50,000 images, 9000 externally images, 9000 externally

accessible and anonymizedaccessible and anonymized• Case descriptions (textual) Case descriptions (textual)

available in XMLavailable in XML• Very varying qualityVery varying quality• Mix of French and EnglishMix of French and English

• Interface is compatible to the Interface is compatible to the MIRCMIRC (Medical Image Resource (Medical Image Resource Centre) standard of the RSNACentre) standard of the RSNA

Page 11: The medGIFT project and medical image retrieval

11©2003 Hôpitaux Universitaires de Genève

Viper/GIFT

• http://viper.http://viper.unigeunige..chch//• http://www.gnu.org/software/gift/http://www.gnu.org/software/gift/• Visual Information Processing for Enhanced Visual Information Processing for Enhanced

Retrieval, project of the University of GenevaRetrieval, project of the University of Geneva• The outcome of the project is The outcome of the project is GIFTGIFT and is and is open open

sourcesource• Continuation at several UniversitiesContinuation at several Universities

• Several activities in the projectSeveral activities in the project• EvaluationEvaluation of image retrieval systems of image retrieval systems• Multimedia Retrieval Markup Language (Multimedia Retrieval Markup Language (MRMLMRML))• Annotation of imagesAnnotation of images• Segmentation of images for retrievalSegmentation of images for retrieval• Video retrievalVideo retrieval

Page 12: The medGIFT project and medical image retrieval

12©2003 Hôpitaux Universitaires de Genève

Characteristics of GIFT

• MRML-based MRML-based communication interfacecommunication interface• KMRML in KonquerorKMRML in Konqueror• Plugin for GimpPlugin for Gimp• User interfaces in Java, PHP, CGI/perlUser interfaces in Java, PHP, CGI/perl

• ComponentsComponents can be exchanged relatively easily can be exchanged relatively easily• For example the feature extractorFor example the feature extractor

• Uses technologies known from Uses technologies known from text retrievaltext retrieval• Frequency-based weightsFrequency-based weights• Relevance feedbackRelevance feedback• Inverted file for quick data accessInverted file for quick data access

• Tools to index directory trees, generate inverted Tools to index directory trees, generate inverted files, etc.files, etc.

Page 13: The medGIFT project and medical image retrieval

13©2003 Hôpitaux Universitaires de Genève

Component-based structure

Q uery engine

Im ageco llection

Storedindex

Storagem ethod

Feature extraction

Feedbacka lgorithm

M R M LInterface

Featurew eighting

…RegionsWavelets

N ew im ages

Page 14: The medGIFT project and medical image retrieval

14©2003 Hôpitaux Universitaires de Genève

• Standardized accessStandardized access to visual search engines to visual search engines• http://www.http://www.mrmlmrml.net/.net/

• Communication is in Communication is in xmlxml, server waits at a port, server waits at a port• Component-based structureComponent-based structure

<mrml session-id="1" transaction-id="44"><query-step session-id="1" resultsize="30" algorithm-id="algorithm-default"> <user-relevance-list> <user-relevance-element image-location="http://viper/1.jpg" user-relevance="1"/> <user-relevance-element image-location="http://viper/2.jpg" user-relevance="-1"/> </user-relevance-list></query-step></mrml>

Page 15: The medGIFT project and medical image retrieval

15©2003 Hôpitaux Universitaires de Genève

Visual features used

• Global Global colour histogramcolour histogram (HSV, 18, 3, 3, 4 grey (HSV, 18, 3, 3, 4 grey levels)levels)

• Colour blocksColour blocks at different scales and locations at different scales and locations• HistogramHistogram of of GaborGabor filter responses filter responses

• 4 directions, 3 scales, quantized in 9 strengths4 directions, 3 scales, quantized in 9 strengths

• Gabor blocksGabor blocks at different scales and locations at different scales and locations• ~85,000 possible features, ~85,000 possible features,

1,000-3,000 features per image, 1,000-3,000 features per image, distribution similar to words in text distribution similar to words in text collectionscollections

Page 16: The medGIFT project and medical image retrieval

16©2003 Hôpitaux Universitaires de Genève

The inverted file

Feature 1

Feature n-1

...

Feature 2

Feature n

Image 5 Image 7 Image 1 Image 25

Image 1 Image 17 Image 3 ...

Image 25 Image 17 Image 1 Image 4

Image 4 Image 5 Image 6 ...

Image 2 Image 17 Image 12 Image 3

Page 17: The medGIFT project and medical image retrieval

17©2003 Hôpitaux Universitaires de Genève

Feature weighting

• Classical idfClassical idf• tf tf - term frequency- term frequency• cfcf - collection frequency- collection frequency• j j - feature number- feature number• QQ - query with i=1..N input images- query with i=1..N input images• kk - possible result image- possible result image• RR - Relevance of an image in a query- Relevance of an image in a query

jjkq

j

N

iiijj

relevancescore

cfRtf

Nrelevance

2

1

)1

log(1

Page 18: The medGIFT project and medical image retrieval

18©2003 Hôpitaux Universitaires de Genève

Relevance feedback

• One-image queries do normally not lead to good One-image queries do normally not lead to good resultsresults

• Several input images improve the query qualitySeveral input images improve the query quality• Negative feedbackNegative feedback is extremely important is extremely important

• Positive feedback is often only a reordering of the Positive feedback is often only a reordering of the highest-ranked resultshighest-ranked results

• Log filesLog files allow to analyze the behaviour of users allow to analyze the behaviour of users• Learning of feature weightings as an additional factorLearning of feature weightings as an additional factor• Long-term learning from the user interactionLong-term learning from the user interaction• Learning on several levels (user/data base/global)Learning on several levels (user/data base/global)

• Changes of feature sets during feedbackChanges of feature sets during feedback• First results seem to be very goodFirst results seem to be very good

Page 19: The medGIFT project and medical image retrieval

19©2003 Hôpitaux Universitaires de Genève

Changes for the medical domain

• Adaptation of the Adaptation of the user interfaceuser interface in php in php• Display of the diagnosis in the interfaceDisplay of the diagnosis in the interface• Link to the complete case description and other images Link to the complete case description and other images

of the case in full sizeof the case in full size

• Adaptation of the Adaptation of the visual characteristicsvisual characteristics• More grey levels, less coloursMore grey levels, less colours• 32 grey levels deliver best overall results32 grey levels deliver best overall results• More directions and scales for the Gabor filtersMore directions and scales for the Gabor filters• Scales have less influence than the directionsScales have less influence than the directions• Quantisation of filter responses has not been tested so farQuantisation of filter responses has not been tested so far• Implementation of more visual characteristics is plannedImplementation of more visual characteristics is planned

• Creation of a reference data setCreation of a reference data set• imageCLEF 2004imageCLEF 2004

Page 20: The medGIFT project and medical image retrieval

20©2003 Hôpitaux Universitaires de Genève

The user interface

Similarityscore

Diagnosis

Choice of user (pos, neg, neutral)

Link to the Case description

Page 21: The medGIFT project and medical image retrieval

21©2003 Hôpitaux Universitaires de Genève

Main projects

• Reference databasesReference databases for content-based search for content-based search• Casimage database for PACS-like retrievalCasimage database for PACS-like retrieval• Specialized database for diagnostic aidSpecialized database for diagnostic aid• Creation of gold standards, ground truthCreation of gold standards, ground truth

• EvaluationEvaluation of clinical use of image retrieval of clinical use of image retrieval• Study on the practical use of image retrievalStudy on the practical use of image retrieval• Comments and feedback from MDsComments and feedback from MDs

• SpecializationSpecialization of medGIFT for several domains of medGIFT for several domains• Lung CTsLung CTs• Dermatologic imagesDermatologic images

• Combination of visual and textual featuresCombination of visual and textual features• Integrations into the PACS (long-term)Integrations into the PACS (long-term)

Page 22: The medGIFT project and medical image retrieval

22©2003 Hôpitaux Universitaires de Genève

Reference data sets, evaluation

• Need for a big Need for a big databasedatabase (currently 8751 images) (currently 8751 images)• Free of copyright and of chargeFree of copyright and of charge• Specialized databases are planned as wellSpecialized databases are planned as well

• Definition of query Definition of query topicstopics• Definition of Definition of measuresmeasures for a system comparison for a system comparison

• SpeedSpeed• Several measures for retrieval qualitySeveral measures for retrieval quality• Relevance feedback, user interface, …Relevance feedback, user interface, …

• Definition of a Definition of a perfect responseperfect response by a domain expert by a domain expert• Visual control of a large number of imagesVisual control of a large number of images• Pooling can make this process easierPooling can make this process easier• Technology from text retrieval (TREC)Technology from text retrieval (TREC)

• Comparison of several systemsComparison of several systems

Page 23: The medGIFT project and medical image retrieval

23©2003 Hôpitaux Universitaires de Genève

Lung image retrieval

• Creation of a database with High Resolution CTs Creation of a database with High Resolution CTs of the lung (currently 100 cases)of the lung (currently 100 cases)

• Manual choice of one or several Manual choice of one or several layerslayers (representative)(representative)

• Marking of Marking of regionsregions that describe the disease best that describe the disease best• Search for similar cases, the MD has then to Search for similar cases, the MD has then to

decide on the diagnosisdecide on the diagnosis• Diagnostic aidDiagnostic aid, (case-based , (case-based

reasoning, evidence-based reasoning, evidence-based medicine)medicine)

• Stepwise refinementStepwise refinement

Page 24: The medGIFT project and medical image retrieval

24©2003 Hôpitaux Universitaires de Genève

Retrieving dermatologic images

• Very Very earlyearly status status• Cooperation with Monash University, Melbourne, Cooperation with Monash University, Melbourne,

AustraliaAustralia• Melanomas are a big problem in AustraliaMelanomas are a big problem in Australia• Switzerland has the highest melanoma rate in EuropeSwitzerland has the highest melanoma rate in Europe

• Analysis of border areasAnalysis of border areas• Textures are importantTextures are important• Development over time is Development over time is

importantimportant• Calibration is a problemCalibration is a problem

Page 25: The medGIFT project and medical image retrieval

25©2003 Hôpitaux Universitaires de Genève

Combination of visual and textual features

• Figure out what each one can that the other one Figure out what each one can that the other one can notcan not

• Text contains Text contains semanticssemantics but does not really but does not really describe the image describe the image contentcontent but rather the but rather the contextcontext

• For teaching, visually similar cases with differing For teaching, visually similar cases with differing diagnoses are interestingdiagnoses are interesting

• Visual characteristics are language-independentVisual characteristics are language-independent• But MeSH terms can also be obtained in several But MeSH terms can also be obtained in several

languageslanguages

• Data/Text miningData/Text mining should include visual data should include visual data• What can the two do together that none of them What can the two do together that none of them

can do alone?can do alone?

Page 26: The medGIFT project and medical image retrieval

26©2003 Hôpitaux Universitaires de Genève

Conclusions

• Content-based visual search can play an Content-based visual search can play an important important rolerole in medicine as it is getting increasingly visual in medicine as it is getting increasingly visual

• EvaluationEvaluation and comparison of systems and and comparison of systems and techniques is basis for successtechniques is basis for success

• We need user We need user studiesstudies and we need to get people and we need to get people used to the new toolsused to the new tools

• Many techniques exist but they need to be Many techniques exist but they need to be optimizedoptimized for certain application domains for certain application domains

• Open SourceOpen Source Software allows the exchange with Software allows the exchange with various research groupsvarious research groups• Knowledge should be publicly accessibleKnowledge should be publicly accessible• Components can be reused to minimize redevelopmentsComponents can be reused to minimize redevelopments• Databases need to be made accessible as wellDatabases need to be made accessible as well