Use of Data Provenance and the Grid in Medical Image Analysis and Drug Discovery – an IXI exemplar...

34
Use of Data Provenance and the Grid in Medical Image Analysis and Drug Discovery – an IXI exemplar Kelvin K. Leung 1 , Mark Holden 1 , Rolf A. Heckemann 2 , Nadeem Saeed 3 , Keith J. Brooks 3 , Jacky B. Buckton 4 , Kumar Changani 3 , David G. Reid 3 , Daniel Rueckert 5 , Joseph V. Hajnal 2 , Derek L.G. Hill 1 1 Division of Imaging Sciences, King's College London, UK 2 Imaging Sciences Department, Imperial College (Hammersmith Hospital Campus), UK 3 Imaging Centre, 4 RA Disease Biology, ri-CEDD, GlaxoSmithKline, UK 5 Department of Computing, Imperial College, UK

Transcript of Use of Data Provenance and the Grid in Medical Image Analysis and Drug Discovery – an IXI exemplar...

Use of Data Provenance and the Grid in Medical Image Analysis and Drug

Discovery – an IXI exemplar

Kelvin K. Leung1, Mark Holden1, Rolf A. Heckemann2, Nadeem Saeed3,

Keith J. Brooks3, Jacky B. Buckton4, Kumar Changani3, David G. Reid3,

Daniel Rueckert5, Joseph V. Hajnal2, Derek L.G. Hill1

1Division of Imaging Sciences, King's College London, UK2Imaging Sciences Department, Imperial College (Hammersmith Hospital

Campus), UK3Imaging Centre, 4RA Disease Biology, ri-CEDD, GlaxoSmithKline, UK

5Department of Computing, Imperial College, UK

Overview

• Background– Motivations

– Virtual data system

– Automatic delineation of multiple bones in serial MR images of joints in a disease model of Rheumatoid Arthritis (RA)

– Image registration and segmentation propagation

• Methods– Prototype

• Results

• Conclusions

Motivations

• Medical imaging is going to play an important part in drug discovery– Recent £76m investment by GlaxoSmithKline (GSK) and Imperial

College on a new clinical imaging center

• Automatic analysis of medical image data requires:– Lots of storage space (each image is about 32Mb in this work)

– Computational power (running time is about 20-24 hours for processing an image on a single desktop computer in this work)

• Motivated by the need of computational resources

Motivations

• The Grid has the potential to allow better collaboration between industry and university with the idea of virtual organisation– University can provide image analysis algorithms as services to

the industry, such as GSK, over the Grid

• Motivated by the need of better and more effective collaboration with the industry

Motivations

• Detail and reliable documentation of data provenance of all the analysis is very important in order to obtain regulatory approval for new drug.– Part 11 of Guidance on industry issued by US Food and Drug

Administration (FDA)

– Good Laboratory Practice (GLP) and Good Clinical Practice (GCP)

• Motivated by the need of data provenance

Overview

• Background– Motivations

– Virtual data system

– Automatic delineation of multiple bones in serial MR images of joints in a disease model of Rheumatoid Arthritis

– Image registration and segmentation propagation

• Methods– Prototype

• Results

• Conclusions

Virtual data system (VDS or Chimera)

• A system to “enable documentation of data provenance, discovery of available methods and on-demand data generation (so-called ‘virtual data’)”– Developed by I. Foster, J. Vöckler, M. Wilde and Y. Zhao of University

of Chicago

• It consists of: – A virtual data catalogue is a virtual data schema that provides a

representation of computational procedures and their invocations.– A virtual data language interpreter handles all the requests for

constructing and querying the database entries.

• Data objects, such as input and output files, are described by logical file names (LFN), which are mapped to physical files via Globus replica catalog (RC) or Globus replica location service (RLS)

Virtual data system

• Virtual data language (VDL) is used to describe computational procedures and their invocations

• Computational procedures are defined by transformation (TR) statements. Example:– TR foo(input file1, output file2) { … }

• Invocations are defined by derivation (DV) statements. Example:– To invoke foo with logical filenames file_a (input) and file_b (output)

– DV call_foo->foo(file1=@{input:”file_a”},file2=@{output:”file_b”});

• Virtual data schema allows the storage of TR’s and DV’s

Virtual data system

• Compound TR can be built so that workflow can be defined. Example: – To call foo twice and pass the output of the first call to the input of the

second call

– TR compound_foo(input file_in, output file_out, io file_io) {

call foo(file1=@{input:”file_in”}, file2=@{output:”file_io”});

call foo(file1=@{input:”file_io”}, file2=@{output:”file_out”}); };

• When requesting an output file from the system, an abstract DAG (contains only LFN) will be generated.

• A planner called “Planning for Execution in Grid (Pegasus)” converts the abstract DAG into a Condor DAGman script and submit it to the Globus universe of Condor.

Overview

• Background– Motivations

– Virtual data system

– Automatic delineation of multiple bones in serial MR images of joints in a disease model of Rheumatoid Arthritis

– Image registration and segmentation propagation

• Methods– Prototype

• Results

• Conclusions

Automatic delineation of multiple bones

• Rheumatoid Arthritis (RA)– Is a chronic, systemic, autoimmune inflammatory disease.– Targets synovial joints, in which there is a massive

accumulation of blood-borne cells such as T cells and macrophages.

– Blood vessels are formed to support this new tissue and the whole mass is called pannus.

– Progressive erosion to cartilage and bone leads to disability in patients

• MR images were acquired in a disease model of RA• Interested in the talus bone and the calcaneus bone in

the ankle• Delineate them from the MR images and study them,

e.g. calculate volume to measure any erosion

Image registration

• Refers to the spatial alignment of two images so that corresponding features in the two images are matched

• The result is a spatial mapping or transformation that transforms positions from one image to positions in another image.

• Example: Movie showing the rigid registration of two 3D MR images of a knee

Sagittal plane of image 1

Sagittal plane of image 2

Sagittal Transaxial Coronal

Image registration

• Rigid registration: translation + rotation = 6 degrees of freedom (dof)

• Affine registration: rigid + skewing + scaling = 12 dof

• Nonrigid registration: warp one image into another one– Very computationally demanding because of lots of dof

– Example: Free form deformation (FFD) models local deformation as translation of a regularly spaced grid of points (control points)

Movie showing the green MR image of a knee overlaid on top of the grey MR image of a knee before and after warping.

White arrows show the amount of translation of the control points.

Segmentation propagation

• Makes use of the spatial mapping calculated from the registration of two image to perform segmentation

• Requires an atlas– An atlas is a reference image with labelled

structures

Target imageReference image

Manual segmentation of calcaneus

Apply spatial mapping

Computed boundary of calcaneus

Segmentation propagation

calcaneus

Atlas

Rigid + non-rigidregistration

Spatial mapping

All image analysis workflows were entered into VDS

Overview

• Background– Motivations

– Virtual data system

– Automatic delineation of multiple bones in serial MR images of joints in a disease model of Rheumatoid Arthritis

– Image registration and segmentation propagation

• Methods– Prototype

• Results

• Conclusions

Prototype

• Simple web interface to replace some command line tools of VDS, Globus Toolkit 2.4 and Condor– Researchers or clinicians working on medical image analysis may

not be comfortable with command line tools and the virtual data language

– Developed using Java servlet on Apache Tomcat

– Web pages for • Querying VDS for transformations and derivations

• Invoking transformations in VDS

• Querying, uploading and downloading files to and from Globus RLS

• Displaying job status in Condor

Prototype

Web portal machine running Apache Tomcat,Globus client, personal Condor(job submission site)

Grid machine runningGlobus Gatekeeper, GridFTP server, Globus RLS and Condor

Experimental condor pool of 4 machines

(storage and execution site)

Overview

• Background– Motivations

– Virtual data system

– Automatic delineation of multiple bones in serial MR images of joints in a disease model of Rheumatoid Arthritis

– Image registration and segmentation propagation

• Methods– Prototype

• Results

• Conclusions

Resultsservices

Results

Service to delineate the calcaneus and talus from the target image

target reference_image

aregdof

Rigid registration

talus_seg

talus tal_dof

Segmentationpropagation

cal_seg

calcaneus cal_dof

Segmentationpropagation

Results

Results

Jobs generated

ResultsJob status in Condor

ResultsClick to download files and view in vtkview

ResultsService to render the surfaces of the bones

ResultsJob submitted

Job status

Results

ResultsBrowse all the executed services

and click on a file to view its provenance

Results

Overview

• Background– Motivations

– Virtual data system

– Automatic delineation of multiple bones in serial MR images of joints in a disease model of Rheumatoid Arthritis

– Image registration and segmentation propagation

• Methods– Prototype

• Results

• Conclusions

Conclusions

• We integrated Grid middleware and data provenance tool with medical image processing software in a prototype system with collaboration with GSK

• Data provenance of the results were kept in VDS. They can be queried and retrieved easily. – Aim to satisfy guidelines issued by US FDA, GLP and GCP on the

maintenance of “audit trail” of electronic records.

• The total processing time of delineating 12 bones from 6 subjects were cut down from about 132 hours to about 33 hours (a factor of 4) by running the computing tasks on a Condor pool instead of on a single desktop computer

Further work

• More user feedback is required to evaluate and improve the system

• Further validation and application to a larger amount of subjects are required to determine the sensitivity of the delineation technique to disease progression

Acknowledgements

• EPSRC

• GlaxoSmithKline (GSK)

• Links– IXI: www.ixi.org.uk– VDS: www.griphyn.org/chimera