Post on 10-May-2015
description
Ian Foster
Computation Institute
Argonne National Lab & University of Chicago
The present and future role of computers in medicine
2
Credits Thanks for support from
Chan Soon-Shiong Foundation Department of Energy National Institutes of Health National Science Foundation
And for many helpful conversations, Carl Kesselman, Jonathan Silverstein, Steve Tuecke, Stephan Erberich, Steve Graham, Ravi Madduri, and Patrick Soon-Shiong
3Biology is shifting from being an observational science to a quantitative
molecular science Old biology: measure
one/two things in two/three conditions
High cost per measurement
Analysis straightforward as little data
Enormously difficult to work out pathways due to inadequate data
New biology: measure 10,000 things under
many conditions Low cost per
measurement Analysis no longer
straightforward Payoff can be bigger:
potential to understand a complex system
Ajay Jain, UCSF
4
Change health care
from
an empirical, qualitative system of silos of information
to a model of
predictive, quantitative, shared, evidence-based outcomes
5
The health care information technology chasm
Health care IT [is] rarely used to provide clinicians with evidence-based decision
support and feedback; to support data-driven process improvement; or
to link clinical care and research.
Computational Technology for Effective Health Care, NRC, 2009
6
7
8
9
Digital power =computing x communication x storage x content
Moore’s law
doubles every 18 months
John Seely Brown
community law
nx 2 where n is # people
disk law
doublesx every 12 months
fiber law
doublesx every 9 months
10
(Intel)
12
Marching towards manycore Intel’s 80 core prototype
2-D mesh interconnect 62 W power
Tilera 64 core system 8x8 grid of cores 5 MB coherent cache 4 DDR2 controllers 2 10 GbE interfaces
IBM Cell PowerPC and 8 cores
12Dan Reed, Microsoft
13
1940 1950 1960 1970 1980 1990 2000 2010
Year Introduced
1E+2
1E+5
1E+8
1E+11
1E+14
1E+17
Pe
ak
Sp
ee
d (flo
ps
)
Doubling time = 1.5 yr.
ENIAC (vacuum tubes)UNIVAC
IBM 701IBM 704
IBM 7090 (transistors)
IBM Stretch
CDC 6600 (ICs)
CDC 7600
CDC STAR-100 (vectors) CRAY-1Cyber 205 X-MP2 (parallel vectors)
CRAY-2X-MP4 Y-MP8
i860 (MPPs)
ASCI White, ASCI Q
Petaflop
Blue Gene/L
Blue Pacific
DeltaCM-5 Paragon
NWT
ASCI Red OptionASCI Red
CP-PACS
Earth
VP2600/10SX-3/44
Red Storm
ILLIAC IV
SX-2
SX-4
SX-5
S-810/20
T3D
T3E
multi-Petaflop
Thunder
The evolution of the fastest supercomputer
Argonne
My laptop
14
The Argonne IBM BG/P
15
www.top500.org
1
3-42
>128K
16
G. Karniadakis et al.
Simulation of the human
arterial tree on the TeraGrid
17
18Storage costs
(PC Magazine, Oct 2, 2007)
20
Growth of Genbank
(1982-2005)
Broad Institute
21
More data does not always mean more knowledge
Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch, August 2006.
22
The Red Queen’s race
"Well, in our country," said Alice … "you'd generally get to somewhere else — if you run very fast for a long time, as we've been doing.”
"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"
23Computing on demand
Public PUMA knowledge base
Information about proteins analyzed against ~2 million gene sequences
Back officeanalysis on Grid
Millions of BLAST, BLOCKS, etc., onOSG and TeraGrid
Natalia Maltsev et al.
26
1993 1998 20021984 1994 1998 2000
Cost perGigabit-
Mile
Capacity increase and new economics
Optical networking breakthrough!
50 Mbps2.5 Gbps
1.6 Tbps
320 Gbps
Moore’sLaw
Revolution
Nortel
27
Optical switches
Lucent
28
New ways of knowing
300 BCE 1700 1950 1990
Empiricism
Data
Theory
Simulation
Enhanced by the power of collaboration
29
30Quantitative medicine is the key to reducing healthcare costs and
improving healthcare outcomes
Patients with same diagnosis
31Quantitative medicine is the key to reducing healthcare costs and
improving healthcare outcomes
Patients with same diagnosis
Misdiagnosed
Non-responderstoxic responders
Non-toxic responders
32
Leukemia and Lymphoma
After Mara Aspinall, GenzymeGenetics; Felix W. Frueh, FDA
33
Leukemia and Lymphoma
After Mara Aspinall, GenzymeGenetics; Felix W. Frueh, FDA
34Currently, 17% of Burkitt's Lymphoma are incorrectly diagnosed as
Diffuse Large B Cell Lymphoma
ClassicBurkitt’s Lymphoma
AtypicalBurkitt’s Lymphoma
Diffuse LargeB Cell Lymphoma
Louis Staudt, National Cancer Institute
36
Dave et al, NEJM, June 8, 2006.
Survival estimates for patients with Burkitt's Lymphoma
Best treatment for Diffuse Large B Cell
Lymphoma
Best treatment for Burkitt’s Lymphoma
37Burkitt’s
LymphomaDiffuse Large
B-cell Lymphoma
Louis Staudt, National Cancer Institute
Classic Atypical
38Imaging biomarkers: Diffusion Tensor Imaging and brain injury
Kraus et al., Brain (2007), 130, 2508-2519
39
Enabling quantitative medicine
Collect a lot of patient data Analyze data to infer effective treatments Identify personalized treatment plans
Clinical practice
Basic research
Clinical trials
40
Challenges
Increasing volumes of data, types of data: genomics, blood proteins, imaging, …
New science and treatments are hidden in the data, not the biology (biomarkers)
Too much for the individual physician or researcher to absorb
… have to pay attention to cognitive support … computer-based tools and systems that offer clinicians and patients assistance for thinking about and solving problems related to specific instances of health care.NRC Report on Computational Technology for Effective Health Care:
Immediate Steps and Strategic Directions, 2009
41Bridging silos to enable quantitative medicine
Basic research
Clinical practice
Clinical trials
trial subjects, outcomes
library
Outco
mes
,
tissu
e ba
nksc
reen
ing
test
s
ongoing
investigative
studies
pathways
42
Addressing urban health
needs
43
Important characteristics
We must integrate systems that may not have worked together before
These are human systems, with differing goals, incentives, capabilities
All components are dynamic—change is the norm, not the exception
Processes are evolving rapidly too
We are not building something simple like a
bridge or an airline reservation system
44
Healthcare is acomplex adaptive system
A complex adaptive system is a collection of individual
agents that have the freedom to act in ways that are not always predictable
and whose actions are interconnected such that
one agent’s actions changes the context
for other agents.
Crossing the Quality Chasm, IOM, 2001; pp 312-13
Non-linear and dynamic Agents are independent
and intelligent Goals and behaviors
often in conflict Self-organization through
adaptation and learning No single point(s) of
control Hierarchical decomp-
osition has limited value
45
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
We need to function in the zone of complexity
Plan and
control
Chaos
Zone of
complexity
Ralph Stacey, Complexity and Creativity in Organizations, 1996
46
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
We need to function in the zone of complexity
Plan and
control
Chaos
Ralph Stacey, Complexity and Creativity in Organizations, 1996
47We call these groupingsvirtual organizations (VOs)
Healthcare = dynamic, overlapping VOs, linking Patient – primary care Sub-specialist – hospital Pharmacy – laboratory Insurer – …
A set of individuals and/or institutions engaged in the controlled sharing of
resources in pursuit of a common goal
But U.S. health system is marked by
fragmented and inefficient VOs with
insufficient mechanisms for
controlled sharing
I advocate … a model of virtual integration rather than true vertical integration … G. Halvorson, CEO Kaiser
48
The Grid paradigm
1995 2000 2005 2010
Principles and mechanisms for dynamic VOs Leverage service oriented architecture (SOA) Loose coupling of
data and services Open software,
architecture
Computer science
Physics
Astronomy
Engineering
Biology
Biomedicine
Healthcare
49
The Grid paradigm and healthcare information integration
Radiology Medical records
Name data and move it around
Make data usable and useful
Make data accessible over the network
Pathology Genomics Labs
Man
ag
e w
ho ca
n d
o w
hat
RHIOData
sources
Platform services
[Grid architecture joint work with Carl Kesselman, Steve Tuecke, Stephan Erberich, and others]
50
The Grid paradigm and healthcare information integration
Transform data into knowledge
Radiology Medical records
Management
Integration
Publication
Enhance user cognitive processes
Incorporate into business processes
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
51
The Grid paradigm and healthcare information integration
Analysis
Radiology Medical records
Management
Integration
Publication
Cognitive support
Applications
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
Value services
52
We partition the multi-faceted interoperability problem
Process interoperability Integrate work across
healthcare enterprise Data interoperability
Syntactic: move structured data among system elements
Semantic: use information across system elements
Systems interoperability Communicate securely, reliably
among system elements
Analysis
Management
Integration
Publication
Applications
53
Publication:Make information accessible
Make data available in a remotely accessible, reusable manner
Leave mediation for integration layer
Gateway from local policy/protocol into wide area mechanisms (transport, security, …)
54
Childrens Oncology Group
Neuroblastoma Cancer
Foundation
Imaging clinical trials
55
NANTCOG
Stephan Erberich,Carl Kesselman, et al.
56
As of Oct19, 2008:
122 participants105 services
70 data35 analytical
57
Data movement in clinical trials
(Center for Health Informatics)
58Community public health:Digital retinopathy screening network
(Center for Health Informatics)
59
Integration:Making data usable and useful
?
0% 100% Degree of prior syntactic and semantic agreement
Degree of communication
0%
100%
Rigid standards-based approach
Loosely coupled approach
Adaptive approach
60
Integration via mediation
Map between models Scoped to domain use
Multiple concurrent use
Bottom up mediation between standards and
versions between local versions in absence of agreement
Query reformulation
Query optimization
Query execution engine
Wrapper
Query in the source schema
Wrapper
Query in union of exported source schema
Distributed query execution
Global Data Model
Alon Halevy, 2000
61
Analytics:Transform data into knowledge
“The overwhelming success of genetic and genomic research efforts has created an enormous backlog of data with the potential to improve the quality of patient care and cost effectiveness of treatment.”
— US Presidential Council of Advisors on Science and Technology, Personalized Medicine Themes, 2008
62
Created
Eligible patients
Enrolled/ evaluated
The imagepyramid
Published
Michael Vannier
63Microarray clustering using Taverna
1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
3. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
Workflow in/output
caGrid services
“Shim” servicesothers
Wei Tan et al.
64
Many many tasks:Identifying potential drug targets
2M+ ligands Protein xtarget(s)
Benoit Roux et al.
65
start
report
DOCK6Receptor
(1 per protein:defines pocket
to bind to)
ZINC3-D
structures
ligands complexes
NAB scriptparameters
(defines flexibleresidues,
#MDsteps)
Amber Score:1. AmberizeLigand
3. AmberizeComplex5. RunNABScript
end
BuildNABScript
NABScript
NABScript
Template
Amber prep:2. AmberizeReceptor4. perl: gen nabscript
FREDReceptor
(1 per protein:defines pocket
to bind to)
Manually prepDOCK6 rec file
Manually prepFRED rec file
1 protein(1MB)
6 GB2M
structures(6 GB)
DOCK6FRED~4M x 60s x 1 cpu
~60K cpu-hrs
Amber~10K x 20m x 1 cpu
~3K cpu-hrs
Select best ~500
~500 x 10hr x 100 cpu~500K cpu-hrsGCMC
PDBprotein
descriptions
Select best ~5KSelect best ~5K
For 1 target:4 million tasks
500,000 cpu-hrs(50 cpu-years)
66DOCK on BG/P: ~1M tasks on 118,000 CPUs
CPU cores: 118784 Tasks: 934803 Elapsed time:
7257 sec Compute time:
21.43 CPU years Average task time:
667 sec Relative Efficiency:
99.7% (from 16 to 32 racks)
Utilization: Sustained: 99.6% Overall: 78.3%
Time (secs)
Ioan Raicu et al.
67
The health care information technology chasm
Health care IT [is] rarely used to provide clinicians with evidence-based decision
support and feedback; to support data-driven process improvement; or
to link clinical care and research.
Computational Technology for Effective Health Care, NRC, 2009
68
Six research challenges for information technology and healthcare
Patient-centered cognitive support Modeling—an individualized virtual patient Automation—integrated use, adaptivity Data sharing and collaboration Data management at scale Automated full capture of physician-patient
interactions
Computational Technology for Effective Health Care, NRC, 2009
69
Six research challenges for information technology and healthcare
Patient-centered cognitive support Modeling—an individualized virtual patient Automation—integrated use, adaptivity Data sharing and collaboration Data management at scale Automated full capture of physician-patient
interactions
Computational Technology for Effective Health Care, NRC, 2009
70
Ralph Stacey, Complexity and Creativity in Organizations, 1996
Low
LowHigh
High
Agreementabout
outcomes
Certainty about outcomes
Functioning in the zone of complexity
Plan and
control
Chaos
71
The Grid paradigm and healthcare information integration
Analysis
Radiology Medical records
Management
Integration
Publication
Cognitive support
Applications
Pathology Genomics Labs
Secu
rity a
nd
policy
RHIOData
sources
Platform services
Value services
72
“People tend to overestimate the short-term impact of
change, and underestimate the long-term impact.”
— Roy Amara
“The computer revolution hasn’t happened yet.”
— Alan Kay, 1997
Thank you!
Computation Institutewww.ci.uchicago.edu