“Collaborations Between Calit2, SIO, and the Venter Institute—a Beginning"
Talk to the
Venter Institute Board
La Jolla, CA
December 5, 2005
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology;
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
Driving Cyberinfrastructure with Environmental Metagenomics
Samples Collected by Sorcerer II
How did Calit2, SIO, and VI Arrive at This Unified Vision?
Funded Today!$24. 5 M
Over 7 Years
J. Craig Venter, et al.
Science 2 April 2004:
Vol. 304. pp. 66 - 74
Prochlorococcus Microbacterium
Burkholderia
Rhodobacter SAR-86
unknown
unknown
Metagenomics “Extreme Assembly” Requires Large Amount of Pixel Real Estate
Source: Karin RemingtonJ. Craig Venter Institute
Metagenomics Requires a Global View of Data and the Ability to Zoom Into Detail Interactively
Overlay of Metagenomics Data onto Sequenced Reference Genomes(This Image: Prochloroccocus marinus MED4)
Source: Karin RemingtonJ. Craig Venter Institute
The OptIPuter – Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data
Green: Purkinje CellsRed: Glial CellsLight Blue: Nuclear DNA
Source: Mark
Ellisman, David Lee,
Jason Leigh
300 MPixel Image!
Calit2 (UCSD, UCI) and UIC Lead Campuses—Larry Smarr PIPartners: SDSC, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Scalable Displays Allow Both Global Content and Fine Detail
Source: Mark
Ellisman, David Lee,
Jason Leigh
30 MPixel SunScreen Display Driven by a 20-node Sun Opteron Visualization Cluster
Allows for Interactive Zooming from Cerebellum to Individual Neurons
Source: Mark Ellisman, David Lee, Jason Leigh
Why Optical NetworksWill Become the 21st Century Driver
Scientific American, January 2001
Number of Years0 1 2 3 4 5
Pe
rfo
rma
nc
e p
er
Do
llar
Sp
en
t
Data Storage(bits per square inch)
(Doubling time 12 Months)
Optical Fiber(bits per second)
(Doubling time 9 Months)
Silicon Computer Chips(Number of Transistors)
(Doubling time 18 Months)
Challenge: Average Throughput of NASA Data Products to End User is Only < 50 Megabits/s
Tested from GSFC-ICESATJanuary 2005
http://ensight.eos.nasa.gov/Missions/icesat/index.shtml
fc *
Solution: Individual 1 or 10Gbps Lightpaths -- “Lambdas on Demand”
(WDM)
Source: Steve Wallach, Chiaro Networks
“Lambdas”
San Francisco Pittsburgh
Cleveland
National Lambda Rail (NLR) and TeraGrid Provides Cyberinfrastructure Backbone for U.S. Researchers
San Diego
Los Angeles
Portland
Seattle
Pensacola
Baton Rouge
HoustonSan Antonio
Las Cruces /El Paso
Phoenix
New York City
Washington, DC
Raleigh
Jacksonville
Dallas
Tulsa
Atlanta
Kansas City
Denver
Ogden/Salt Lake City
Boise
Albuquerque
UC-TeraGridUIC/NW-Starlight
Chicago
International Collaborators
NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout
NSF’s TeraGrid Has 4 x 10Gb Lambda Backbone
Links Two Dozen State and Regional Optical
Networks
DOE, NSF, & NASA
Using NLR
Extending Telepresence with Remote Interactive Analysis of Data Over NLR
HDTV Over Lambda
OptIPuter Visualized
Data
SIO/UCSD
NASA Goddard
www.calit2.net/articles/article.php?id=660
August 8, 2005
25 Miles
Venter Institute
First Trans-Pacific Super High Definition Telepresence Meeting in New Calit2 Digital Cinema Auditorium
Keio University President Anzai
UCSD Chancellor Fox
Sony NTT SGI
Lays Technical Basis for Global Scientific
Collaboration
September 26-30, 2005Calit2 @ University of California, San Diego
California Institute for Telecommunications and Information Technology
Calit2@UCSD Is Connected to the World at 10,000 Mbps
iGrid
2005T H E G L O B A L L A M B D A I N T E G R A T E D F A C I L I T Y
Maxine Brown, Tom DeFanti, Co-Chairs
www.igrid2005.org
50 Demonstrations, 20 Counties, 10 Gbps/Demo
Calit2 is Partnering with SIOto Prototype a Digital Environment Research Systems
• Viewing and Analyzing Earth Satellite Data Sets• Earth Topography• Atmospheric Brown Clouds• Climate Modeling • Surface, Subsurface, and Ocean Floor Observatories• Coastal Zone Data Assimilation• Ocean Environmental Metagenomics
John Orcutt, Director CEOADeputy Director, SIO
Smarr March 2005 Talk to SIO CouncilLed to Calit2 Discussions with Craig Venter
First Remote Interactive High Definition Video Exploration of Deep Sea Vents
Source John Delaney & Deborah Kelley, UWash
Canadian-U.S. Collaboration
A Near Future Metagenomics Fiber Optic-Enabled Data Generator
Source John Delaney, UWash
www.sccoos.org
Use SCCOOS As Prototype for Coastal Zone Data Assimilation Testbed
Goal:
Link SCCOOS Sites with
LambdaGridto
Prototype Future
Ocean and Earth
Sciences Observing
System
Yellow—Proposed Initial Lambda Backbone
Use OptIPuter to Couple Data Assimilation Models to Remote Data Sources Including Biology
Regional Ocean Modeling System (ROMS) http://ourocean.jpl.nasa.gov/
NASA MODIS Mean Primary Productivity for April 2001 in California Current System
Marine Microbial MetagenomicsFrom Species Genomes to Ecological Genomes
• Each Sequence is a Part of an Entire Biological Community• Sequences, Genes and Gene Families, Coupled With
Environmental Metadata– Tremendous Potential to Better Understand the Functioning
of Natural Ecosystems
• Challenge– Much More Powerful Information Infrastructure Required to
Support Metagenomics
Scripps Genome Center
Dr. Terry Gaasterland
Evolution is the Principle of Biological Systems:Most of Evolutionary Time Was in the Microbial World
You Are
Here
Source: Carl Woese, et al
Much of Genome Work Has
Occurred in Animals
Comparative Genomics Can Reveal Biological FactsThat Are Not Visible Within a Species
“After sequencing these three genomes, it is clear that substantial rearrangements in the human genome happen only once in a million years, while the rate of rearrangements in the rat and
mouse is much faster.”--Glenn Tesler, UCSD Dept. of Mathematics
www.calit2.net/culture/features/2004/4-1_pevzner.html
Co-Authors Pavel Pevzner and Glenn Tesler, UCSD
April 1, 2004 December 05, 2002December 9, 2004
Advanced Algorithmic Techniques Reveal Unexpected Results
“Many of the chicken–human aligned,
non-coding sequences occur
far from genes, frequently in clusters
that seem to be under selection for
functions that are not yet understood.”
Nature 432, 695 - 716 (09 December 2004)
David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen, Eran Halperin, Eleazar Eskin, Dennis G. Ballinger,
Kelly A. Frazer, David R. Cox. “Whole-Genome Patterns of Common DNA Variation
in Three Human Populations” Science 18 February, 2005: 307(5712):1072-1079.
Calit2 Researcher Eskin Collaborates with Perlegen Sciences on Map of Human Genetic Variation Across Populations
“We have characterized whole-genome patterns of common human DNA variation by genotyping
1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian
ancestry.”
“Although knowledge of a single genetic risk factor can seldom be used to predict the treatment
outcome of a common disease, knowledge of a large fraction of all the major genetic risk factors contributing to a treatment response or common
disease could have immediate utility, allowing existing treatment options to be matched to
individual patients without requiring additional knowledge of the mechanisms by which the genetic
differences lead to different outcomes .”“More detailed haplotype
analysis results are available at http://research.calit2.net/hap/wgha/ “
The Bioinformatics Core of the Joint Center for Structural Genomics will be Housed in the Calit2@UCSD Building
Extremely Thermostable -- Useful for Many Industrial Processes (e.g. Chemical and Food)
173 Structures (122 from JCSG)
• Determining the Protein Structures of the Thermotoga Maritima Genome • 122 T.M. Structures Solved by JCSG (75 Unique In The PDB) • Direct Structural Coverage of 25% of the Expressed Soluble Proteins• Probably Represents the Highest Structural Coverage of Any Organism
Source: John Wooley, UCSD
Web PortalRich Clients
Providing Integrated Grid Software and Infrastructure for Multi-Scale BioModeling
Telescience Portal
Grid Middleware and Web Services
Workflow
MiddlewarePMV ADT
Vision Continuity
APBSCommand
Grid and Cluster Computing Applications Infrastructure
Rocks Grid of ClustersAPBS Continuity
Gtomo2TxBRAutodockGAMESS
QMView
National Biomedical Computation Resource an NIH supported resource center
Located in Calit2@UCSD Building
Calit2 Intends to Jump BeyondTraditional Web-Accessible Databases
Data Backend
(DB, Files)
W E
B P
OR
TA
L(p
re-f
ilte
red
, q
ue
rie
sm
eta
da
ta)
Response
Request
BIRN
PDB
NCBI Genbank+ many others
Source: Phil Papadopoulos, SDSC, Calit2
Flat FileServerFarm
W E
B P
OR
TA
L
TraditionalUser
Response
Request
DedicatedCompute Farm(100s of CPUs)
TeraGrid: Cyberinfrastructure Backplane(scheduled activities, e.g. all by all comparison)
(10000s of CPUs)
Web(other service)
Local Cluster
LocalEnvironment
DirectAccess LambdaCnxns
Op
tIPu
ter
Clu
ste
r C
lou
dData-BaseFarm
10 GigE Fabric
Calit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics Server
Source: Phil Papadopoulos, SDSC, Calit2+
We
b S
erv
ice
s
What Will Our Core Data Sets Be?
• Metagenomic– Sargasso Sea + Sorcerer II Expedition (GOS)– JGI Community Sequencing Project
• Microbial Genomes– Moore Marine Microbial Project– JGI Community Sequencing Project– Other Relevant genomes (e.g., from Genbank)
• Standard– Non-Redundant Nucleotide and AA Databases
• Environmental and Satellite data– NOAA Oceans and NASA Goddard Satellite Date
Source: Saul KravitzDirector of Software Engineering
J. Craig Venter Institute
Looking Back Nearly 4 Billion YearsIn the Evolution of Microbe Genomics
Science Falkowski and Vargas 304 (5667): 58
Top Related