“Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD”
Panel Presentation Internet2 Joint Techs meeting
Baton Rouge, LA 24 January 2012
Tad Reynales, Chief Infrastructure Officer California Institute for Telecommunications and Information Technology
University of California, San Diego http://www.calit2.net
1
Abstract
Campuses are experiencing an enormous increase in the quantity of data generated by scientific instruments and computational clusters and stored in massive data repositories. The shared Internet, engineered to enable interaction with megabyte-sized data objects is not capable of dealing with the gigabytes or terabytes to petabytes of modern scientific “big data”. Instead, a high performance cyberinfrastructure is emerging to support data-intensive research, which requires affordable, reliable network, compute, storage and archive resources, and staff CI expertise. Calit2, SDSC and the UCSD campus are engaged in a multi-year effort to design and deploy campus cyberinfrastructure which will integrate data generation, transmission, storage, analysis, visualization, curation and sharing, driven by applications as such as high throughput genomics -- DNA and RNA sequencing and gene expression profiling -- for scientific and medical research.
UCSD Network Backbone
V SIO LJ J N Muir K B M U S A CALIT2 BIOE EBU1 EBU3B HEP
B-720 M-720 B M
Distribution Nodes Default Gateway / L3 routing Packet Capture Forwarding Netflow Statistics Stateful Firewall MAC drops
Primary Path for all routed traffic
Backup Path Research Path VLAN extension
300 Buildings 75,000 IPs 1,000 Switches
Source: ACT, UCSD
UCSD Colocation and Research Networks
Thunder Lightning
Arista Arista
Jun.EX Jun.EX Jun.EX
CMMe
Optiput
Calit1101
Bio-e
Leichtag
BB-1
BB-2
CMMw
RC-1 RC-2
M-core B-core
Source: ACT, UCSD
UCSD Campus
CENIC Connections
HPR DC ISP
Commodity Internet
K-12 Community Colleges State Universities Akamai Google
Amazon S3
Cenic UCs
MX-0 MX-1
1 1 10 2 20 10
LA
Rvsd San Diego
San Diego (diverse)
Tustin
LA
Source: ACT, UCSD
CENIC HPR Backbone
UCSF UCB UCOP UCSC STAN UCD UCDMC
NASA
archive
NPS
UCSB
UCCSN
UCM
UofA
ASU
UCSD UCR UCI UCLA Los Nettos
USC Caltech
RIV
SAC SVL
LAX
20G
10G
1G Source: ACT, UCSD
Current UCSD Prototype Optical Core: Bridging End-Users to CENIC L1, L2, L3 Services
QuartziteCore
CalREN-HPRResearch
Cloud
Campus ResearchCloud
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE Switch withDual 10GigE Upliks
.....To cluster nodes
GigE
10GigE
...Toothernodes
Quartzite CommunicationsCore Year 3
ProductionOOO
Switch
Juniper T3204 GigE4 pair fiber
Wavelength Selective
Switch
To 10GigE clusternode interfaces
..... To 10GigE clusternode interfaces and
other switches
Packet Switch
32 10GigE
Source: Phil Papadopoulos, SDSC/Calit2 (Quartzite PI, OptIPuter co-PI)
Quartzite Network MRI #CNS-0421555; OptIPuter #ANI-0225642
Lucent
Glimmerglass
Force10
Enpoints: >= 60 endpoints at 10 GigE >= 32 Packet switched >= 32 Switched wavelengths >= 300 Connected endpoints
Approximately 0.5 TBit/s Arrive at the “Optical” Center of Campus. Switching is a Hybrid of: Packet, Lambda, Circuit -- OOO and Packet Switches
The Global Lambda Integrated Facility-- Creating a Planetary-Scale High Bandwidth Collaboratory
Research Innovation Labs Linked by 10G Dedicated Lambdas
www.glif.is/publications/maps/GLIF_5-11_World_2k.jpg
Academic Research OptIPlanet Collaboratory: A 10Gbps “End-to-End” Lightpath Cloud
National LambdaRail
Campus Optical Switch
Data Repositories & Clusters
HPC
HD/4k Video Repositories
End User OptIPortal
10G Lightpaths
HD/4k Live Video
Local or Remote Instruments
The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data
Picture Source: Mark Ellisman, David Lee, Jason Leigh
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
Scalable Adaptive Graphics
Environment (SAGE)
OptIPortal
The Latest OptIPuter Innovation: Quickly Deployable Nearly Seamless OptIPortables
45 minute setup, 15 minute tear-down with two people (possible with one)
Shipping Case
Image From the Calit2 KAUST Lab
High Definition Video Connected OptIPortals: Virtual Working Spaces for Data Intensive Research
Source: Falko Kuester, Kai Doerr Calit2; Michael Sims, Larry Edwards, Estelle Dodson NASA
Calit2@UCSD 10Gbps Link to NASA Ames Lunar Science Institute, Mountain View, CA
NASA Supports Two Virtual Institutes
LifeSize HD
2010
“Blueprint for the Digital University”--Report of the UCSD Research Cyberinfrastructure Design Team
• A Five Year Process Began Pilot Deployment in 2009-10
research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf
No Data Bottlenecks--Design for
Gigabit/s Data Flows
* COLOCATION The CIDT recommends that UCSD fund the use of at least 45 racks in this facility for near‐term needs of campus researchers to freely host their equipment, and begin discussions on how to meet long term needs.
* CENTRALIZED DISK STORAGE The CIDT recommends an initial purchase of 2 PB of raw storage capacity to supplement the Data Oasis component of SDSC’s Triton Resource, and operating funds to manage and scale up the UCSD storage resource to meet demand.
* DIGITAL CURATION AND SERVICES The CIDT recommends the establishment of the Research Data Depot, a suite of three core services designed to meet the needs of modern researchers. The three services are 1) data curation, 2) data discovery and integration, and 3) data analysis and visualization.
* RCI Network The CIDT recommends that the current RCN pilot be expanded, and requests funds to connect 25 buildings using 10 Gb/s Ethernet networking within the next several years. Funding and access philosophy would aim to encourage usage of the network.
* CONDO CLUSTERS The CIDT recommends UCSD embrace the concept of condo clusters and exploit the deployment of the Triton Resource to launch the initiative.
* CI EXPERTISE The CIDT recommends that a coordinating body be established to maintain a labor pool of such experts and work out mechanisms that would allow customers to pay for their services.
UCSD CI Design Team Recommendations
Data Oasis – 3 Different Types of Storage
HPC Storage (Lustre-Based PFS) • Purpose: Transient Storage to Support HPC, HPD, and Visualization • Access Mechanisms: Lustre Parallel File System Client
Project (Traditional File Server) Storage • Purpose: Typical Project / User Storage Needs • Access Mechanisms: NFS/CIFS “Network Drives”
Cloud Storage • Purpose: Long-Term Storage of Data that will be Infrequently Accessed • Access Mechanisms: S3 interfaces, DropBox-esq web interface, CommVault
Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable
2005 2007 2009 2010
$80K/port Chiaro (60 Max)
$ 5K Force 10 (40 max)
$ 500 Arista 48 ports
~$1000 (300+ Max)
$ 400 Arista 48 ports
• Port Pricing is Falling • Density is Rising – Dramatically • Cost of 10GbE Approaching Cluster HPC Interconnects
Source: Philip Papadopoulos, SDSC/Calit2
Arista Enables SDSC’s Massive Parallel 10G Switched Data Analysis Resource
2 12
OptIPuter
32
Co-Lo
UCSD RCI
CENIC/NLR
Trestles 100 TF
8 Dash
128 Gordon
Oasis Procurement (RFP)
• Phase0: > 8GB/s Sustained Today • Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012)
40à128
Source: Philip Papadopoulos, SDSC/Calit2
Triton 32
Radical Change Enabled by Arista 7508 10G Switch
384 10G Capable
8 Existing
Commodity Storage 1/3 PB
2000 TB > 50 GB/s
10Gbps
5 8 2
4
NSF Funds a Big Data Supercomputer: SDSC’s Gordon-Dedicated Dec. 5, 2011
• Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW – Emphasizes MEM and IOPS over FLOPS – Supernode has Virtual Shared Memory:
– 2 TB RAM Aggregate – 8 TB SSD Aggregate
– Total Machine = 32 Supernodes – 4 PB Disk Parallel File System >100 GB/s I/O
• System Designed to Accelerate Access to Massive Datasets being Generated in Many Fields of Science, Engineering, Medicine, and Social Science
Source: Mike Norman, Allan Snavely SDSC
UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage
Source: Philip Papadopoulos, SDSC, UCSD
OptIPortal Tiled Display Wall
Campus Lab Cluster
Digital Data Collections
N x 10Gb/s
Triton – Petascale Data Analysis
Gordon – HPD System
Cluster Condo
WAN 10Gb: CENIC, NLR, I2
Scientific Instruments
DataOasis (Central) Storage
GreenLight Data Center
Making University Campuses Living Laboratories for the Greener Future
www.educause.edu/EDUCAUSE+Review/EDUCAUSEReviewMagazineVolume44/CampusesasLivingLaboratoriesfo/185217
Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server
512 Processors ~5 Teraflops
~ 200 Terabytes Storage 1GbE and
10GbE Switched/ Routed
Core
~200TB Sun
X4500 Storage
10GbE
Source: Phil Papadopoulos, SDSC, Calit2
Grant Announced January 17, 2006
Calit2 CAMERA: Over 4000 Registered Users From Over 80 Countries
Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis
http://camera.calit2.net/
The greening of the data center
Multiple strategies are possible: – More energy efficient compute/storage/network designs – Local energy efficient data center designs – Remote data centers near less expensive energy sources – Cloud resources and cloud-based services (local, remote) – Novel cooling technologies
– Liquid-cooled CPU’s, computers and racks – Oil immersion designs
– Containerized data centers (HP Pod, IBM PMDC, Cirrascale Forest) – Energy-efficient algorithms; software to move workload around
– UCSD campus energy production (solar, co-generation, fuel cell) * 2 MW solar, 5 MW renewal produced on campus; 40 MW peak demand
* 2.8 MW fuel cell planned, along with 2.8 MW advanced storage facility
The GreenLight Project: Instrumenting the Energy Cost of Computational Science • Focus on 5 Communities with At-Scale Computing Needs:
– Metagenomics – Ocean Observing – Microscopy – Bioinformatics – Digital Media
• Measure, Monitor, & Web Publish Real-Time Sensor Outputs – Via Service-oriented Architectures – Allow Researchers Anywhere To Study Computing Energy Cost – Enable Scientists To Explore Tactics For Maximizing Work/Watt
• Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness
• Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing
Source: Tom DeFanti, Calit2; GreenLight PI
GreenLight Project: Remote Visualization of Data Center
Source: Virtual Reality Lab, Calit2
GreenLight Project Heat Distribution
Combined heat + fans
Realis2c correla2on
Source: glimpse.calit2.net
GreenLight Project Allows for Testing of Novel Architectures on Bioinformatics Algorithms
“Our version of MS-Alignment [a proteomics algorithm] is more than 115x faster than a single core of an Intel Nehalem processor, is more than 15x faster than an eight-core version, and reduces the runtime for a few samples from 24 hours to just a few hours.” —From “Computational Mass Spectrometry in a Reconfigurable Coherent Co-processing Architecture,” IEEE Design & Test of Computers, Yalamarthy (ECE), Coburn (CSE), Gupta (CSE), Edwards (Convey), and Kelly (Convey) (2011)
June 23, 2009
http://research.microsoft.com/en-us/um/cambridge/events/date2011/msalignment_dateposter_2011.pdf
Using UCSD RCI to Store and Analyze Next Gen Sequencer Datasets
Source: Chris Misleh, SOM/Calit2 UCSD
Stream Data from Genomics Lab to GreenLight Storage, NFS Mount Over 10Gbps to Triton Compute Cluster
Cost Per Megabase in Sequencing DNA is Falling Much Faster Than Moore’s Law
www.genome.gov/sequencingcosts/
BGI—The Beijing Genome Institute is the World’s Largest Genomic Institute
• Main Facilities in Shenzhen and Hong Kong, China – Branch Facilities in Copenhagen, Boston, UC Davis
• 137 Illumina HiSeq 2000 Next Generation Sequencing Systems – Each Illumina Next Gen Sequencer Generates 25 Gigabases/Day
• Supported by High Performance Computing and Storage – ~160TF, 33TB Memory – Large-Scale (12PB) Storage
From 10,000 Human Genomes Sequenced in 2011 to 1 Million by 2015 in Less Than 5,000 sq. ft.!
4 Million Newborns / Year in U.S.
Needed: Interdisciplinary Teams Made From Computer Science, Data Analytics, and Genomics
Future 100 Gbps Cancer Genomics network
http://gigaom.com/cloud/fighting-cancer-at-100-gigabits-per-second/
UCSD Planned Optical Networked Biomedical Researchers and Instruments
Cellular & Molecular Medicine West
National Center for
Microscopy & Imaging
Leichtag Biomedical Research
Center for Molecular Genetics Pharmaceutical
Sciences Building
Cellular & Molecular Medicine East
CryoElectron Microscopy Facility
Radiology Imaging Lab
Bioengineering
Calit2@UCSD
San Diego Supercomputer
Center
GreenLight Data Center
• Connects at 10 Gbps : – Microarrays – Genome Sequencers – Mass Spectrometry – Light and Electron
Microscopes – Whole Body Imagers – Computing – Storage
Summary -- 6 Ingredients of Campus RCI RCI requires sustainable funding models to enable affordable, reliable: I. High performance networks – lab, campus à regional, national, global
II. Storage resources and services to store and preserve research data
III. Compute resources – shared services, colocation or condo clusters
IV. Data curation and management to share and archive research data - Federated with metadata standards and Digital Asset Management system - Domain-specific repositories (Protein Data Bank, GenBank, Cancer Genome)
V. Energy source(s) and data center space (several possible strategies)
VI. Cyberinfrastructure expertise (training, experience, collaboration)
…one more ingredient of RCI: Microbrews!
Acknowledgements Sponsors: • Larry Smarr, Director, Calit2 (a UC San Diego/UC Irvine partnership) • Ramesh Rao, Director, UCSD Division, Calit2 Sources: * Larry Smarr, Professor of Computer Science and Engineering, UCSD • Tom DeFanti, Director of Visualization, Calit2 • Philip Papadopolous, Division Director, SDSC and Researcher, Calit2 • Brian Dunne, network engineer, Calit2 • Chris Misleh, sysadmin, high-throughput genomics, CALIT2 • UCSD School of Medicine, High-Throughput Genomics Core • Valerie Polichar, Infrastructure Liaison, Administrative Computing &
Telecommunications, UCSD • Mark Shinn, (former) network architect, ACT, UCSD • Corporation for Education Network Initiatives in California (CENIC)
• www.calit2.net rci.ucsd.edu www.optiputer.net greenlight.calit2.net
Top Related