Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus...

37
Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSDPanel Presentation Internet2 Joint Techs meeting Baton Rouge, LA 24 January 2012 Tad Reynales, Chief Infrastructure Officer California Institute for Telecommunications and Information Technology University of California, San Diego http://www.calit2.net 1

Transcript of Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus...

Page 1: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

“Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD”

Panel Presentation Internet2 Joint Techs meeting

Baton Rouge, LA 24 January 2012

Tad Reynales, Chief Infrastructure Officer California Institute for Telecommunications and Information Technology

University of California, San Diego http://www.calit2.net

1

Page 2: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Abstract

Campuses are experiencing an enormous increase in the quantity of data generated by scientific instruments and computational clusters and stored in massive data repositories. The shared Internet, engineered to enable interaction with megabyte-sized data objects is not capable of dealing with the gigabytes or terabytes to petabytes of modern scientific “big data”. Instead, a high performance cyberinfrastructure is emerging to support data-intensive research, which requires affordable, reliable network, compute, storage and archive resources, and staff CI expertise. Calit2, SDSC and the UCSD campus are engaged in a multi-year effort to design and deploy campus cyberinfrastructure which will integrate data generation, transmission, storage, analysis, visualization, curation and sharing, driven by applications as such as high throughput genomics -- DNA and RNA sequencing and gene expression profiling -- for scientific and medical research.

Page 3: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

UCSD Network Backbone

V SIO LJ J N Muir K B M U S A CALIT2 BIOE EBU1 EBU3B HEP

B-720 M-720 B M

Distribution Nodes Default Gateway / L3 routing Packet Capture Forwarding Netflow Statistics Stateful Firewall MAC drops

Primary Path for all routed traffic

Backup Path Research Path VLAN extension

300 Buildings 75,000 IPs 1,000 Switches

Source: ACT, UCSD

Page 4: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

UCSD Colocation and Research Networks

Thunder Lightning

Arista Arista

Jun.EX Jun.EX Jun.EX

CMMe

Optiput

Calit1101

Bio-e

Leichtag

BB-1

BB-2

CMMw

RC-1 RC-2

M-core B-core

Source: ACT, UCSD

Page 5: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

UCSD Campus

CENIC Connections

HPR DC ISP

Commodity Internet

K-12 Community Colleges State Universities Akamai Google

Amazon S3

Cenic UCs

MX-0 MX-1

1 1 10 2 20 10

LA

Rvsd San Diego

San Diego (diverse)

Tustin

LA

Source: ACT, UCSD

Page 6: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

CENIC HPR Backbone

UCSF UCB UCOP UCSC STAN UCD UCDMC

NASA

archive

NPS

UCSB

UCCSN

UCM

UofA

ASU

UCSD UCR UCI UCLA Los Nettos

USC Caltech

RIV

SAC SVL

LAX

20G

10G

1G Source: ACT, UCSD

Page 7: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Current UCSD Prototype Optical Core: Bridging End-Users to CENIC L1, L2, L3 Services

QuartziteCore

CalREN-HPRResearch

Cloud

Campus ResearchCloud

GigE Switch withDual 10GigE Upliks

.....To cluster nodes

GigE Switch withDual 10GigE Upliks

.....To cluster nodes

GigE Switch withDual 10GigE Upliks

.....To cluster nodes

GigE

10GigE

...Toothernodes

Quartzite CommunicationsCore Year 3

ProductionOOO

Switch

Juniper T3204 GigE4 pair fiber

Wavelength Selective

Switch

To 10GigE clusternode interfaces

..... To 10GigE clusternode interfaces and

other switches

Packet Switch

32 10GigE

Source: Phil Papadopoulos, SDSC/Calit2 (Quartzite PI, OptIPuter co-PI)

Quartzite Network MRI #CNS-0421555; OptIPuter #ANI-0225642

Lucent

Glimmerglass

Force10

Enpoints: >= 60 endpoints at 10 GigE >= 32 Packet switched >= 32 Switched wavelengths >= 300 Connected endpoints

Approximately 0.5 TBit/s Arrive at the “Optical” Center of Campus. Switching is a Hybrid of: Packet, Lambda, Circuit -- OOO and Packet Switches

Page 8: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

The Global Lambda Integrated Facility-- Creating a Planetary-Scale High Bandwidth Collaboratory

Research Innovation Labs Linked by 10G Dedicated Lambdas

www.glif.is/publications/maps/GLIF_5-11_World_2k.jpg

Page 9: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Academic Research OptIPlanet Collaboratory: A 10Gbps “End-to-End” Lightpath Cloud

National LambdaRail

Campus Optical Switch

Data Repositories & Clusters

HPC

HD/4k Video Repositories

End User OptIPortal

10G Lightpaths

HD/4k Live Video

Local or Remote Instruments

Page 10: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

The OptIPuter Project: Creating High Resolution Portals Over Dedicated Optical Channels to Global Science Data

Picture Source: Mark Ellisman, David Lee, Jason Leigh

Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent

Scalable Adaptive Graphics

Environment (SAGE)

OptIPortal

Page 11: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

The Latest OptIPuter Innovation: Quickly Deployable Nearly Seamless OptIPortables

45 minute setup, 15 minute tear-down with two people (possible with one)

Shipping Case

Image From the Calit2 KAUST Lab

Page 12: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

High Definition Video Connected OptIPortals: Virtual Working Spaces for Data Intensive Research

Source: Falko Kuester, Kai Doerr Calit2; Michael Sims, Larry Edwards, Estelle Dodson NASA

Calit2@UCSD 10Gbps Link to NASA Ames Lunar Science Institute, Mountain View, CA

NASA Supports Two Virtual Institutes

LifeSize HD

2010

Page 13: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

“Blueprint for the Digital University”--Report of the UCSD Research Cyberinfrastructure Design Team

•  A Five Year Process Began Pilot Deployment in 2009-10

research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf

No Data Bottlenecks--Design for

Gigabit/s Data Flows

Page 14: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

* COLOCATION The CIDT recommends that UCSD fund the use of at least 45 racks in this facility for near‐term needs of campus researchers to freely host their equipment, and begin discussions on how to meet long term needs.

* CENTRALIZED DISK STORAGE The CIDT recommends an initial purchase of 2 PB of raw storage capacity to supplement the Data Oasis component of SDSC’s Triton Resource, and operating funds to manage and scale up the UCSD storage resource to meet demand.

* DIGITAL CURATION AND SERVICES The CIDT recommends the establishment of the Research Data Depot, a suite of three core services designed to meet the needs of modern researchers. The three services are 1) data curation, 2) data discovery and integration, and 3) data analysis and visualization.

* RCI Network The CIDT recommends that the current RCN pilot be expanded, and requests funds to connect 25 buildings using 10 Gb/s Ethernet networking within the next several years. Funding and access philosophy would aim to encourage usage of the network.

* CONDO CLUSTERS The CIDT recommends UCSD embrace the concept of condo clusters and exploit the deployment of the Triton Resource to launch the initiative.

* CI EXPERTISE The CIDT recommends that a coordinating body be established to maintain a labor pool of such experts and work out mechanisms that would allow customers to pay for their services.

UCSD CI Design Team Recommendations

Page 15: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Data Oasis – 3 Different Types of Storage

HPC Storage (Lustre-Based PFS) •  Purpose: Transient Storage to Support HPC, HPD, and Visualization •  Access Mechanisms: Lustre Parallel File System Client

Project (Traditional File Server) Storage •  Purpose: Typical Project / User Storage Needs •  Access Mechanisms: NFS/CIFS “Network Drives”

Cloud Storage •  Purpose: Long-Term Storage of Data that will be Infrequently Accessed •  Access Mechanisms: S3 interfaces, DropBox-esq web interface, CommVault

Page 16: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Rapid Evolution of 10GbE Port Prices Makes Campus-Scale 10Gbps CI Affordable

2005 2007 2009 2010

$80K/port Chiaro (60 Max)

$ 5K Force 10 (40 max)

$ 500 Arista 48 ports

~$1000 (300+ Max)

$ 400 Arista 48 ports

•  Port Pricing is Falling •  Density is Rising – Dramatically •  Cost of 10GbE Approaching Cluster HPC Interconnects

Source: Philip Papadopoulos, SDSC/Calit2

Page 17: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Arista Enables SDSC’s Massive Parallel 10G Switched Data Analysis Resource

2 12

OptIPuter

32

Co-Lo

UCSD RCI

CENIC/NLR

Trestles 100 TF

8 Dash

128 Gordon

Oasis Procurement (RFP)

•  Phase0: > 8GB/s Sustained Today •  Phase I: > 50 GB/sec for Lustre (May 2011) :Phase II: >100 GB/s (Feb 2012)

40à128

Source: Philip Papadopoulos, SDSC/Calit2

Triton 32

Radical Change Enabled by Arista 7508 10G Switch

384 10G Capable

8 Existing

Commodity Storage 1/3 PB

2000 TB > 50 GB/s

10Gbps

5 8 2

4

Page 18: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

NSF Funds a Big Data Supercomputer: SDSC’s Gordon-Dedicated Dec. 5, 2011

•  Data-Intensive Supercomputer Based on SSD Flash Memory and Virtual Shared Memory SW –  Emphasizes MEM and IOPS over FLOPS –  Supernode has Virtual Shared Memory:

– 2 TB RAM Aggregate – 8 TB SSD Aggregate

–  Total Machine = 32 Supernodes – 4 PB Disk Parallel File System >100 GB/s I/O

•  System Designed to Accelerate Access to Massive Datasets being Generated in Many Fields of Science, Engineering, Medicine, and Social Science

Source: Mike Norman, Allan Snavely SDSC

Page 19: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

UCSD Campus Investment in Fiber Enables Consolidation of Energy Efficient Computing & Storage

Source: Philip Papadopoulos, SDSC, UCSD

OptIPortal Tiled Display Wall

Campus Lab Cluster

Digital Data Collections

N x 10Gb/s

Triton – Petascale Data Analysis

Gordon – HPD System

Cluster Condo

WAN 10Gb: CENIC, NLR, I2

Scientific Instruments

DataOasis (Central) Storage

GreenLight Data Center

Page 20: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Making University Campuses Living Laboratories for the Greener Future

www.educause.edu/EDUCAUSE+Review/EDUCAUSEReviewMagazineVolume44/CampusesasLivingLaboratoriesfo/185217

Page 21: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Calit2 Microbial Metagenomics Cluster- Next Generation Optically Linked Science Data Server

512 Processors ~5 Teraflops

~ 200 Terabytes Storage 1GbE and

10GbE Switched/ Routed

Core

~200TB Sun

X4500 Storage

10GbE

Source: Phil Papadopoulos, SDSC, Calit2

Grant Announced January 17, 2006

Page 22: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Calit2 CAMERA: Over 4000 Registered Users From Over 80 Countries

Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis

http://camera.calit2.net/

Page 23: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

The greening of the data center

Multiple strategies are possible: –  More energy efficient compute/storage/network designs –  Local energy efficient data center designs –  Remote data centers near less expensive energy sources –  Cloud resources and cloud-based services (local, remote) –  Novel cooling technologies

–  Liquid-cooled CPU’s, computers and racks –  Oil immersion designs

–  Containerized data centers (HP Pod, IBM PMDC, Cirrascale Forest) –  Energy-efficient algorithms; software to move workload around

–  UCSD campus energy production (solar, co-generation, fuel cell) * 2 MW solar, 5 MW renewal produced on campus; 40 MW peak demand

* 2.8 MW fuel cell planned, along with 2.8 MW advanced storage facility

Page 24: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

The GreenLight Project: Instrumenting the Energy Cost of Computational Science •  Focus on 5 Communities with At-Scale Computing Needs:

–  Metagenomics –  Ocean Observing –  Microscopy –  Bioinformatics –  Digital Media

•  Measure, Monitor, & Web Publish Real-Time Sensor Outputs –  Via Service-oriented Architectures –  Allow Researchers Anywhere To Study Computing Energy Cost –  Enable Scientists To Explore Tactics For Maximizing Work/Watt

•  Develop Middleware that Automates Optimal Choice of Compute/RAM Power Strategies for Desired Greenness

•  Data Center for School of Medicine Illumina Next Gen Sequencer Storage and Processing

Source: Tom DeFanti, Calit2; GreenLight PI

Page 25: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

GreenLight Project: Remote Visualization of Data Center

Source: Virtual Reality Lab, Calit2

Page 26: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

GreenLight Project Heat Distribution

Combined  heat  +  fans  

Realis2c  correla2on  

Source: glimpse.calit2.net

Page 27: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

GreenLight Project Allows for Testing of Novel Architectures on Bioinformatics Algorithms

“Our version of MS-Alignment [a proteomics algorithm] is more than 115x faster than a single core of an Intel Nehalem processor, is more than 15x faster than an eight-core version, and reduces the runtime for a few samples from 24 hours to just a few hours.” —From “Computational Mass Spectrometry in a Reconfigurable Coherent Co-processing Architecture,” IEEE Design & Test of Computers, Yalamarthy (ECE), Coburn (CSE), Gupta (CSE), Edwards (Convey), and Kelly (Convey) (2011)

June 23, 2009

http://research.microsoft.com/en-us/um/cambridge/events/date2011/msalignment_dateposter_2011.pdf

Page 28: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Using UCSD RCI to Store and Analyze Next Gen Sequencer Datasets

Source: Chris Misleh, SOM/Calit2 UCSD

Stream Data from Genomics Lab to GreenLight Storage, NFS Mount Over 10Gbps to Triton Compute Cluster

Page 29: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Cost Per Megabase in Sequencing DNA is Falling Much Faster Than Moore’s Law

www.genome.gov/sequencingcosts/

Page 30: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

BGI—The Beijing Genome Institute is the World’s Largest Genomic Institute

•  Main Facilities in Shenzhen and Hong Kong, China –  Branch Facilities in Copenhagen, Boston, UC Davis

•  137 Illumina HiSeq 2000 Next Generation Sequencing Systems –  Each Illumina Next Gen Sequencer Generates 25 Gigabases/Day

•  Supported by High Performance Computing and Storage –  ~160TF, 33TB Memory –  Large-Scale (12PB) Storage

Page 31: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

From 10,000 Human Genomes Sequenced in 2011 to 1 Million by 2015 in Less Than 5,000 sq. ft.!

4 Million Newborns / Year in U.S.

Page 32: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Needed: Interdisciplinary Teams Made From Computer Science, Data Analytics, and Genomics

Page 33: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Future 100 Gbps Cancer Genomics network

http://gigaom.com/cloud/fighting-cancer-at-100-gigabits-per-second/

Page 34: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

UCSD Planned Optical Networked Biomedical Researchers and Instruments

Cellular & Molecular Medicine West

National Center for

Microscopy & Imaging

Leichtag Biomedical Research

Center for Molecular Genetics Pharmaceutical

Sciences Building

Cellular & Molecular Medicine East

CryoElectron Microscopy Facility

Radiology Imaging Lab

Bioengineering

Calit2@UCSD

San Diego Supercomputer

Center

GreenLight Data Center

•  Connects at 10 Gbps : –  Microarrays –  Genome Sequencers –  Mass Spectrometry –  Light and Electron

Microscopes –  Whole Body Imagers –  Computing –  Storage

Page 35: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Summary -- 6 Ingredients of Campus RCI RCI requires sustainable funding models to enable affordable, reliable: I.  High performance networks – lab, campus à regional, national, global

II.  Storage resources and services to store and preserve research data

III.  Compute resources – shared services, colocation or condo clusters

IV.  Data curation and management to share and archive research data -  Federated with metadata standards and Digital Asset Management system -  Domain-specific repositories (Protein Data Bank, GenBank, Cancer Genome)

V. Energy source(s) and data center space (several possible strategies)

VI. Cyberinfrastructure expertise (training, experience, collaboration)

Page 36: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

…one more ingredient of RCI: Microbrews!

Page 37: Campus Success Stories: Research Cyberinfrastructure at Calit2 … · 2012-01-23 · “Campus Success Stories: Research Cyberinfrastructure at Calit2 & UCSD” Panel Presentation

Acknowledgements Sponsors: •  Larry Smarr, Director, Calit2 (a UC San Diego/UC Irvine partnership) •  Ramesh Rao, Director, UCSD Division, Calit2 Sources: * Larry Smarr, Professor of Computer Science and Engineering, UCSD •  Tom DeFanti, Director of Visualization, Calit2 •  Philip Papadopolous, Division Director, SDSC and Researcher, Calit2 •  Brian Dunne, network engineer, Calit2 •  Chris Misleh, sysadmin, high-throughput genomics, CALIT2 •  UCSD School of Medicine, High-Throughput Genomics Core •  Valerie Polichar, Infrastructure Liaison, Administrative Computing &

Telecommunications, UCSD •  Mark Shinn, (former) network architect, ACT, UCSD •  Corporation for Education Network Initiatives in California (CENIC)

•  www.calit2.net rci.ucsd.edu www.optiputer.net greenlight.calit2.net