Infrastructure Requirements for Discovery Research · New features are being rolled out fast and...

Infrastructure

Requirements for

Discovery Research

Chris Dagdigian

2009 PRISM Forum

Kicking Things Off

• Thanks for inviting me!

• Warning:

– I speak very fast

– Infamous for massive slide decks

• Goals for today

– Objective assessment of the state ofIT for discovery informatics

– Some specifics on:

• Green IT & Virtualization

• Storage

• Compute & HPC trends

• Networking

• Cloud Computing

– Some practical tips & advice

– Plenty of time for questions andconversation

BioTeam Inc.

Independent Consulting Shop

• Vendor/technology agnostic

Staffed by:

• Scientists forced to learn

High Performance IT

• Many years of industry &

academic experience

Our specialty:

• Bridging the gap between

Science & IT

I am not a “visionary”

cluster building in ‘02

Recipe for current career arc:

1. Find people smarter than myself

2. Watch what they do

3. Try to understand the ‘why’

4. Shamelessly copy them

What this means for today’s talk:

• I have no interest in being an ‘expert’,‘visionary’, ‘talking-head’ or pundit

• My interest lies in practical methods forapplying IT to solve research problems

• I’ll tell you what I think along with what I’veseen, done and broken in my daily work

• Happy to be challenged & questioned

First: Infrastructure Picture Tour

Single instrument scale: Self-contained lab-local cluster & storage for Illumina

Live Cell Imaging: High-speed confocal microscope rig

Workgroup Scale:

100 Terabyte storage

system and 10 node / 40

CPU core Linux Cluster

supporting multiple NGS

instruments

Large Core Facility

Computing, Virtualization & Scale-out

Compute Power, continued

• Trend

– Physical size of compute infrastructureshrinking rapidly

– Virtualization, consolidation & multi-coreare among the reasons

– Why build large clusters when you cancouple a smaller number of 8 or 16-coreservers/blades each supporting .5TB ofRAM?

• Result

– Shrinking floorspace need for HPC

– Facility issues are now the primarycompute restraint

– High density computing now restricted byavailable power density and coolingcapacity

– Research IT staff need to comprehendfacility, power and cooling issues in 2009 &beyond. This is critical.

Virtualization In Research IT

Still the lowest hanging fruit in most shops

• Tremendous benefits for:

• Operators, end-users,

environment & budgets

• Tipping point for me was:

• Live migration of running VMs

without requiring a proprietary

file system underneath

Virtualization In Research IT

Seen in 2009:

• Campus “Virtual Colocation Service”

• Deployed when HVAC/power hit facility limits

• Available campus-wide to all researchers & groups

• Built with VMWare & NetApp on Sun hardware

• Aggressive thin-provisioning & content optimization

• Truly significant payoff:

• ~400 servers currently virtualized

• Large # of physical servers retired & shut down

• Storage savings from de-dup, compression & thin provisioning

• Significant electrical & HVAC savings

• Full delegation of administrative control to owners

Virtualization Can Greatly Assist Research IT

Another benefit for Research IT shops:

• Lets scientists design, deploy and manage apps and services that are

not part of the “enterprise” portfolio

• Solves a common problem in large IT environments:

• Scientists routinely building web apps and services to satisfy

individual or workgroup level requirements

• Often need administrative control over the web server and

elevated access permissions on the base OS

• Apps and services do not meet enterprise standards for support,

security, documentation & lifecycle management

Learn from the big guys: “Trickle-Down” Scale-out Tips

• Google, Microsoft & Amazon all operate at extreme scales that few of us comeclose to matching

• Datacenter scaling & efficiency measures taken by these companies aretightly held trade secrets for competitive reasons

• This is starting to change and we will all benefit

• Both from “trickle-down” best practices & hard data

• And with vendor products improved to meet these demanding customers

“Trickle-Down” Scale-out Tips

April 2009:

• Google Datacenter Efficiency Summit

• Presentations now online

• Google video on Youtube showing ‘04-era 780W/sq ft containerized datacenter

• If Google was doing this stuff in 2004, what are they up to now?

And from server vendors:

• Dell warranting servers at 95F input temperature

• Rackable Systems warranting it’s C2 rack at 104F inlet temperature

September 2009 Report:

• Microsoft Dublin Datacenter

• Evaporative cooling & air-side economization (NO CHILLERS!)

• 95F Operating Temperature

October 2009 Report:• Microsoft Chicago datacenter (reported operating since July)

• One floor for containers (224,000 servers)

• 112 containers, each with 2,000 servers - 56MW critical load

• Two container stacks moved manually on air skates by 4 humans

Green IT for Cynics

Hyped beyond all reasonable measure

• Reminds me of dotcom-era WAN-scale grid computing PR

• Promising more than can berealistically delivered

Marketing aside, still worth pursuing:

• Shrinking physical footprint,reducing power consumption andbetter management of HVAC isbetter for the planet

• … and has tangible results on thebottom line

Green IT for Cynics

Where Green IT matters to me:

• Putting more capability into asmaller space

• Putting capability into spacesthat previously could not supportit

• Telco closets, underneathwetlab benches, etc.

• Working within power or HVAC-starved machine rooms

• Reducing power draw &increasing power use efficiency

• Reducing cooling costs &increasing air handling efficiency

Green IT for Cynics

My 2008 “aha!” moment:

• NexSan SATABeast Storage Arrays

• 48x 1TB SATA disks in 4Uenclosure w/ FC interconnects

• “AutoMAID” options built intomanagement console

• What we had:

• Three 48 terabyte SATABeastsin two racks

• APC PDUs with per-outletmonitoring features

• What we saw:

• 30% reduction in power draw

• No appreciable impact on clusterthroughput

Storage

The stakes are high if you don’t solve the research storage problem …

~200 TB

USB disk stored

on lab benches

Storage Trends

Seen in 2008-2009

• First 100TB single-namespace project

• First Petabyte+ storage project

• First large Cloud data transit project

• 4x increase in “technical storage audit”

• First time witnessing 10+TB catastrophic

data loss

• First time witnessing job dismissals due to

data loss

• Data Triage discussions are spreading

well beyond cost-sensitive industry

organizations

82TB Folder. Still satisfying. Single-namespace is good for science.

1PB volume - Even more satisfying

Data Drift - Real World Example

• Non-scalable storage islands add complexity

• Example:– Volume “Caspian” hosted on server “Odin”

– “Odin” replaced by “Thor”

– “Caspian” migrated to “Asgard”

– Relocated to “/massive/”

• Resulted in file paths that look like this:

/massive/Asgard/Caspian/blastdb

/massive/Asgard/old_stuff/Caspian/blastdb

/massive/Asgard/can-be-deleted/do-not-delete…

Data Management

• Very difficult

• Lab protocols changing faster than LIMS development cycles

• We have seen many different workarounds

– Notebooks, spreadsheets, file structures & wikis

• BioTeam is actively using MediaWiki Platform for cost-effective

data management in rapidly changing research IT environments

General Observations

Storage is a commodity in 2009

Cheap storage is easy

Big storage getting easier every day

Big, cheap & SAFE is much harder …

Data movement & management can

be hardest problem

Traditional backup methods may no

longer apply

• Or even be possible …

Observations cont.

• End users still have no clue about

the true costs of keeping data

accessible & available

• “I can get a terabyte fromCostco for $220!” (Aug 08)

• “I can get a terabyte fromCostco for $160!” (Oct 08)

• “I can get a terabyte fromCostco for $124!” (April 09)

• IT needs to be involved in setting

expectations and educating on true

cost of keeping data online &

accessible

• Organizations need forward looking

research storage roadmaps

Observations cont.

• The rise of “terabyte instruments” is

already having a major disruptive influence

on existing environments

• We see individual labs deploying

100TB+ systems

• Data movement is hard, especially to/from

wet labs

• I was wrong when I said

• “petabyte scale storage needs willappear within the decade …”

• That time is now for some large

organizations

Homework Exercise

• Select three vendors

• Build quotes for 100TB single-

namespace NAS solution

• My results:

• $100K to $1.5M range

• Repeat every six months

Follow-up:

Price a Petabyte Disk Array:

$125,000 - $4M USD

Capacity Dilemma: Data Triage

Data Triage

• The days of unlimited storage

for research are over

• Rate of consumption

increasing unsustainably

• First saw triage acts in 2007

(industry client)

• Becoming acceptable practice

in 2008

• Absolutely a given in 2009 for

most projects we see

Architecting Storage & Data Movement For Research IT

• First principal:

– Understand the data you will produce

– Understand the data you will keep

– Understand how the data will move

• Second principal:

– One platform or many?

– One vendor or many?

– One lab/core or many?

Final thoughts on storage for 2009

• Yes the problem is real

• More and more “terabyte instruments” are coming to market

• Some of us have peta-scale storage issues today

• “Data Deluge” & “Tsunami” are apt termsBut:

• The problem does not feel as scary as it once did

• Chemistry, reagent cost & human factors are natural bottlenecks

• Data Triage is an accepted practice, no longer heresy

• Data-reduction starting to happen within instruments

• Customers starting to trust instrument vendor software more

• We see large & small labs dealing successfully with these issues

• Many ways to tackle IT requirements

Mix and match solutions to fit local need …

Stolen Broad Slide - Future trend regarding downstream data …

Utility / Cloud Computing

Cloud/Utility Computing - Setting the stage

• Burned by “OMG!! GRID Computing” HypeIn 2009 will try hard never to use the word “cloud”

in any serious technical conversation. Vocabulary matters.

• Understand My Bias:

– Speaking of “utility computing” as it resonates with infrastructure people

– My building blocks are servers or groups of systems, not software

stacks, developer APIs or commercial products

• Goal:

• Replicate, duplicate, improve or relocate complex systems

Lets Be Honest

• Not rocket science

• Easy to learn, prototype &

understand

• Easy to learn

strengths/weaknesses

While I’m being honest

• Amazon Web Services (“AWS”) IS the cloud

– Simple, practical, understandable and usable today by just about anyone

– Rollout of features and capabilities continues to be impressive

• Competitors are years behind

– … and tend to believe too much of their own marketing materials

• The cloud is real, usable and useful TODAY

• If you are just starting out in this space:

– Lowest hanging fruit: dev, test & pilot projects

• Almost immediate payback in many cases

– Next step: CPU bound scientific problems

– Future: Archive/deep storage with cloud providers

Amazon Web Services

• A collection of agile infrastructure services available to on-demand

• New products and added features added almost monthly

• Recent enhancements:

– Two-factor Authentication & Rotating Credentials

– Virtual Private Cloud (“VPC”) Product

– EC2 auto-scaling & load-balancing– http://aws.amazon.com/about-aws/whats-new/

AWS Products/Services

• EC2 - Elastic Compute Cloud

– Scalable on-demand virtual servers

• SimpleDB - Simple Database Service

– Simple queries on structured data

• S3 - Simple Storage Service

– Bucket/object based storage

• EBS - Elastic Block Service

– Persistent block storage (looks like a disk)

AWS Products/Services

• SQS - Simple Queue System

– Message passing service storage

• Elastic MapReduce

– Hadoop on AWS

– Terabyte scale data mining & processing

• VPS - Virtual Private Cloud

– Connect your infrastructure to AWS via VPN tunnel

– (more important than it sounds …)

Cloud Sobriety

McKinsey presentation “Clearing the Air on Cloud Computing” is a must-read

• Tries to deflate the hype a bit

• James Hamilton has a nice reaction:

• http://perspectives.mvdirona.com/

Both conclude:

• IT staff needs to understand “the cloud”

• Critical to quantify your own internal costs

• Perform your own due diligence

Cloud Security

• Lots of overblown fears (and some political posturing)

• My personal take:

• Amazon, Google & Microsoft quite probably have better internaloperating controls than you do

• All of them are happy to talk as deeply as you like about all issuesrelating to security

• Do your own due diligence & don’t let politics or IT empire issues clouddecision making

• Biggest issue for me may be per-country data protection and patientprivacy rules

http://aws.amazon.com/security/

State of Amazon AWS

New features are being rolled out fast and furious

But …

– EC2 nodes still poor on disk IO operations

• EBS service can use some enhancements

• Many readers, one-writer on EBS volumes would be fantastic

• Poor support for latency-sensitive things and workflows that prefer tight networktopologies

This matters because:

• Compute power is easy to acquire

• Life science tends to be IO bound

• Life science is currently being buried in data

AWS & HPC: A whole new world

• For cluster people some radical changes

Years spent tuning systems for shared access

– Utility model offers dedicated resources

– EC2 not architected for our needs

– Best practices & reference architectures will change

• Current State: Transition Period

– Still hard to achieve seamless integration with local clusters & remote utility clouds

– Most people are moving entire workflows into the cloud rather than linking grids

– Some work being done on ‘transfer queues’

HPC Informatics & AWS: Summary

• Virtualized networking is ‘reasonable’ but there are certainly issues that

need to be worked around

• Network latency can be high

• Virtualized storage I/O is far slower than anything we can do with local

resources. Absolute fact.

• Still hard to share data/storage across many systems

• Inability to currently request EC2 nodes that are “close” in network

topology terms is problematic (but likely to change)

• MapReduce is not a viable solution for everyone

• Amazon has a deep interest in HPC workflows, expect them to address

all of our concerns

Cloud Storage

• It is quite probable that the “internet-scale” providers can provide

storage far more cheaply than we can ourselves

– Especially if we are honest about facility, power, continuity and operationalcosts

• Some people estimate cost at .80 GB/year and falling fast for Amazon

and others to provide 3x geographically replicated raw storage

– Can you seriously match this?

• These prices come from operating at extreme efficiency scales that we

will never be able to match ourselves

• Question: how best to leverage this?

When the ingest problem is solved …

• I think there may be petabytes of life science data that would flock to

utility storage services

• Public and private data stores

• Mass amount of grant funded study data

• Archive store, HSM target and DR store

• “Downloader Pays” model is compelling for people required to

share large data sets

Putting It All Together

Putting It All Together - Agile IT for Biomarker Discovery

• Understand that science changes faster than IT infrastructures

• Designing for flexibility is key

• Large groups: consider copying the large Next-Gen labs

– IT budgeted via “cost per gigabase sequenced”

– IT represents ~27% of sequencing cost on a per-gigabase basis

– Storage/compute for production is fixed in sequencing budget

– Storage/compute for post-capture analysis is a consumable

• Compute power is easy to acquire in 2009

– Concentrate on: Density, Flexibility, Virtualization, “Green IT” gains

Putting It All Together - Agile IT for Biomarker Discovery

• Storage is often our biggest headache

– Concentrate on: speed, scaling and large namespaces

• Data Movement & Management equally hard

– Concentrate on: data movement (networks), data lifecycle & curation

– Can your LIMS handle rapid protocol changes and new data?

• Cloud Computing is real & useful today

– In 2009, Amazon Web Services IS “the cloud”

– Lowest hanging fruit: dev, pilot & test programs

– Potential use case: long term, deep archive store + “downloader pays”

model for distribution of public-funded data sets

• Thanks!

• Presentation slides will appear here:

– http://blog.bioteam.net

• Comments/feedback:

– “chris@bioteam.net”

Infrastructure Requirements for Discovery Research · New features are being rolled out fast and...

Documents

Transcript of Infrastructure Requirements for Discovery Research · New features are being rolled out fast and...

Time's furious passing

Cloud Computing Lec2 · • Cloud computing services –recap • Amazon cloud services – Elastic Compute Cloud (EC2) ... (win) Micro 1 EC2 unit 1 GB EBS storage only 32/64 bit

VMware vRealize Operations Management Pack for Amazon Web ...€¦ · Deep integration with key AWS resources Gain visibility into services, including ELB, S3, VPC, Budget, EC2, EBS

EBS Snapshot Scheduler - Amazon S3€¦ · Important: The EBS Snapshot Scheduler will create and delete snapshots for EBS volumes that are attached to both running and stopped EC2

FURIOUS M sample

MongoDB on EC2 and EBS

Furious combat

Infor Corporate Overview Presentation with Speaking … · Applications in the cloud 90+ Countries Accessing the Cloud ... EC2 Auto Scaling Route 53 EBS RDS SQS CloudFormation Cloud

Mathematics and Department of Agricultural and Biological ... · PDF file• AWS – HTCondor, Chef, EC2, EBS, S3, SNS, NEWT – Spot, Route 53, Cloud Formation Making complex things

READPAST & Furious: Locking

Big, Fast, Furious

Oracle EBS: P2P with EBS Payables and Non-EBS Procurement

Optimizing MySQL Running on Amazon EC2 Using Amazon EBS · PDF fileBlock size: Size of each I/O ... • Disaster recovery –EBS snapshots can be copied from one AWS Region ... Optimizing

THE FURIOUS - El Grimorioelgrimorio.org/Libreria/ES/C-System/The Furious.pdf · 2019. 5. 29. · THE FURIOUS. ‐2 ‐ THE FURIOUS. ‐3 ‐ THE FURIOUS. ÍNDICE. The Furious.....‐

Cloud Computing Lec2 · •Amazon Web Services –Amazon EC2 –Amazon S3 –Amazon EBS ... •Amazon Elastic MapReduce •Elastic Load Balancing •etc. 18/02/2014 Satish Srirama

Immersion Day•Amazon Elastic Block Storage (EBS) –persistent block storage •Amazon EC2 Instance Storage –ephemeral block storage •Amazon RDS –managed relational database

Fast and Furious

Deep Dive: Maximizing EC2 and EBS Performance

Maximizing Amazon EC2 & Amazon EBS Performanceaws-de-media.s3.amazonaws.com/images/summit-berlin/Run-15.45... · Block Storage (EBS) ... Typical block size Random /Seq? Typical correct

Collaborate 2012 - enterprise tools for ebs on ec2 - ppt