Mapping Life Science Informatics to the Cloud
-
Upload
chris-dagdigian -
Category
Technology
-
view
5.334 -
download
0
description
Transcript of Mapping Life Science Informatics to the Cloud
![Page 2: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/2.jpg)
I’m Chris.
I’m an infrastructure geek.
I work for the BioTeam.
![Page 3: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/3.jpg)
The “C” Word.
![Page 4: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/4.jpg)
When I say “cloud”I’m talking IaaS.
![Page 5: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/5.jpg)
Amazon AWSIs the IaaS cloud.
Most others are fooling themselves.(Has-beens, also-rans & delusional marketing
zombies)
![Page 6: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/6.jpg)
A message for thepretenders…
![Page 7: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/7.jpg)
No APIs?Not a cloud.
![Page 8: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/8.jpg)
No self-service?Not a cloud.
![Page 9: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/9.jpg)
I have to email a human?
Not a cloud.
![Page 10: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/10.jpg)
~50% failure rate when provisioning new servers?
Stupid cloud.
![Page 11: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/11.jpg)
Block storage and virtual servers
only?(barely) a cloud;
![Page 12: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/12.jpg)
Private Clouds: My $.02
![Page 13: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/13.jpg)
Private Clouds in 2012:
• Hype vs. Reality ratio still wacky
• Sensible only for certain shops• Have you seen what you have to do to your networks & gear?
• There are easier ways
![Page 14: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/14.jpg)
Private Clouds: My Advice for ‘12
• Remain cynical (test vendor claims)
• Due Diligence still essential• I personally would not deploy/buy
anything that does not explicitly provide Amazon API compatibility
![Page 15: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/15.jpg)
Private Clouds: My Advice for ‘12
• Most people are better off:• Adding VM platforms to existing
HPC clusters & environments• Extending enterprise VM
platforms to allow user self-service & server catalogs
![Page 16: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/16.jpg)
Enough Bloviating. Advice time.
![Page 17: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/17.jpg)
Tip #1
![Page 18: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/18.jpg)
HPC & Clouds: Whole New World
![Page 19: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/19.jpg)
• We have spent decades learning to tune research HPC systems for shared access & many users.
• The cloud upends this model
![Page 20: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/20.jpg)
• Far more common to see …• Dedicated cloud resources
spun up for each app or use case• Each system gets individually
tuned & optimized
![Page 21: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/21.jpg)
Tip #2
![Page 22: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/22.jpg)
Hybrid Clouds & Cloud Bursting
![Page 23: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/23.jpg)
• Lots of aggressive marketing• Lots of carefully constructed
“case studies” and prototypes• The truth?• Less usable than you’ve been
told• Possible? Heck yeah.• Practical? Only sometimes.
![Page 24: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/24.jpg)
• Advice• Be cynical• Demand proof• Test carefully
![Page 25: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/25.jpg)
• Still want to do it?• Buy it, don’t build it• Cycle Computing• Univa• BrightComputing• …
![Page 26: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/26.jpg)
• Follow the crowd• In the real world we see:• Separation between local
and cloud HPC resources• Send your work to the
system most suitable
![Page 27: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/27.jpg)
Tip #3
![Page 28: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/28.jpg)
You can’t rewrite EVERYTHING.
![Page 29: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/29.jpg)
• Salesfolk will just glibly tell you to rewrite your apps so you can use whatever big data analysis framework they happen to be selling today
![Page 30: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/30.jpg)
• They have no clue.
![Page 31: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/31.jpg)
• In life science informatics we have hundreds of codes that will never be rewritten.
• We’ll be needing them for years to come.
![Page 32: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/32.jpg)
• Advice:• MapReduceish methods
are the future for big-data informatics
• It will take years to get there
• We still have to deal with legacy algorithms and codes
![Page 33: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/33.jpg)
• You will need:• A process for figuring out
when it’s worthwhile to rewrite/re-architect
• Tested cloud strategies for handling three use cases
![Page 34: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/34.jpg)
You need 3 cloud architectures:
1. Legacy HPC2. “Cloudy” HPC3. Big Data HPC (Hadoop)
![Page 35: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/35.jpg)
Legacy HPC on the cloud
• MIT StarCluster• http://web.mit.edu
/star/cluster/• This is your baseline• Extend as needed
![Page 36: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/36.jpg)
“Cloudy” HPC
• Use this method when …• It makes sense to rewrite or
rearchitect an HPC workflow to better leverage modern cloud capabilities
![Page 37: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/37.jpg)
“Cloudy” HPC, continued
• Ditch the legacy compute farm model
• Leverage elastic scale-out tools (***)
• Spot Instances for elastic & cheap compute
• SimpleDB for job statekeeping• SQS for job queues & workrflow “glue”• SNS for message passing & monitoring• S3 for input & output data• Etc.
![Page 38: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/38.jpg)
Big Data HPC
• It’s gonna be a MapReduce world
• Little need to roll your own• Ecosystem already healthy• Multiple providers today• Often a slam-dunk cloud use
case
![Page 39: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/39.jpg)
Tip #4
![Page 40: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/40.jpg)
The Cloud was not designed for “us”
![Page 41: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/41.jpg)
• HPC is an edge case for the hyperscale IaaS clouds
• We need to deal with this and engineer around it.
![Page 42: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/42.jpg)
• Many examples• Eventual consistency• Networking & subnets• Latency• Node placement
![Page 43: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/43.jpg)
• Advice• Manage expectations• Benchmark & test• Evangelize• (pester the cloud sales reps
…)
![Page 44: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/44.jpg)
Tip #5
![Page 45: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/45.jpg)
Data Movement Is Still Hard
![Page 46: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/46.jpg)
• Consistently getting easier• Amazon is not a
bottleneck• AWS Import/Export• AWS Direct Connect• Aspera has some
amazing stuff out right now
![Page 47: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/47.jpg)
• Advice• AWS Import/Export works
well• Size of pipe is not
everything• Sweat the small stuff• Tracking, checksums, disk
speed• Dedicated workstations• Secure media storage
![Page 48: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/48.jpg)
Dedicated data movement station
![Page 49: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/49.jpg)
‘naked’ Terabyte-scale data movement
![Page 50: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/50.jpg)
Don’t overlook media storage …
![Page 51: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/51.jpg)
• Advice for 2012• BioTeam is dialing down our
advocacy of physical data ingestion into the cloud
• Why?• Operationally hard,
expensive and no longer strictly needed
![Page 52: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/52.jpg)
Real world cross-country internet-based data movement
March 2012
![Page 53: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/53.jpg)
700Mb/sec into Amazon, stress-free & zero tuning
March 2012
![Page 54: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/54.jpg)
• People trying to move data via physical media quickly realize the operational difficulties
• Bandwidth is cheaper than hiring another body to manage physical data ingestion & movement
• In 2012 we strongly recommend network-based data movement when at all possible
![Page 55: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/55.jpg)
u r doing it wrong
![Page 56: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/56.jpg)
cool data movement, bro!
![Page 57: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/57.jpg)
Tips #6 & 7
![Page 58: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/58.jpg)
Cloud storage. Still slow.
![Page 59: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/59.jpg)
Big shared storage. Still hard.
![Page 60: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/60.jpg)
• Not much we can do except engineer around it
• AWS compute cluster instances are a huge step forward
• AWS competitors take note
![Page 61: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/61.jpg)
• We are not database nerds
• We care about more than just random IO performance
• We need it all• Random I/O• Long sequential
read/write
![Page 62: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/62.jpg)
• Faster Storage Options• Software RAID on EBS• Various GlusterFS
options• Even if you optimize
everything, the virtual NICs are still a bottleneck
![Page 63: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/63.jpg)
• Big Shared Storage• 10GbE nodes and NFS• Software RAID sets• GlusterFS or similar• 2012: pNFS finally?
![Page 64: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/64.jpg)
Tip #8
![Page 65: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/65.jpg)
Things fail differently in the cloud.
![Page 66: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/66.jpg)
• Stuff breaks• It breaks in weird ways• Transient/temporary
issues more common than what we see “at home”
![Page 67: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/67.jpg)
• Advice• Pessimism is good• Design for failure• Think hard about• How will you detect?• How will you respond?
![Page 68: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/68.jpg)
• Advice• Remove humans from
loop• Automate recovery• Automate your backups
![Page 69: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/69.jpg)
Tip #9
![Page 70: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/70.jpg)
Serial/batch computing at-scale
![Page 71: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/71.jpg)
• Loosely coupled workflows are ideal
• Break the pipeline into discrete components
• Components should be able to scale up|down independently
![Page 72: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/72.jpg)
• Component = Opportunity to:• … Make a scaling
decision• (# nodes in use)
• … Make sizing decision• (instance type in use)
![Page 73: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/73.jpg)
Nirvana is …
![Page 74: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/74.jpg)
… independent loosely connected components that can self-scale and communicate asynchronously
![Page 75: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/75.jpg)
Advice:• Many people already doing
this• Best practices are well
known• Steal from the best:• RightScale, Opscode &
Cycle Computing
![Page 76: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/76.jpg)
Phew. Think I’m done now.
![Page 78: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/78.jpg)
End;
![Page 79: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/79.jpg)
Backup Slides
![Page 80: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/80.jpg)
Private Clouds: Pick Your Poison
• OpenStack - http://openstack.org • Pro: Super smart
developers; significant mindshare; True Open Source
• Con: Commitment to AWS API compatibility (?) & stability
![Page 81: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/81.jpg)
Private Clouds: Pick Your Poison
• CloudStack- http://cloudstack.org • Pro: Explicit AWS API
support; very recent move away from “open-core” model; usability
• Con: Developer mindshare? Sudden switch to Apache
![Page 82: Mapping Life Science Informatics to the Cloud](https://reader037.fdocuments.in/reader037/viewer/2022103115/55756a2cd8b42a2e248b4be0/html5/thumbnails/82.jpg)
Private Clouds: Pick Your Poison
• Eucalyptus- http://eucalyptus.com • Pro: Direct AWS API
compatibility; lots of hypervisor support
• Con: Open-core model; mindshare; Recent ressurection