Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

31
Cloud present, future & trajectory Brendan Bouffler (@boofla), #scico Global Scientific Computing 02-Mar- 16 *Does not apply to mathematicians with specialties in Cantorian set theory who should immediately ask for a copy of my very long disclaimer.

Transcript of Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Page 1: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Cloud present, future & trajectoryBrendan Bouffler (@boofla), #scicoGlobal Scientific Computing

02-Mar-16

*Does not apply to mathematicians with specialties in Cantorian set theory who should immediately ask for a copy of my very long disclaimer.

Page 2: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

We are Psycho SciCo

No, not that one.

Page 3: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Scientific Computing Group (SciCo)

Science is one of the greatest areas of computation and can benefit from a democratization in cost and global accessibility that the cloud brings.

It’s also where we think Amazon can make a huge, really disruptive, impact on the world by participating - which is, at the most basic level, what we are about as a company.

Page 4: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

“… the online book and decorative pillow seller Amazon.com swooped in and, in 2006, launched its own computer rental system—the future Amazon Web Services. The once-fledgling service has since turned cloud computing into a mainstream phenomenon …”

Source: Bloomberg Business - April 22, 2015

$7B retail business10,000 employees

A whole lot of servers

2006 2015

Every day, AWS adds enough server capacity to power

this $7B enterprise

Page 5: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Global AWS RegionsExisting1. Oregon2. California3. Virginia4. Dublin5. Frankfurt6. Singapore7. Sydney8. Seoul9. Tokyo10. Sao Paulo11. Beijng12. US GovCloud

2016/17:13. Ohio14. India15. UK16. Canada17. China+1

AWS Region = A cluster of Availability ZonesAvailability Zone = A cluster of data centers

All regions are sovereign, meaning your data never leaves that location unless you cause it to.

Page 6: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Page 7: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Map of scientific collaboration between researchers - Olivier H. Beauchesne - http://bit.ly/e9ekP2

Science means Collaboration

Page 8: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Public Data Sets

Page 9: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Page 10: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Cray Supercomputer

Page 11: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Beowulf Cluster

Page 12: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

A top500 supercomputer

Ready in ~100 seconds

For ~ $100/hr

Page 13: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Time travel for job queues

Wall clock time: ~1 hour Wall clock time: ~1 week

Cost: the same

Page 14: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Cost Control & Budgeting

Page 15: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Spot Market AWSome-nessSpot Bid Advisor

The Spot Bid Advisor analyzes Spot price history to help you determine a bid price that suits your needs.

You should weigh your application’s tolerance for interruption and your cost saving goals when selecting a Spot instance and bid price.

The lower your frequency of being outbid, the longer your Spot instances are likely to run without interruption.

https://aws.amazon.com/ec2/spot/bid-advisor/

Bid Price & Savings

Your bid price affects your ranking when it comes to acquiring resources in the SPOT market, and is the maximum price you will pay.

But frequently you’ll pay a lot less.

Page 16: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Choices

When you only pay for what you use …• If you’re only able to use your compute, say, 30%

of the time, you only pay for that time.

1 Pocket the savings• Buy chocolate• Buy a spectrometer• Hire a scientist.

2 Go faster• Use 3x the cores to

run your jobs at 3x the speed.

3Go Large• Do 3x the science,

or consume 3x the data.

… you have options.

Page 17: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

AWS - Frankfurt

EC2

S3

over (Janet/GÉANT)research network

over commercialinternet

----- Data egress----- Not data egress

inter-region

Data egress waiver applies

Data egress is: data transferred out from AWS, over the Internet, to the end user

AWS – Dublin

Page 18: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Global Data Egress Waiver at a Glance

Available to degree-granting / research institutions Permanent program unlike previous pilots

North America, Europe, APAC, Japan & GovCloud regions(but not including Latin America, Middle East, China, India, and Africa)

Excludes MOOCs or other egress-as-a-service situations.

Must use a Research Network we peer with (e.g. Janet or GÉANT)

Who

Contract addendum required Can also procure through reseller (e.g. Arcus)

Waives data egress charges from qualified accounts

Capped at waiving no more than 15% of the customer’s bill

What

How

Researchers strongly need predictable budgetsWhy

Page 19: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

39 years of computational chemistry in 9 hoursNovartis ran a project that involved virtually screening 10 million compounds against a common cancer target in less than a week. They calculated that it would take 50,000 cores and close to a $40 million investment if they wanted to run the experiment internally.

Partnering with Cycle Computing and Amazon Web Services (AWS), Novartis built a platform thst ran across 10,600 Spot Instances (~87,000 cores) and allowed Novartis to conduct 39 years of computational chemistry in 9 hours for a cost of $4,232. Out of the 10 million compounds screened, three were successfully identified.

Novartis

Page 20: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Stars in the CloudCHILES will produce the first HI deep field, to be carried out with the VLA in B array and covering a redshift range from z=0 to z=0.45. The field is centered at the COSMOS field. It will produce neutral hydrogen images of at least 300 galaxies spread over the entire redshift range.

The team at ICRAR in Australia have been able to implement the entire processing pipeline in the cloud for around $2,000 per month by exploiting the SPOT market, which means the $1.75M they otherwise needed to spend on an HPC cluster can be spent on way cooler things that impact their research … like astronomers.

Page 21: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Finding what you’re not looking for

http://blog.csiro.au/wtf-is-that-how-were-trawling-the-universe-for-the-unknown/

WTF’s cloud-based backend is hosted on Amazon Web Services servers, where the researchers are able to access software for data reduction, calibration and viewing right from their desktop. The team is currently issuing a challenge using data peppered with “EMU (Easter) Eggs” – objects that might pose a challenge to data mining algorithms.

This way they hope to train the system to recognise things that systematically depart from known categories of astronomical objects, to help better prepare for unanticipated discoveries that would otherwise remain hidden.

Page 22: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Zooniverse“The Zooniverse is heavily reliant on Amazon Web Services (AWS), particularly Elastic Compute Cloud (EC2) virtual private servers and Simple Storage Service (S3) data storage. AWS is the most cost-effective solution for the dynamic needs of Zooniverse’s infrastructure …”http://wwwconference.org/proceedings/www2014/companion/p1049.pdf

The World’s Largest Citizen Science Platform

… cost is a factor – running a central API means that when the Zooniverse is quiet and there aren’t many people about we can scale back the number of servers we’re running (automagically on Amazon Web Services) to a minimal level.

Page 23: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

C4Intel Xeon E5-2666 v3, custom built for AWS.

Intel Haswell, 16 FLOPS/tick

2.9 GHz, turbo to 3.5 GHz

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/c4-instances.html

Feature SpecificationProcessor Number E5-2666 v3Intel® Smart Cache 25 MiBInstruction Set 64-bitInstruction Set Extensions AVX 2.0Lithography 22 nmProcessor Base Frequency 2.9 GHzMax All Core Turbo Frequency 3.2 GHzMax Turbo Frequency 3.5 GHz (available on c4.2xLarge)Intel® Turbo Boost Technology 2.0Intel® vPro Technology YesIntel® Hyper-Threading Technology YesIntel® Virtualization Technology (VT-x) YesIntel® Virtualization Technology for Directed I/O (VT-d)

Yes

Intel® VT-x with Extended Page Tables (EPT)

Yes

Intel® 64 Yes

Page 24: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

cfnCluster - provision an HPC cluster in minutes

#cfnclusterhttps://github.com/awslabs/cfncluster

cfncluster is a sample code framework that deploys and maintains clusters on AWS. It is reasonably agnostic to what the cluster is for and can easily be extended to support different frameworks. The CLI is stateless, everything is done using CloudFormation or resources within AWS.

10 minutes

http://boofla.io/u/cfnCluster – (Boof’s HOWTO slides)

Page 25: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016
Page 26: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Headnode

Instance

Compute node

Instance

Compute node

Instance

Compute node

Instance

Compute node

Instance

10G Network

Auto-scaling group

Virtual Private Cloud

/shared

Head Instance2 or more cores (as needed)CentOS 6.xOpenMPI, gcc etc…

Choice of scheduler:Torque, SGE, OpenLava, Slurm

Compute Instances2 or more cores (as needed)CentOS 6.x

Auto Scaling group driven by scheduler queue length.

Can start with 0 (zero) nodes and only scale when there are jobs.

It's a real cluster

Page 27: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

Infrastructure as code

#cfncluster

The creation process might take a few minutes (maybe up to 5 mins or so, depending on how you configured it.

Because the API to Cloud Formation (the service that does all the orchestration) is asynchronous, we can kill the terminal session if we wanted to and watch the whole show from the AWS console (where you’ll find it all under the “Cloud Formation”dashboard in the events tab for this stack.

$ cfnCluster create boof-clusterStarting: boof-clusterStatus: cfncluster-boof-cluster - CREATE_COMPLETE Output:"MasterPrivateIP"="10.0.0.17"Output:"MasterPublicIP"="54.66.174.113"Output:"GangliaPrivateURL"="http://10.0.0.17/ganglia/"Output:"GangliaPublicURL"="http://54.66.174.113/ganglia/"

Page 28: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

This cluster intentionally left blank.

Your cluster is ephemeral.

Yes, that’s right, you’ve created a disposable cluster.

But it’s 100% recyclable.

It’s worth noting that anything you put into this cluster will vaporize when you issue the command

$ cfncluster delete <your cluster name>

… which might not be what you first expect.

It’s easy to save your data tho, and pick up from where you left off later.

Before you delete your cluster, take a snapshot of the EBS (block storage) volume that you used for your /shared filesystem using the AWS EC2 console (see the pic on the right).

The EBC volume you care most about is the one attached to the headnode instance (hint: it’s probably the largest one).

Page 29: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

How do I join the Data Egress Waiver Program?

Your AWS account manager will work with you to sign you up. Sign up for an AWS Account using the Jisc/Arcus Portal (coming soon)

Peter MeagherAWS [email protected]

Page 30: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016

How will this impact me? Simple, predictable budgets: you will not be charged for data egress out from AWS over the

internet to you. This makes it easier to write grant proposals, and plan your research budget. Discount: this program lowers your monthly bill. Retrieving data: there is no cost to access your data or to retrieve it to your local site. Tailored to academia: We understand that predictable budgets are important because of how

research funding works. And we know that National Research and Education Networks provide most research institutions with a reliable, fast network connection to the AWS cloud for your compute and big data needs.

Volume Discount: AWS will apply the waiver to your institution’s aggregated AWS account, which averages out data egress use – and gives you access to further volume discounts.

** Data egress charges waived up to 15% of your total bill, or >3x typical usage. ** Data ingress (uploading data to AWS) is always free. ** Data egress waived is from AWS out over the internet. Glacier, CloudFront, DirectConnect port speed fees, or

traffic between AWS regions are not waived.

Page 31: Cloud present, future and trajectory (Amazon Web Services) - JIsc Digifest 2016