Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances...

27
Converge to Cloud. Converge to Cloud. Locuz.com Locuz.com HPC on AWS

Transcript of Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances...

Page 1: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Locuz.comLocuz.com

HPC on AWS

Page 2: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Faster Time to ResultsAccess computing

infrastructure in minutes

Lower Total CostPay-as-you-go pricing

Elastic and PowerfulEasily add or remove capacity

Globally AccessibleEasily collaborate with teams around the world

SecureA collection of tools toprotect data and privacy

ScalableAccess to effectively

limitless capacity

How is Cloud Helping Enterprise HPC?

Page 3: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Cloud Computing in a Simulation-Driven World

Scalability and agility Secure global collaboration Enterprise data governance

Page 4: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Scalability for Simulations

Design, engineering, analysis, visualization

› Simulation-driven design, discovery, optimization

Sample use-Cases

› Antenna and power simulations

› Genomics and proteomics

› Computational fluid dynamics

› Structural and finite element analysis

› Molecular modeling for drug discovery

› Oil and gas reservoir simulations

Cloud unlocks simulation at massive scale

Page 5: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Actual Demand for Computing

Total servers deployed

Unused IT Resources

Time

Server

acquisition

Server

acquisition

Server

acquisition

The Old Way: Low Utilization, High Costs

› Typical server utilization rates are low due to need to deploy for peak needs

Page 6: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Actual Demand for Computing

Managing with high utilization…

Time

The Hidden Cost of Managing HPC Utilization

??

Project

Delay

Project

Delay

› Higher

utilization rates

result in hidden

costs

› Longer queue

wait times, and

delayed

projects

Page 7: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Conflicting goals

› Cluster users seek fastest possible time-to-results

› Simulations are not steady-state workloads

› IT support team seeks highest possible utilization

Result

› The job queue becomes the capacity buffer

› Job completion times are hard to predict

› Users are frustrated and run fewer jobs

?

HPC Queues Are Evil!

Page 8: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

The World as Seen by Central ITHigh utilization is viewed as a good thing

Page 9: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

The World as Seen by the HPC User

Schedule

impact!

Page 10: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.In a secure Virtual Private Cloud

Automation and Auto Scaling allows easier

cluster management and monitoring

Page 11: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

High Performance, High Throughput Computing

HPC: High Performance Computing (Cluster Computing)

› Requires large numbers of compute cores arranged in a tight cluster, normally

more than are available in a single server

› Latency-sensitive: requires a high degree of communication between individual

tasks running on each compute core

HTC: High Throughput Computing (Grid Computing)

› Like HPC, also requires large numbers of compute cores, however there is minimal

need for communication between the tasks

Cloud supports both HPC Cluster and HTC Grid Use-Cases

› Traditional HPC cluster applications can scale well on EC2, with the added benefit

of higher scale for parallelizing HPC jobs

› HTC applications run extremely well on AWS

Page 12: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Cluster HPC and Grid HTC on the Cloud

Cluster HPC

› Tightly coupled, latency

sensitive applications

› Use larger EC2

compute instances,

placement groups,

Enhanced Networking

Grid HTC

› Loosely coupled,

pleasingly parallel

› Use a variety of EC2

instances, multiple

AZs, Spot, Auto

Scaling, SQS

HPC + HTC

› Use a grid strategy on the cloud to

run a group of parallel, individually

clustered HPC jobs

Page 13: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Locuz.comLocuz.com

Industry Examples

Page 14: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

HGST applications for engineering:

› Molecular dynamics, CAD, CFD, EDA

› Collaboration tools for engineering

› Big data for manufacturing yield analysis

Partner:

Example in Electronics Manufacturing

Molecular Dynamics Simulation

at HGST:

› Millions of parallel parameter

sweeps, running months of

simulations in just hours.

› Over 70,000 Intel cores running

at peak, using EC2 Spot instances

Page 15: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Example in Life Sciences

Baylor CHARGE project:

› Genomics analysis on 14,000

participants

› 24 terabases of sequencer

content each month

› 1PB of raw data storage

› 21,000 AWS compute cores

at peak

› Initial analysis completed in

10 days

Page 16: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Example in Financial Regulation

Page 17: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Large scale of animation rendering on AWS:

• Cloud Rendering at Walt Disney Animation Studios (available on SlideShare)

• Automated environment leveraging Spot Fleet

• Launched 40K cores in 20 min

at less than $0.02 per core-hour

Example in Animation Rendering

Page 18: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Shared File Storage

Cloud-Based, Auto-Scaling

Render Farm on EC2

License Managers and

Cluster Head Nodes

3D Graphics Virtual Workstation

Remote Graphics

AWS Direct Connect

On-Premises IT

Resources

Client Devices

- No local data -

Storage Cache

Amazon S3

Rendering Farm Architecture

Page 19: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Altair HyperWorks on AWS

Page 20: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

ANSYS Enterprise Cloud on AWS

Page 21: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Locuz Competency

› HPC Lifecycle Management: unique methodology to provision

a complete HPC environment on AWS for the entire lifecycle of a

HPC infra or Application

› Automation : Using AWS automation we fast provision a hpc

cluster (CPU / GPU), performance Storage, high speed network etc.

along with hpc middle ware tools

› IP Led Management : Ganana Job submission Portal further

reduces the end users learning curve to run hpc jobs on cloud

› Containerization: Using container technology for faster

application provision.

Pre-processing / Meshing

Simulation

Post Processing

Visualization (2D/3D)

Page 22: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Service Offerings

HPC Assessment and consulting services

› HPC capacity planning.

› HPC requirement analysis with detail roadmap of migration to AWS

› HPC consulting services for Hybrid and Cloud only models

HPC Deployment and Managed services

› HPC Infra Deployment & On boarding

Services

› HPC Application Workflow optimization

› 24/7 Remote Management services with

uptime commitments at middleware level

through NOC.

› GUI job submission portal.

› Check pointing at scheduler level.

› Hadoop / DASK Analytic Cluster services

23

HPC Application services

› Application benchmarking.

› Application porting to multi OS & Cloud platforms (Linux, Windows, GPUs etc.) using Docker Container

› Application optimization for performance.

› Application migration to accelerator technologies – GPU

› Deployment and optimization services of CUDA enabled applications on certified platforms)

Page 23: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

On-Demand

› Pay for compute capacity by the hour with no long-term commitments

› For spiky workloads, or to define needs

AWS Consumption Models

Reserved

› Make a low, one-time payment and receive a significant discount on the hourly charge

› For committed utilization

Spot

› Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand

For time-insensitive or transient workloads

Page 24: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

On

Reserved Instances

100%

On-Demand

Time

Spot

Optimize Utilization on AWS with RI, On-Demand, Spot

Scale up Scale down

Page 25: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

With Spot the Rules are Simple

Spot is a market in which the price of compute changes based on supply and

demand

You’ll never pay more than your bid. When the market exceeds your bid you get 2 minutes to wrap up your

work

Page 26: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Best Practices for Using Spot

Fault toleranceStateless Multi-AZ Loosely coupledInstance Flexibility

Page 27: Locuz · Pre-processing / Meshing Simulation Post Processing Visualization ... Reserved Instances 100% On-Demand Time Spot Optimize Utilization on AWS with RI, On-Demand, Spot Scale

Converge to Cloud.Converge to Cloud.

Locuz.comLocuz.com

Thank You!