(CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS
-
Upload
amazon-web-services -
Category
Technology
-
view
1.289 -
download
1
Transcript of (CMP306) Dynamic, On-Demand Windows HPC Clusters On AWS
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Timothy DiLauro, AWS Solutions Architect
Julien Lépine, AWS Solutions Architect
October 2015
CMP306
On-Demand Windows HPC on AWS
Windows Clusters for Dynamic Needs
What to Expect from the Session
HPC on AWS
AWS Architecture for Windows HPC
AWS Architecture for HPC
Best Practices for Windows HPC
Demonstration
HPC on AWS
Low cost with flexible pricing Efficient clusters
Unlimited infrastructure
Faster time to results
Concurrent Clusters on-demand
Increased collaboration
Why AWS for HPC?
Popular HPC workloads on AWS
Genome
processing
Modeling and
Simulation
Government and
Educational Research
Monte Carlo
Simulations
Transcoding and
Encoding
Computational
Chemistry
Benefits of Agility
Elastic Cloud-Based Resources
Actual demand
Resources scaled to demand
Waste Customer
Dissatisfaction
Actual Demand
Predicted Demand
Rigid On-Premises Resources
Cost Benefits of HPC in the Cloud
Pay As You Go Model
Use only what you need
Multiple pricing models
On-Premises
Capital Expense Model
High upfront capital cost
High cost of ongoing support
AWS Journey for HPC Customer
Dev, Test, Eval True Production Mission Critical All-in
Build new production apps
Migrate production apps
Build mission-critical apps
Migrate mission-critical apps
Development and test
Eval and training
Corporate standard
“Cloud First”
AWS Architecture for HPC
On-Demand HPC on AWS
With AWS, deploy multiple clusters
running at the same time and match the
architectures to the jobs
AWS Architecture for HPC
Amazon
Virtual Private
Cloud
Amazon
Simple Storage
Service
Amazon
Elastic Block
Store
Amazon
Elastic Compute
Cloud
Amazon
CloudWatch
AWS
CloudFormation
Auto Scaling
2006 2007 2008 2009 2010 2011 2012-2013 2014
m1.small
m1.xlarge
m1.large
m1.small
m2.2xlarge
m2.4xlarge
c1.medium
c1.xlarge
m1.xlarge
m1.large
m1.small
cc2.8xlarge
cc1.4xlarge
cg1.4xlarge
t1.micro
m2.xlarge
m2.2xlarge
m2.4xlarge
c1.medium
c1.xlarge
m1.xlarge
m1.large
m1.small
cr1.8xlarge
hs1.8xlarge
m3.xlarge
m3.2xlarge
hi1.4xlarge
m1.medium
cc2.8xlarge
cg1.4xlarge
t1.micro
m2.xlarge
m2.2xlarge
m2.4xlarge
c1.medium
c1.xlarge
m1.xlarge
m1.large
m1.small
cc1.4xlarge
cg1.4xlarge
t1.micro
m2.xlarge
m2.2xlarge
m2.4xlarge
c1.medium
c1.xlarge
m1.xlarge
m1.large
m1.small
t2.micro
t2.small
t2.medium
t1.micro
hs1.8xlarge
m3.xlarge
m3.2xlarge
hi1.4xlarge
m1.medium
cc2.8xlarge
cr1.8xlarge
cg1.4xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
c1.medium
c1.xlarge
m1.xlarge
m1.large
m1.small
c1.medium
c1.xlarge
m1.xlarge
m1.large
m1.small
new
existing
Amazon Elastic Compute Cloud
g2.2xlarge
hs1.xlarge
hs1.2xlarge
hs1.4xlarge
c3.large
c3.xlarge
c3.2xlarge
c3.4xlarge
c3.8xlarge
m3.medium
m3.large
i2.large
i2.xlarge
i2.4xlarge
i2.8xlarge
r3.large
r3.xlarge
r3.2xlarge
r3.4xlarge
r3.8xlarge
Continuing to enable customer choice and right sizing of clusters
m4.large
m4.xlarge
m4.2xlarge
d2.xlarge
d2.2xlarge
d2.4xlarge
d2.8xlarge
t2.micro
t2.small
t2.medium
t2.large
t1.micro
hs1.8xlarge
m3.xlarge
m3.2xlarge
hi1.4xlarge
m1.medium
cc2.8xlarge
cr1.8xlarge
cg1.4xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge
c1.medium
c1.xlarge
m1.xlarge
m1.large
m1.small
m4.4xlarge
m4.10xlarge
c4.xlarge
c4.2xlarge
c4.4xlarge
c4.8xlarge
g2.8xlarge
g2.2xlarge
hs1.xlarge
hs1.2xlarge
hs1.4xlarge
c3.large
c3.xlarge
c3.2xlarge
c3.4xlarge
c3.8xlarge
m3.medium
m3.large
i2.large
i2.xlarge
i2.4xlarge
i2.8xlarge
r3.large
r3.xlarge
r3.2xlarge
r3.4xlarge
r3.8xlarge
2015
Auto Scaling and Amazon CloudWatchMatch demands of cluster queue with appropriate compute needs
CloudWatch
Auto Scaling group
Windows HPC Job Manager
Amazon Elastic Block Store
• Designed for five nines of availability
• Attaches to Amazon EC2 within the same Availability Zone
• Point-in-time snapshots to Amazon S3
• Checkbox enabled encryption
MagneticGeneral Purpose
(SSD)
Provisioned IOPS
(SSD)
Volume types
When performance
matters, use SSD-
backed volumes!
Network attached persistent block storage volumes for Amazon EC2
Amazon EBS
• Default 30 GB volume
• Gets initial I/O credit of 5.4M
• Burst for up to 30 mins @ 3000 IOPS
• Accumulate 90 I/O credits/second
Windows Boot Volume
Decrease launch time of instances by leveraging General Purpose SSD
Amazon Simple Storage ServiceStore input and result datasets for dynamic and transitive Windows HPC clusters
RedundancyDurability: designed for 99.999999999%
Availability: designed for 99.9%
CapacityConsumption-based storage model
Virtually unlimited capacity
SecurityEncryption in Transit: HTTPS/TLS
Encryption at Rest: SSE, SSE-C, SSE-KMS
Ease of useStorage Classes: Standard, RRS, Glacier
Lifecycle Policies: archive, expiration
Amazon S3
Copy data to Amazon S3 and enable SSE
Write-S3Object –BucketName mybucket -Folder .\Scripts -KeyPrefix SampleScripts\ -ServerSideEncryption
Copy data from Amazon S3 to a local folder
Read-S3Object –BucketName mybucket -KeyPrefix SampleScripts –Folder .\
• Bucket: mybucket
• Keyname Space: SampleScripts
• Local Folder: .\Scripts
Migrate data to AWS and Windows HPC clusters with AWS Tools for PowerShell
AWS CloudFormation
• Create templates to describe the AWS resources used to run your
application
• Provision identical copies of a stack
• Templates can be stored in a source control system
• Track all changes made to your infrastructure stack
• Modify and update resources in a controlled and predictable way
• Just choose what resources and configurations you need
• Customize your template via parameters
Consistently and easily deploy Windows HPC clusters based on workflow needs
Templated resource provisioning
Infrastructure as code
Declarative and flexible
AWS Architecture for HPC
• Users directory
• Bastion host
• Head node
• Compute nodes
Core Infrastructure Cluster Infrastructure
Amazon VPC
Users
Bastion
Core
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Cluster
AWS Architecture for HPC
Hybrid or “burst” All-in AWS
Choose the right deployment architecture for the use case
Core infrastructure:Users directory
Bastion host
On-premises
AWS
AWS Directory Service
Amazon EC2
Cluster infrastructure:Head node
Compute node
Storage
AWS
AWS
On-premises/AWS
Amazon EC2
Amazon EC2
Amazon S3
User workstations On-premises Amazon WorkSpaces
AWS Architecture for HPC“Burst” to virtually unlimited compute capacity in AWS
Amazon VPC
Users
Bastion
Core
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
ClusterWorkstations
Head
HPCUsers
CoreCluster
On-Premise
HPC
HPC HPC
AWS Architecture for HPCDeploy users, infrastructure, and cluster all in AWS
Amazon VPC
Core
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
ClusterWorkstations
Users
Bastion
AWS Architecture for Windows HPC
Windows Server on AWS
Easy Licensing
OS $/Hr
BYOL
Optimized AWS
Software for
Windows
EC2Config, drivers
Experience
October 2008
Every use case
Every industry
OS Choice
2003R2
2008, 2008R2
2012, 2012R2
Microsoft Portfolio
SQL Server
SharePoint
Exchange, Lync
Customize Systems
50+ EC2 instances
32, 64 bits
CPU, GPU
AWS Architecture for Windows HPCNetworking best practices for Windows HPC clusters
• Network Design- Leverage both public and private subnets, manage sizing
• Availability – Use multi-AZ design
• Access Control – use VPC endpoint and NAT for external accesses
Availability Zone A
Availability Zone B
Private Subnet
10.0.10.0/24
Public Subnet
10.0.0.0/24
Core
Private Subnet 2
10.0.11.0/24
VPCEndpoint
NAT
Public Subnet
10.0.1.0/24
NAT
AWS Architecture for Windows HPC
• Domain Controller – Highly available extension of your existing environment
• Remote Desktop Gateway - Increase security posture
Core infrastructure best practices for Windows HPC clustersAvailability Zone A
Availability Zone B
Private Subnet
10.0.10.0/24
Public Subnet
10.0.0.0/24
DC
Core
Private Subnet 2
10.0.11.0/24
DC
RDGW
Public Subnet
10.0.1.0/24
AWS Architecture for Windows HPC
• Head Node – Size independent of Compute Node, General Purpose family
• Compute Nodes – use Auto Scaling groups and cluster instances
• S3 Bucket – Persistent, secure, available storage of cluster input and results
Cluster infrastructure best practices for Windows HPC clusters
Availability Zone B
Availability Zone A
Private Subnet
10.0.10.0/24
Public Subnet
10.0.0.0/24
Core
Private Subnet 2
10.0.11.0/24
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Cluster
Public Subnet
10.0.1.0/24
S3Bucket
VPCEndpoint
AWS Architecture for Windows HPCAll at once, complete Windows HPC infrastructure on AWS
Availability Zone B
Availability Zone A
Private Subnet
10.0.10.0/24
Public Subnet
10.0.0.0/24
DC
S3Bucket
Core
Private Subnet 2
10.0.11.0/24
DC
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Cluster
VPCEndpoint
RDGW
NAT
Public Subnet
10.0.1.0/24
NAT
AWS Architecture for Windows HPCLaunch multiple clusters right-sized to complete work in amount of time specified
Private Subnet
10.0.10.0/24
Public Subnet
10.0.0.0/24
DC
Core
Private Subnet 2
10.0.11.0/24
DC
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Cluster
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute Compute Compute Compute
RDGW
NAT
Public Subnet
10.0.1.0/24
NAT
Availability Zone A
Availability Zone B
S3Bucket
VPCEndpoint
Best Practices for Windows HPC
Secure Windows HPC Workloads on AWS
AWS Resource Access: Enable access to AWS resource through
policies in IAM roles
Encryption at Rest: Enable encryption on EBS volumes and specify
server side encryption for objects in Amazon S3
Create private access to input and output results stored in Amazon S3
via VPC endpoints
Ensure auditability of AWS account by enabling AWS CloudTrail
Leverage native AWS security features to enhance the
security posture of Windows HPC
Optimized network for Windows HPC
Enhanced Networking: SR-IOV feature provides higher PPS
performance, lower latencies, and very low network jitter
Placement Groups: All instances get low latency, full bisection,
10Gbps bandwidth between instances
EBS Optimization: Get up to 4000Mbps additional dedicated
throughput dedicated to your storage needs
AWS PV Drivers / Intel Drivers: Make sure you stay current with
the latest
Get the most of AWS networking for your HPC workloads
Optimized processing with Windows HPC
Hyper-threading: Most current generation AWS instances provide
hyper-threading, keep it or deactivate it based on your needs
Turbo Boost: Latest generation of instances leave you control C-
state and P-state registers for your processors
The right instance: Choose your constraints (price, CPU, GPU,
RAM, network) and get the instance type that fits your use case
The right storage: Choose the amount and support of instance
storage or Amazon EBS storage required, and leverage storage
services such as Amazon S3
Get the most of your instances for your HPC workloads
Automated Windows HPC computing
Windows PowerShell®: You can get all the installation and
configuration of the instances done automatically
AWS Tools for Windows PowerShell: Your cluster can become
aware of the infrastructure it is running on
Auto Scaling: Automate provisioning and scaling of your cluster to
have your workloads finished when you need them
AWS CloudFormation: Deploy your clusters in a few clicks, create
test clusters in minutes
Get your cluster as code, running in minutes from scratch
Demonstration
Windows HPC AWS CloudFormation TemplateEnable automated deployments of clusters with pre-built template
Amazon VPC
DC
RDGW
Core
Head
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Compute
Cluster
AWS CloudFormation Templates: PrerequisitesThings to do before starting the template
Select your region and base image• VPC + Subnet: Just input selected CIDR
• Instance Types: for all instances
• (Optional) Placement Group: Create a VPC placement group
Prepare installation media then snapshot• Download Microsoft HPC Pack and unzip to \HPCPack2012R2-Full
• Extract SQL Server installation to \SQLInstall
• Download Intel SR-IOV drivers and extract to \PROWinx64
• Download latest AWS PV drivers and extract to \AWSPVDriverSetup
Select installation configuration:• Define domain configuration and credentials
AWS CloudFormation Template: CoreBuilding the core Windows infrastructure
Base Network• VPC + Public Subnet: Select your CIDR
• DHCP Option Set: Configured to use DC
• Security Groups: For bastion and cluster
Core Infrastructure:• Domain Controller in new forest
• Remote Desktop Bastion Host (outside of domain)
• Domain User “Join Computer to Domain” privileges
AWS CloudFormation Template: ClusterBuilding the Microsoft HPC cluster on AWS
Head-Node• Multi-role: database, HPC Head node, Share
• Monitored: Amazon CloudWatch Custom metrics
Compute Nodes:• Automated: Automatic configuration to join the cluster
• Scalable: Auto Scaling group resizing the cluster based on load
• Up-to-date: auto upgrade of AWS and Intel Drivers
Windows HPC AWS CloudFormation Template
In < 30 minutes, your cluster will be ready to accept jobs.
Getting Started Collateral
QwikLAB: Launching Microsoft HPC Pack on AWS:
https://www.qwiklab.com/focuses/preview/1604?search=19103
Reference CloudFormation Template:
https://github.com/awslabs/aws-cfn-windows-hpc--template
Remember to complete
your evaluations!
Thank you!