The UberCloud
From Project to Product
From HPC Experiment to HPC Marketplace
From HPC Shop to HPC Shopping Mall
Wolfgang Gentzsch President, The UberCloud
Burak Yenier CEO, The UberCloud
HPC 2014 , Cetraro, July 7 – 11, 2014
The UberCloud
From Project to Product
From HPC Experiment to HPC Marketplace
From HPC Shop to HPC Shopping Mall
Wolfgang Gentzsch President, The UberCloud
Burak Yenier CEO, The UberCloud
Product innovation and scientific insight require computing
<=
HPC 2014 , Cetraro, July 7 – 11, 2014
Summary: UberCloud Progress
Traction: 1,500 registered orgs, 72 countries, 155 experiment teams exploring Computing as a Service
Visible: 60+ articles; 40+ trade shows; prestigious 2013 HPCwire Readers’ Choice Award
Powerful sponsors: Intel, Autodesk, Bull, IDC, ANSYS; talking to 10 more.
Powerful participant: 4 of Top 5 CAE ISVs (total 80+); 100+ sw/hw providers; hundreds of end-users, 600+ renowned experts
Compendium I & II: 25 + 17 best case studies from Round 1 – 5 HPC Experiment sponsored by Intel, over 1,000 downloads
Hired Linda Treiman (from Bright) to take care of our providers and sponsors
Container technology and run time environment
UberCloud Marketplace and AppStore
3
Engineers & scientists computing tools: workstations
3 options to use technical compute power
, servers, and clouds
Benefits of HPC in the Cloud
Continue using your workstation for your daily design, and use Cloud resources with additional benefits:
An HPC system at your finger tip, on demand
Pay per use (no CAPital EXpenditure)
Scaling resources up and down (business flexibility)
Low risk by working with multiple cloud providers.
The challenges
Workstation: slow, limited capacity
HPC server: expensive (TCO!), complex
HPC in the Cloud: security, licensing, data transfer, expertise, and …
Very crowded cloud services market, difficult to find your ideal service
It all started June 2012 with the free voluntary UberCloud Experiments
HPC as a Service, on demand, in a team experiment
For SMBs and their engineering applications
to explore the end-to-end process
of using remote computing resources,
as a service, on demand, at your finger tip,
and learning how to resolve the roadblocks.
How does the Experiment work?
End-User registers
Software Vendor joins
We select a Team Expert
Matching a Resource Provider
152 UberCloud Experiments so far
42 case studies in Compendium I & II
Assigning an UberCloud mentor
Now, the team is ready to go
Finally, writing the Case Study
22 Steps Towards a successful project Step 1: define end-user project
1.1: TE & EU fill out "Project definition" docu
1.2: UC assigns SP based on "Project definition" docu
1.3: UC + TM assign RP based on "Project definition" docu
1.4: TE calls for a kick-off meeting over Skype via Doodle
1.5: RP fills out "Computing resources" docu
1.6: SP fills out "Software resources" docu
1.7: If custom code, EU fills out "Software resources" docu
1.8 TE + TM review UC Exhibit, consider additional services
EU = end user, SP = software provider, RP = resource provider, TE = team expert, TM = team mentor, UC UberCloud
22 Steps Towards a successful project Step 2 & 3: resources & execution
Step 2: Contact the resources, set up the project environment
2.1: TE gets resources using "Computing resources" docu
2.2: TE & RP set up software using "Software resources" docu
2.3: TE & RP set up EU code using "Software resources" docu
2.4: TE & RP configure project environment
2.5: TE performs a trial run
Step 3: Initiate project execution on cloud resources
3.1: TE & EU upload data to the project environment
3.2: TE & RP queue the job(s) for the project
EU = end user, SP = software provider, RP = resource provider, TE = team expert, TM = team mentor, UC UberCloud
22 Steps Towards a successful project Step 4-6: monitor, review, report
Step 4: Monitor the project
4.1: TE monitors the job status
4.2: TE & EU re-set parameters between runs as needed
4.3: TE & RP performs post processing, such as remote viz
Step 5: Review your results
5.1: TE makes results available to EU, if needed repeats Step 2-5
5.2: TE & RP remove EU data from project environment
Step 6: Document your findings
6.1: TE initiates docu "Template for UC Experiment Uses Cases"
6.2: TE requests team to contribute to and review the docu
EU = end user, SP = software provider, RP = resource provider, TE = team expert, TM = team mentor, UC UberCloud
Step by Step process
Basecamp project management platform for each team
The UberCloud HPC Experiments Started July 2012, 1500 participants, 72 countries
Example: Amazon AWS in the UberCloud:
Team 2:
Team 20:
Team 30:
Team 40:
Team 65:
Team 70:
Team 116:
Team 142:
Team 147:
13
Simulation of a Multi-resonant Antenna System
Turbo-machinery Application Benchmarks
Heat Transfer Use Case
Simulation of Spatial Hearing
Weather Research with WRF
Next Generation Sequencing Data Analysis
Quantitative Finance Historical Data Modeling
Virtual Testing of Severe Service Control Valve
Compressor Map Generation Using Cloud-Based CFD
The UberCloud HPC Experiments Started July 2012, 1500 participants, 72 countries
Example: Bull extreme factory in the UberCloud:
Team 5:
Team 8:
Team 32:
Team 52:
Team 85:
Team 89:
Team 120:
14
2-phase Flow Simulation of a Separation Column
Flash Dryer with Hot Gas to Evaporate Water from a Solid
2-phase flow simulation of a separation columns
Simulations of Blow-off in Combustion Systems
Combustion simulations of power plant equipment
Simulations of Enzyme-Substrate reactions
Simulation of water flow around self-propelled ship
© 2013 ANSYS, Inc. July 16, 2014 15
Some Lessons Learned
- UberCloud HPC Experiment
Team 8: Flash Dryer Simulation (ANSYS Fluent)
Simulation throughput criterion was met ‼ Remote visualization solution required ‼ Time for downloading results ‼ IP concern
Team 9: Irrigation Simulation (ANSYS CFX)
Timely, high fidelity results were obtained ‼ Windows above Linux preferred ‼ HPC workshop services for SMEs requested
Ability to conduct parametric simulations ‼ Sufficient number of licenses needed ‼ Remote visualization solution required ‼ Disappointing hardware performance results
Team 34: Wind Turbine Simulation (ANSYS Fluent)
Source: The UberCloud HPC Experiment: Compendium of Case Studies
© 2013 ANSYS, Inc. July 16, 2014 16
Some Lessons Learned
- UberCloud HPC Experiment
Team 36: IC-Engine Simulation (ANSYS Fluent)
Smooth setup of environment and sw ‼ Appropriate cloud licensing required ‼ Network bandwidth not good for graphics ‼ Customized sw needs to be recompiled
Team 54: Pool Plant Simulation (ANSYS CFX)
Ability to easily burst into the Cloud Accelerated file transfer and 3D graphics ‼ Cost of the commercial CFD licenses
Ease of use Good remote visualization ‼ File uploading time ‼ Stress test with multiple users required
Team 56: Axial Fan Simulation (ANSYS Fluent)
Source: The UberCloud HPC Experiment: Compendium of Case Studies
Team 1: Heavy Duty ABAQUS Structural Analysis in the Cloud
The Team: Frank Ding, is the Engineering Analysis and Computing Manager at
Simpson Strong-Tie in Northern California. The end user problem space…..
Matt Dunbar, is now the Chief Architect and CAE technical specialist at Simulia Dassault Systems, in Rhode Island on the East Coast. He represents the application level expertise in this experiment.
Steve Hebert, is one of the founders and CEO of Nimbix, located in Texas, which in this team is the provider of cloud-based HPCinfrastructure and applications hosting
Rob Sherrard, is the other co-founder of Nimbix and VP of Service Delivery.
Sharan Kalwani, HPC Segment Architect with Intel Corporation and in this project is the overall Subject Matter Expert,located in Michigan (Midwest).
Team 1: The problem to be solved
The Use Case: ABAQUS/Explicit and ABAQUS/Standard are the major applications
HPC cluster at Simpson Strong-Tie is modest, 32 cores of Intel x86-based gear.
Cloud bursting is critical.
Also challenging is the issue of sudden large data transfers
Need to perform visualization ensuring design simulation is proceeding correctly
Workflow
Pre-processing happens on end user’s workstation to prepare the CAE model
Files transferred to HPC cloud data staging area using a secured FTP process
Submit the job through (Nimbix.net) web portal
Result files can be transferred back for post-processing,
or the post-processing can be done using remote desktop tool like HP RGS on the HPC provider’s visualization node.
Team 1: Challenges!
A weekly schedule – was not the first challenge!
Needed a fast interconnect (e.g. Infiniband) which was not available.
Solved with “fat” nodes, as this cluster is a sandbox for testing the cloud workflow, the actual inter-connect performance of this 12 core cluster was not a concern.
The second challenge was to address the need for simple and secure file storage and transfer. Accomplished very quickly using GLOBUS technology. These days cloud based storage is mature and ready for prime time HPC, especially in the CAE arena.
The third challenge was now to push the limits and stream several jobs simultaneously to the remote HPC cloud resource. This provided solid evidence that “bursting” was indeed feasible. To the whole team’s surprise it worked admirably and had no impact whatsoever overall.
The fourth and final challenge now became perhaps the most critical which was the end user perception and acceptance of the cloud as a smooth part of the workflow.
Remote visualization was necessary to see if the simulation results (left remotely in the cloud)
Team 1: What the end user saw…..
With right tuning, useful remote visualization!
Team 1: What did we learn?
Benefits:
Clearly established - HPC cloud model can indeed be made to work.
Recommendations:
A few key necessary factors emerged:
Result file transfers: most CAE result files easily over several gigabytes, a minimum of 2-4 MB/sec sustained and delivered bandwidth is necessary
The same applies when doing remote visualizations, in this case, 4 MB/sec is the threshold Latency is also a key concern.
Beyond the Cloud service provider, a network savvy ISP is perhaps a necessary part of the team of infrastructure in order to deliver robust and production like HPC cloud
Remote visualization provides a convenient collaboration platform for a CAE analyst to access the analysis results any where he has the need, but it requires a secure “behind the firewall” remote workspace
Team 2: Simulating new probe design for a medical device
HPC Expert:
End User: wanted to stay anonymous
Credits from:
Team 70 Case Study: Next Generation Sequencing Data Analysis
MEET TEAM 70:
End User - Thomas Dyar, Senior Genomics Data Scientist, Betty Diegel, Senior Software Engineer, medical devices company
Software Provider - Brian O'Connor, CEO Nimbus Inform.. Cloud services for workflows utilizing SeqWare
Resource Provider - Amazon Web Services
HPC Cloud Experts - Cycle Computing
Team 142 Case Study: Virtual testing of severe service control valve
MEET TEAM 142:
End User – Mark Lobo, Lobo Engineering;
Software Provider – Derrek Cooper, Autodesk CFD 360
Resource Provider - Amazon Web Services
HPC Cloud Experts – Jon den Hartog and Heath Houghton Autodesk
Challenges with the experiments
HPC is complex; at times it requires multiple experts
Reaching out to industry end-users
No standards: access and usage of hw & sw providers are different, some are complex
Lack of automation: Currently the end-to-end process of the HPC experiment is manual (intentionally).
Time delays: vacation, conferences, and everybody has a day job (busy!)
Barriers: Complexity, data transfer, security, IP, software licenses, performance, interoperability…
AND: we learn a lot . . . .
Bumps on the road
Time delays: Vacation times in July/August and December
No standards: Access and usage processes of hw & sw providers are different, some complex
Hands-on: Process automation at providers vary greatly.
Lack of automation: Currently the end-to-end process of the HPC experiment is manual (intentionally).
Participants spent relatively small portion of their time, some are responsive, others are not: it is not their day job!
Getting regular updates from Team Experts is a challenge because this is not their day job !
Building a marketplace demands building an ecosystem
UC Market Place
App store
Comm
unity
Start: Rough
idea
Mar Com
ß Pro duct
Technology
Experiment
Exhibition
06/12 HPC Cetraro
09/12
01/13
01/13
01/14
01/14
03/14
06/14
workflow impact
Problem: today’s crowded and ineffective cloud ‘market’
Supply
Cloud providers ISVs Consultants Trainers
Demand
Engineers Scientists Data analysts Experts
.
.
.
.
.
Complexity
Data Transfer
Security Licensing
Uncertain Cost
Roadblocks
Solution: The UberCloud Marketplace
Supply
Cloud providers ISVs Consultants Trainers …
Demand
Engineers Scientists Data analysts Experts
UberCloud Marketplace
Solution: The UberCloud Marketplace
UberCloud Marketplace
for 20+ million engineers and scientists
and their service providers
to discover, try, buy, and sell
computing time, storage, software and expertise on demand
Announcement at HPC Cetraro
Technology solution:
Standard Cloud run-time environment
Building thin, light-weight run-time environment (RTE) on top of Linux kernel features and open source tools, which
provides a standard platform across distributed in-house, grid, and cloud resources
facilitates access to all kinds of resources (workstations, servers, and private, hybrid, and public clouds)
moving portable, stackable units including end-users app, data, tools seamlessly btwn in-house and external resources
enables portability across different in-house and external resources (federation)
reducing / removing many of the cloud challenges
32
Builder
Launcher
Controller ISV Data Tools
Stackable units with tools (ex: encryption), ISV application codes (ex: OpenFOAM). Just add your own codes and data.
Run anywhere with UberCloud Run Time. Scale up or down the compute power as needed.
Collect granular usage data, logs. Monitor, alert, report.
Any Workstation
Any Cluster Any Cloud
Run Time Run Time Run Time
Build once, run anywhere
Portable Units are like containers
Standard software units (with user’s app, data, tools etc.) can be moved seamlessly across any set of resources. Units are
stackable and portable,
built from a base unit with standard functionality (security, encryption, compression, monitoring, data transfer, etc)
extended by the ISV’s software as next layer,
top layer is the end-users configuration and data.
34
Next Steps: Reducing / Removing Cloud Challenges
Challenge *) Addressed today With UberCloud **)
Portability low high
Security medium high
Software Licenses low medium
Data Transfer low medium
Compliance low medium
Standardization low high
Cost & ROI Transparency low high
Resource Availability medium high
Transparency of Market low high
Cloud Computing Expertise low medium
*) Cloud challenges are addressed low, or medium, or high **) When UberCloud is fully developed one year from now
It’s your turn now
Download 2013 Compendium of case studies from HPCwire
Download 2014 Compendium of case studies
Register at TheUberCloud.com
Try the UberCloud Marketplace with $1 voucher and you get
NOW NOW
The UberCloud Community and Marketplace
Thank You !
Register free at
http://www.TheUberCloud.com
Top Related