COSMIC: Middleware for Xeon Phi Servers and Clusters

21
COSMIC: Middleware for Xeon Phi™ Servers and Clusters Computing Systems Architecture NEC Laboratories America Princeton, NJ www.nec-labs.com January 2014 Pre-commercialization, name subject to change S Cadambi, G Coviello, C Li, K Rao, M Sankaradas, S Chakradhar

description

Now in Beta, COSMIC is NEC’s system software that enables seamless Xeon Phi coprocessor sharing. It is completely transparent to applications and all other system software components. COSMIC is useful in organizations where several users share one or more Xeon Phi-based servers. It can reduce capital cost by efficiently utilizing fewer servers. Learn more: http://www.nec-labs.com/research/system/systems_arch-website/cosmic.php Watch the video presentation: http://wp.me/p3RLHQ-bbE

Transcript of COSMIC: Middleware for Xeon Phi Servers and Clusters

  • 1. COSMIC: Middleware for Xeon Phi Servers and Clusters Pre-commercialization, name subject to changeS Cadambi, G Coviello, C Li, K Rao, M Sankaradas, S ChakradharComputing Systems Architecture NEC Laboratories America Princeton, NJ January 2014www.nec-labs.com

2. The Xeon Phi Coprocessor (MIC) Launched by Intel at ISC 2012 x86-based coprocessor with 60+ cores HOST MulticorePCIe60+ cores, 240+ threads 512b vector units 8+GB memory (7120P) Supports OpenMP Runs Linux: allows multi-processing, memory management Good for scientific applications 2 3. Xeon Phi Servers and Clusters Fast ramp-up: Many hardware vendors Many clusters already commissionedNEC also offers a Xeon Phi serverExpress5800/HR120b-1 Some very high performance ones too! Top500 #1: Tianhe-2 Top500 #7: Stampede1U form factor with 2 Xeon Phi coprocessors3 4. Managing Xeon Phi Clusters Most clusters follow an exclusive allocation policy for the Xeon Phi 1 Phi dedicated to one unique user until job completes BOBNeeds 1 Xeon PhiHas to wait for Phi to become availableAMYCHARLIE Needs 1 Xeon PhiACTIVE USERSNeeds 3 Xeon Phis4 node clusterHOSTXEON PHI 60 cores, 8GBHOSTXEON PHI 60 cores, 8GBHOSTXEON PHI 60 cores, 8GBHOSTXEON PHI 60 cores, 8GB 5. Why the Conservative Policy? Avoids resource oversubscription5 6. What is Resource Oversubscription? Say Amy and Bob each want to run a program that uses a single Xeon Phi intermittently (coprocessor offload model) Do they each need a device, or can they share? AMYS PROGRAM BeginBOBS PROGRAM BeginXeon PhiHostHostXeon PhiXeon PhiSHARE?HostEnd HOST PROCESSORXEON PHI COPROCESSOREnd6 7. What is Resource Oversubscription? First problem of sharing Phi the programs together oversubscribe hardware threads This can cause 2-3x slowdown! AMYS PROGRAM BeginBOBS PROGRAM BeginXeon PhiHostHostXeon PhiXeon PhiSHARE?HostEnd HOST PROCESSORXEON PHI COPROCESSOREnd7 8. What is Resource Oversubscription? Second problem of sharing Phi the programs can oversubscribe physical device memory This causes random crashes AMYS PROGRAM BeginBOBS PROGRAM BeginXeon PhiHostHostXeon PhiXeon PhiSHARE?HostEnd HOST PROCESSORXEON PHI COPROCESSOREnd8 9. Why the Conservative Policy? Avoids resource oversubscription Safe no crashes Easier management BUT9 10. Downsides of Conservative Policy Poorly utilized Xeon Phi coprocessors Dynamic utilization. Averages around 40%! Only 40% of cores are doing useful work on average due to intermittent use, conservative scheduling policy, 10 11. Downsides of Conservative Policy Need larger cluster than necessary THIS CAN GET EXPENSIVE!Capital cost Power Maintenance Administration 11 12. Downsides of Conservative Policy Long wait times if all Xeon Phis are busy Annoyed users: have to wait even if their jobs are short Cannot pre-empt running jobs Even though Phis may be underutilized or intermittently used, they must waitRUNNING PROGRAMS HAVE OCCUPIED ALL XEON PHIS IN CLUSTERXEON PHI CLUSTER 12 13. COSMIC Middleware that allows safe Xeon Phi sharing Transparently discovers resource requirements and schedules jobs to maximally share Xeon PhisAPPLICATIONSU S E R K E R N E LCOSMIC (invisible to apps, kernel)LINUXMPSS : MODIFIED LINUX + DRIVERS +HOST PROCESSORXEON PHI COPROCESSOR13 14. COSMIC lets users share the Phi AMYS PROGRAM Begin Xeon Phi HostBOBS PROGRAM BeginInstead of making them wait for each other, COSMIC co-runs them by interspersing host and Phi portionsXeon Phi HostXeon PhiHostXeon PhiHostHost Xeon Phi HostXeon PhiEndDevice sharing: users dont wait, better utilizationEnd14 15. COSMIC also resolves conflicting user directives WITHOUT COSMIC User 1s Xeon Phi portion User-specified coreUser 2s Xeon Phi portionaffinity may conflict during sharing Xeon Phi coresWITH COSMICCOSMIC transparently resolves conflicts and Xeon spreads Phi load across cores cores15 16. Utilization: 1-device serverAverage Utilization (%)100WITH COSMIC (BLACK) AVERAGE UTILIZATION 70.6%90 80 70 60 50 4030 20 10 0TimeWITHOUT COSMIC (BLUE) AVERAGE UTILIZATION 41.7%16 17. Performance: 2-device server64 jobs, randomly arriving Average Latency (s)Makespan (s)Average Core UtilizationWithout COSMICWith COSMICWithout COSMICWith COSMICWithout COSMICWith COSMIC10991193144123819.9%56.9%Major improvements through device sharing, load balancing17 18. COSMIC Demo18 19. Easy to Use on Clusters Easy to interface with third party software Optional COSMIC cluster component for even better utilization Up to 50% footprint reduction by Phi sharing! COSMIC CLUSTER COMPONENTCOSMIC HOSTXEON PHI 60 cores, 8GBTHIRD PARTY CLUSTER MANAGEMENT SOFTWARECOSMIC HOSTXEON PHI 60 cores, 8GBCOSMIC HOSTXEON PHI 60 cores, 8GBCOSMIC HOSTXEON PHI 60 cores, 8GB19 20. COSMIC Summary We are ready to engage with beta customers Do you manage Xeon Phi servers or clusters? Do you use off-the-shelf cluster management software with exclusive allocation policies? If so, you likely will benefit from COSMIC Improves Xeon Phi utilization by sharing Transparent to users Transparent to underlying system software Easy to add-on to third-party cluster tools20 21. How to Get More Info Contact us: NEC Japan: Y Hirotani, [email protected] NEC Labs America: S Cadambi, [email protected] We make onsite presentations / demos If interested in evaluating COSMIC, just ask us See our demo online: http://www.nec-labs.com/research/system/systems_arch-website/cosmic.php21