Performance & Energy Optimization - OpenMP · 2015. 11. 28. · Md Abdullah Shahneous Bari Abid M....

Performance & Energy Optimization @

Md Abdullah Shahneous Bari Abid M. Malik Millad Ghane

Ahmad Qawasmeh Barbara M. Chapman

11/28/15

Layout of the talk

Ø Overview Ø Motivation Ø Factors that affect the performance & Energy Optimization Ø Experimental Results Ø Conclusion & Future Work

OpenMP

Ø De-facto standard for shared memory parallel programming

Ø Thread based parallelism Ø Mainly two kinds of parallelism

Ø Regular parallelism (work sharing constructs) Ø Irregular parallelism (task based constructs)

Main Barrier Towards Exascale Computing…

Ø Power, power and power Ø 20MW power limit for exascale machines (DOE) Ø Usually processor vendors concern Ø But to reach the exascale limit software stack have to

chip in Ø Any solution????

Power Constrained Computing (Overprovisioning)

Ø Usually not all application use maximum node power all the time

Ø Capping the power at lower limit Ø Allows extra node to be added at the similar power

budget

ExtraNode ExtraComputePower

Power Constrained Computing(Contd.)

Ø More focus on overall system level performance Ø Some related work,

Ø Sarood et al. [1] Ø Patki et al. [2] Ø Rountree et al. [3]

1.  Sarood,Osman,etal."Op?mizingpoweralloca?ontoCPUandmemorysubsystemsinoverprovisionedHPCsystems."ClusterCompu,ng(CLUSTER),2013IEEEInterna,onalConferenceon.IEEE,2013.

2.  Patki,Tapasya,etal."Exploringhardwareoverprovisioninginpower-constrained,highperformancecompu?ng."Proceedingsofthe27thinterna,onalACMconferenceonInterna,onalconferenceonsupercompu,ng.ACM,2013.

3.  Rountree,Barry,etal."BeyondDVFS:Afirstlookatperformanceunderahardware-enforcedpowerbound."ParallelandDistributedProcessingSymposiumWorkshops&PhDForum(IPDPSW),2012IEEE26thInterna,onal.IEEE,2012.

Why OpenMP???

Ø Current Issue: Less focus on per-node performance Ø Challenge: To reach the peak throughput, per-node performance

must be improved Ø OpenMP is the most popular language of choice for intra node

parallelism

Factors That Impact Work Sharing Parallelism…

Ø How many workers are working? ~ Thread Ø How the work is scheduled? ~ Scheduling Policy Ø How much work they are given at one time? ~ Chunk Size Ø How the data is laid out for the workers? ~ Thread Affinity Ø What do the workers do during their break? ~ Wait Policy

Experimental Details Ø Selected parameters

Ø No. Of threads (2, 4, 8, 16, 24, 32) Ø Scheduling policy (STATIC, DYNAMIC, GUIDED) Ø Chunk size(1, 8, 32, 64, 128, 256, 512) Ø Wait policy (active, passive) Ø Thread affinity (OMP_PLACES + OMP_PROC_BIND)

Ø Power cap levels Ø (55, 70, 85, 100, 115)w

Ø Used technology: Ø  Intel RAPL (for power capping & energy measurement) Ø OMPT for kernel level measurement

Ø Benchmark ~ NPB

0102030405060708090

onj_gr

_main_

_c_i_c

volve_

_init_

xact_r

_ini?alize_

_ninvr_1

_pinvr_1

_txinv

zetar_

_solve

ortar_

%Perform

anceIm

provem

Kernels

55W 70W 85W 100W 115W

Performance improvement using the best configuration compared to default across all kernels

onj_gr

_main_

_c_i_c

volve_

_init_

_ini?alize_

_ninvr_1

_pinvr_1

_txinv

zetar_

_solve

ortar_

1%EnergyIm

provem

Kernels

55W 70W 85W 100W 115W

Energy consumption improvement using the best configuration compared to default across all kernels

24,GUIDED,8

24,DYNAM

24,GUIDED,8

32,DYNAM

32,STATIC,1

115W,32,STATIC,1

55W 70W 85W 100W 115W

Execu?

e(Sec)

DifferentPowerCapLevels

Execution time comparison among different configurations (an LU kernel)

BestConfigura?on DefaultConfigura?on DefaultConfigura?onWithoutPowerCap

OpenMP ICVs on DRAM Power

Ø Developing a model for power consumption of openmp applications

: Mill

DataSize

Power(W

DataSize

FloorPlan

Courtesy:

Impact of threads & scheduling policy in task based parallelism

Ongoing Work Ø Dynamic adaptation (APEX),

Ø Active harmony Ø Modeling

Ø Across different software stack (OpenMP runtime), Ø Openuh Ø GCC Ø Intel

Ø Across different hardware architecture Ø Intel sandybridge Ø IBM power8

Future Work

Ø More concrete configuration selection Ø DRAM capping Ø Fine grain (core level) control Ø Other energy efficient techniques,

Ø DVFS, frequency modulation etc.

Ø Combining it with a inter-node (MPI) programming models for hybrid applications

Summary Ø  Overview

Ø  Motivation

Ø  Factors that affect the performance & Energy Optimization

Ø  Experimental Results

Ø  Conclusion & Future Work

Performance & Energy Optimization - OpenMP · 2015. 11. 28. · Md Abdullah Shahneous Bari Abid M....

Documents

Transcript of Performance & Energy Optimization - OpenMP · 2015. 11. 28. · Md Abdullah Shahneous Bari Abid M....

Identification of the quality of urban life assessment aspects ......Identification of the quality of urban life assessment aspects in residential neighbourhoods in Doha R. Qawasmeh

Rub en Medina...Rub en Medina Page 3 tation of Ventricular Heart Chambers from MultiSlice Computerized Tomography: An Hybrid Ap-proach, In H. Cheri , J.M. Zain, and E. El-Qawasmeh

Total Data Base of all format received Ranchi District ...dseranchi.com/images/news/merge 2819_16082016_1.pdf13 angara ghane nath mahto m late parsuram ram mahto silli 27 5 1956 14

Cambridge University Presscambridge... · Web view1.Ghane MR, Gharib MH, Ebrahimi A, et al. Accuracy of rapid ultrasound in shock (RUSH) exam for diagnosis of shock in critically

Assaf Shacham, Keren Bergman, Luca P. Carloni Presented for HPCAN Session by: Millad Ghane NOCS’07.

Portal By Soheila Ghane. outline 1.History 2.Introduction to issues 3.Defining “Portal” 4.What can be expected from a portal? 5.What can not be expected.

Milliken Millad NX 8000 - Amazon S3 · 2016. 10. 17. · Millad® NX™ 8000 is revolutionizing the industry by creating new opportunities for crystal clear packaging and products

e.jang.com.pk OF IMMOVABLE PROPERTIES/Bah… · to Ahmad Puri Gate Band Road Khatm-e-Nabowat Chowk (Ex-Chowk Fawara) to Millad Chowk (Upto Depth of 100 Feet) Milad Chowk to Sariaki

Milliken Millad NX 8000 · 2020-02-28 · Millad® NX™ 8000 is a new-to-the-world chemistry that significantly expands the performance of polypropylene clarification. Millad NX

Sampling Methods - Bishop PRML Ch. 11vda.univie.ac.at/Teaching/ML/16s/LectureNotes/11... · 2016-06-03 · Sampling Methods Bishop PRML Ch. 11 Alireza Ghane ... The fundamental problem

MiniCheckID Pro+ TM V1.3. Team Website: //dcm.uhcl.edu/capF09g2/ 12/1/2009 Capstone Team #2 - Fall20092 Yunis Al-Qawasmeh.

- Department of Electronics & TelecommunicationDepartment of … · 2020-01-09 · Sunil Ghane Pramod B hide Dr. Reena Sonkusare Dr. Su.ata Kulkarni Prof. Pallavi Malame Prof. Manish

A Survey of Open Access Barriers to Scientific Information: Providing an Appropriate Pattern for Scientific Communication in Iran Mohammad Reza Ghane Ph.D.

HaMoked · Qawasmeh, 'Amer Abu 'Ayesha and Husam Qawasmeh. i. July 2014: A shooting attack was carried out from a passing Palestinian vehicle, using small-arms, at an Israeli citizen

A Compiler-Based Tool for Array Analysis in HPC Applications Presenter: Ahmad Qawasmeh Advisor: Dr. Barbara Chapman 2013 PhD Showcase Event.

Polyolefin Additives: Advances in Performance Extension · Polyolefin Additives: Advances in Performance Extension ... Millad® NX™ 8000 clearly leads the market because it is ...

Milliken Millad NX 10 Concentrate - Squarespacestatic.squarespace.com/static/51dcbe53e4b01df64ad42b38/t... · Millad® NX™ 10 Concentrate is a 10% concentrate of Millad® NX™

Introduction to Machine Learning - univie.ac.atvda.univie.ac.at/Teaching/ML/15s/LectureNotes/01_basics.pdf · Introduction to Machine Learning Bishop PRML Ch. 1 Alireza Ghane. Course

Extension Bulletin E3370 New February 2018 Agricultural ... · Agricultural Drainage Ehsan Ghane, Michigan State University. 2 AGICUTUA DAINAGE plastic pipes are buried in the ground

Screening for Additives in PE and PP Food Contact ... · A clarifier (Millad® NX8000, 3988i type) was found in all transparent samples and mono fatty acid esters of glycerol as antistatic/mould