A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications
-
Upload
jelani-english -
Category
Documents
-
view
27 -
download
0
description
Transcript of A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications
A Component Infrastructure for Performance and Power
Modeling of Parallel Scientific Applications
Boyana NorrisArgonne National Laboratory
Van Bui, Lois Curfman McInnes, Li LiArgonne National Laboratory
Oscar Hernandez, Barbara ChapmanUniversity of Houston
Kevin HuckUniversity of Oregon
Outline
Motivation
Performance/Power Models
Component Infrastructure
Experiments
Conclusions and Future Work
Acknowledgements
2CBHPC, Karlsruhe, Germany, October 17, 2008
Component-Based Software Engineering
Functional unit with well-defined interfaces and dependenciesComponents interact through portsBenefits: software reuse, complex software management, code generation,
available “services”Drawback: more restrictive software engineering, need for runtime framework
CBHPC, Karlsruhe, Germany, October 17, 2008
3
Motivation
CBSE increasing in HPCPower increasing in importanceA need for simpler processes for
performance/power measurement and analysis― Performance tools can be applied at the
component abstraction layer― Opportunities for automation
CBHPC, Karlsruhe, Germany, October 17, 2008
4
Power vs. Energy
Rate a system performs work
Power = Work / ▲Time
Total work over a period of time
Energy = Power * ▲ Time
CBHPC, Karlsruhe, Germany, October 17, 2008
5
Power Trends
CBHPC, Karlsruhe, Germany, October 17, 2008
6
Cameron, K. W., Ge, R., and Feng, X. 2005. High-Performance, Power-Aware Distributed Computing for Scientific Applications. Computer 38, 11 (Nov. 2005), 40-47.
Power Reduction Techniques
Circuit and logic levelLow power interconnectLow power memories and memory hierarchyLow power processor architecture
adaptationsDynamic voltage scalingResource hibernationCompiler level power managementApplication level power managementCBHPC, Karlsruhe, Germany, October 17, 2008
7
Goals and Approach
Provide a component based system― Facilitates performance/power measurement and
analysis ― Computes high level performance metrics― Integrates existing tools into a uniform interface ― End Goal: static and dynamic optimizations based
on offline/online analyses
8CBHPC, Karlsruhe, Germany, October 17, 2008
System Diagram
9
Interactive Analysis and Model Building
SubstitutionAssertionDatabase
Instrumented Component Application
Runs
Instrumented Component Application
Runs
Control System(parameter changes andcomponent substitution)
Control System(parameter changes andcomponent substitution)
CQoS-Enabled Component Application
CQoS-Enabled Component Application
Component AComponent A
Component BComponent B
Component CComponent C
Substitution Set
Machine Learning
Performance/PowerDatabases
(persistent & runtime)
Analysis Infrastructure Control Infrastructure
CBHPC, Karlsruhe, Germany, October 17, 2008
Performance Model I
FLP Inefficiency – PD: Problem size dependent variant FLP Inefficiency – PI: Problem size independent variant
CBHPC, Karlsruhe, Germany, October 17, 2008
10
Metric
Global Stalls Stall_cycles/total_cycles
% FLP Stalls FLP_stalls/stall_cycles
FLP Inefficiency – PD FLP_OPS * stalls/cycles
FLP Inefficiency – PI (FLP_OPS/retired_inst) * stall/cycle
Performance Model II
Core logic Stalls = L1D_register_stalls + branch_misprediction + instruction_miss + stack_engine_stalls + floating_point_stalls + pipeline_inter_register_dependency + processor_frontend_flush
Memory Stalls = L1_hits * L1_latency + L2_hits * L2_latency + L3_hits * L3_latency + local_mem_access * local_mem_latency + remote_mem_access * remote_mem_latency + TLB_miss * TLB_miss_penalty
CBHPC, Karlsruhe, Germany, October 17, 2008
11
Power Model
CBHPC, Karlsruhe, Germany, October 17, 2008
12
Based on on-die componentsLeverages performance hardware counters
Performance Measurement and Analysis System
Components― TAU: Performance measurement
http://www.cs.uoregon.edu/research/tau/home.php
― Performance Database Component(s)― PerfExplorer: Performance and power analysis
http://www.cs.uoregon.edu/research/tau/docs/perfexplorer/
CBHPC, Karlsruhe, Germany, October 17, 2008
14
PerfExplorer Component
PerfExplorer Component
TAU Component
TAU Component
Component App
Component App
Database ComponentsDatabase
Components
Runtime Optimization
Runtime Optimization
Compiler feedbackCompiler feedback
User/tool analysisUser/tool analysis
PerfExplorer Component
Loads a python analysis scriptPerformance and power analysisData mining, inference rules, comparing
different experimental runs
CBHPC, Karlsruhe, Germany, October 17, 2008
15
Study I: Performance-Power Trade-offs
CBHPC, Karlsruhe, Germany, October 17, 2008
16
Experiment – Effect of compiler optimization levels on performance and power
Experimental Details― Machine: SGI Altix 300― MPI Processes: 16― Compiler: OpenUH― Code: GenIDLEST― Optimization levels: -O0, -O1, -O2, -O3― Performance tools: TAU, PerfExplorer, and PAPI
Results
CBHPC, Karlsruhe, Germany, October 17, 2008
18
Aggressive optimizations Higher power IPC ~ Power dissipation
Aggressive optimizations Lower energy Operation count ~ energy consumption
Performance/Power Study With PETSc Codes
PETSc: Portable Extensible Toolkit for Scientific Computation ― http://www.mcs.anl.gov/petsc/
Experimental Details― Machine: SGI Altix 3600― Compiler: GCC― MPI Processes: 32― Application: 2-D simulation of cavity flow
Krylov subspace linear solvers: FGMRES, GMRES, BiCGS Preconditioner: Block Jacobi Problem Size: 16x16 each processor (weak scaling)
― Performance tools: TAU, PerfExplorer, PAPI
CBHPC, Karlsruhe, Germany, October 17, 2008
19
Inefficiency
CBHPC, Karlsruhe, Germany, October 17, 2008
20
― Bottlenecks in methods used in solution of linear system
― Bottleneck also in preconditioner
Results
FGMRES has good performance initially― Not very power efficient
BCGS is optimal for performance and power efficiency
CBHPC, Karlsruhe, Germany, October 17, 2008
21
Conclusions
Little or no hardware and software support for detailed power measurement and analysis on modern systems
Need for more integrated toolsets supporting both performance and power measurements, analysis, and optimizations
Combining tools with component based software engineering can benefit efficiency and effectiveness of tuning process
CBHPC, Karlsruhe, Germany, October 17, 2008
22
Future Directions
Integration of components into a framework Dynamic selection of algorithms and
parameters based on offline/online analyses Compiler based performance power cost
modeling Continue performance and power analysis of
PETSc based codes Extension of performance and power model for
more modern architectures
CBHPC, Karlsruhe, Germany, October 17, 2008
23
References
Jarp, S. A methodology for using the itanium-2 performance counters for bottleneck analysis. Tech.rep., HP Labs, August 2002.
Bircher, W.L.; John, L.K. Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events. International Symposium on Performance Analysis of Systems & Software, Page(s):158 - 168, 2007.
Isci, C. and Martonosi, M. 2003. Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data. In Proceedings of the 36th Annual IEEE/ACM international Symposium on Microarchitecture (December 03 - 05, 2003).
K. Huck, O. Hernandez, V. Bui, S. Chandrasekaran, B. Chapman, A. D. Malony, L.C. McInnes, and B. Norris. Capturing Performance Knowledge for Automated Analysis, Supercomputing, 2008 . http://www2.cs.uh.edu/~vtbui/sc.pdf
24CBHPC, Karlsruhe, Germany, October 17, 2008
Acknowledgments
Professors/Advisors: Boyana Norris, Lois Curfman McInnes, Barbara Chapman, Allen Maloney, Danesh Tafti
Students: Oscar Hernandez, Kevin Huck, Sunita Chandrasekaran, Li Li
SiCortex: Lawrence Stuart and Dan Jackson MCS Division, Argonne National LaboratoryNSF, DOE, NCSA, NASA
CBHPC, Karlsruhe, Germany, October 17, 2008
25