Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.
-
Upload
peyton-foard -
Category
Documents
-
view
238 -
download
0
Transcript of Intel Software College Tuning Threading Code with Intel® Thread Profiler for Explicit Threads.
Intel Software College
Tuning Threading Code with Intel® Thread Profiler
for Explicit Threads
2
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Objectives
After successful completion of this module you will be able to…
• Use Thread Profiler to recognize and fix common performance problems in applications using Windows* threads
3
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Agenda
Look at Intel® Thread Profiler features
Define Critical Path Analysis
Examine Thread Profiler data views available
Review common performance issues of multithreaded applications
• Focus on Load imbalance
• Focus on Synchronization contention
Describe general optimizations to gain better performance
4
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Motivation
Developing efficient multithreaded applications is hard
New performance problems are caused by the interaction between concurrent threads
• Load imbalance
• Contention on synchronization objects
• Threading overhead
5
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Thread Profiler
Plugs in to the VTune™ performance environment
• Instrumentation-based data collector in VTune
Identifies performance issues in OpenMP* or threaded applications using the Win32* API, POSIX* threads, and Intel® Threading Building Blocks
Pinpoints performance bottlenecks that directly affect execution time
6
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Thread Profiler Features
Supports several different compilers
• Intel® C++ and Fortran Compilers, v7 and higher
• Microsoft* Visual* C++ .NET* 2002, 2003 & 2005 Editions • Integrated into Microsoft Visual Studio .NET* IDE
Binary instrumentation of applications
Different views and filters available to assist and organize analysis
Uses critical path analysis
7
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
The critical pathcritical path is the longest is the longest execution flowexecution flow
What is the Critical Path?
Threaded applications contain multiple execution flows• A new flow is created when a thread is created or resumes
• Flow ends when a thread terminates or blocks on a synchronization primitive
Thread 1
Thread 2
Thread 3
T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15
Acquire L
Threads 2 & 3 Done
Acquire L
Wait for Threads 2 & 3
Release L
Acquire lock L
Wait for L
Release L Wait for L
Thread 2 terminates
Thread 3 terminates
Thread 1 terminates
8
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Critical Path Analysis
System Utilization
• Relative to the system executing the application Idle: no threadsSerial: a single threadUnder Utilized: more than one thread, less than coresFully Utilized: # threads == # coresOver Utilized: # threads > # cores
Thread interaction categories Cruise: threads running without interferenceOverhead: thread operation overheadBlocking: thread waiting on external eventImpact: thread preventing some other thread from executing
If the If the critical pathcritical path is shortened, the application will run is shortened, the application will run in less timein less time
9
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread 1
Thread 2
Thread 3
T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15
Acquire lock L
Wait for Threads 2 & 3
Wait for L
Release L Wait for L
Release L
Acquire L
Acquire L
Threads 2 & 3Done
System Utilization
Examines processor utilization to determine concurrency level of the application
Concurrency is the number of active threads
Categorization shown for a system configuration with 2 processors
Idle Serial Fully UtilizedUnder Utilized Over Utilized
Concurrency Level0
15
5
10
Tim
e
10
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Execution Time Categories
Analyze thread interaction and behavior along critical path
Record objects that cause CP transitions
Cruise time Overhead Blocking time Impact time
Categorization shown for a system configuration with 2 processors
Thread Interaction0
15
5
10
Tim
e
Thread 1
Thread 2
Thread 3
T0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15
Acquire lock L
Wait for Threads 2 & 3
Wait for L
Release L Wait for L
Release L
Acquire L
Acquire L
Threads 2 & 3 Done
11
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Merging Concurrency and Behavior
Concurrency Level Critical Path Thread Behavior
0
15
5
10
Tim
e
Start with system utilization
Further categorize by behavior
12
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler Views
Critical Path View
• Shows breakdown of the critical path
Profile View
• Shows the breakdown of selected critical paths
• User can select other views of the selected profile
• Concurrency level, threads, objects
Timeline View
• Shows thread activity and critical path transitions for the entire application
Source View
• Transition source view, creation source view
13
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 1a
Threaded version of potential code
• Is there a performance issue?
Goal
• Run application through Thread Profiler
• Examine thread activities by reviewing different views
14
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Thread Profiler Profile View
Profile Pane
Timeline Pane
15
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Profile Pane – Concurrency Level View
Concurrency Level View
Two threads ran in parallel ~33% of the time
Ran single threaded ~65% of the time
Let’s look at the Thread View
16
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Profile Pane – Thread View
Time on the Critical Path
Active time of the thread
Lifetime of the thread
Let’s look at the Object View
17
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Profile Pane – Object View
This object caused all of the impact
Let’s look at Timeline View
18
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Timeline Pane
19
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Source View
20
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 1b
Threaded version of potential code
• Is there a performance issue?
Goal
• Examine thread activities by reviewing different views
• Determine system utilization
• Identify any performance issues
21
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Review Activity 1
Concurrency Level view can be used to determine system utilization by the application
Timeline view enables you to understand the thread activity in your application
Instrumentation time will be included in first run results; Instrumentation time will be included in first run results; thus, for applications running in a short amount of time, a thus, for applications running in a short amount of time, a
second run may produce more realistic timings.second run may produce more realistic timings.
22
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Common Performance Issues
Load balance
• Improper distribution of parallel work
Synchronization
• Excessive use of global data, contention for the same synchronization object
Parallel Overhead
• Due to thread creation, scheduling..
Granularity
• Not sufficient parallel work
23
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Load Imbalance
Unequal work loads lead to idle threads and wasted time
Busy
Idle
Time
Thread 0
Thread 1
Thread 2
Thread 3
Start threads
Join threads
24
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Redistribute Work to Threads
Static assignment
• Are the same number of tasks assigned to each thread?
• Do tasks take different processing time?• Do tasks change in a predictable pattern?
• Rearrange (static) order of assignment to threads• Use dynamic assignment of tasks
25
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Redistribute Work to Threads
Dynamic assignment
• Is there one big task being assigned?• Break up large task to smaller parts
• Are small computations agglomerated into larger task?• Adjust number of computations in a task• More small computations into single task?• Fewer small computations into single task?• Bin packing heuristics
26
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Unbalanced Workloads
Threads are unbalanced
Active Times not equal
27
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 2 – Load Imbalance
Threaded version of potential code with thread pools
• Has a load balance performance issue
28
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Review Activity 2
Threads view can be used to determine activity levels of each thread within the application
Timeline view enables you to understand the thread activity in your application
29
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Synchronization
By definition, synchronization serializes execution
Lock contention means more idle time for threads
Busy Idle In Critical
Thread 0
Thread 1
Thread 2
Thread 3
Time
30
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Synchronization Fixes
Eliminate synchronization
• Expensive but necessary “evil”
• Use storage local to threads• Use local variable for partial results, update global after local computations• Allocate space on thread stack (alloca)• Use thread-local storage API (TlsAlloc)
• Use atomic updates whenever possible• Some global data updates can use atomic operations (Interlocked API
family)
31
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Atomic Updates
Use Win32 Interlocked* intrinsics in place of synchronization object
static long counter;
// FastInterlockedIncrement (&counter);
// SlowerEnterCriticalSection (&cs); counter++;LeaveCriticalSection (&cs);
32
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Synchronization Fixes
Reduce size of critical regions protected by synchronization object
• Larger critical regions tie up sync objects longer; other threads sit idle longer waiting to acquire objects
• Only accesses to shared variables need to be protected
33
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Synchronization Fixes
Use best synchronization object for job
• Critical Section• Local object• Available to threads within the same process• Lower overhead (~8X faster than mutex)
• Mutex• Kernel object• Accessible to threads within different processes• Deadlock safety (can only be released by owner)
Other objects are available
34
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Object Contention
This object caused all of the impact
What is all this?These four threads…
…are impacting threads by this
object
35
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Activity 3
Threaded version of numerical integration
• Has serious performance issues
Goal
• Understand thread activity
• Use the Thread Profiler groupings
• Examine synchronization and its effect on performance
• Fix performance issue
36
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Review Activity 3
Grouping objects and threads provides the information on which objects impact what threads
Apply the heuristics from labs for locating bottlenecks in the source code
For longer running applications, the difference in first and second run-times is negligible
37
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
General Optimizations
Serial Optimizations
• Serial optimizations along the critical path should affect execution time
Parallel Optimizations
• Reduce synchronization object contention
• Balance workload
• Functional parallelism
Analyze benefit of increasing number of processors
Analyze the effect of increasing the number of threads on scaling performance
38
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.
Intel® Thread Profiler for Explicit ThreadsWhat’s Been Covered
Identifying performance issues can be time consuming without tools
Tools are required to understand and to optimize parallel efficiency and hardware utilization
Thread Profiler helps you understand your applications thread activity, system utilization, and scaling performance
39
Copyright © 2006, Intel Corporation. All rights reserved.
Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.