Game Threading Analysis & Methodology · Performance Profile AI Decomp. with 2 Threads Locks &...
Transcript of Game Threading Analysis & Methodology · Performance Profile AI Decomp. with 2 Threads Locks &...
Game Threading Analysis & Methodology
Session:
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
2
Objectives
At the end of the module you will be able to: • Describe two strategies to parallelize a game using two different
threading implementations • Evaluate the effectiveness of each strategy with respect to how each
uses the underlying number of cores we WILL NOT be teaching you how to program using Windows API. We
will not be teaching you how to program with Threading Building blocks. We will not be teaching you how to program DirectX or Direct 3D.
This module is intended primarily to show a higher level method of
attack for games and how to use tools to evaluate the effectiveness of the threading strategy.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
3
Introduction to Intel® Parallel Amplifier
Usual Game Structure Parallelization with Windows* Threads
What is Intel® Threading Building Blocks?
Parallelization with
Intel® Threading Building Blocks
Curriculum Application & Summary
Agenda
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
4
Motivation for a threading Tool
Developing efficient multithreaded applications is hard New performance problems are caused by the interaction between
concurrent threads • Load imbalance • Contention on synchronization objects • Threading overhead
Need a tool to help!
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
5
Intel® Parallel Amplifier
• Component within the Intel® Parallel Studio • Has 3 main charters
• Identify hotpots in an application • Identify level of concurrency in an application • Identify locks & waits
• In this module, we will look specifically at locks & waits
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
6
Amplifier Locks & waits - Synchronization View
Orange: OK – Acceptably utilized CPU cores
Red: Poor – Underutilized CPU cores
Synchronization Object View: This view shows the kinds of synchronization
objects and effects they are having on performance – Threads, Critical
Sections, etc identifies here
Green: Ideal – Fully utilized CPU cores
Blue: Over – Over -utilized CPU cores
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
7
Amplifier Locks & waits – Wait Thread View
Wait Thread View: This shows the threads and functions that are waiting. It also depicts the relative proportion of app spent undersubscribed, fully subscribed or oversubscribed
Summary View: This shows a graph indicating how much time in app is spent fully subscribed to cores
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
8
Concurrency Profile
Measure core utilization so user can see how parallel their program really is
• Relative to the system executing the application Idle: no active threads Under-subscribed: # threads > 1 && # threads < # cores OK: More than a single thread Fully-subscribed*: # threads == # cores Oversubscribed: # threads > # cores
Concurrency level is the number of threads that are active (not waiting, sleeping, blocked, etc.) at a given time
* example reflects 4 core machine
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
9
Introduction to Intel® Parallel Amplifier Usual Game Structure Parallelization with Windows* Threads
What is Intel® Threading Building Blocks?
Parallelization with
Intel® Threading Building Blocks
Curriculum Application & Summary
Agenda
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
10
Get Input Simulate Render
Usual Game Structure
http://softwarecommunity.intel.com/articles/eng/1363.htm
DTC uses the usual game loop
Get Input Physics AI Particles Render
Consists of loop called “Game Loop”
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
11
Lab Activity 1
Build Destroy the Castle • Follow the steps for Lab
Activity 1A & 1B in the student guide to build & run Destroy the Castle
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
12
Limitation of Serial Games for Multi-Core Systems
With clock rates reaching into the multiple GHz range, further increases are becoming harder
Parallel hardware has gone mainstream for
desktop To exploit the performance potential of
multi-core processors, applications must be threaded
Serial games get no benefits from multi-core
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
13
Introduction to Intel® Parallel Amplifier
Usual Game Structure
Parallelization with Windows* Threads
What is Intel® Threading Building Blocks?
Parallelization with Intel® Threading Building Blocks
Curriculum Application & Summary
Agenda
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Parallelization with Windows* Threads
Updating with double buffered data structures Decoupling rendering from frame processing Asynchronous update of parts of the scene Task Decomposition
Render
Physics
Particles
AI
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
ISC has 8 hour course on how to thread DTC with Windows* Threads
14
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Parallel Amplifier shows serial code dominating the execution
DTC Baseline Analysis
Locks & Waits for “Single Thread” run
Serial execution dominates
Terrible Frame Rate
15
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
16
Performance Profile Task Decomposition
Locks & Waits for Multithreaded run
Thread pool for Render plus 3 “simulate” threads
Load balance issue Good frame rate
Some parallelism but … low utilization of 4 cores
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Data Level Parallelism
Nested parallelism • Top level - task decomposition • Next level - data decomposition
update several AI units
update several AI units
update several AI units
update several AI units
Render
Physics
Particles
AI
17
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
18
Lab Activity 2
Use Parallel Amplifier to Display Win version Baseline Locks & Waits Information
• Follow the steps for Lab Activity 2 in the student guide to analyze the baseline and Task Decomposition profiles you created in Activity 1
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
19
Performance Profile AI Decomp. with 2 Threads
Locks & Waits for Multithreaded Plus 2 AI threads run
Better core utilization Best frame rate Starting to see
oversubscription Load balancing an issue
More parallelism … higher utilization of 4 cores
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
20
Performance Profile AI Decomp. with 4 Threads
Locks & Waits for Multithreaded Plus 4 AI threads run
Oversubscription a problem
Load balancing issue Frame rate drops
Too many threads for number of cores
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
21
One More Problem: Nested Parallelism
Software components are built from smaller components
If each turtle specifies threads...
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
22
Disadvantages of …
Using Windows* and POSIX Threads for Games
• Low-Level details (not intuitive) – Hard to come up with good design – Code often becomes very dependent on a particular OS’s threading
facilities
• Load imbalance – Has to be managed manually
• Oversubscription – Multiple components create threads that compete for CPU resources – Hard to manage nested parallelism
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Hard to achieve scalability
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
23
Introduction to Intel® Parallel Amplifier
Usual Game Structure Introduction to code instrumentation Parallelization with Windows* Threads
What is Intel® Threading Building Blocks?
Parallelization with
Intel® Threading Building Blocks
Curriculum Application & Summary
Agenda
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
24
What is Intel® Threading Building Blocks?
It is Open Source now! • http://www.intel.com/software/products/tbb/ • http://threadingbuildingblocks.org/ • Component in Intel Parallel Composer
Threading Abstraction Library • Relies on generic programming • Provides high-level generic implementation of parallel design
patterns and concurrent data structures You specify task patterns instead of threads
• Library maps your logical tasks onto physical threads, efficiently using cache and balancing load
• Full support for nested parallelism Targets threading for robust performance
• Designed to provide scalable performance for computationally intense portions of shrink-wrapped applications
• Portable across Linux*, Mac OS*, and Windows* Emphasizes scalable data parallel programming
• Solutions based on task decomposition usually do not scale * Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Components of Intel® Threading Building Blocks
•Parallel algorithms •Concurrent containers •Synchronization primitives •Memory allocation •Task scheduler
Problem Intel® TBB Approach
•Low-Level details Operate with task patterns instead of threads
•Load imbalance Work-stealing balances load
•Oversubscription One scheduled thread per hardware thread
25
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
26
Lab Activity 3
Build the TBB version of Destroy the Castle, collect profile data and then analyze the data to compare this parallel strategy to the previous one.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
27
Introduction to Intel® Parallel Amplifier
Usual Game Structure
Introduction to code instrumentation Parallelization with Windows* Threads
What is Intel® Threading Building Blocks?
Parallelization with
Intel® Threading Building Blocks
Curriculum Application & Summary
Agenda
* InteCul and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
28
Parallelization with TBB
Scheme of parallelization with Windows*
Render
Physics
Particles
update several AI units
update several AI units
update several AI units
. . .
update several AI units
update several blocks
update several blocks
update several blocks
update several blocks
update several AI units
update several AI units
update several AI units
update several AI units
update several particles
update several particles
update several particles
update several particles
Render
update several particles
update several blocks
update several AI units
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Scheme of parallelization with Intel® TBB
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
29
Task Graph
MainTask
AITask
AIBodyTask
AIFinalTask
Task creation order Task completion signals
AIBodyTask
AIBodyTask
SyncTask
PhysicsTask
ParticlesTask
Not expanded ...
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
30
Performance Profile – MultiThreaded w AI run
Locks & Waits for Multithreaded TBB 2 extra AI tasks
Good frame rate Good Core utilization
Good utilization of 4 cores
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
31
Limitation for Games
Intel® TBB is not intended for
– I/O bound processing – Hard real-time processing – Excessive usage of explicit synchronization
However, it is compatible with other threading packages – It can be used in concert with Windows* and POSIX threads, etc
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
32
Advantages for Games
Generic Parallel Algorithms – You specify task patterns instead of threads – Cross-Platform implementation
Load balancing – Adaptive tuning to variable computation – Full support for nested parallelism
Efficient use of resources – One scheduled thread per hardware thread – Effective cache reuse
Easy to achieve scalability
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
33
Lab Activity 4
Analyze the TBB version with Parallel Amplifier
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
34
Introduction to Intel® Parallel Amplifier
Usual Game Structure
Introduction to code instrumentation Parallelization with Windows* Threads
What is Intel® Threading Building Blocks?
Parallelization with
Intel® Threading Building Blocks
Curriculum Application & Summary
Agenda
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
35
Summary
Serial games get no benefits from multi-core
Analyzed two different parallelization strategies using core
utilization as a key metric Make you aware of some of the game threading related
materials we offer • including this 2 hour module, in addition to • 8 more hours covering game threading in depth – available
through the Intel Software College Academic community
* Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are the property of their respective owners.
Use this material in your Game or 3D Graphics curriculums now
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
36
Call To Action
Think about Multi-Threading at the beginning of your project Think about scalable performance (for N cores, not just 2 or 4)
for years to come Plan now - how you can use this methodology in your
curriculum
SOFTWARE AND SERVICES Copyright © 2014, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
37
?
38