Ultra Sonic 2001 RYKO Manufacturing Company. Ultra Sonic 2001.
Cars, Boats and Planes: Optimizing Sonic & All-Stars ... · Sonic & All-Stars Racing Transformed is...
Transcript of Cars, Boats and Planes: Optimizing Sonic & All-Stars ... · Sonic & All-Stars Racing Transformed is...
1
Cars, Boats and Planes: Optimizing Sonic & All-Stars Racing
Transformed for Ultrabook™ PCs with Touch and Sensors
Abstract
Sonic & All-Stars Racing Transformed is a game from Sumo Digital* published by Sega* on the PC and
several other gaming platforms including Xbox* 360, PS3*, PS Vita* and Wii* U. Optimizing the PC
version of the game proved a sizable task for Sumo Digital that yielded additional benefits for other
target platforms. Intel® engineers worked with Sumo Digital to ensure the PC version runs on par with
the other platforms and is optimized to take full advantage of PC technology. This case study describes
the techniques used to identify and overcome some of the obstacles encountered in adding sensor
control, touch support, and Intel® 3rd Generation Core™ optimization. GPUView and Intel® Graphics
Performance Analyzers (Intel® GPA) were used to identify GPU stall periods and track down the causes.
Eliminating these nearly doubled the average frame rate from 13FPS to over 25FPS on our test PC.
Figure 1. Screenshot of initial gameplay displaying frame rate of 13 FPS in upper-left corner
Overview Sonic & All-Stars Racing Transformed is a fast-paced cross-platform multiplayer racing game. This case
study illustrates how the issues in performance and creating a balanced control experience were
identified, addressed, and resolved. Similar approaches can be useful in your own game development.
To maximize the reach of this game, Sumo Digital needed to port it to PC and touch-enabled and sensor-
enabled devices including tablets and Ultrabooks. Some touch and tilt support was available from
Formatted: Font: Not Bold, Italic
2
another game, but time constraints had limited previous development. This project allowed us to build
on that foundation.
In developing a PC title, it’s most efficient to support the widest range of platforms and environments
possible. Ideally, the game should run on all versions of the Windows* OS that are still used by a
considerable number of users - everything from XP* up to and including Windows 8. This was an
important factor in the choice of touch API, as the Windows 7 version was more widely supported than
that of Windows 8. Since the Windows 7 API is ‘lower level’ with access to the raw touch data, this
allowed existing touch functionality from the console versions to be repurposed.
The goal of setting a high bar for a PC product targeting processor graphics gave us a new opportunity: to
go beyond simply tweaking it for generic PC gaming and instead, fully optimize it for Intel 3rd Generation
Core. The main tools we used for performance analysis were GPUView, a tool to monitor GPU and CPU
activity and Intel Graphic Performance Analyzers (Intel GPA). These utilities proved invaluable in
identifying and addressing GPU stalls which were causing drops in frame rate.
Touch Controls None of the console versions were purely controlled by touch, so the design starting point was a UI that
treated touch as an ‘additive’ control system. The need for a touch-only front end lead to the addition of
back buttons, large touch zones, and the rework of many screens to enlarge buttons for touch. A more
advanced in-game control settings screen was added, as seen in Figure 2.
Figure 2. Custom controls for touch
With the removal of a requirement to use a gamepad or keyboard, racing controls are primarily driven by
a virtual joystick or tilting the device, augmented by buttons displayed on the touch screen as seen in
Figures 2 and 3.
Formatted: Font: Not Bold, Italic
3
Figure 3. In-game controls showing virtual joystick and primary action buttons
To implement touch in a Windows 8 Desktop app with backwards compatibility, there are two event
models to choose from: WM_GESTURE and WM_TOUCH. A detailed article on Windows 8 touch input is
available in References.
WM_GESTURE has many gestures already defined, but it is more appropriate for navigation and
manipulation than real-time game control for multiple reasons. It uses a time delay to determine
whether a touch is a single press or the start of a gesture, such as a pan gesture, and it doesn’t allow for
multiple simultaneous gestures to be tracked. Sumo Digital had designed the touch interface to use both
hands for independent controls and wanted to repurpose as much of their existing console code as
possible.
The more suitable method was WM_TOUCH, which registers the raw touch events themselves. This
allows not only finer control but more robust options as multiple individual fingers can be tracked,
limited only by the touch screen hardware itself. This tradeoff was an exchange of gaining more control
at the cost of a more complex implementation effort.
Due to a wide range of devices (and hands!), Sumo Digital opted to go with dynamically repositionable
controls, tied to the dominant steering hand’s touch contact point. Using dynamically repositionable
controls meant the key controls were always in a place suitable for both the hand size and the posture of
the person holding the device and could adapt if the player changed grip on the device while playing.
Formatted: Font: Not Bold, Italic
4
Using the older Win7 touch API meant limited touch points, but Sumo Digital had already intended to
keep the controls simple and intuitive. The number of buttons was reduced by implementing auto-
accelerate and using drift to act as a brake when not turning. Clever clustering of key controls allows one
touch zone to be used to detect multiple button presses. Sumo Digital also added simple gesture
support; the player can begin a swipe gesture on the stunt button to control in game stunts, which
removed the need for a second analogue joystick.
Sensor Integration To make use of the Ultrabook PC’s additional input methods, control was expanded beyond simple touch
to include inclinometers for steering the vehicles. These sensors measure the tilt of the device on all 3
axes. Since the vehicles in the game transform among land, sea, and air modes, this tilt control is ideal
to seamlessly transform 2-dimensional racing into 3 dimensions when the players take to the air.
Much of the underlying sensor code was created by Intel. The sensor library directly polls the sensor
once per frame providing a very fast response time when detecting changes in the device orientation.
Details on the sensor library can be found in the "Blackfoot Blade" case study listed in References with
many of the lessons learned in that title benefiting Sonic & All-Stars Racing Transformed. An article with
sample code that uses the library is also listed in References.
With the technical challenges of adding sensor support mostly solved using the Intel libraries, the actual
game play issues were expected to be the next area in need of attention. Unfortunately, sensor control
raised some interesting problems. First, many users would hold the same device in different ways; some
held the device like a wheel, some like a tray. They’d also steer by turning the device in different axes
too. The problem wasn’t too hard to solve for the car and boat as the game only has to worry about
steering in one axis, but planes were another story. This required dynamic recalibration of the default
sensor position, both whilst playing but also at key points, for example when the vehicle transforms or
when the game is paused. Dynamic recalibration also handled ‘fringe’ cases like when the device is
passed to a friend so they can have a go, or the player lies back in bed, or pauses the game, puts the
device down, then comes back and holds the device in a different way.
A second problem occurred when comparing the responsiveness of touch and sensor controls to the
gamepad experience. With touch and sensor input, minor user errors lead to exaggerated negative
results. This was remedied by adding a steering assistance mechanic. This is an ‘additive’ system that
purely adds a varying amount of input to what the player is inputting, but in a way that doesn’t ‘fight’
the player or play the game for them.
GPU Optimization Once the game’s component systems were largely complete, GPUView was used to check the GPU
performance. We noticed that there were significant gaps in the GPU hardware queue, where the
graphics are processed and rendered, as depicted in Figure 4. Ideally, the GPU would be running all the
time unless deliberately limited to conserve power, constantly queuing new frames while the current
frame is being rendered to maximize the frame rate.
5
Figure 4.) GPUView shows GPU stalls as gaps among the top green bars Formatted: Font: Not Bold, Italic
6
The bars on the top represent GPU activity, with recurring patterns indicating frames. The GPU should be
constantly active when running, but here we see gaps of 5-6 milliseconds per frame. It may seem like a
small delay, but this constitutes about 20% of the total frame drawing time and makes a significant
impact to the game’s frame rate. These gaps coincided with stop start behavior on the CPU. Thus the
CPU and GPU were virtually serialized, causing the delays. Using GPUView to investigate the DirectX
events around the stall points, it was found that Lock Allocation events were happening that were
configured for the CPU to wait for the GPU to complete its work. See Figure 5 for the event details that
line up with the red line in Figure 4.
Figure 5. GPUView metric for the Lock Allocation event most likely to be the cause of the stall. Formatted: Font: Not Bold, Italic
7
GPUView also allows the developer to show details of the memory that was being locked at this point.
This is shown below in Figure 6. Note the allocation handle is the same at 0xFFFFFFA800B784330. The
important thing to notice is the lock is on a resource that is a D3DDDIFMT_R32F texture format and is
1x1 pixels in size.
Figure 6.) GPUView metric for the memory being locked.
This was enough information to investigate the likely cause of the lock in GPA. In GPA we could view all
the Render Targets and Textures to find anything that was 1x1 and a 32bit Float, we could also look at
the API log to find problematic “LockRect” calls that caused the CPU to wait on the GPU. The Lock calls
are shown below in Figure 7.
Figure 7. GPA API log call showing all LockRect calls
Formatted: Font: Not Bold, Italic
Formatted: Font: Not Bold, Italic
8
The problem was traced to the CPU polling the GPU for data every frame. In this case, the CPU was
waiting until the GPU had rendered data into a 1x1 texture that was being used to calculate the average
luminosity of the screen for a technique called tone mapping. The GPU would then sit idle while the
CPU calculated the data needed to be used in the Tone Mapping Post processing effect, and then built up
enough information to create a new DMA packet of information to send to the GPU and restart the
hardware queue. Ideally, the GPU should always have data prepared so that the CPU does not have to
wait to retrieve the data.
This problem was fixed by ensuring the CPU worked on data from a previous frame. The GPU resource
was first copied to a CPU readable resource using the DirectX function StretchRect. Two frames later, this
resource was locked, ensuring the GPU had completed the work before the CPU requested it. The CPU
lockable rendering surface would be selected from several spare surfaces in a “round robin” manner,
ensuring that the CPU was never asking for data that the GPU had not yet calculated.
Figure 8. Optimized code metrics show smoother performance Formatted: Font: Not Bold, Italic
9
As shown in Figure 8, the result is a much smoother frame workload from having removed the gaps in
both the GPU and CPU processing.
The optimization was further enhanced when Sumo Digital streamlined the post processing by
combining techniques. The original shadow and lighting calculation system generated and used a stencil
buffer for a three-pass system. A new platform-specific version of the code was created using a different
set of shader and zbuffer commands that streamlined the processing to a two-pass system without any
visual compromise.
In addition, GPA hardware metrics showed the pixel shaders to be bandwidth-limited in the texture
samplers. This code was reworked to allow some of the less complex shaders to pre-calculate the values
and store these into unused alpha channels. This allowed use of fewer textures in the post-process
shaders, giving a better ratio of math instructions to texture fetch instructions (which introduce latency).
The combination of the improved post processing, new shadow and lighting system, the elimination of
the GPU stall together with many other smaller optimizations resulted in the frame seen in Figure 9. Not
only is the frame rate more than doubled, the visual quality has also been improved with higher quality
lighting and with additional post processing effects including Ambient Occlusion.
Figure 9. Screenshot of completed gameplay, with frame rate of 29FPS denoted in the upper-left corner
Formatted: Font: Not Bold, Italic
10
Conclusion This case study demonstrates some solutions for typical obstacles in creating and optimizing touch-
based games. The work done on the PC version allowed Sumo Digital to back port many of the control
improvements to other versions of the game. The PC with its larger heavier devices when compared to
phones and devices such as the PS Vita raised control issues that weren’t previously noted. Solving these
problems benefitted all devices. The self-calibration of the inclinometer happened in time to ship with
the PS Vita version and made a big difference to the control. Making the right decisions in
implementation of sensors and touch can solve many problems in performance and user experience.
Tools such as Intel GPA are vital to find and capitalize on opportunities for optimization, preventing
unnecessary delays and taking full advantage of the hardware.
About the Authors Brad Hill is a Software Engineer at Intel in the Developer Relations Division. Brad investigates new
technologies on Intel hardware and shares the best methods with software developers via the Intel
Developer Zone and at developer conferences. He is currently pursuing a Master of Science degree in
Computer Science at Arizona State University.
Leigh Davies is a senior application engineer at Intel with over 15 years of programming experience in the PC gaming industry. He is a member of the European Visual Computing Software Enabling Team providing technical support to game developers, areas of expertise include 3D graphics and recently touch and sensors.
References
Comparing Touch Coding Techniques - Windows 8 Desktop Touch Sample: http://software.intel.com/en-
us/articles/comparing-touch-coding-techniques-windows-8-desktop-touch-sample.
Implementing Touch and Sensors for Windows* 8 Desktop Games: Confetti Interactive’s* experiences
developing "Blackfoot Blade": http://software.intel.com/en-us/articles/implementing-touch-and-
sensors-for-windows-8-desktop-games-confetti-interactive-s.
Accessing Microsoft Windows* 8 Desktop Sensors: http://software.intel.com/en-us/articles/accessing-
microsoft-windows-8-desktop-sensors
Test PC Specifications Ultrabook, Intel CoreTM i7-3667U CPU @ 2.00Ghz with HD4000 Graphics, 4GB Memory. Windows 8 Pro
64-Bit OS. 5 point Touch support.
11
Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,
EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS
GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR
SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR
IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR
INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly,
in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION
CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES,
SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH,
HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS'
FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY,
OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL
OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL
PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not
rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves
these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from
future changes to them. The information here is subject to change without notice. Do not finalize a design with this
information.
The products described in this document may contain design defects or errors known as errata which may cause the
product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your
product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may
be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results
to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products.
Ultrabook™ products are offered in multiple models. Some models may not be available in your market. Consult your Ultrabook™ manufacturer. For more information and details, visit http://www.intel.com/ultrabook
*Other names and brands may be claimed as the property of others.
Copyright© 2013 Intel Corporation. All rights reserved.
Optimization Notice
http://software.intel.com/en-us/articles/optimization-notice/
Performance Notice For more complete information about performance and benchmark results, visit www.intel.com/benchmarks