Cars, Boats and Planes: Optimizing Sonic & All-Stars ... · Sonic & All-Stars Racing Transformed is...

11
1 Cars, Boats and Planes: Optimizing Sonic & All-Stars Racing Transformed for UltrabookPCs with Touch and Sensors Abstract Sonic & All-Stars Racing Transformed is a game from Sumo Digital* published by Sega* on the PC and several other gaming platforms including Xbox* 360, PS3*, PS Vita* and Wii* U. Optimizing the PC version of the game proved a sizable task for Sumo Digital that yielded additional benefits for other target platforms. Intel® engineers worked with Sumo Digital to ensure the PC version runs on par with the other platforms and is optimized to take full advantage of PC technology. This case study describes the techniques used to identify and overcome some of the obstacles encountered in adding sensor control, touch support, and Intel® 3 rd Generation Core™ optimization. GPUView and Intel® Graphics Performance Analyzers (Intel® GPA) were used to identify GPU stall periods and track down the causes. Eliminating these nearly doubled the average frame rate from 13FPS to over 25FPS on our test PC. Figure 1. Screenshot of initial gameplay displaying frame rate of 13 FPS in upper-left corner Overview Sonic & All-Stars Racing Transformed is a fast-paced cross-platform multiplayer racing game. This case study illustrates how the issues in performance and creating a balanced control experience were identified, addressed, and resolved. Similar approaches can be useful in your own game development. To maximize the reach of this game, Sumo Digital needed to port it to PC and touch-enabled and sensor- enabled devices including tablets and Ultrabooks. Some touch and tilt support was available from Formatted: Font: Not Bold, Italic

Transcript of Cars, Boats and Planes: Optimizing Sonic & All-Stars ... · Sonic & All-Stars Racing Transformed is...

1

Cars, Boats and Planes: Optimizing Sonic & All-Stars Racing

Transformed for Ultrabook™ PCs with Touch and Sensors

Abstract

Sonic & All-Stars Racing Transformed is a game from Sumo Digital* published by Sega* on the PC and

several other gaming platforms including Xbox* 360, PS3*, PS Vita* and Wii* U. Optimizing the PC

version of the game proved a sizable task for Sumo Digital that yielded additional benefits for other

target platforms. Intel® engineers worked with Sumo Digital to ensure the PC version runs on par with

the other platforms and is optimized to take full advantage of PC technology. This case study describes

the techniques used to identify and overcome some of the obstacles encountered in adding sensor

control, touch support, and Intel® 3rd Generation Core™ optimization. GPUView and Intel® Graphics

Performance Analyzers (Intel® GPA) were used to identify GPU stall periods and track down the causes.

Eliminating these nearly doubled the average frame rate from 13FPS to over 25FPS on our test PC.

Figure 1. Screenshot of initial gameplay displaying frame rate of 13 FPS in upper-left corner

Overview Sonic & All-Stars Racing Transformed is a fast-paced cross-platform multiplayer racing game. This case

study illustrates how the issues in performance and creating a balanced control experience were

identified, addressed, and resolved. Similar approaches can be useful in your own game development.

To maximize the reach of this game, Sumo Digital needed to port it to PC and touch-enabled and sensor-

enabled devices including tablets and Ultrabooks. Some touch and tilt support was available from

Formatted: Font: Not Bold, Italic

2

another game, but time constraints had limited previous development. This project allowed us to build

on that foundation.

In developing a PC title, it’s most efficient to support the widest range of platforms and environments

possible. Ideally, the game should run on all versions of the Windows* OS that are still used by a

considerable number of users - everything from XP* up to and including Windows 8. This was an

important factor in the choice of touch API, as the Windows 7 version was more widely supported than

that of Windows 8. Since the Windows 7 API is ‘lower level’ with access to the raw touch data, this

allowed existing touch functionality from the console versions to be repurposed.

The goal of setting a high bar for a PC product targeting processor graphics gave us a new opportunity: to

go beyond simply tweaking it for generic PC gaming and instead, fully optimize it for Intel 3rd Generation

Core. The main tools we used for performance analysis were GPUView, a tool to monitor GPU and CPU

activity and Intel Graphic Performance Analyzers (Intel GPA). These utilities proved invaluable in

identifying and addressing GPU stalls which were causing drops in frame rate.

Touch Controls None of the console versions were purely controlled by touch, so the design starting point was a UI that

treated touch as an ‘additive’ control system. The need for a touch-only front end lead to the addition of

back buttons, large touch zones, and the rework of many screens to enlarge buttons for touch. A more

advanced in-game control settings screen was added, as seen in Figure 2.

Figure 2. Custom controls for touch

With the removal of a requirement to use a gamepad or keyboard, racing controls are primarily driven by

a virtual joystick or tilting the device, augmented by buttons displayed on the touch screen as seen in

Figures 2 and 3.

Formatted: Font: Not Bold, Italic

3

Figure 3. In-game controls showing virtual joystick and primary action buttons

To implement touch in a Windows 8 Desktop app with backwards compatibility, there are two event

models to choose from: WM_GESTURE and WM_TOUCH. A detailed article on Windows 8 touch input is

available in References.

WM_GESTURE has many gestures already defined, but it is more appropriate for navigation and

manipulation than real-time game control for multiple reasons. It uses a time delay to determine

whether a touch is a single press or the start of a gesture, such as a pan gesture, and it doesn’t allow for

multiple simultaneous gestures to be tracked. Sumo Digital had designed the touch interface to use both

hands for independent controls and wanted to repurpose as much of their existing console code as

possible.

The more suitable method was WM_TOUCH, which registers the raw touch events themselves. This

allows not only finer control but more robust options as multiple individual fingers can be tracked,

limited only by the touch screen hardware itself. This tradeoff was an exchange of gaining more control

at the cost of a more complex implementation effort.

Due to a wide range of devices (and hands!), Sumo Digital opted to go with dynamically repositionable

controls, tied to the dominant steering hand’s touch contact point. Using dynamically repositionable

controls meant the key controls were always in a place suitable for both the hand size and the posture of

the person holding the device and could adapt if the player changed grip on the device while playing.

Formatted: Font: Not Bold, Italic

4

Using the older Win7 touch API meant limited touch points, but Sumo Digital had already intended to

keep the controls simple and intuitive. The number of buttons was reduced by implementing auto-

accelerate and using drift to act as a brake when not turning. Clever clustering of key controls allows one

touch zone to be used to detect multiple button presses. Sumo Digital also added simple gesture

support; the player can begin a swipe gesture on the stunt button to control in game stunts, which

removed the need for a second analogue joystick.

Sensor Integration To make use of the Ultrabook PC’s additional input methods, control was expanded beyond simple touch

to include inclinometers for steering the vehicles. These sensors measure the tilt of the device on all 3

axes. Since the vehicles in the game transform among land, sea, and air modes, this tilt control is ideal

to seamlessly transform 2-dimensional racing into 3 dimensions when the players take to the air.

Much of the underlying sensor code was created by Intel. The sensor library directly polls the sensor

once per frame providing a very fast response time when detecting changes in the device orientation.

Details on the sensor library can be found in the "Blackfoot Blade" case study listed in References with

many of the lessons learned in that title benefiting Sonic & All-Stars Racing Transformed. An article with

sample code that uses the library is also listed in References.

With the technical challenges of adding sensor support mostly solved using the Intel libraries, the actual

game play issues were expected to be the next area in need of attention. Unfortunately, sensor control

raised some interesting problems. First, many users would hold the same device in different ways; some

held the device like a wheel, some like a tray. They’d also steer by turning the device in different axes

too. The problem wasn’t too hard to solve for the car and boat as the game only has to worry about

steering in one axis, but planes were another story. This required dynamic recalibration of the default

sensor position, both whilst playing but also at key points, for example when the vehicle transforms or

when the game is paused. Dynamic recalibration also handled ‘fringe’ cases like when the device is

passed to a friend so they can have a go, or the player lies back in bed, or pauses the game, puts the

device down, then comes back and holds the device in a different way.

A second problem occurred when comparing the responsiveness of touch and sensor controls to the

gamepad experience. With touch and sensor input, minor user errors lead to exaggerated negative

results. This was remedied by adding a steering assistance mechanic. This is an ‘additive’ system that

purely adds a varying amount of input to what the player is inputting, but in a way that doesn’t ‘fight’

the player or play the game for them.

GPU Optimization Once the game’s component systems were largely complete, GPUView was used to check the GPU

performance. We noticed that there were significant gaps in the GPU hardware queue, where the

graphics are processed and rendered, as depicted in Figure 4. Ideally, the GPU would be running all the

time unless deliberately limited to conserve power, constantly queuing new frames while the current

frame is being rendered to maximize the frame rate.

5

Figure 4.) GPUView shows GPU stalls as gaps among the top green bars Formatted: Font: Not Bold, Italic

6

The bars on the top represent GPU activity, with recurring patterns indicating frames. The GPU should be

constantly active when running, but here we see gaps of 5-6 milliseconds per frame. It may seem like a

small delay, but this constitutes about 20% of the total frame drawing time and makes a significant

impact to the game’s frame rate. These gaps coincided with stop start behavior on the CPU. Thus the

CPU and GPU were virtually serialized, causing the delays. Using GPUView to investigate the DirectX

events around the stall points, it was found that Lock Allocation events were happening that were

configured for the CPU to wait for the GPU to complete its work. See Figure 5 for the event details that

line up with the red line in Figure 4.

Figure 5. GPUView metric for the Lock Allocation event most likely to be the cause of the stall. Formatted: Font: Not Bold, Italic

7

GPUView also allows the developer to show details of the memory that was being locked at this point.

This is shown below in Figure 6. Note the allocation handle is the same at 0xFFFFFFA800B784330. The

important thing to notice is the lock is on a resource that is a D3DDDIFMT_R32F texture format and is

1x1 pixels in size.

Figure 6.) GPUView metric for the memory being locked.

This was enough information to investigate the likely cause of the lock in GPA. In GPA we could view all

the Render Targets and Textures to find anything that was 1x1 and a 32bit Float, we could also look at

the API log to find problematic “LockRect” calls that caused the CPU to wait on the GPU. The Lock calls

are shown below in Figure 7.

Figure 7. GPA API log call showing all LockRect calls

Formatted: Font: Not Bold, Italic

Formatted: Font: Not Bold, Italic

8

The problem was traced to the CPU polling the GPU for data every frame. In this case, the CPU was

waiting until the GPU had rendered data into a 1x1 texture that was being used to calculate the average

luminosity of the screen for a technique called tone mapping. The GPU would then sit idle while the

CPU calculated the data needed to be used in the Tone Mapping Post processing effect, and then built up

enough information to create a new DMA packet of information to send to the GPU and restart the

hardware queue. Ideally, the GPU should always have data prepared so that the CPU does not have to

wait to retrieve the data.

This problem was fixed by ensuring the CPU worked on data from a previous frame. The GPU resource

was first copied to a CPU readable resource using the DirectX function StretchRect. Two frames later, this

resource was locked, ensuring the GPU had completed the work before the CPU requested it. The CPU

lockable rendering surface would be selected from several spare surfaces in a “round robin” manner,

ensuring that the CPU was never asking for data that the GPU had not yet calculated.

Figure 8. Optimized code metrics show smoother performance Formatted: Font: Not Bold, Italic

9

As shown in Figure 8, the result is a much smoother frame workload from having removed the gaps in

both the GPU and CPU processing.

The optimization was further enhanced when Sumo Digital streamlined the post processing by

combining techniques. The original shadow and lighting calculation system generated and used a stencil

buffer for a three-pass system. A new platform-specific version of the code was created using a different

set of shader and zbuffer commands that streamlined the processing to a two-pass system without any

visual compromise.

In addition, GPA hardware metrics showed the pixel shaders to be bandwidth-limited in the texture

samplers. This code was reworked to allow some of the less complex shaders to pre-calculate the values

and store these into unused alpha channels. This allowed use of fewer textures in the post-process

shaders, giving a better ratio of math instructions to texture fetch instructions (which introduce latency).

The combination of the improved post processing, new shadow and lighting system, the elimination of

the GPU stall together with many other smaller optimizations resulted in the frame seen in Figure 9. Not

only is the frame rate more than doubled, the visual quality has also been improved with higher quality

lighting and with additional post processing effects including Ambient Occlusion.

Figure 9. Screenshot of completed gameplay, with frame rate of 29FPS denoted in the upper-left corner

Formatted: Font: Not Bold, Italic

10

Conclusion This case study demonstrates some solutions for typical obstacles in creating and optimizing touch-

based games. The work done on the PC version allowed Sumo Digital to back port many of the control

improvements to other versions of the game. The PC with its larger heavier devices when compared to

phones and devices such as the PS Vita raised control issues that weren’t previously noted. Solving these

problems benefitted all devices. The self-calibration of the inclinometer happened in time to ship with

the PS Vita version and made a big difference to the control. Making the right decisions in

implementation of sensors and touch can solve many problems in performance and user experience.

Tools such as Intel GPA are vital to find and capitalize on opportunities for optimization, preventing

unnecessary delays and taking full advantage of the hardware.

About the Authors Brad Hill is a Software Engineer at Intel in the Developer Relations Division. Brad investigates new

technologies on Intel hardware and shares the best methods with software developers via the Intel

Developer Zone and at developer conferences. He is currently pursuing a Master of Science degree in

Computer Science at Arizona State University.

Leigh Davies is a senior application engineer at Intel with over 15 years of programming experience in the PC gaming industry. He is a member of the European Visual Computing Software Enabling Team providing technical support to game developers, areas of expertise include 3D graphics and recently touch and sensors.

References

Comparing Touch Coding Techniques - Windows 8 Desktop Touch Sample: http://software.intel.com/en-

us/articles/comparing-touch-coding-techniques-windows-8-desktop-touch-sample.

Implementing Touch and Sensors for Windows* 8 Desktop Games: Confetti Interactive’s* experiences

developing "Blackfoot Blade": http://software.intel.com/en-us/articles/implementing-touch-and-

sensors-for-windows-8-desktop-games-confetti-interactive-s.

Accessing Microsoft Windows* 8 Desktop Sensors: http://software.intel.com/en-us/articles/accessing-

microsoft-windows-8-desktop-sensors

Test PC Specifications Ultrabook, Intel CoreTM i7-3667U CPU @ 2.00Ghz with HD4000 Graphics, 4GB Memory. Windows 8 Pro

64-Bit OS. 5 point Touch support.

11

Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE,

EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS

GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR

SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR

IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR

WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR

INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly,

in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION

CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES,

SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH,

HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS'

FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY,

OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL

OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL

PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not

rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves

these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from

future changes to them. The information here is subject to change without notice. Do not finalize a design with this

information.

The products described in this document may contain design defects or errors known as errata which may cause the

product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your

product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may

be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

Software and workloads used in performance tests may have been optimized for performance only on Intel

microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer

systems, components, software, operations and functions. Any change to any of those factors may cause the results

to vary. You should consult other information and performance tests to assist you in fully evaluating your

contemplated purchases, including the performance of that product when combined with other products.

Ultrabook™ products are offered in multiple models. Some models may not be available in your market. Consult your Ultrabook™ manufacturer. For more information and details, visit http://www.intel.com/ultrabook

*Other names and brands may be claimed as the property of others.

Copyright© 2013 Intel Corporation. All rights reserved.

Optimization Notice

http://software.intel.com/en-us/articles/optimization-notice/

Performance Notice For more complete information about performance and benchmark results, visit www.intel.com/benchmarks