Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an...

41
Migrating Mesh Skinning Deformation Functionality from RSX to SPUs on the PlayStation R 3 Anders R˚ anes September 29, 2010 Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Stefan Johansson Supervisors at Coldwood: Andreas Asplund & Olof H¨aggstr¨ om Examiner: Fredrik Georgsson Ume ˚ a University Department of Computing Science SE-901 87 UME ˚ A SWEDEN

Transcript of Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an...

Page 1: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Migrating Mesh SkinningDeformation Functionality from

RSX to SPUs on thePlayStation R⃝3

Anders Ranes

September 29, 2010Master’s Thesis in Computing Science, 30 credits

Supervisor at CS-UmU: Stefan JohanssonSupervisors at Coldwood: Andreas Asplund & Olof Haggstrom

Examiner: Fredrik Georgsson

Umea UniversityDepartment of Computing Science

SE-901 87 UMEASWEDEN

Page 2: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred
Page 3: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Abstract

In game development, performance is everything and the Playstation 3 provides a uniqueplatform for utilizing parallelization of code to achieve extremely high performance. In thismaster’s thesis the issue of animation with smooth skinning is migrated from being a GPUprocess to becoming a parallelized and 358% faster process. This method is incorporated inan existing commercial game engine and integrated in a currently in development title forthe Playstation 3. An in-depth study covers parallel processors, the CELL processor, usedin the Playstation 3, and how contemporary industry leading game developers are utilizingthe same unique architecture to increase their own games’ performance.

Page 4: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

ii

Page 5: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Contents

1 Introduction 1

2 Problem Description 3

2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Creating a frame with skinned objects 7

3.1 Scene graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Vertex data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.4 Culling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.5 Deformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.6 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Parallelized computation on the Playstation 3 11

4.1 Parallel processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 The Playstation 3 processor and memory layout . . . . . . . . . . . . . . . . . 12

4.3 Leading industry Playstation 3 SPU utilization . . . . . . . . . . . . . . . . . 13

4.3.1 Santa Monica Studios . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3.2 Guerilla Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3.3 Dice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 Emergent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4.1 Gamebryo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4.2 Floodgate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.5 Reflections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Accomplishment 19

5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Existing skinning frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

iii

Page 6: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

iv CONTENTS

5.2.1 SPU skinning in Gamebryo 2.4 . . . . . . . . . . . . . . . . . . . . . . 19

5.2.2 Skinning in Gamebryo 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.2.3 Proposed SPU-skinning solution . . . . . . . . . . . . . . . . . . . . . 23

5.3 How the work was done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Results 27

6.1 Performance in a stand alone demo . . . . . . . . . . . . . . . . . . . . . . . . 27

6.2 Performance gain in the actual game . . . . . . . . . . . . . . . . . . . . . . . 27

7 Conclusions 29

7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

References 33

Page 7: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

List of Figures

3.1 Example of a skinned mesh with skeleton, left: bindpose, right: animated. . . 8

4.1 A simplified view of the Playstation 3 processor and memeory layout. . . . . 13

5.1 SPU-Skinning in Gamebryo 2.4. . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.2 Software Skinning in Gamebryo 2.3. . . . . . . . . . . . . . . . . . . . . . . . 21

5.3 Hardware Skinning in Gamebryo 2.3. . . . . . . . . . . . . . . . . . . . . . . . 22

5.4 SPU-skinning solution proposed for Gamebryo 2.3. . . . . . . . . . . . . . . . 25

v

Page 8: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

vi LIST OF FIGURES

Page 9: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Chapter 1

Introduction

Parallelization is becoming more and more common in computing science in general andin game development in particular. Parallelization is an ever present area of work with aconstant need for refinement in game development. One part of game development is an-imation of on-screen objects, which for a long time has been done by skinning. Skinninggets its name from the analogy to the human body where a collection of connected bonesform a skeleton and each point of the skin is connected to one or several bones. When thebones are moved the points of skin are moved along with it. This creates much less workfor animators which have enabled them to animate much more complex models.

In this master’s thesis, I propose a method for smooth skinning and incorporate it as aparallilized process in an existing game engine for the gaming platform Sony PlayStation R⃝3(hereafter referred to as the Playstation 3). The work is done for the game developer Cold-wood Interactive, Umea, for one of their upcoming commercial titles.

In Chapter 2, the problem is described in more detail and the goals required by Coldwoodare presented. In Chapter 3, basic concepts of 3D graphics are explained and the steps tocreate a frame with regard to skinning is presented. In Chapter 4, an in-depth study ofindustry utilization of the parallel nature of the Playstation 3 is presented. Chapter 5describes the existing skinning code in the used game engine Gamebryo and my proposedsolution. The performance results of the solution are presented in Chapter 6, and in Chapter7 conclusions are drawn from the work and limitations of the solution are presented alongwith suggestions for future work.

1

Page 10: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

2 Chapter 1. Introduction

Page 11: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Chapter 2

Problem Description

Games generally use as much power as possible to show as much and as beautiful graphics aspossible on each frame, and the framerate must be high, 30 – 60 fps1 is standard. For eachframe there exists very little time in which many tasks have to be handled, such as input,animation, game logic, AI, rendering, etc. In recent years parallel processor architectureshave become more and more common for both PC and consoles. One current research topicin computing science is to determine which algorithm and processes that can be parallelizedand which that cannot. Rendering have for a long time been processed in parallel because ofits natural ability to be split into non-interfering parts. Specialized graphics hardware havebeen a standard in computer entertainment for a long time, but the rest of the game loopis just now starting to catch up. Most major gaming companies are making most of theirprestanda progress by utilizing the parallel power of modern processors. The animationpart is of high priority because of its comparable size to the other, non-rendering, events inthe game loop It is, however, not something that can be easily parallelized.

2.1 Problem Statement

Coldwood Interactive AB (hereafter referred to as Coldwood) is an independent game stu-dio, founded 2003 in Umea, Sweden. They develop video games for platforms such as PC,Playstation 2, Playstation 3, PSP and XBOX. The studio utilizes the rendering middlewareGamebryo for some projects, which provides several solutions common to game develop-ment tasks, including a scenegraph handler. This master’s thesis only considers the use ofGamebryo on PC and the Playstation 3.

The Playstation 3 is equipped with one CPU (PPU), one GPU (RSX), and seven gen-eral purpose SIMD processing units (SPUs). When used traditionally the RSX easilybecomes choked and the total frame time suffers. The frame time is bound by MAX-TIME(PPU,RSX), so the key is to balance the workload between all units so that no oneunit exceeds given time limits. Thus, on the Playstation 3 specifically, it would make senseto let the SPUs do GPU work, even though they would perform worse in a direct comparison.

Coldwood has identified smooth skinning as a particularly problematic task. The currentimplementation do one scene traversal to calculate world matrices for every object, but then

1Frames per second.

3

Page 12: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

4 Chapter 2. Problem Description

the RSX is used to do the actual deformation to transform the objects into the correct spacefor both shadowing and normal rendering. Traversing has a tendency to invoke ”pointerchasing” that easily results in L2 cache misses, inflicting PPU penalties, which in turndelays RSX execution. It also affects the RSX negatively directly through extensive use ofvertex shaders, because of the need to perform the skinning twice for objects which are bothshadow casters and visible in the normal rendering. Finding a way to redirect this work tothe SPUs could have a very positive impact on the performance of the RSX.

2.2 Goals

The following goals are required by Coldwood:

– Implement a smooth skinning solution that can be distributed and run on SPUs.

– Profile, track bottlenecks (stalls, inefficient code, cache misses, etc.) and compare thenew implementation to the old.

– The solution should be integrated into the existing code base and will be used in acommercial product on the Playstation 3. The product uses peripherals which requirea steady 60Hz refresh rate (60 fps).

Additional optional goals include:

– Reduce cache miss penalties and use interleaved DMA transfers to further optimizememory usage.

– Propose future development to maximize the use of multiple processors.

– Introduce dual quaternion skinning[7] to eliminate skinning artifacts.

All required goals and all but the last optional goal are reached with the proposed andimplemented solution in this master’s thesis.

2.3 Purposes

The main purpose of the proposed solution in this master’s thesis is to reduce the time theanimation takes each frame so that more PPU and GPU power is free to perform othertasks, such as more advanced graphics or just more animated entities in a scene.

2.4 Methods

The above mentioned goals expressed by Coldwood gave a clear path for the work to follow.First the game engine Gamebryo must be analysed along with the platform and severaltechniques for SPU parallelization. The extent of support for smooth skinning and SPUwork in the engine would dictate which parts could be reused or reshaped, and which wouldhave to be created from scratch. After the design phase is complete the solution will beimplemented and integrated into the existing game.

Page 13: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

2.5. Related Work 5

2.5 Related Work

Coldwood does not currently use the latest version of Gamebryo so more recent Gamebryoversions, which utilize SPU-skinning, are important related work. Other interesting workrelating to this master’s thesis in a more general way is the progress other Playstation 3developers and SCE2 themselves are making in regard to SPU utilization. Companies likeSanta Monica Studios, Dice, Guerilla Games, and many more are all trying to harness thepowerful SPUs on the Playstation 3.

2SONY Computer Entertainment

Page 14: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

6 Chapter 2. Problem Description

Page 15: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Chapter 3

Creating a frame with skinnedobjects

This chapter will introduce the reader to common terminology and techniques in modern 3Dgraphics. Especially, it will focus on how skinned objects are created and processed for eachframe in a computer game. Each frame actually contains much more calculations, such asphysics simulation, game logic, sound effects, input handling etc. Several books have beenwritten about each of those subjects, so to keep this chapter short all subjects not regardinghow skinned objects are processed are omitted completely or mentioned only briefly. Severalsubjects are for the same reason simplified, and are in reality much more complex.

3.1 Scene graph

A very common approach to storing an entire scene of 3D objects is to use a scene graph.The scene graph starts with a root node positioned at the origin of the world, all objectsknow only their position relative to their parent node. This way complex objects can beplaced easily in the scene without placing all object it is composed of explicitly. For example,a car wheel only needs to know where on the car it is positioned, not where the car itself isposition in the world.

3.2 Vertex data

In any application which displays 3D graphics with polygons, each triangle polygon is storedas three points in space. These points are called vertices. Each vertex is stored in an indexedlist. To store the polygons, a second list is created which contains indices of the first listwhere each 3-tuple describes one polygon. Since most vertices are used by several polygons,this approach see to that the vertices are only stored once. This collection of polygons iscalled a mesh.

Depending on how the polygons will be rendered, different kinds of vertex data accom-panies the vertex position. Very common is the normal vector which is needed for basiclighting. It is a vector which is orthogonal to the plane of the polygon and gives the polygonan upside and a downside. For more advanced rendering effects other data is needed, mostcommon are the tangent and binormal vectors. These are orthogonal to each other and to

7

Page 16: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

8 Chapter 3. Creating a frame with skinned objects

the normal vector, which makes them tangent to the plane of the polygon.

For skinned objects the vertex data is stored in what is called bindpose. For humanoidobjects the bindpose is called the T-pose because of the way humans are positions with theirarms straight out from their bodies. The purpose of the bindposes is to store the vertexdata in a neutral pose, which the skinning can manipulate each frame.

3.3 Skeleton

Each skinned mesh has an associated skeleton. This skeleton is composed of a tree of bonesall orignating from a single root bone which is placed in the scene graph. Thus, traversingthe scene graph will traverse each bone in the skeleton. The vertices in the mesh are as-sociated to one or several bones in the skeleton and when the bones move the associatedvertices move according to how the influence of the bones are weighted. See Figure 3.1 foran example of a skeleton illustrated with its associated skinned mesh.

Figure 3.1: Example of a skinned mesh with skeleton, left: bindpose, right: animated.

The animations for the skinned objects are stored in keyframes, each positioning thebones of the skeleton in a specific pose. These keyframes have an associated timing startingat zero for the first keyframe. During an animation, at each frame, the current time iscompared with the keyframe data to find the two keyframes closest to the current time inthe keyframe timeline. These are then interpolated to find a pose for the skeleton which isa blend of the two keyframe’s poses.

Page 17: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

3.4. Culling 9

The bone positions from the interpolated keyframe data are relative to parent bones inthe skeleton tree and must, after the entire skeleton is updated, be recalculated in absolute,or global, terms to be used in the deformation later on.

3.4 Culling

To render only what is shown on the screen, cullers are used to decide whether an objectwill be shown or not. If the object is not shown during the current frame there will be noneed to deform the vertices and render the polygons included in that object’s meshes. Onemajor culler is the view frustum culler which culls all objects not inside the view frustumwhich is the space visible from the camera. Other cullers can cull objects occluded by otherobject and vertices which face away from the camera. There are a myriad of cullers andthey are executed at different times during the frame calculation to cull vertices as soon aspossible to reduce the workload on later systems.

3.5 Deformation

The deformation is the most costly step in the skinning animation, and is appropriatelyplaced after most cullers. The deformation takes all vertex data, the bone matrices andthe weights for all bone influences for each vertex. The vertex knows at what positionit should have relative to one or several bones. Each vertex also has a weight associatedwith each bone influencing it and these weights always sums to one. The weighted mean ofthese relative vertex positions is calculated and stored as skinned vertex data. For examplevertices in the middle of an underarm of a humanoid skinned mesh are influenced only bythe underarm bone in the skeleton. The vertices positioned at the elbow are influenced bothby the underarm and by the overarm, creating a seemless bend in the mesh at the elbow.

3.6 Rendering

Rendering is the process that actually draws the pixels on the screen. It takes the verticeswhich up until now have been in world space and transforms them to screen space. It isalso responsible for applying all effects such as lighting. This step is always performed onthe hardware on the GPU, which is highly parallelized in a hierarchical design for vertexand pixel processing. Traditionally the deformation step is also performed on the GPU asit is composed of matrix and vector operations which the GPU is specialized in.

Page 18: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

10 Chapter 3. Creating a frame with skinned objects

Page 19: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Chapter 4

Parallelized computation on thePlaystation 3

This in-depth study was done to gain understanding of how the Playstation 3 works anddelves into its hardware, mainly the CELL processor. From there it continues to a surveyon how leading Playstation 3 developers and SCE’s own teams are utilizing the uniquearchitecture to create games today.

4.1 Parallel processors

Processors can be divided into four distinct categories in regard to their parallel compu-tation capabilities. These categories declare whether the processor has the capability forsingle or multiple instructions on the same data each cycle and whether it has the capabilityto apply the same instruction on a single piece or multiple pieces of data. This division iscalled Flynn’s Taxonomy and was proposed bi Michael J. Flynn in 1966. The first processorcategory is SISD, Single Instruction Single Data. This is a processor that each cycle appliesa single instruction to a single piece of data. This is often what is thought of when speakinggenerally of processors.

The second processor category is MISD, Multiple Instruction Single Data. Such proces-sors performs multiple instructions to a single piece of data each cycle. This is not a widelyused architecture due to the fact that SIMD and MIMD perform most parallel tasks muchmore efficiently than the MISD architecture. One could claim that all pipelined proces-sors belong to the MISD category. But since each pipeline stage changes the data which thedifferent instructions are being applied to, it is not the same data in the different instructions.

The third processor category is SIMD, Single Instruction Multiple Data. Such proces-sors perform a single instruction on several different pieces of data each cycle. This kind ofprocessor exploits data level parallelism which distributes data to different processing units.This in contrast to task level parallelism which distributes different executing threads to dif-ferent processing units such as in multi-core SISD processors. Modern specialized graphicshardware is an example of SIMD processing where the input data, which is to be rendered,is divided in each rendering step between a larger and larger number of smaller and smallerspecialized processing units.

11

Page 20: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

12 Chapter 4. Parallelized computation on the Playstation 3

The forth processor category is MIMD, Multiple Instruction Multiple Data. Such pro-cessors perform multiple instructions on several different pieces of data each cycle. Almostall modern super computers are clusters of this kind of processor. This category almostoverlaps with having several different processors. Should the processor units share a com-mon memory or should memory be distributed among the different processing units? Mostcommon is a combination of the two, with different levels of the memory hierarchy beingshared differently among the processors.

4.2 The Playstation 3 processor and memory layout

The Playstation 3 features two processors, the CELL[9] and the RSX[9], see Figure 4.1. TheRSX ’Reality Synthesizer’ (RSX) was developed by Nvidia and SONY specifically for thePlaystation 3 and was based on the NV47 Chip from Nvidias GeForce 7800 Achitecture[10].It is the graphical accelerator of the Playstation 3. It will only be briefly discussed as itdoes not feature the unique parallel qualities which this in-depth study covers and has aperipheral part in the thesis itself.

The main CPU on the Playstation 3 is the CELL Broadband Engine (CELL) whichwas co-developed by SONY, Toshiba, and IBM. The development started in 2001 and thePlaystation 3, released in 2006, was the first commercial product featuring it. The CELLhas a unique architecture which incorporates both MIMD and SIMD principles. The mainprocessing unit is the POWER Processing Element or PPE which incorporates the Pow-erPC Processing Unit or PPU. Using the CELL as a standard SISD processor this is theunit that performs all computation. The communication between all elements in the CELLis done through the Element Interconnect Bus (EIB). The EIB is a ring structured bus witha teoretical bandwidth of 204.8GB/s and a demonstrated bandwidth of 197GB/s [11].

The EIB connects the Integrated Memory Controller, (MIC) which handles the mainmemory, to the CELL elements. Also connected to the EIB is the Flexible IO system(FlexIO) which acts both like a southbridge that connects to all peripherals and as a north-bridge by connecting to the RSX. Additionally eight Synergetic Processing Elements (SPE)are also present on the CELL chip, each one containing a single Synergetic Processing Unit(SPU). On the Playstation 3 only seven SPEs are present on the chip to gain a higher man-ufacturing yield. One of the SPEs is only used by the operating system and thus is neveravailable for developers[11].

At this point is seems that the CELL is a MIMD processor with shared memory throughthe EIB but that doesn’t give a complete picture as each SPE features a 256Kb LocalStore (LS) making the memory layout both shared and distributed. The LS should not beconfused with L2 cache as utilizing it in such a manor is highly inefficient. The greatestperformance on a SPU is achieved by transfering a 256Kb chunk of data using DMA andthen letting the SPU work independently on the data. Now the SPU does not have to stallfor data fetching because of L2 cache misses in the LS[11] [1].

The entire CELL architecture resembles a MIMD architecture, but each SPU is in itself aSIMD processor specialized in vector, floating point and integer operations as well as havinga complete instruction set for general purpose computations[11]. It becomes clear that theCELL processor is quite different from almost all other commercial multi-core processors

Page 21: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

4.3. Leading industry Playstation 3 SPU utilization 13

featuring either SIMD or MIMD architecture. This introduces many new parallelizationopportunities as well as many challenges.

CELL Broadband Engine

SPU

LS

SPE

SPU

LS

SPE

SPU

LS

SPE

SPU

LS

SPE

SPU

LS

SPE

SPU

LS

SPE

SPU

LS

SPE

SPU

LS

SPE

EIB

L2 CacheL1 CachePPU

MIC

Memory

PPEFlexIO

RSXPheripherals

Figure 4.1: A simplified view of the Playstation 3 processor and memory layout.

4.3 Leading industry Playstation 3 SPU utilization

There is certainly scientific research being performed on the parallel computational capa-bilities of the CELL processor, and quite often it is performed on clusters of Playstation3’s. Others utilize its processing capabilities by distributing applications to Playstation 3units all around the world in projects like Folding@Home and Rosetta@Home to solve pro-tein folding, problems that require vast amounts of processing power. This master’s thesisin-depth study will be about game development, how market leading developers utilize theparallel capabilities of the Playstation 3. The three developers are Santa Monica Studios,Guerilla Games, and Dice. Game development is a very competitive market, it is safe toassume that not all information will be divulged by the developers to anyone outside thecompany as it is most likely regarded as highly confidential company secrets. They do how-ever still hold talks at game developer conferences, where they in general terms describehow they use the SPUs on the Playstation 3.

Page 22: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

14 Chapter 4. Parallelized computation on the Playstation 3

4.3.1 Santa Monica Studios

Santa Monica Studio is an internal SCE development studio and was started in 1999. AtGame Developers Conference in San Francisco, 2009, Jim Tilander and Vassily Filippov helda presentation regarding their then current game God of War III, which was later releasedin March 2010[4]. This presentation showed how they utilized SPUs in their engine.

They identified the three major stages in each frame as Simulation, Scene and Render.They start out by showing how the frames can be double buffered by pipelining the stagesbetween a multi-core processor and a GPU: rendering on the GPU and simulation and scenestages on separate cores. Since all stages are processed in parallel but for three differentframes the total frame time is only bound by the most time-consuming stage.

On the Playstation 3 they argue the use of the SPUs as helper CPUs to alleviate boththe PPU and RSX. Even though the SPUs might be slower than the processor it alleivatesthe total gain due to the parallelization. The game is profiled and costly operations areidentified as candidates for SPU migration. Having code that is easily moved between thePPU and SPU is essential for development time, so they point out that it is important tokeep memory behaviour on the PPU limited so it can easily be swapped for DMA calls onthe SPU. The RSX runs shader code so it will always need to be rewritten for the SPU.This approach enables on-demand optimization utilizing the SPUs.

4.3.2 Guerilla Games

Guerilla Games was founded in 2000 and since 2005 it is a subsidiary of SCE. It is basedin Amsterdam and is most famous for their Killzone series and are currently developing thethird installment. In February 2009, they released Killzone 2, and in March the same yearthey held a presentation at Game Developers Conference in San Francisco on the renderingtechnology used in the game[8].

The presentation mostly concerns the deferred rendering and graphics buffer layout, butin the end they show how they utilize the SPUs to generate display lists. To generate thedisplay lists they saw the choice between a double buffered approach or a ring bufferedapproach, where the display lists were generated just in time before the RSX needed them.However, both methods present problems. Double buffering requires a lot of memory, andring buffering requires a lot of synchronization between the CPU and GPU to prevent datanot yet consumed to be overwritten. Moreover, on the Playstation 3 the synchronizationis not possible to achieve in an effective way since the SPUs execute asynchronously andpossibly out of order.

They instead opted for a dynamic memory block allocation system for the display listsand the rendering resources and having the RSX signal when a block of memory was free towrite with a simple fence command. So when an SPU starts a new task, it goes over eachblock and checks for a block which is marked free. When it finds one it locks the block andgoes on to perform which ever calculations the task includes. When the task is completedit writes back to the memory block, marks it free and continues to search for another freeblock. This works in perfect unison with the 256Kb LS on the SPEs since each block inmain memory can be made to fit exactly in the LS.

Page 23: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

4.4. Emergent 15

4.3.3 Dice

Dice was founded in 1988 under the name Digital Illusion by four students in a dorm roomat Vaxjo University and later grew to become Swedens biggest game developer. In 2004EA bought the company and it became EA Dice. Their current main franchise is the Bat-tlefield series, which started in 2002 and continues to be a best seller worldwide with itscurrent installment. At SIGGRAPH 2009 in New Orleans, Johan Andersson gave a presen-tation called Parallel Graphics in Frostbite - Current & Future, where he gave a glimpse ofhow their game engine worked, not just on the Playstation 3 but on parallel platforms ingeneral[5].

They have designed the entire engine in terms of async jobs and all cores execute thesejobs whether on the Playstation 3 with its two executing PPU threads and six SPUs, theXBOX 360 with 6 executing threads, or a PC with 2 – 8 executing threads. This view isvery interesting as it compares the SPU setup of the Playstation 3 with other contemporaryhardware setups. Some jobs are dependant on earlier jobs and these dependancies can beused to generate a job graph for the entire engine. This job graph not only describes exe-cution order but also shows sync points and how the workload is balanced at specific timesduring the execution of a frame. This easily shows bottlenecks which can then be balancedout between all job consumers.

4.4 Emergent

Emergent Game Technologies is a middleware developer, that is to say they only build theinfrastructure other games need but no games themselves. This is contrary to most gameengine retailers as they are often part of a gaming company which utilizes and develops theengine for themselves as well as sell it to other game developers. Theirs core product is thegame engine Gamebryo and its toolset, a multi platform engine which has been released indifferent versions since 1999 (between 1999 and 2005 by Numerical Design Limited).

4.4.1 Gamebryo

The Gamebryo game engine [2, 3] contains all components needed to create a game. Mostnotable among the engine components is a scene graph, a renderer for each supported consoleand version of DirectX. It also contains a physics framework, a sound system, and for smallerprojects, a base application which can be used to quickly setup a test environment, demoor small game. The engine is accompanied by serveral tools for creating graphical assetssuch as models, levels and animations. In the model creation tools exporter it is possibleto specify if that particular model should be hardware or software skinned as each renderersupports both methods. Gamebryo also includes a streaming framework called Floodgateto take advantage of platforms which offer multi-core processors.

4.4.2 Floodgate

As one of the platforms supported is the Playstation 3, support for parallelized execution onthe SPUs has been developed for the Playstation 3 specific parts of Floodgate. Unlike forexample Insomniac Games approach to SPU utilization[6], Floodgate does not take direct

Page 24: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

16 Chapter 4. Parallelized computation on the Playstation 3

control of all SPUs at launch, controlling synchronization and issuing itself. Instead Flood-gate utilizes the thread library SPURS1 developed by SCE. This allows other non-GamebryoSPURS tasks to be executed alongside the tasks issued by Floodgate.

The main components of Floodgate are the kernel programs which are compiled and runon the SPUs. These accept streams of data as input and returns an equally long stream ofdata as output. A task describes where to get the data for the input and where to storethe data from the output and also which kernel is to be run on the SPU when the task isexecuted. Tasks can be chained together by setting one task’s output stream as the inputstream of another which Floodgate then automatically schedules to run in the logically cor-rect order.

Tasks are stored in workflows which contain one or more linked tasks. These workflowsare what is submitted to the SPUs by the streamprocessor, which handles the status pollingand eventual return of each workflow. Each workflow is sheduled by the program by settinga specific priority and also by the streamprocessor according to when the workflow is sub-mitted and when its results are needed. Results which are needed earlier in the game loopare issued before those who are needed later and internally in each such synchronizationgroup the priority decides which workflows are issued first.

This explicit synchronization poses a non trivial problem of avoiding stalls on the PPUby keeping it busy while the kernels are processing the streams on the SPUs. If the stream-processor is asked for the result to soon it will stall the PPU while waiting for the kernel tocomplete. Some reordering of the PPU code might be required to avoid stalls by keepingall processors busy at all times.

4.5 Reflections

It is clear that the parallelization of code and full utilization of the SPUs is the single mostimportant aspect of making Playstation 3 games faster and thus affording a deeper graphicalexpression, whether it be realism or non-realistic shading. The companies examined are allclosely tied to SCE and are thus encouraged to divulge their thoughts on SPU utilizationon one premise; if everyone shares their knowledge all Playstation 3 games get better, whichsells more Playstation 3 hardware which in turn gives a bigger client base which enableseven more sales for upcoming titles.

Sharing knowledge is weighed against the notion that gaming is a highly competitivemarket with top tier sales vastly outweighing all other game sales, so being number oneoften means success and everything else results in failure. This notion allows one to ponderhow much they really divulge to each other about their really cutting edge technology, sincethis is a huge part of what sets their games apart from other companies and hopefully ranksthem in the top.

Since most top tier games take several years to develop and remain a secret to the worlduntil trademarks, domain addresses, and such are registered, and the fact that all data foundcover games which have already been released; I am confident in presuming that the datapresented is in fact not their state of the art research, or at most, only conceptually cover

1SPURS stands for ”SPU Runtime System”.

Page 25: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

4.5. Reflections 17

their current utilization of the Playstation 3 hardware.

The same is true even for Emergent Game Technologie’s Gamebryo Engine. The versionColdwood uses is somewhat outdated and it cannot be determined if their latest software,for examples, schedules tasks in a more efficient manner than the version used. That beingsaid, the insight I have gained from Gamebryo is much larger than what I have learnedof other companies’ engines, mainly because I have had access to the full source code,documentation, and samples for Gamebryo. I also had the opportunity to work with peoplewho have used it for a long time in several commercial game titles.

Page 26: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

18 Chapter 4. Parallelized computation on the Playstation 3

Page 27: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Chapter 5

Accomplishment

The work was done as for most general software development. Finding the most time-efficient way to migrate the functionality without compromising the efficiency of the code.

5.1 Preliminaries

One prerequisite was of course to use the already in-use game engine Gamebryo 2.3, whichdoes not include SPU-skinning functionality. The Gamebryo 2.4 engine does, but it hasa radically different approach to handling content with mesh classes which modifiers canbe applied to. So the SPU-kernel code was given, but the framework of using it was not.However, the threading framework Floodgate of Gamebryo 2.4 was deprecated because ofan update to the Playstation 3 firmware, so Floodgate had to be imported from an evennewer Gamebryo version, 2.6, which was released after the firmware update.

5.2 Existing skinning frameworks

The conclusion of the in-depth study (see Chapter 4.5) deemed that the implementationof a new threading framework to utilize the EDGE1 SPU API directly would be to timeconsuming and most likely not give a very noticeable gain in performance. An even lowerlevel API called SPURS2 is also available, which is the direct Playstation 3 API from thePlaystation 3 SDK. But trying to re-invent the wheel would be even more time-consuming.The smartest and definitely the fastest solution was determined to be to use the SPU-kernelcode from Gamebryo 2.4 and Floodgate 2.6 imported into Gamebryo 2.3, and then imple-ment an efficient modifier framework to use the imported functionality on the Gamebryo2.3 asset-handling code.

5.2.1 SPU skinning in Gamebryo 2.4

The first step was to analyse Gamebryo 2.4 by stepping through the executing code to seehow the kernel jobs were issued and how the modifiers worked. The modifier of interest is

1EDGE or Efficiently Distributed renderinG Engine, was designed by SCE to be an example engine forthe Playstation 3, but is now a collection of game engine parts available to licenced Playstation 3 developers.

2SPU Runtime System

19

Page 28: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

20 Chapter 5. Accomplishment

PPU SPU RSX

SceneGraph Mesh SkinningMeshModifier WorkflowManager CalculateBoneMatrices Task Deform Task Renderer

Update

UpdateDownWardPass

SubmitTasks

AddRelatedTask(delayed): Calculate Bone Matrices Task Added to Workflow

FlushTaskGroup: Submit Outstanding Workflows to Floodgate

Tasks Executed Asynchronously

OnVisible: Mesh passed culling

CompleteTasks

Wait for task completion

SubmitTasks

AddRelatedTask(immediate): Deform Task Added to Workflow

Tasks Executed Asynchronously

RenderImmediate: Render the Mesh

CompleteTasks

Wait for task completion

RenderMesh

Figure 5.1: SPU-Skinning in Gamebryo 2.4.

the SkinningMeshModifier which issues SPU jobs that calculate the bone matrices and thenskin the model accordingly, see Figure 5.1.

The first time the SkinningMeshModifier is applied is during the update traversal of thescene graph. For each bone, which is represented by an AVObject (the base object for allscene graph nodes in Gamebryo), the location data is updated by an animation controller.This data is part of the input to the CalculateBoneMatricesKernel which calculates a trans-formation matrix for each bone, but since the data is relative to the parent object in thescene graph the kernel cannot be run until the entire scene graph is traversed. Luckily theglobal workflow manager allows tasks to be issued with a delay flag, which holds tasks in the

Page 29: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

5.2. Existing skinning frameworks 21

workflow until a flush command is issued. This command is issued by the game loop itselfafter the update traversal is done, and causes the workflow manager to send the workflowwith the CalculateBoneMatrices tasks to be executed on the SPUs.

After that the updating of the scene graph is completed, a culler culls each item in thescene graph and if a skinned mesh is deemed visible by the culler, it asks the SkinningMesh-Modifier which asks the workflow manager which in turn asks the kernel if it has completedall the tasks in the workflow. This operation stalls the PPU, so it is important to give thePPU enough work to complete before trying to get the result from the SPUs. But whenthe kernel is complete, the SkinningMeshModifier issues a deformation task to the workflowmanager, this task can be issued to the SPUs instantly so the delayed flag is not set. Thedeformation task deforms each vertex, normal, binormal, and tangent (hereafter referred toas vertex data), according to the particular weight of each bone influencing that particularvertex data.

When it becomes time to render the mesh in the end of the game loop, each mesh waitsfor the completion of its deform task and then sends the deformed vertex data to the rendererwhich displays it on the screen. This is possible since each mesh has both bindpose vertexdata which is sent to the deformation kernel each frame and regular vertex data which thekernel writes back to.

5.2.2 Skinning in Gamebryo 2.3

PPU RSX

SceneGraph Geometry SkinInstance Renderer

Update

UpdateDownWardPass

RenderImmediate: Render the Geometry

Deform

Renderer

CalculateBoneMatrices

RenderGeometry

Figure 5.2: Software Skinning in Gamebryo 2.3.

The SPU-skinning solution in Gamebryo 2.4 utilize a large set of functionality whichdo not exist in Gamebryo 2.3 and the features in the two versions are implemented verydifferently. So the next step was to do a comparison between the SPU-skinning in Gamebryo2.4 with the skinning in Gamebryo 2.3. In Gamebryo the skinning of each geometry can be

Page 30: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

22 Chapter 5. Accomplishment

chosen to be done either by software or hardware skinning, see Figures 5.2 and 5.3.

As in Gamebryo 2.4, the update traversal of the scene graph in Gamebryo 2.3 is per-formed in the beginning of each game loop. When it comes across a skinned geometry ittells the renderer to perform the PPU side operation CalculateBoneMatrices to calculatethe bone matrix for that geometry. A culler is used in Gamebryo 2.3 too, but it is notimportant to the skinning and therefore omitted from the figures.

Later in the game loop, when it is time to render the geometry,either the software orhardware skinning is called. For software skinning each geometry tells it’s skin instance todeform each vertex data according to the positions of the bones that influence that vertexdata. Since the geometry class only contains bindpose data, the result of the deformationis written directly to the RSX buffer from where it is sent to the renderer to be drawnon the screen. The renderer does nothing to the data it recieves and renders it withoutmodification.

PPU RSX

ceneGraph Geometry Renderer

Update

UpdateDownWardPass

RenderImmediate: Render the Geometry

Renderer

CalculateBoneMatrices

RenderSkinnedGeometry

Figure 5.3: Hardware Skinning in Gamebryo 2.3.

The hardware skinning in Gamebryo 2.3 is even simpler on the PPU side, hiding the morecomplex parts on the RSX. The calculation of the bone matrices is performed identically tothe software skinning. However when it becomes time to render it uses a different rendercall to the RSX, which sends the bindpose vertex data of the geometry along with all thebones influencing the vertex data. The deformation is then performed on the RSX. This isa bad design choice for the Playstation 3 even though the RSX has specialized hardware forperforming the vector and matrix operations needed by the deformation operation, becausethe RSX is tasked with all the shader code for a game. When there are SPUs available theyshould be utilized for exactly these kinds of operations, alleviating the PPU and RSX fromvector and matrix heavy operations. This skinning design is, however, the natural choice

Page 31: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

5.2. Existing skinning frameworks 23

for the standard setup with CPU and GPU, e.g. on a PC or XBOX360, where the GPU ismany factors faster than the CPU at dealing with any vector and matrix heavy operations.

5.2.3 Proposed SPU-skinning solution

To integrate SPU-skinning in Gamebryo 2.3, one of the skinning methods would need to bemodified to perform SPU-skinnin where previously the game had used the hardware imple-mentation. Otherwise a third variant would need to be implemented. To introduce as fewpotential errors as possibleit was decided that making few major changes and additions tothe game engine’s code-base was the best approach. Implementing a third rendering statewould require changes throughout the code-base so modifying an existing rendering statewas chosen. Since the software skinning was not being used it was the obvious candidate tomodify. This also gives the possibility of having both implementations, SPU- and hardware-based, running side-by-side in the game and leaving it up to the designers to choose whichmethod to use for each element. Another advantage with modifying the software skinninginto SPU-skinning is that no changes to the RSX code would be needed, since it alreadyonly renders the vertices it is given.

The functionality marked with green in Figure 5.4 must be imported from Gamebryo2.4, the functionality marked with yellow must be heavily modified and the functionalitymarked with red must be implement from scratch. The threading framework Floodgatefrom Gamebryo 2.6 had to be imported into the solution due to compatibility issues withthe Playstation 3 SDK, but the kernels and all the needed specialized classes could beimported from Gamebryo 2.4 since Floodgate showed no noticeable functional differencebetween versions 2.4 and 2.6.

The main thing lacking from Gamebryo 2.3 which was needed for utilizing the importedSPU skinning was a framwork that could feed the data to the kernels each frame. It wouldneed to be attachable to geometries in the same way as SkinningMeshModifiers where at-tachable to meshes in Gamebryo 2.4, but also incorporate the functionality of meshes whichis lacking in the geometry classes in Gamebryo 2.3. The major lacking functionality wasa straight memory layout for bone data, such as weights and indices. These are originallyplaced in a tree structure in the skin instance bone data, making it impossible to send as aconstant stream to the skinning kernel. Thus those had to be extracted, straightened andstored in the kernel feeder at load time for each geometry.

Heavy modifications were also needed in classes already used in Gamebryo 2.3 and inthe deformation kernel from 2.6. The geometry classes need to allow the kernel feeder men-tioned previously to be attached. The updating of the geometry classes need to be modifiedto issue CalculateBoneMatrices tasks through the kernel feeder instead of calculating themin the renderer class. When a culler deemed a geometry visible it needs to complete thattask and then issue a deformation task with the result of the CalculateBoneMatrices kerneltogether with the vertex data and the bone data as input.

In Gamebryo 2.3 the deform method in the skin instance deform and braid the differentstreams of vertex data in the format that the RSX want. However, it is possible to writedirectly to the graphics buffer on the RSX from the deformation kernel, so the deformationkernel had to be modified to write a single braided output stream and also take new inputdata for texture coordinates as they where also needed in the stream. The texture coor-

Page 32: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

24 Chapter 5. Accomplishment

dinates could not be set up just once as in the original code because when the LS from aSPU is written back from the kernel it copies the entire memory block to its destination;any holes in the stream not written would contain trash data. Any data that was needed inthe graphics buffer braided vertex data stream would need to go through the deformationkernel.

5.3 How the work was done

A small test environment was created which only included Gamebryo and some helper classesthat quickly starts up the environment for testing and creating smaller projects. Firstly, allsoftware skinning was stripped form the engine and then the system was implemented for theengines Microsoft Windows environment. When that was running a switch to the Playstation3 environment was made, still in the test environment. When that ran successfully it couldbe established that Gamebryo now provided the SPU-skinning, and the work continued withmaking the real game take advantage of the new functionality. During the development onthe Playstation 3, profiling was performed to find and fix all bottlenecks.

Page 33: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

5.3. How the work was done 25

PPU SPU RSX

SceneGraph Geometry TestKernelFeeder WorkflowMgr CalcBoneMatrices Deform Renderer

Update

UpdateDownWardPass

SubmitTasks

AddRelatedTask(delayed): Calculate Bone Matrices Task Added to Workflow

FlushTaskGroup: Submit Outstanding Workflows to Floodgate

Tasks Executed Asynchronously

OnVisible: Geometry passed culling

CompleteTasks

Wait for task completion

SubmitTasks

AddRelatedTask(immediate): Deform Task Added to Workflow

Tasks Executed Asynchronously

RenderImmediate: Render the Geometry

CompleteTasks

Wait for task completion

RenderGeometry

Write to G-buffer

Figure 5.4: SPU-skinning solution proposed for Gamebryo 2.3. With regard to Gamebryo2.3, the colors of the functionalities have the following meaning. White – unmodified; yellow– the funcitionality is modified; red – new implementation; green – imported from Gamebryo2.4.

Page 34: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

26 Chapter 5. Accomplishment

Page 35: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Chapter 6

Results

Performance analysis was performed both in the test environment and in the actual title.The results in the test environment show how well the parallelization increased performance,and the results from the actual game show what kind of contribution the system gave tothe final product.

6.1 Performance in a stand alone demo

In the test environment the application was almost entirely made up of animations, anentirely blank background with 16 non-textured characters with 80 bones each, running alooping animation over and over. The characters were the same kind as the main charactersin the actual game and contained 30428 vertices each which formed 47184 triangles. Sothe system would calculate 1280 bone matrices and then deform 486848 vertices. Sincethe hardware skinning had not been removed, two meshes could be created, one for eachskinning system. The implementations could then be examined head-to-head in the sameapplication. With hardware skinning the application ran with 17 fps and with the newSPU-skinning it ran with 61 fps, so the absolute speedup1 was 358%. As mentioned in thein-depth study there are 7 guaranteed SPUs on the Playstation 3 one of which is claimed bythe operating system. Another is claimed by a different system running in Gamebryo whichleaves 5 SPUs. So it is clear that each single SPU is not as fast the RSX, but combined theyoutperform it. Even when working with a system which goal is to fill the graphics buffer asfast as possible, work that the RSX was specifically designed for.

6.2 Performance gain in the actual game

When the development of the game started and up until the very last stages of the testingand performance tweaking, the SPUs where almost not utilized at all. One of the 6 availibleSPUs was allocated by Gamebryo, but the other five where mostly in idle. When the SPU-skinning solution was about to be merged into the game a new fullscreen mlaa2 algorithmhad been implemented which used all the SPUs. The algorithm was originally intended fora game which runs at lower fps than Coldwood’s title which has to be run at 60 fps for itsadvanced physics to work properly. This resulted in that all the SPUs now became occupied

1Absolute speedup is determined by comparing with the execution time of the best squential algorithm.2Morphological Anti Aliasing

27

Page 36: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

28 Chapter 6. Results

almost for the entire frame. There was simply no time to perform skinning for the largemain character models at the same time as the mlaa was running. There where however anumber of smaller models in each scene which could be skinned instead. There was no timeto skin them on the RSX but they had few enough bones and vertices to be skinned nextto the mlaa on the SPUs without creating a stall.

Before the mlaa was introduced the main characters where however for a short whileskinned with the new SPU skinning solution, and a comparison was made between the oldhardware skinning method and the new SPU method for the main characters. The differencebetween these actual characters and the versions in the test environment is that in the actualgame the models have textures, normals, binormals, and tangents, which also need to bedeformed. On average the framerate increased from 61.7 fps to 65.25 fps, which is a 5.7%increase.

Page 37: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Chapter 7

Conclusions

The solution works very well in my opinion and even though it didn’t get used to its fullcapacity in the title it still contributes to the title in a smaller way. The actual performanceis unrefutable, 358% faster than hardware skinning in a test environment leaves no roomfor speculation on the value of parallelized code in gaming. All required goals have beenreached and thanks to the Floodgate library all but one of the optional goals have been met,to introduce dual quaternion skinning, which I regard as a complete success.

7.1 Limitations

The models which where created for this new SPU skinning were exported as softwareskinned models, this prevented the exporter from stripifying1 the models. A possible solutionwould be to rewrite parts of the exporter to enable stripifying for software skinned models,or to export them as hardware skinned models and create a workaround in the program toidentify and perform SPU skinning on those specific hardware skinned models only.

7.2 Future work

Another kind of animation is blendshape animation which combines different pre-posedshapes to achive animation. This is performed on the RSX in Gamebryo 2.3 but thereexists a SPU solution in later versions. It should be possible to migrate that functionalityusing much the same method as the SPU smooth skinning but with a slightly modifiedkernel feeder. Blendshape animation is used in the title for facial animations. Anotherthing that could be done is to modify the deformation kernel to perform dual quaternionskinning, a slightly more expensive algorithm that reduces almost all anomalous artifactsthat come from twisting bones to much.

1The process of storing all triangles in strips so that each new triangle uses the last two vertices and onlyone new vertex which helps reduce the size of the model data.

29

Page 38: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

30 Chapter 7. Conclusions

Page 39: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

Acknowledgements

Thanks to:Christopher Holmberg

Andreas AsplundOlof HaggstromStefan JohanssonDick Adolfsson

The rest of the Coldwood team

31

Page 40: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

32 Chapter 7. Conclusions

Page 41: Migrating Mesh Skinning Deformation Functionality from RSX ... · parallilized process in an existing game engine for the gaming platform Sony PlayStation⃝R 3 (hereafter referred

References

[1] Abraham Arevalo and Ricardo M. Matinata and Maharaja Pandian and Eitan Peri andKurtis Ruby and Francois Thomas and Chris Almond.Programming the Cell Broadband Engine Architecture.http://www.redbooks.ibm.com/redbooks/pdfs/sg247575.pdf.

[2] Emergent. Gamebryo 2.3 Documentation, 2007.

[3] Emergent. Gamebryo 2.5 Documentation, 2008.

[4] Jim Tilander and Vassily Filippov. Practical SPU Programming in God of War III.Game Developers Conference, March 2009.http://www.tilander.org/aurora/comp/gdc2009 Tilander Filippov SPU.pdf.

[5] Johan Andersson. Parallel Graphics in Frostbite – Current and Future. SIGGRAPH,New Orleans, 2009.http://s09.idav.ucdavis.edu/talks/04JAndersson-ParallelFrostbiteSiggraph09.pdf.

[6] Jonathan Garrett. SPU wrangling, job management and debugging.Game Developers Conference, 2009.http://www.insomniacgames.com/tech/articles/0809/files/gdc2009 gpu wrangling.pdf.

[7] Ladislav Kavan and Steven Collins and Jiri Zara and Carol O’Sullivan. GeometricSkinning with Approximate Dual Quaternion Blending. ACM Transaction on Graphics27(4), 2008.

[8] Michal Valient. The Rendering Technology of Killzone 2. Game Developers Conference,March 2009.

[9] Sony Computer Entertainment Inc. Press release, May 2005.http://www.scei.co.jp/corporate/release/pdf/050517e.pdf.

[10] Sony Computer Entertainment Inc. Game developer conference, 2006.

[11] Thomas Chen and Ram Raghaven and Jason Dale and Eiji Iwata. Cell BroadbandEngine Architecture and its first implementation. Technical report, IBM, SystemsPerformance, http://www.ibm.com/developerworks/power/library/pacellperf/, 2005.

33