Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern approaches to...

Post on 10-Jan-2017

500 views 0 download

Transcript of Umbra Ignite 2015: Rulon Raymond – The State of Skinning – a dive into modern approaches to...

The State of Skinning

… Or How To Maintain Your Physique

Welcome!Tervetuloa!

Rulon RaymondSr. Engine Programmer

Introduction

1) Review2) Evolution of techniques on console HW3) The new hotness (hint: it’s a Clifford

Algebra)4) Extensions

DISCLAIMER: All screenshots and techniques presented are not associated with any specific title, project, or oragnization, unless otherwise

stated.

Outline

What is Skinning?

What is Skinning?

I Was Skinning Long Before 3D Animated Models Were All The Rage

Step2: Skinning!

What is Skinning?

Skinned Model, ready for drawing

Model Vertices

Bone Weights

Bone Transform

s

What is Skinning?

What is Skinning?

: The initial vertex transform Array of bone weighting values Array of bone transforms: The final vertex transform

Skinning on Consoles

• Sony Playstation (1995)• Geometry Transform Engine (GTE)

Skinning on Consoles

• Sony Playstation2 (2000)• Vector Unit 0 (VU0)

Skinning on Consoles

Microsoft Xbox (2001) NVIDIA GPU (DirectX 8.x)

Skinning on Consoles

Sony PS3 ( 2006) Synergistic Processing Units (SPU’s)

Skinning on Consoles

Why not use the GPU for skinning on Xbox 360 and PS3?

The CPU’s/SPU’s are actually quite fast.

Skinning Implementation

3X@3.2Ghz

6X@3.2Ghz(with many restrictions…)

Why not use the GPU for skinning on Xbox 360 and PS3?

Split Vertex Streams

Skinning on Consoles

VertexPosition, Tangent Space• Skinned

Colors, UV’s, etc.• Constant – sent straight to GPU

Stream 0

Stream 1

Why not use the GPU for skinning on Xbox 360 and PS3?

Unified Memory Architecture

Skinning on Consoles

// Just skinned a vertex. Now write it out as// three 16-byte vectors__stvx( skinnedVertexData0, vertsOutBuffer, 0 );__stvx( skinnedVertexData1, vertsOutBuffer, 16 );__stvx( skinnedVertexData2, vertsOutBuffer, 32 );// Gah – why’d that take so long?

// ~20% faster!// (F*&^% write-combine memory)__stvx( skinnedVertexData0, vertsOutBuffer, 0 );_WriteBarrier();__stvx( skinnedVertexData1, vertsOutBuffer, 16 );_WriteBarrier();__stvx( skinnedVertexData2, vertsOutBuffer, 32 );

Why not use the GPU for skinning on Xbox 360 and PS3?

So you can use the GPU for other things.

Skinning on Consoles

Microsoft Xbox One (2013) Sony PS4 (2013) AMD GCN GPU

Skinning on Consoles

Skinning on Consoles

GPU FrameDraw Calls

IDLE Draw Calls

Post FX

IDLEGCN Compute Unit

GCN Compute Unit

Async Compute Skinning

Skinning on Consoles

GPU FrameDraw Calls

Skinning

Draw Calls

Post FX

Skinning

GPU Compute Unit

GPU Compute Unit

• Generate Draw List (frame N)

Visible Models

• Async Compute Dispatch Thread.

Model Skinning

Workloads • GPU rendering (frame N-1)

Skinned Model

(frame N)

Skinning on Consoles

Async Compute Skinning

Skinning on Consoles

MATH WARNING!

The standard approach to real-time skinning, used in almost every modern 3D game.

Linear Matrix Blend Skinning

Suffers from some well-documented problems...

The “candy wrapper” effect

Linear Matrix Blend Skinning

Mesh Volume Preservation

Example: “flat ass syndrome”

Linear Matrix Blend Skinning

Q: Why do these problems exist?A: Let’s take a closer look at the underlying math…

Linear Matrix Blend Skinning

Linear Matrix Blend Skinning

Apply the property of distrubutivity:

Linear Matrix Blend Skinning

To keep it simple: Let represent a rigid transform. No scale, shear, … Most common scenario for skinning in games.

A linear combination of rigid transforms DOES NOT yield a rigid transform! Orthonormal matrices aren’t

closed under addition. Scaling values can creep into

the final vertex transforms. Extreme cases can result in

rank-deficient matrices.

Linear Matrix Blend Skinning

𝑣 ′

𝑣

𝑀 𝑗1𝑣

𝑀 𝑗2𝑣

Example: The “candy wrapper” artifact

The most common workaround to these issues is the addition of new bones. Hand-animated or procedural. Split the rotation of a joint, relative to its parent, into even increments –

for a single axis only. Example: Arm Twist Bone

Parented to the shoulder and consistently represents exactly half its twist(roll) motion.

Linear Matrix Blend Skinning

Adding these bones is not free!

Memory and processing overhead.

Exact amount depends on actual implementation.

Linear Matrix Blend Skinning

Dual Quaternions to the rescue! But what exactly are they? Let’s start with a quick review of the vanilla

variety of quaternions…

Linear Matrix Blend Skinning

Quaternions

Hamilton - 1843

A 4D extension of complex numbers

For our purposes all we care about is unit quaternions. Conveniently represent rotations. Conjugate:

Quaternions

𝑞∗=𝑞−1 ,‖𝑞‖=1

One important quaternion equation to note:

Applies a rotation to a 3D point

Quaternions

Similar in form to complex numbersStored as:

Dual Numbers

Conjugate

Multiplication

Dual Numbers

Basically a quaternion whose elements are dual numbers (quaternion form)

is the scalar part (dual number) is the vector part (dual vector)

(dual number form) : “non-dual part” : “dual part” Most useful for skinning.

Dual Quaternions

Multiplication:

Quaternion Conjugate:

Dual Conjugate:

Quaternion & Dual Conjugate:

Dual Quaternions

𝑁𝑜𝑟𝑚 (�̂�)=‖𝑞𝑎‖+⟨𝑞𝑎 ,𝑞𝑏 ⟩‖𝑞𝑎‖

𝜀

Dual Quaternions

�̂�∗=�̂�−1 ,‖�̂�‖=1

Rigid Transforms:

Dual Quaternions

Transforming a 3D point

Dual Quaternions

Geometric Interpretation Recall:

Dual Quaternions : dual quaternion representing only a rotation

• : translation vector, in quaternion form

• : angle of rotation• : translation along

: unit dual quaternion with a 0 scalar part

• : direction of axis of rotation• : moment of rotation axis

Screw Transform! Rotation about an axis followed by translation

along that axis. All rigid transforms can be described this way.

Dual Quaternions

Simple Case:

Dual Quaternion Blend Skinning

𝑞0 𝑞1

𝑞𝐷𝑄𝐵

Dual Quaternion Blend Skinning

Unlike with matrix blending, the result is always a rigid transform!

Very accurate, but not perfect. Can introduce accelerations when input dual

quaternions differ greatly. 8.15 degrees : Maximum rotational deviation 15.1% : Maximum translational deviation

Modified SLERP can be used if absolute accuracy is required.

Efficiency tradeoff usually not worth it.

Dual Quaternion Blend Skinning

Must handle antipodality! Polarity rule:

We want: Fix up all dual quaternions prior to skinning.

Dual Quaternion Blend Skinning

�̂�

−�̂�

for ( all bones’ unit dual quaternions, dq[i] )if ( InnerProduct( dq[i], dq[parent[i]] ) <

0.0 )Negate( dq[i] );

Dual Quaternion Blend Skinning

// Input: unit quaternion 'q0', translation vector 't' // Output: unit dual quaternion 'dq' static void QuatTrans2UDQ( const float q0[4], const float t[3], float dq[2][4] ) {

// Non-Dual Part: dq[0] = q0 for ( int i=0; i<4; i++ )

dq[0][i] = q0[i];

// Dual Part: dq[1] = ((0,t[0],t[1],t[2])/2)*q0dq[1][3] = -0.5f*(t[0]*q0[0] + t[1]*q0[1] + t[2]*q0[2]); // Scalar

Componentdq[1][0] = 0.5f*( t[0]*q0[3] + t[1]*q0[2] - t[2]*q0[1]); // Vector

Component 0dq[1][1] = 0.5f*(-t[0]*q0[2] + t[1]*q0[3] + t[2]*q0[0]); // Vector

Component 1dq[1][2] = 0.5f*( t[0]*q0[1] - t[1]*q0[0] + t[2]*q0[3]); // Vector

Component 2}

Generating a Dual Quaternion

Dual Quaternion Blending

Dual Quaternion Blend Skinning

// Input: array of dual quaternions 'dqIn'// Input: array of weights 'w‘, totaling 1.0// Input: size of the above two arrays (> 1)// Output: the blended dual quaternion 'dqOut' static void DQB( const float dqIn[][2][4], float w[], int numDQ, float dqOut[2][4] ){ // dqOut = w[0]*dqIn[0] Vec4Scale( dqIn[0][0], w[0], dqOut[0] ); Vec4Scale( dqIn[0][1], w[0], dqOut[1] ); for( int i = 1; i < numDQ; ++i ) { // dqOut += w[i]*dqIn[i] Vec4Mad( dqOut[0], w[i], dqIn[i][0], dqOut[0] ); Vec4Mad( dqOut[1], w[i], dqIn[i][1], dqOut[1] ); }}

Transformation Using a Dual Quaternion

Dual Quaternion Blend Skinning

// Input: unit dual quaternion 'dq' // Input: input position 'vecIn' // Output: rigidly transformed position 'vecOut' static void DQTransform( const float dq[2][4], const vec3_t vecIn, vec3_t vecOut ){ vec4_t q0, q1; float a0, ae, recipDeLen; vec3_t d0, de, temp1, temp2, temp3, temp4, temp5; vec3_t temp6, temp7, temp8, temp9, temp10, temp11;

recipDeLen = 1.0f / I_sqrt( dq[0][3]*dq[0][3] + dq[0][0]*dq[0][0] + dq[0][1]*dq[0][1] + dq[0][2]*dq[0][2] );

// Normalize both parts of the dual quaternion, based // on the length of the non-dual part. Vec4Scale( dq[0], recipDeLen, q0 ); Vec4Scale( dq[1], recipDeLen, q1 );

// Isolate the scalar and vector parts of both // quaternions. This is just for code clarity and can // be omitted for SIMD optimization. a0 = q0[3]; ae = q1[3]; memcpy( d0, &q0[0], sizeof( d0 )); memcpy( de, &q1[0], sizeof( de ));

// Transform 'vecIn' by the dual quaternion // to produce 'vecOut'. vecOut = dq*v*dq^-1 Vec3Cross( d0, vecIn, temp1 ); Vec3Mad( temp1, a0, vecIn, temp2 ); Vec3Scale( de, a0, temp3 ); Vec3Scale( d0, ae, temp4 ); Vec3Cross( d0, de, temp5 ); Vec3Sub( temp3, temp4, temp6 ); Vec3Add( temp6, temp5, temp7 ); Vec3Scale( temp7, 2.0f, temp8 ); Vec3Scale( d0, 2.0f, temp9 ); Vec3Cross( temp9, temp2, temp10 ); Vec3Add( vecIn, temp10, temp11 ); Vec3Add( temp11, temp8, vecOut );}

Blend

ing (2

)

Blend

ing (3

)

Blend

ing (4

)

Transf

orm Po

s

Transf

orm Ve

c05

101520253035

Matrix Skinning (column-major)DQB Skinning

Dual Quaternion Blend Skinning

Instruction Counts (XB360 VMX )

05

101520253035

Matrix Skinning (row-major)DQB Skinning

Dual Quaternion Blend Skinning

Instruction Counts (XB360 GPU)

Dual Quaternion Blend Skinning

On GCN GPU DQ Skinning

Matrix Skinning

Aggregate $ Efficiency VGPR Count Memory Stalls DRAM Footprint

DQ vs. Matrix Skinning

DQ Skinning is ~24% faster***

Dual Quaternion Blend Skinning

***: Depends heavily on vertex layout, tangent space quality, number of bones, and weighting distributions.

Optional Optimizations: Compress quaternions

10:10:10:2 format for non-dual component Tune max waves/SIMD Generate skinning transforms on the GPU

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Procedural Motions

Dual Quaternion Blend Skinning

Spore © EA (2008)

IK

Dual Quaternion Blend Skinning

Especially when animations are played on characters with different or custom proportions.

Ragdolls: Can you spot all the artifacts DQB would resolve?

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Dual Quaternion Blend Skinning

Pros GPU/SIMD friendly

No asset changes required Cheaper transform blending

More cache friendly Requires less memory/constants Conducive to procedural motions (Mostly) replaces the need for

the rotational split bones mentioned earlier.

Can be enabled selectively (per-LOD, per-submesh, high end

machines only)

Dual Quaternion Blend Skinning

Cons Less intuitive than matrices

Local scaling must be handled separately

Actual vertex transform is more ALU

Still not 100% accurate Potential bulge artifacts

Not widely adopted in games (yet)

No more flat asses!

Skinning

Blend Shapes

Skinning

Geometry Caching

Skinning

“Bulging-free dual quaternion skinning” (Kim, 2014)

Skinning

Skinning

1.Solve for: Bone weights on to minimize for all t.

2.Re-weight artists-selected vertices in Maya/Max.

Skinning

The optimal model skinning approach can vary per platform.

Give dual quaternion skinning a look. Don’t assume skinning is a “solved

problem”.(Unless you’re Leatherface)

Conclusion