GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
-
Upload
amd-developer-central -
Category
Technology
-
view
1.358 -
download
4
description
Transcript of GS-4150, Bullet 3 OpenCL Rigid Body Simulation, by Erwin Coumans
BULLET 3 OPENCL™ RIGID BODY SIMULATIONERWIN COUMANS, AMD
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL2
DISCLAIMER & ATTRIBUTION
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATIONCONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. OpenCL™ is a trademark of Apple Inc. Windows® and DirectX® are trademarks of Microsoft Corp. Linux is a trademark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their respective owners.
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL3
AGENDA
Introduction, Particles, Rigid Bodies
GPU Collision Detection
GPU Constraint Solving
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL4
BULLET 2.82 AND BULLET 3 OPENCL™ ALPHA
Real-time C++ collision detection and rigid body dynamics library
Used in movies
‒ Maya, Houdini, Cinema 4D, Blender, Lightwave, Carrara, Posed 3D, thinking Particles, etc
‒ Disney Animation (Bolt), PDI Dreamworks (Shrek, How to train your dragon), Sony Imageworks (2012),
Games
‒ GTA IV, Disney Toystory 3, Cars 2, Riptide GP, GP2
Industrial applications, Robotics
‒ Siemens NX9 MCD, Gazebo
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL5
PARTICLES AND RIGID BODIES
Position (Center of mass, float3)
Orientation
‒ (Inertia basis frame, float4)
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL6
UPDATING THE TRANSFORM
Linear velocity (float3)
Angular velocity (float3)
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL7
C/C++ VERSUS OPENCL™
void integrateTransforms(Body* bodies, int numNodes, float timeStep)
{
for (int nodeID=0;nodeId<numNodes;nodeID++) {
if( bodies[nodeID].m_invMass != 0.f) {
bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep;
}}
__kernel void integrateTransformsKernel( __global Body* bodies, int numNodes, float timeStep)
{
int nodeID = get_global_id(0);
if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f)) {
bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep;
}
}
One to One mapping
Read WriteCompute
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL8
OPENCL™ PARTICLES
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL9
UPDATE ORIENTATION
__kernel void integrateTransformsKernel( __global Body* bodies,const int numNodes, float timeStep, float angularDamping, float4 gravityAcceleration)
{
int nodeID = get_global_id(0);
if( nodeID < numNodes && (bodies[nodeID].m_invMass != 0.f))
{
bodies[nodeID].m_pos += bodies[nodeID].m_linVel * timeStep; //linear velocity
bodies[nodeID].m_linVel += gravityAcceleration * timeStep; //apply gravity
float4 angvel = bodies[nodeID].m_angVel; //angular velocity
bodies[nodeID].m_angVel *= angularDamping; //add some angular damping
float4 axis;
float fAngle = native_sqrt(dot(angvel, angvel));
if(fAngle*timeStep> BT_GPU_ANGULAR_MOTION_THRESHOLD) //limit the angular motion
fAngle = BT_GPU_ANGULAR_MOTION_THRESHOLD / timeStep;
if(fAngle < 0.001f)
axis = angvel * (0.5f*timeStep-(timeStep*timeStep*timeStep)*0.020833333333f * fAngle * fAngle);
else
axis = angvel * ( native_sin(0.5f * fAngle * timeStep) / fAngle);
float4 dorn = axis;
dorn.w = native_cos(fAngle * timeStep * 0.5f);
float4 orn0 = bodies[nodeID].m_quat;
float4 predictedOrn = quatMult(dorn, orn0);
predictedOrn = quatNorm(predictedOrn);
bodies[nodeID].m_quat=predictedOrn; //update the orientation
}
}
See opencl/gpu_rigidbody/kernels/integrateKernel.cl
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL10
UPDATE TRANSFORMS, HOST SETUP
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 0, sizeof(cl_mem), &bodies);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(int), &numBodies);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &deltaTime);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float), &angularDamping);
ciErrNum = clSetKernelArg(g_integrateTransformsKernel, 1, sizeof(float4), &gravityAcceleration);
size_t workGroupSize = 64;
size_t numWorkItems = workGroupSize*((m_numPhysicsInstances + (workGroupSize)) / workGroupSize);
if (workGroupSize>numWorkItems)
workGroupSize=numWorkItems;
ciErrNum = clEnqueueNDRangeKernel(g_cqCommandQue, g_integrateTransformsKernel, 1, NULL, &numWorkItems, &workGroupSize,0 ,0 ,0);
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL11
MOVING THE CODE TO GPU
Create an OpenCL™ wrapper‒ Easier use, fits code style, extra features, learn the API
Replace C++ by C
Move data to contiguous memory
Replace pointers by indices
Exploit the GPU hardware…
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL12
SHARING DATA STRUCTURES AND CODE BETWEEN OPENCL™ AND C/C++
#include "Bullet3Collision/NarrowPhaseCollision/shared/b3RigidBodyData.h"
#include "Bullet3Dynamics/shared/b3IntegrateTransforms.h"
__kernel void integrateTransformsKernel( __global b3RigidBodyData_t* bodies,const int numNodes, float timeStep, float angularDamping, float4 gravityAcceleration)
{
int nodeID = get_global_id(0);
if( nodeID < numNodes)
{
integrateSingleTransform(bodies,nodeID, timeStep, angularDamping,gravityAcceleration);
}
}
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL13
PREPROCESSING OF KERNELS WITH INCLUDES IN SINGLE HEADER FILE
We want the option of embedding kernels in our C/C++ program
Expand all #include files, recursively into a single stringified header file
‒ This header can be used in OpenCL™ kernels and in regular C/C++ files too
‒ Kernel binary is cached and cached version is unvalidated based on time stamp of embedded kernel file
Premake, Lua and a lcpp: very small and simple C pre-processor written in Lua
‒ See https://github.com/willsteel/lcpp
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL14
HOST, DEVICE, KERNELS, WORK ITEMS
Global Device Memory
Global Host Memory
L2 cache
Host Device (GPU)
CPU
GPU Collision Detection
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL16
RIGID BODY PIPELINE
timeStart End
Narrow Phase CD
Detect
pairs
Constraint Solving
Setup
constraints
Solve
constraints
Integrate
position
Collision Data Dynamics Data
Compute
world space
Object AABB
Collision
shapes
Object
AABB
Overlapping
pairs
World
transforms
velocities
Mass
Inertia
Constraints
(contacts,
joints)
Compute
contact
points
Contact
points
Integration
Forces,
Gravity
Broad PhaseCollision Detection (CD)
Mid PhaseCD
Cull complex
shapes
local space
Object
local space
BVH
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL17
BOUNDING VOLUMES AND DETECT PAIRS
X min
Y min
Z min
*
X max
Y max
Z max
Object ID
MIN (X,Y,Z)
MAX (X,Y,Z)
Object ID A Object ID B
Object ID A Object ID B
Object ID A Object ID B
Output pairs
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL18
COMPUTE PAIRS BRUTE FORCE
__kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs,
__global int2* pairsOut, volatile __global int* pairCount,
int numObjects, int axis, int maxPairs)
{
int i = get_global_id(0);
if (i>=numObjects)
return;
for (int j=0;j<numObjects;j++)
{
if ( i != j && TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) {
int2 myPair;
myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3];
int curPair = atomic_inc (pairCount);
if (curPair<maxPairs)
pairsOut[curPair] = myPair; //flush to main memory
}
}
Scatter operation
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL19
DETECT PAIRS
Uniform Grid
‒ Very fast
‒ Suitable for GPU
‒ Object size restrictions
Can be mixed with other algorithms
See bullet3\src\Bullet3OpenCL\BroadphaseCollision\b3GpuGridBroadphase.cpp
0 1 2 3
12 13 14 15
5 7
8 10 11
B
C E
D
F
A
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL20
UNIFORM GRID AND PARALLEL PRIMITIVES
Radix Sort the particles based on their cell index
Use a prefix scan to compute the cell size and offset
Fast OpenCL™ and DirectX® 11 Direct Compute implementation
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL21
1 AXIS SORT, SWEEP AND PRUNE
Find best sap axis
Sort aabbs along this axis
For each object, find and add overlapping pairs
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL22
COMPUTE PAIRS 1-AXIS SORT
__kernel void computePairsKernelOriginal( __global const btAabbCL* aabbs,
__global int2* pairsOut, volatile __global int* pairCount,
int numObjects, int axis, int maxPairs)
{
int i = get_global_id(0);
if (i>=numObjects)
return;
for (int j=i+1;j<numObjects;j++)
{
if(aabbs[i].m_maxElems[axis] < (aabbs[j].m_minElems[axis]))
break;
if (TestAabbAgainstAabb2GlobalGlobal(&aabbs[i],&aabbs[j])) {
int2 myPair;
myPair.x = aabbs[i].m_minIndices[3]; myPair.y = aabbs[j].m_minIndices[3];
int curPair = atomic_inc (pairCount);
if (curPair<maxPairs)
pairsOut[curPair] = myPair; //flush to main memory
}
}
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL23
GPU MEMORY HIERARCHY
Global Device Memory
Shared Local Memory
Shared Local MemoryShared Local Memory
Compute Unit
Private Memory
(registers)
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL24
BARRIER
A point in the program where all threads stop and wait
When all threads in the Work Group have reached the barrier, they can proceed
Barrier
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL25
KERNEL OPTIMIZATIONS FOR 1-AXIS SORTCONTENT SUBHEADER
AVOID GLOBAL ATOMICS
Use private memory to accumulate overlapping pairs (append buffer)
LOCAL ATOMICS Determine early exit condition for all work items within a workgroup
LOCAL MEMORY block to fetch AABBs and re-use them within a workgroup (barrier)
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL26
KERNEL OPTIMIZATIONS (1-AXIS SORT)
Load balancing‒ One work item per object, multiple work items for large objects
See opencl/gpu_broadphase/kernels/sapFast.cl and sap.cl
(contains un-optimized and optimized version of the kernel for comparison)
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL27
SEQUENTIAL INCREMENTAL 3-AXIS SAP
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL28
PARALLEL INCREMENTAL 3-AXIS SAP
Parallel sort 3 axis
Keep old and new sorted axis‒6 sorted axis in total
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL29
If begin or endpoint has same index do nothing
Otherwise, range scan on old AND new axis‒adding or removing pairs, similar to original SAP
Read-only scan is embarrassingly parallel
PARALLEL INCREMENTAL 3-AXIS SAP
Sorted x-axis old
Sorted x-axis new
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL30
HYBRID CPU/GPU PAIR SEARCH
0 1 2 3
12 13 14 15
5 7
8 10 11
B
C E
D
F
A
Small
Small
Large
Large
GPU
either
either
CPU
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL31
TRIANGLE MESH COLLISION DETECTION
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL32
GPU BVH TRAVERSAL
Create skip indices forfaster traversal
Create subtrees thatfit in Local Memory
Stream subtrees forentire wavefront/warp
Quantize Nodes
‒ 16 bytes/node
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL33
COMPOUND VERSUS COMPOUND COLLISION DETECTION
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL34
TREE VERSUS TREE: TANDEM TRAVERSAL
See __kernel void findCompoundPairsKernel( __global const int4* pairs … in
‒ in bullet3\src\Bullet3OpenCL\NarrowphaseCollision\kernels/sat.cl
for (int p=0;p<numSubTreesA;p++) {
for (int q=0;q<numSubTreesB;q++) {
b3Int2 node0; node0.x = startNodeIndexA;node0.y = startNodeIndexB;
nodeStack[depth++]=node0; depth = 1;
do {
b3Int2 node = nodeStack[--depth];
if (nodeOverlap){
if(isInternalA && isInternalB){
nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBleftChild);nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBleftChild);
nodeStack[depth++] = b3MakeInt2(nodeAleftChild, nodeBrightChild);
nodeStack[depth++] = b3MakeInt2(nodeArightChild, nodeBrightChild);
} else {
if (isLeafA && isLeafB) processLeaf(…)
else { …} //see actual code
}
} while (depth);
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL35
CONTACT GENERATION: GPU CONVEX HEIGHTFIELD
Dual representation
SATHE, R. 2006. Collision detection shader using cube-maps. In ShaderX5, Charles River Media
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL36
SEPARATING AXIS TEST
Face normal A
Face normal B
Edge-edge normal
Uniform work suits GPU very well: one work unit processes all SAT tests for one pair
Precise solution and faster than height field approximation for low-resolution convex shapes
See opencl/gpu_sat/kernels/sat.cl
A B
axis
plane
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL37
COMPUTING CONTACT POSITIONS
Given the separating normal find incident face
Clip incident face using Sutherland Hodgman clipping
One work unit performs clipping for one pair, reduces contacts and appends to contact buffer
See opencl/gpu_sat/kernels/satClipHullContacts.cl
n
incident
reference face
n
clipping planes
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL38
SAT ON GPU
Break the algorithm into pipeline stages, separated into many kernels
‒ findSeparatingAxisKernel
‒ findClippingFacesKernel
‒ clipFacesKernel
‒ contactReductionKernel
Concave and compound cases produce even more stages
‒ bvhTraversalKernel,findConcaveSeparatingAxisKernel,findCompoundPairsKernel,processCompoundPairsPrimitivesKernel,processCompoundPairsKernel,findConcaveSphereContactsKernel,clipHullHullConcaveConvexKernel
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL39
GPU CONTACT REDUCTION
See newContactReductionKernel in opencl/gpu_sat/kernels/satClipHullContacts.cl
GPU Constraint Solving
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL41
REORDERING CONSTRAINTS REVISITED
A
B D
1 4
A B C D
1 1
2 2
3 3
4 4
A B C D
Batch 0 1 1 3 3
Batch 1 4 2 2 4
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL42
while( nIdxSrc ) {nIdxDst = 0; int nCurrentBatch = 0;for(int i=0; i<N_FLG/32; i++) flg[i] = 0; //clear flagfor(int i=0; i<nIdxSrc; i++) {
int idx = idxSrc[i]; btAssert( idx < n );//check if it can goint aIdx = cs[idx].m_bodyAPtr & FLG_MASK; int bIdx = cs[idx].m_bodyBPtr & FLG_MASK;u32 aUnavailable = flg[ aIdx/32 ] & (1<<(aIdx&31));u32 bUnavailable = flg[ bIdx/32 ] & (1<<(bIdx&31));if( aUnavailable==0 && bUnavailable==0 ) {
flg[ aIdx/32 ] |= (1<<(aIdx&31)); flg[ bIdx/32 ] |= (1<<(bIdx&31));cs[idx].getBatchIdx() = batchIdx;sortData[idx].m_key = batchIdx; sortData[idx].m_value = idx;nCurrentBatch++;if( nCurrentBatch == simdWidth ) {
nCurrentBatch = 0;for(int i=0; i<N_FLG/32; i++) flg[i] = 0;
}}else {
idxDst[nIdxDst++] = idx;}
}swap2( idxSrc, idxDst ); swap2( nIdxSrc, nIdxDst );batchIdx ++;
}
CPU SEQUENTIAL BATCH CREATION
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL43
GPU ITERATIVE BATCHING
Parallel threads in workgroup (same SIMD) use local atomics to lock rigid bodies
Before locking attempt, first check if bodies are already used in previous iterations
See “A parallel constraint solver for a rigid body simulation”, Takahiro Harada, http://dl.acm.org/citation.cfm?id=2077378.2077406
and opencl\gpu_rigidbody\kernels\batchingKernels.cl
A B C D
unused unused unused unused
1 1 2 3
A B C D
Batch 0 1 1
For
each
bat
ch
For
each
un
assi
gned
co
nst
rain
t
Try to reserve bodies
Append constraint to batch
A
B D
1 4
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL44
GPU PARALLEL TWO STAGE BATCH CREATION
Cell size > maximum dynamic object size
Constraint are assigned to a cell
‒ based on the center-of-mass location of the first active rigid body of the pair-wise constraint
Non-neighboring cells can be processed in parallel
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL45
MASS SPLITTING+JACOBI ~= PGS
See “Mass Splitting for Jitter-Free Parallel Rigid Body Simulation” by Tonge et. al.
A B0 B1 C0 C1 D1 D1 A
1 1 2 2 3 3 4 4
B D
A
1
2 3
4
B C D
B1
B0
Parallel Jacobi
Averaging velocities
C1
C0
C1
C0
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL46
GPU NON-CONTACT CONSTRAINTS, JOINTS
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL47
GPU NON-CONTACT CONSTRAINTS, JOINTS
getInfo1Kernel and getInfo2Kernel with switch statement replaces virtual methods in Bullet 2.x
See bullet3\src\Bullet3OpenCL\RigidBody\kernels\jointSolver.cl
__kernel void getInfo1Kernel(__global unsigned int* infos, __global b3GpuGenericConstraint* constraints, int numConstraints)
__kernel void getInfo2Kernel(__global b3SolverConstraint* solverConstraintRows, ..
switch (constraint->m_constraintType)
{
case B3_GPU_POINT2POINT_CONSTRAINT_TYPE:
case B3_GPU_FIXED_CONSTRAINT_TYPE:
}
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL48
DETERMINISTIC RESULTS
Projected Gauss Seidel requires solving rows in the same order
Sort the constraint rows (contacts, joints)
Solve constraint batches in the same order
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL49
DYNAMICA PLUGIN FOR MAYA WITH OPENCL™
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL50
AMD CODEXL OPENCL™ DEBUGGER AND PROFILER
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL51
STACKING TEST
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL52
FUTURE WORK
DirectX®11 DirectCompute port
Multi GPU, multi-core, MPI
Move over Bullet 2 to Bullet 3, hybrid of CPU and GPU
‒ Featherstone, direct solvers on CPU
Cloth and Fluid simulation, TressFX hair, with two-way interaction
Extend GPU-PGS solver to GPU-NNCG
‒ Non-smooth non-linear conjugate gradient solver
Improve GPU Ray intersection tests
| BULLET 3 OpenCL™ RIGID BODY SIMULATION | NOVEMBER 21, 2013 | CONFIDENTIAL53
THANK YOU!
Visit http://bulletphysics.org for more information. All source code is available:
http://github.com/erwincoumans/bullet3
‒ Lets you fork, report issues and request features
Windows®, Linux®, Mac OSX
AMD and NVIDIA GPU
‒ Preferably high-end desktop GPU