Collision Detection Design & Final Project Topic Brandon Smith November 5, 2008 ME 964.

Collision Detection Design & Final Project Topic

Brandon SmithNovember 5, 2008

ME 964

contact_data Allocation

• Possible ways to allocate the contact_data array:– Allocate contact_data[ N(N-1)/2 ]– Allocate contact_data[ n_contacts ]

• To avoid creating a huge array, I chose the second method:– 1st Kernel Call

• Find the number of contacts.

– 2nd Kernel Call• Calculate the contact_data for each contact.

Kernel Call Setup

• The total number of contact tests is:n_tests = N(N-1)/2

• The total number of concurrent threads is:n_concurrent_threads = N_SMs * BLOCKS_PER_SM *

THREADS_PER_BLOCK

• Each thread will perform several tests:n_test_per_thread = n_tests / n_concurrent_threads + 1

Collide Kernel: Indexing• Given the block number and thread number, a

range of test numbers (ki,kf) are generated:thread_id = bx*THREADS_PER_BLOCK +

tx;

ki = tests_per_thread*thread_id + 1;

kf = ki + tests_per_thread - 1;

Body 1 2 3 4 j

1 1 2 4 7

2 3 5 8

3 6 9

4 k

i

• Given a test number k, the indices (i,j) can be calculated:

k = ( (j-1)2-(j-1) )/2 + I

k <= (j2-j )/2

Collide Kernel: Contact Testing

• __global__ function calls __device__ test to actually perform the contact test

• In the first pass it simply tests for contact• In the second pass it calculates contact_data.

• atomicAdd is used to count the number of contacts – Keeps one contact tall for all concurrent threads– No need for condensation of results from each thread– Hassle to compile:

nvcc.exe -ccbin "C:\Program Files\Microsoft Visual Studio 8\VC\bin" -c -arch sm_11 -D_CONSOLE -Xcompiler "/EHsc /W3 /nologo /Wp64 /O2 /Zi /MT " - I"C:\CUDA\include" -I"C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc" -o Release\collide.obj collide.cu

Final Project: Monte Carlo Radiation Transport

• Objective: – Compute radiation flux or

derived quantities over a spatial/temporal domain.

• Method:– Follow the life of individual

particles through the domain.

1D Half Absorber - Half Scatter Benchmark

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+05

1.00E+06

1.00E+07

1.00E+08

0 5 10 15 20

Distance [cm]

Log

(Flu

x)

[n/c

m^

2*s

]

Diffusion Theory

Monte Carlo

• Quality of Results:– Statistical error is proportional

to 1/sqrt(n_particles)– Difficult to get even particle

distribution across the domain– Many particles are required to

achieve low statistical error

Example: Fusion Reactor Shielding

• The GPU Advantage:– Increase the number of

simulated particles– Decrease statistical error

Tasks during a Particle’s Life• Birth: particles are created at a

source

• Ray-cast: the distance to the next surface is calculated

• Collision: the particle interacts with matter

• Next volume: the particle crosses a boundary into another material

• Death: if the particle is absorbed, it is killed.

a b c

Volume 1 Volume 2

Complement

Particle Tracking - No Collisions, No Overlaps

ab

c

Volume 1 Volume 2

Complement

Particle Tracking - Collision at a, No Overlaps

d

Existing Fortran Code

• Geometry:– 3-D geometry supporting boxes and spheres

• Physics:– Only neutral particles (neutrons, photons)– No energy dependence– No time dependence

• Materials:– Simple materials (only a few isotopes)

• Sources:– point, line, area, volume

• Results:– mesh tallies and volume tallies

Potential for Parallelism

• Usually we can assume each particle is independent, unless:– criticality, weight windows, etc…

• Each thread could calculate independent particle trajectories– embarrassingly parallel

• When enough particles are simulated, condense the results from each thread

Implementation Challenges• Current code is in Fortran 90

– ~1700 lines– Has anyone tried F2C?

• Designed for Fortran 77

• Particles are tracked on a large mesh– ~1 M mesh elements, accessed once per particle– Mesh will need to be in global memory– Mesh will be accessed with an atomic function for data sharing?

• Ensure that random numbers are not repeated– Use a pseudo-random number generator for each thread– Each thread will need a different random seed – Check to ensure sufficiently large stride

• Could schedule rendezvous to check for solution convergence– Stop simulation once statistical error falls below a set value ( 5% )

ME 964: Project Proposal

Vikalp Mishra

Collision Detection• Aim

– Solve collision detection problem given N rigid spheres in 3D space

• Approach– Brute Force

– Compare each sphere with every other sphere• O(n2)

– If distance between centers is• more than sum of radii No collision• Less than sum of radii Collision

– When collision detected• compute normal and object IDs

Final Project: Bone FEA

• Title:– GPU based Finite

Element Analysis of Femur

• Femur– Thigh bone: Bone

between hip and knee joint

– Longest/ strongest bone in the body

Why study femur ?

• To better understand bone mechanics/ properties– Across species

• To understand the impact & extent of injury under various loading– Use in sports medicine & surgery

• To study impact of DNA change on bone formation/ growth– Improve the process of cloning to develop better species

• To study effect of nutrition cycle on bone development

Background

• In past– Experiments were done to study bone behavior / material properties

• Test performed– Fracture test– Bending test– Torsion test

• Experiments on mouse / pig– Costly and time consuming– Only one experiment per sample possible

• Alternative– Capture bone geometry and material properties– Use computational tools for various analysis

• Saves time/ money

Typical approach

• Given:– CT scan data of bone (geometry)

– Material property distribution

– Loading scheme• 3 or 4 point loading / Torsion test / Bending test

Use of FEA

• Use Finite Element Method– To capture geometry– Physical properties

• Hexahedral elements• Tetrahedral elements

• Formulate FE problem– Use boundary conditions to define element level

• stiffness matrix (Ke)• load vector (Fe)

• Assemble elements in global matrix (Kg, Fg)

• Solve FE problem– Obtain deflection (u = Kg-1Fg)

• Compare with experimental results– Verify model

Bottleneck

• Bone geometry is complex– Large number of elements required

• For pig bone ~ 0.5 – 1 million elements (coarse mesh)

GPU based approach

• Potential for GPU based computation– Same set of computation for each element

• Stiffness matrix computation (Ke)• Load vector computation (Fe)

– Different data sets for each element– SIMD

• Approach– Use GPU for element level computation

• Account for 67% of total time

– Use CPU for global matrix inversion

• Compare results with MATLAB based model

ME 964 – Midterm and Final Projects

Saigopal Nelaturi

• Problem – Given n spheres in 3d space, compute all pair-wise collisions

• Approach – Brute force algorithm with quadratic complexity

• Idea – every pair of spheres can be tested independently, and in parallel

CUDA Collision detection

Task Parallelism – pseudo code

• Constructive operators in SE(3)

• SE(3) is the group of 4x4 rigid transformation matrices

• Point in SE(3) = matrix

• Set in SE(3) = set of matrices

• Can devise operators using Boolean algebra and matrix multiplication (group operation)

Final Project

Example

How to compute workspace? Position + orientation of coordinate frame on coupler

Use set formulation in SE(3) – Intersection of setsEmbarrassingly parallel process!

Many other applications in design/geometric modeling/ motion planning …

• For very large sets of 4x4 transformation matrices , implement

• Intersection – pairwise comparison between matrices • Convolution – pairwise multiplication between matrices • Show some workspace computations (hopefully in 3d)

If possible, implement• Deconvolution – combination of pairwise

intersection/multiplication

Goals

Midterm Project

Ram Subramanian

The Task

To solve a collision detection problem: Given an arbitrary number of rigid spheres with known radii, distributed in the 3D space, To find out which spheres are in contact/penetration with which other spheres.

The Algorithm

• One pass over array to determine collisions.

• One pass over all the collided bodies to compute the values of collision required.

• Two Kernel Calls.

• O(n.(n-1)/2)

Indexing Every Thread gets a Reference body (Body A) and

a Comparison body (Body B). Each block has 512 threads (assumption 1). Each row in a grid has 512 blocks (assumption 2). Total number of threads is n(n-1)/2. Compute the index value with the thread ID and

block ID. Using this index value and the number of bodies

(using the div and mod) the index of the Body A and Body B, respectively, can be determined.

Final Project - Image Processing on the GPU

Goal – Implement Image Processing Algorithms for the GPU. Eventually have an image processing library for the GPUs using CUDA

Motivation – Most image processing tasks involve operating on individual pixels or a region of the image. Many of these tasks are embarrassingly parallel.

Proposed Implementations

• Harris Corner Detector

Ambitious GoalImplement an image stitching algorithm or 3D reconstruction algorithm that will stitch two images together using the Harris Corner detector.

Motivation – This is an algorithm used in the first stage processing of many other Image Processing and Computer Vision algorithms (e.g. : 3D reconstruction, Scene Stitching, Object Tracking, Visual Servoing, etc… )

Harris Corner Detector

• At every pixel in the image place a window (larger the better, e.g. 5x5) call it W

• Assume either 4 or 8 neighborhood of the current pixel position

• Slide the window to each neighboring pixel, giving W1, W2 …Wi (where i = 4 or 8)

Harris Corner Detector Contd..

• Compute the sum of squared differences (SSD) between W and each Wi

• A Corner is detected when all SSD values are below a given threshold set by user (or the smallest value is below a given threshold).

Midterm and Final Projects

Toby Heyn

ME 964

11/06/08

Midterm Project

• Spatial Subdivision– Partition space into

uniform grid (cells)

– For each object, determine which cells the object overlaps

– Objects can only collide if they occupy the same cell or adjacent cells

Midterm Project

• Construct Cell ID Array– Each thread determines the cell IDs of the cells its sphere

occupies, loads into Cell ID Array• Sort Cell ID Array

– Radix Sort Algorithm• Create Collision Cell List

– Scan sorted Cell ID Array, look for changes in cell ID– Write Collision Cell List with Cell ID Array indices, number of

objects in the cell• Traverse Collision Cell List

– One thread per Collision Cell– Each thread checks all collision pairs in the Collision Cell– Collisions are written to output

Midterm Project

• Radix Sort– Sorts cell IDs in several passes– Sorts low order bits before higher order bits,

retaining order of IDs with same cell ID• This helps in a later step

– Takes 4 passes to sort the 32 bit (4 byte) integers

– Makes use of parallel scan operation

Final Project

• Default final project – granular dynamics using collision detection from midterm

• Incorporate midterm collision detection into Chrono::Engine multibody dynamics engine

• Simulate Mars Rover with many (millions) of bodies

Final Project

• Chrono::Engine– C++ API– Commands for creating simulation environment,

populating with bodies, creating constraints, etc– Uses Bullet for collision detection– Has been used to solve systems with ~100,000

bodies– Has a CUDA parallelized dynamics solver

(based on LCP formulation)

Final Project

• Each wheel is a union of primitives

• Terrain consists of ~5000 spheres (much too coarse)

• Obstacles:– Non spherical bodies in

wheels– Large mass difference

between small grain and large rover

Final Project

• Handling non-spherical bodies– Represent the surface of the body as a composite of

smaller spheres

– New representation has more bodies, but only spheres

– Maintain same dimensions, mass, inertia properties

Final Project

• Parallelism– Collision detection

• Many bodies/collision pairs to check• Spatial sub-division: geometric decomposition,

task decomposition

– Dynamics• Many equations of motion to solve

– Geometric decomposition• Potentially many non-spherical bodies to process

in parallel

Final Project

• Remaining Issues– Re-use of data

• After solving the collision detection problem once, can data be reused to reduce the size of the problem to be solved in subsequent steps?

– Automate handling of non-spherical geometry• Can an automated method be created to represent

arbitrary geometry with spheres?

ME 964 Midterm & Final Project

Justin Madsen

Outline

• Midterm & final are the same project– “default scheme”

• Collision detection method– Baraff– Brief overview of 2 phase algorithm– Ideas for CUDA implementation

• Ideas for final project– Integrating CUDA collision detection with

other dynamics programs

Efficient collision detection

• Baraff method– Axis Aligned bounding

boxes (AABB)– Simple yet efficient– Only dealing with spheres

• Can be extended to convex polyhedra

• (actually don’t need bounding boxes for spheres, it’s a special case)

Figure 1. AABB size and orientation depends on the local coordinate system

Overview of method

• One dimensional case (x-axis)• Sort & Sweep

– Each object has a length along the axis according to the AABB– Data: beginning and end values (b and e) of each box– Sorted lowest to highest according to these values

Figure 2. Six objects and their AABB axes [1]

Determine possible contacts

• After sorting, collision detection happens in two phases

• Phase 1: broad phase– Traverse the axis; add objects to “possible

contact list” when bi is encountered

– For one dimensional case, when bi added to the list, it means contact occurs with all other objects in the list

Three dimensional case

• Phase 1 for 3-D:– Extend one dimensional contact check by

checking b and e for values along the y and z axes of the other objects in the list

– If contact check comes back positive for all 3 axes, add the object to the “possible contact list”

• Possible because…

Need to verify collision

• Tested positive for collision along all 3 axes…

Figure 3. Left to right: XY, XZ and YZ axes testing positive for collision

Verifying collision

• Phase 2: narrow phase– Just because all 3 axes

“intersect” does not necessarily mean contact has occurred

– Remember, checking bounding boxes, not actual object

– Using spheres; check distance between spheres vs. respective radii

Implementation in CUDA

• Can parallelize both broad and narrow phase– Accomplish this by assigning each object a thread– Same method, but requires two broad phase

sweeps• Sweep 1: determine & save number of collisions, but

don’t save collision pairs– Do a prefix sum to determine amount of memory and memory

location to store each collision pair

• Sweep 2: determine collision pairs and save them to the correct memory location

Extending midterm to final project

• Collision detection to be used for granular dynamics– Use existing parallel algorithms to determine

dynamics of a system with many contacts– Integrate my collision detection program into

existing software• Bullet, ChronoEngine

References

• [1] David Baraff. An introduction to physically based modeling: Rigid body simulation II - nonpenetration constraints. SIGGRAPH Course Notes,1997.

Collision Detection Design & Final Project Topic Brandon Smith November 5, 2008 ME 964.

Documents

Transcript of Collision Detection Design & Final Project Topic Brandon Smith November 5, 2008 ME 964.