Thinking Parallel, Part 2

7/27/2019 Thinking Parallel, Part 2

1/10

10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 1/10

NVIDIA Developer Zone

Secondary links

CUDA ZONE

Hom e > CUDA ZONE > Parallel Fora ll

Thinking Parallel, Part II: Tree Traversal on the GPU

By Tero Karr as, posted Nov 26 201 2 a t 09 :09PM

Tags: Algorithms, Parallel Programming

In thefirst part of this se ries, we looked at c ollision detection on the GPU and discussed two

commonly used algorithms that find potentially co lliding pairs in a set of 3D objec ts using their

axis-aligned bo unding box es (AA BBs). Each o f the two algorithms has its weaknesses: sort and

sweep suffers from high exec ution diver gence, while uniform gridrelies on too many simplifying

assumptions that limit its applicability in practice.

In this part we will turn our attention to a more sophisticated approach, hierarchical treetraversal, that avoids these issues to a large ex tent. In the proc ess, we will further explore the

role of divergence in parallel programming, and show a couple o f practical examples of how to

improve it.

Bounding Volume Hierarchy

We will b uild our appro ach around a bounding volume hierarchy (BVH), which is a commonly

used acceleration structure in ray tracing (for ex ample). A b ounding v olume hierarchy is

essentially a hierarchical grouping of 3D objec ts, where eac h group is associated with aconserv ative bounding box.

Dev eloper Cent er s Tech nologies Tools

Resou rces Com m unity

Log In
https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpuhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/blog/term/2https://developer.nvidia.com/user/loginhttps://developer.nvidia.com/https://developer.nvidia.com/suggested-readinghttps://developer.nvidia.com/toolshttps://developer.nvidia.com/technologieshttps://developer.nvidia.com/http://en.wikipedia.org/wiki/Bounding_volume_hierarchyhttps://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpuhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/term/2https://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/https://developer.nvidia.com/


2/10



Suppose we have eight objects, O -O , the green triangles in the figure abov e. In a BVH,

individual objects are r epresented by leaf nodes (green spheres in the figure), groups of objects

by internal nodes (N -N , orange spheres), and the entire scene by the ro ot node (N ). Eachinternal node (e.g. N ) has two children (N and N ), and is associated with a bounding volume

(orange rectangle) that fully contains all the underlying objects (O -O ). The bounding volumes

can basically be any 3D shapes, but we will use axis-aligned bounding box es (AA BBs) for

simplicity.

Our ov erall approach is to first construct a BVH ov er the giv en set of 3D objec ts, and then use it

to acc elerate the searc h for potentially co lliding pairs. We will postpone the discussion of

efficient hierarchy construction to the third part of this series. For now, lets just assume that we

already have the BVH in place.

Independent Traversal

Giv en the bounding box of a particular object, it is straightforward to formulate a rec ursive

algorithm to query all the objects whose bo unding boxes it ov erlaps. The following function takes

a BVH in the parameter bvh and an AABB to que ry against it in the parameter queryAABB. It tests

the AABB against the BVH recursively and returns a list of potential co llisions.

void traverseRecursive(CollisionList& list,

const BVH& bvh,

const AABB& queryAABB,

int queryObjectIdx,

NodePtr node)

{

// Bounding box overlaps the query => process node.

if(checkOverlap(bvh.getAABB(node), queryAABB))

{

// Leaf node => report collision.

if(bvh.isLeaf(node))

list.add(queryObjectIdx, bvh.getObjectIdx(node));

// Internal node => recurse to children.

1 8

1 7 12 4 5

1 4


3/10


4/10



The first line of the kernel compute s a linear 1D index for the curr ent thread. We do not make any

assumptions about the block and grid sizes. It is enough to launch at least numObjectsthreads in

one way or anotherany e xc ess threads will get terminated by the seco nd line. The third line

fetches the bounding box o f the corresponding objec t, and calls our function to perform

recursive trav ersal, passing the objec ts index and the pointer to the ro ot node o f the BVH in the

last two arguments.

To test our implementation, we will run a dataset taken fromAPEX Destruc tio n using a GeForceGTX 690 GPU. The data set c ontains 12,486 o bjects representing debris falling from the walls of a

corridor, and 7 3,7 04 pairs of potentially co lliding objects, as shown in the following screenshot.

The total execution time of our kernel for this dataset is 3.8 milliseconds. Not very good

considering that this kernel is just one part o f collision detec tion, which is only one part of a

simulation that we would ideally like to run at 60 FPS (16 ms). We should be able to do better .

Minimizing Divergence

The most o bv ious problem with our rec ursive implementation is high execution divergence. The

decision of whether to skip a given node or recur se to its children is made independently b y each

thread, and there is nothing to guarantee that nearby threads will remain in sync o nce they have

made different decisions. We can fix this by performing the trav ersal in an iterative fashion, and

managing the recursion stac k explicitly, as in the following function.

__device__ void traverseIterative(CollisionList& list,

BVH& bvh,

AABB& queryAABB,

int queryObjectIdx)

{

// Allocate traversal stack from thread-local memory,

// and push NULL to indicate that there are no postponed nodes.

NodePtr stack[64];

NodePtr* stackPtr = stack;
http://en.wikipedia.org/wiki/GeForce_600_Serieshttps://developer.nvidia.com/apex-destruction


5/10



*stackPtr++= NULL;// push

// Traverse nodes starting from the root.

NodePtr node = bvh.getRoot();

do

{

// Check each child node for overlap.

NodePtr childL = bvh.getLeftChild(node);

NodePtr childR = bvh.getRightChild(node);

bool overlapL =( checkOverlap(queryAABB,

bvh.getAABB(childL)));

bool overlapR =( checkOverlap(queryAABB,

bvh.getAABB(childR)));

// Query overlaps a leaf node => report collision.

if(overlapL && bvh.isLeaf(childL))

list.add(queryObjectIdx, bvh.getObjectIdx(childL));

if(overlapR && bvh.isLeaf(childR))

list.add(queryObjectIdx, bvh.getObjectIdx(childR));

// Query overlaps an internal node => traverse.

bool traverseL =(overlapL &&!bvh.isLeaf(childL));

bool traverseR =(overlapR &&!bvh.isLeaf(childR));

if(!traverseL &&!traverseR)

node =*--stackPtr;// pop

else

{

node =(traverseL)? childL : childR;

if(traverseL && traverseR)

*stackPtr++= childR;// push

}

}

while(node != NULL);

}

The loop is exec uted once for ev ery internal node that ov erlaps the query box . We begin by

chec king the children of the current node for ov erlap, and report an intersection if one of them is

a leaf. We then chec k whether the ov erlapped children are internal nodes that need to be

proc essed in a subsequent iteration. If there is only one child, we simply set it as the curre nt

node and start ov er. If there are two c hildren, we set the left child as the current node and push

the right child onto the stack. If there are no children to be trav ersed, we po p a node that was

prev iously pushed to the stack. The traversal ends when we pop NULL, which indicates that there

are no more nodes to proc ess.

The total execution time of this kernel is 0.91 millisecondsa rather substantial improvement

over 3.8 m s for the recur sive kernel! The reason for the improv ement is that each thread is now

simply exec uting the same loop o v er and ov er, regardless of which traver sal decisions it ends up


6/10



making. This means that nearby threads exec ute ev ery iteration in sync with each other, ev en if

they are trav ersing completely different parts of the tree.

But what if threads are indeed traversing complete ly different parts of the tree? That means that

they are acc essing different nodes (data divergence) and exec uting a different number of

iterations (execution divergence). In our curr ent algorithm, there is nothing to guarantee that

nearby threads will actually process objects that are nearby in 3D space. The amount of

divergence is therefore ve ry sensitive to the order in which the objects are specified.

Fortunately, we c an exploit the fact that the objects we want to query are the same objects from

which we constructed the BVH. Due to the hierarc hic al nature of the BVH, objec ts c lose to each

other in 3D are also likely to be loc ated in nearby leaf nodes. So lets order our queries the same

way , as sho wn in the following kernel c ode.

__global__ void findPotentialCollisions(CollisionList list,

BVH bvh)

{ int idx = threadIdx.x + blockDim.x * blockIdx.x;

if(idx < bvh.getNumLeaves())

{

NodePtr leaf = bvh.getLeaf(idx);

traverseIterative(list, bvh,

bvh.getAABB(leaf),

bvh.getObjectIdx(leaf));

}

}

Instead of launching one thread per o bject, as we did prev iously, we are now launching one

thread per leaf node. This does not affect the behav ior of the kernel, since each o bject will still get

processed ex actly once. However, it changes the or dering of the threads to minimize bo th

exe cution and data div ergence. The total exec ution time is now0.43 millisecondsthis triv ial

change improv ed the performance o f our algorithm by another 2x!

There is still one minor pro blem with our algorithm: ev ery potential collision will be repor ted

twiceonce by each participating objectand objects will also report c ollisions with themselves.

Reporting twice as many co llisions also means that we have to perform twice as much wor k.Fortunately , this can be avoided through a simple modification to the algorithm. In order for

objec t A to r eport a c ollision with object B, we require that A must appear before B in the tree.

To avoid traversing the hierarchy all the way to the leav es in order to find out whether this is the

case, we can store two additional pointers for ev ery internal node, to indicate the rightmost leaf

that can be reached through eac h of its children. During the trav ersal, we can then skip a node

whe never we notic e that it c annot be used to reach any leav es that wo uld be located after our

query node in the tree.

__device__ void traverseIterative(CollisionList& list,

BVH& bvh,

AABB& queryAABB,


7/10



int queryObjectIdx,

NodePtr queryLeaf)

{

...

// Ignore overlap if the subtree is fully on the

// left-hand side of the query.

if(bvh.getRightmostLeafInLeftSubtree (node)


8/10



So, the parallel implementation of simultaneous trav ersal does less work than independent

trav ersal, and it does not lack in parallelism, either. Sounds good, right? Wrong. It actually

performs a lot worse than independent traversal. How is that possible?

The answer isyou guessed itdivergence. In simultaneous trav ersal, each thread is working on

a complete ly different por tion of the tree, so the data divergence is high. There is no cor relation

between the traversal dec isio ns made by nearby threads, so the ex ec ution divergence is also

high. To make matters even worse, the exe cution times of the individual threads vary wildlythreads that are giv en a non-overlapping initial pair will exit immediately, whereas the ones

given a node paired with itself are likely to ex ecute the longest.

Maybe there is a way to o rganize the computation differently so that simultaneous trav ersal

would y ield better results, similar to what we did with independent trav ersal? There hav e been

many attempts to accomplish something like this in other contex ts, using clev er work

assignment, packet trav ersal, warp-synchronous programming, dynamic load balancing, and so

on. Long story short, yo u can get pretty close to the performance of independent traver sal, but it

is extremely difficult to actually be at it.

Discussion

We hav e lo oked at two way s of per forming broad-phase collision detectio n by trav ersing a

hierarchical data structure in parallel, and we have seen that minimizing divergence through

relatively simple algorithmic modifications can lead to substantial performance improv ements.

Comparing independent trav ersal and simultaneous tr aversal is interesting because it highlights

an important lesson about parallel programming. Independent traversal is a simple algorithm,

but it performs mo re work than necessary . ov erall. Simultaneo us traversal, on the other hand, ismore intelligent about the work it performs, but this comes at the price of increased c omplex ity.

Complex algorithms tend to be harder to parallelize, are more susceptible to divergence , and

offer less flexibility when it co mes to optimization. In our ex ample, these effects end up

completely nullifying the benefits of reduced overall computation.

Parallel programming is often less about ho w much work the program per forms as it is about

whe ther that work is div ergent or no t. Algorithmic complex ity often leads to div ergence, so it is

important to try the simplest algorithm first. Chances are that after a few rounds o f optimization,

the algorithm runs so well that more c omplex alternatives hav e a hard time competing with it.

In my nex t post, I will focus on parallel BVH construction, talk about the problem of occupancy ,

and present a recently published algorithm that explicitly aims to maximize it.

Abo ut the autho r: Tero Karras is a graphics researc h scientist at NV IDIA Researc h.

Parallel Forall is the NVI DIA Parallel Programming blog. If you enjoy ed this post, subsc ribe to

theParallel Forall RSS fee d!

NVIDIA Developer Programs

Get exclusiv e access to the lat est softwar e, report bugs and r eceiv e notifications for special ev ents.
https://developer.nvidia.com/blog/feed/2


9/10



Lear n m ore and Register

Recommended Reading

About Par al lel Forall

Conta ct Para llel Forall

Parallel Forall Blog

Featured Articles

Prev iousPauseNext

Tag Index

accelerometer (1) Alg orithm s (3) An droid (1 ) ANR (1 ) ARM (2) Ar ray Fire (1) Au di (1) Aut omotiv e &

Embedded (1 ) Blog (2 0) Blog (2 3) Cluster (4 ) competition (1 ) Compilation (1 ) Concur rency (2)

Copperh ead (1 ) CUDA (2 3) CUDA 4.1 (1 ) CUDA 5.5 (3 ) CUDA C (1 5) CUDA Fort ra n (1 0) CUDA Pro Tip(1 ) CUDA Profiler (1 ) CUDA Spotligh t (1 ) CUDA Zone (81 ) CUDACasts (2) Debug (1 ) Debugg er (1 )

Debugging (3) Dev elop 4 Shield (1) dev elopment kit (1 ) DirectX (3) Eclipse (1 ) Ev ents (2) FFT (1 ) Finite

Difference (4) Floatin g Point (2 ) Gam e & Graphics Dev elopment (3 5) Gam es and Gra phics (8) GeForce

Dev eloper Stories (1 ) getting started (1) google io (1 ) GTC (2) Hardwar e (1 ) Interv iew (1 ) Kepler (1 )

Lamborghini (1 ) Librar ies (4) m emory (6) Mobile Dev elopment (2 7 ) Monte Car lo (1 ) MPI (2) Multi -GPU

(3) n ativ e_app_glue (1 ) NDK (1 ) NPP (1 ) Nsight (2) N Sight Eclipse Edition (1 ) Nsight Tegra (1 )

NSIGHT Visual Stu dio Edition (1 ) Num baPro (2) Nu m erics (1) NV IDIA Parallel Nsight (1 ) nv idia-smi

(1 ) Occupancy (1 ) OpenACC (6) OpenGL (3) OpenGL ES (1) Parallel Forall (6 9) Parallel Nsight (1 )

Parallel Progr am m ing (5) PerfHUD ES (2) Perform anc e (4) Port ability (1 ) Port ing (1 ) Pro Tip (5)

Professional Gra phics (6) Profiling (3) Program m ing Lang ua ges (1) Py thon (3 ) Robotics (1 ) Shape

Sensing (1 ) Shared Memory (6) Shield (1) Stream s (2) tablet (1 ) TADP (1 ) Technologies (3) t egra (5)

Tegra A ndroid Developer Pack (1 ) Tegra A ndroid Development Pack (1 ) Tegra Dev eloper Stories (1)

Tegra Profiler (1 ) Tegra Zone (1 ) Textur es (1) Th rust (3 ) Tools (10) tools (2) Toradex (1 ) Visual Stu dio

(3 ) Windows 8 (1 ) xoom (1 ) Zone In (1 )

Developer Blogs

Parallel Forall Blog

About

Contact

Copyright 2013 NVIDIA Corporation

Legal Inf ormation
http://www.nvidia.com/object/legal_info.htmlhttps://developer.nvidia.com/contacthttp://www.nvidia.com/page/companyinfo.htmlhttp://developer.nvidia.com/blog/feed/2https://developer.nvidia.com/blog/tags/4392?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4211?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4945?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4161?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4861?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/256?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/50?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4821?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4849?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4397?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4939?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4971?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4940?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4914?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/51?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/221?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4865?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/316?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4844?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4854?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4848?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4872?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4858?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4841?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4526?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4847?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4869?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4850?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3546?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4831?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/231?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/176?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4880?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/311?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3356?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4621?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4846?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4944?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4626?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4681?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4953?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4903?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4938?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4949?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4776?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4701?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/296?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4866?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4853?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4756?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4871?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4836?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/431?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4862?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4870?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4859?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/181?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4201?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/421?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4948?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4977?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/1?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4851?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4946?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4741?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4811?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3131?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/276?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4915?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4521?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4372?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4874?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/2?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4336?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/466?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4873?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/456?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4501?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4933?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4631?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4857?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4845?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4864?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4860?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4852?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4917?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4905?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4786?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/436?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4842?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4868?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4867?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/496?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4843?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/321?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttp://developer.nvidia.com/blog/feed/2http://developer.nvidia.com/content/contact-parallel-forallhttp://developer.nvidia.com/about-parallel-forallhttps://developer.nvidia.com/blog?term=2https://developer.nvidia.com/registered-developer-programs


10/10

Thinking Parallel, Part 2

Documents

Transcript of Thinking Parallel, Part 2