Thinking Parallel, Part 3

7/27/2019 Thinking Parallel, Part 3

1/11

10/10/13 Thinking Parallel, Part III: Tree Construction on the GPU | NVIDIA Developer Zone

https://developer.nvidia.com/content/thinking-parallel-part-iii-tree-construction-gpu 1/11

NVIDIA Developer Zone

Secondary links

CUDA ZONE

Hom e > CUDA ZONE > Parallel Fora ll

Thinking Parallel, Part III: Tree Construction on the GPU

By Tero Karr as, posted Dec 19 201 2 at 1 0:49PM

Tags: Algorithms, Occ upancy , Parallel Programming

Inpart IIof this series, we looked at hierarchical tree trav ersal as a means of quickly identify ing

pairs of potentially co lliding 3D objec ts and we demonstrated how optimizing for low divergence

can result in substantial performance gains on massively parallel processors. Hav ing a fast

trav ersal algorithm is not v ery useful, though, unless we also have a tree to go with it. In this

part, we will close the c ircle by looking at tree building; specifically, parallel bounding vo lume

hierarchy (BVH) construction. We will also see an ex ample of an algorithmic optimization that

would be completely po intless on a single-cor e pro cessor, but leads to substantial gains in aparallel setting.

There are many use c ases for BVHs, and also many way s of constructing them. In our case,

construction speed is of the essence. In a physics simulation, objects keep moving from one time

step to the next, so we will need a different BVH for eac h step. Furthermore, we know that we are

going to spend only about 0 .25 milliseconds in trav ersing the BVH, so it makes little sense to

spend much more on constructing it. Onewell-kno wn appro ach for handling dy namic scenes is to

essentiallyrecycle the same BVH over and over. The basic idea is to only recalculate the

bo unding boxes of the nodes ac cor ding to the new objec t lo cations while keeping the hierarc hic al

structure of nodes the same. I t is also possible to make small incremental modifications to

improve the node structure around obje cts that hav e moved the most. However, the main

problem plaguing these algorithms is that the tree can deteriorate in unpredictable way s over

time, which can result in arbitrarily bad traversal performance in the worst case. To ensure

predictable worst-case behavior, we instead choose to build a new tree from scratch ev ery time

step. Lets look at how.

EXPLOITING THE Z-ORDER CURVE

The most promising current parallel BVH construc tion approac h is to use a so-called linear BVH(LBVH). The idea is to simplify the problem by first choosing the order in which the leaf nodes

(each correspo nding to one object) appear in the tree, and then generating the internal nodes in a

way that respects this o rder. We generally want objec ts that located close to each other in 3D

space to also reside nearby in the hierarchy, so a reasonable choice is to sort them along a space-

Dev eloper Cent er s Tech nologies Tools

Resou rces Com m unity

Log In
http://en.wikipedia.org/wiki/Space-filling_curvehttp://en.wikipedia.org/wiki/Space-filling_curvehttp://luebke.us/publications/eg09.pdfhttps://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpuhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/tags/4846https://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/user/loginhttps://developer.nvidia.com/suggested-readinghttps://developer.nvidia.com/https://developer.nvidia.com/https://developer.nvidia.com/technologieshttps://developer.nvidia.com/toolshttps://developer.nvidia.com/http://en.wikipedia.org/wiki/Space-filling_curvehttp://luebke.us/publications/eg09.pdfhttps://developer.nvidia.com/user/loginhttps://developer.nvidia.com/https://developer.nvidia.com/suggested-readinghttps://developer.nvidia.com/toolshttps://developer.nvidia.com/technologieshttps://developer.nvidia.com/http://en.wikipedia.org/wiki/Space-filling_curvehttp://luebke.us/publications/eg09.pdfhttp://visual-computing.intel-research.net/publications/papers/2008/async/AsyncBVHJournal2008.pdfhttps://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpuhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/thinking-parallel-part-iii-thumb.jpghttps://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/blog/tags/4846https://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/term/2https://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/https://developer.nvidia.com/


2/11



filling curve. We will use theZ-order curve for simplicity .

The Z-order c urv e is defined in terms ofMorto n codes. To calculate a Morton co de for the given3D point, we start by looking at the binary fixed-point representation of its coordinates, as shown

in the top left part of the figure. First, we take the fractional part of each co ordinate and ex pand it

by inse rting two gaps after each bit. Second, we inter leav e the bits o f all three c oor dinate s

together to form a single binary number. If we step through the Morton codes obtained this way

in increasing order , we are effectiv ely stepping along the Z-order curve in 3D (a 2D

representation is shown on the right-hand side of the figure). In practice, we can determine the

order of the leaf nodes by assigning a Morton code for each ob ject and then sorting the objects

accordingly. A s mentioned in the contex t of sort and sweep inpart I, parallel radix sort is just the

right tool for this job. A good way to assign the Morton code for a given object is to use the

centro id point of its bounding box , and express it relative to the bo unding box of the scene. The

ex pansion and interleaving of bits can then be performed efficiently by ex ploiting the arcane bit-

swizzling properties of integer multiplication, as shown in the following code.

// Expands a 10-bit integer into 30 bits

// by inserting 2 zeros after each bit.

unsignedint expandBits(unsignedint v)

{

v =(v *0x00010001u)&0xFF0000FFu;

v =(v *0x00000101u)&0x0F00F00Fu;

v =(v *0x00000011u)&0xC30C30C3u;

v =(v *0x00000005u)&0x49249249u;

return v;

}

// Calculates a 30-bit Morton code for the

// given 3D point located within the unit cube [0,1].

unsignedint morton3D(float x,float y,float z)

{

x = min(max(x *1024.0f,0.0f),1023.0f);

y = min(max(y *1024.0f,0.0f),1023.0f);

z = min(max(z *1024.0f,0.0f),1023.0f);
http://en.wikipedia.org/wiki/Space-filling_curvehttps://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpuhttp://en.wikipedia.org/wiki/Z-order_curvehttp://en.wikipedia.org/wiki/Space-filling_curve


3/11



unsignedint xx = expandBits((unsignedint)x);

unsignedint yy = expandBits((unsignedint)y);

unsignedint zz = expandBits((unsignedint)z);

return xx *4+ yy *2+ zz;

}

In our ex ample dataset with 12,486 objects, assigning the Morton codes this way takes 0.02

milliseco nds on GeForce GTX 690 , whereas sorting the objec ts takes 0.18 ms. So far so goo d, but

we still have a tree to b uild.

TOP-DOWN HIERARCHY GENERATION

One of the great things about LBVH is that once we hav e fixed the order o f the leaf nodes, we c an

think of each internal node as just a linear range ov er them. To illustrate this, suppose that we

haveNleaf nodes in total. The root node co ntains all of them, i.e. it cov ers the range [0,N-1]. The

left child of the root must then cov er the range [0,], for some appro priate choice of , and the

right child cov ers the range [+1,N-1]. We can continue this all the way down to obtain thefollowing recursiv e algorithm.

Node* generateHierarchy(unsignedint* sortedMortonCodes,

int* sortedObjectIDs,

int first,

int last)

{

// Single object => create a leaf node.

if(first ==last)

returnnewLeafNode(&sortedObjectIDs[first]);

// Determine where to split the range.

int split = findSplit(sortedMortonCodes, first,last);

// Process the resulting sub-ranges recursively.

Node* childA = generateHierarchy(sortedMortonCodes, sortedObjectI

first, split);

Node* childB = generateHierarchy(sortedMortonCodes, sortedObjectI

split +1,last);

returnnewInternalNode(childA, childB);

}

We start with a range that covers all o bje cts (first=0, last=N-1), and determine an appropriateposition to split the range in two (split=). We then repeat the same thing for the resulting sub-

ranges, and generate a hierarchy where each such split corresponds to one internal node. The

recursion terminates when we encounter a range that co ntains only one item, in which case we

create a leaf node.
http://en.wikipedia.org/wiki/GeForce_600_Series


4/11



The only remaining question is how to choose . LBVH determines acc ording to the highest bit

that differs between the Morton co des within the given range. In other words, we aim to partition

the ob jects so that the highest differing bit will be zero for all objects in childA, and one for all

objects in childB. The intuitive reason that this is a good idea is that partitioning objec ts by the

highest differing bit in their Morton codes co rrespo nds to classify ing them on either side of an

axis-aligned plane in 3D. In prac tice, the most efficient way to find out where the highest bit

changes is to use binary search. The idea is to maintain a current best guess for the po sition, and

try to adv ance it in exponentially dec reasing steps. On each step, we check whether the propo sed

new position would violate the requirements for childA, and either accept o r reject it

acc ordingly. This is illustrated by the following code, which uses the __clz()intrinsic function

available in NVI DIA Fermi and Kepler GPUs to count the number of leading zero bits in a 32-bit

integer.

int findSplit(unsignedint* sortedMortonCodes,

int first,

int last)

{

// Identical Morton codes => split the range in the middle.

unsignedint firstCode = sortedMortonCodes[first];

unsignedint lastCode = sortedMortonCodes[last];


5/11



if(firstCode == lastCode)

return(first +last)>>1;

// Calculate the number of highest bits that are the same

// for all objects, using the count-leading-zeros intrinsic.

int commonPrefix = __clz(firstCode ^ lastCode);

// Use binary search to find where the next bit differs.

// Specifically, we are looking for the highest object that

// shares more than commonPrefix bits with the first one.

int split = first;// initial guess

int step =last- first;

do {

step =(step +1)>>1;// exponential decrease

int newSplit = split + step;// proposed new position

if(newSplit commonPrefix)

split = newSplit;// accept proposal

}

}

while(step >1);

return split;

}

How should we go about parallelizing this kind of recursive algorithm? One way is to use the

approach presented by Garanzha et al., which processes the levels of nodes sequentially ,

starting from the root. The idea is to maintain a growing array of nodes in a breadth-first order ,

so that ev ery level in the hierarchy c orresponds to a linear range of nodes. On a given level, we

launch one thread for each node that falls into this range. The thread starts by reading firstand

lastfrom the node array and calling findSplit(). It then appends the resulting child nodes to the

same node array using an atomic counter and writes out their corre sponding sub-ranges. This

process iterates so that each level outputs the nodes c ontained on the nex t leve l, which then get

processed in the next ro und.

OCCUPANCY

The algorithm just described (Garanzha et al.) is surprisingly fast when there are millions of

objec ts. The algorithm spends most of the exec ution time at the bottom levels of the tree, which
http://scholar.google.fi/scholar?q=Simpler+and+faster+HLBVH+with+work+queues


6/11



contain more than enough work to fully employ the GPU. There is some amo unt of data

dive rgence on the higher levels, as the threads are accessing distinct parts of the Morton code

array . But those lev els are also less significant considering the ov erall execution time, since they

do not c ontain as many nodes to begin with. In our c ase, however, there are only 1 2K objects

(recall the example see n from the last post). Note that this is actually less than the number of

threads we would need to fill our GTX 690, ev en if we were able to parallelize every thing

perfectly . GTX 690 is a dual-GPU card, where each o f the two GPUs can run up to 16K threads in

parallel. Even if we restric t ourselv es to only one of the GPUsthe other one c an e.g. handlerendering while we do phy sicswe are still in danger o f running low on parallelism.

The top-down algorithm takes 1.04 ms to process our workload, which is more than twice the

total time taken by all the other processing steps. To explain this, we need to c onsider another

metric in addition to divergence: occupancy. Occ upancy is a measure of how many thre ads are

exe cuting on average at any giv en time, relative to the maximum number of threads that the

processor can theoretically support. When occ upancy is low, it translates directly to

performance: dropping occupancy in half will reduce performance by 2x. This dependence gets

gradually weaker as the number of activ e threads increases. The reason is that when occ upancy is

high enough, the ov erall performance starts to bec ome limited by other factors, such asinstruction throughput and memory bandwidth.

To illustrate, co nsider the case with 12K objec ts and 16K threads. If we launch one thread per

object, our o ccupancy is at most 7 5%. A bit low, but not by any means catastrophic. How does

the top-down hierarchy generation algorithm compare to this? There is only one node o n the first

level, so we launch only o ne thread. This means that the first level will run at 0.006% occupancy !

The second lev el has two nodes, so it runs at 0.01 3% occ upancy . Assuming a balanced hierarchy,

the third level runs at 0.025% and the fourth at 0.0 5%. Only when we get to the 1 3th lev el, can we

ev en hope to reac h a reasonable occupancy of 25%. But right after that we will already run out of

work. These numbers are somewhat discouragingdue to the low occupanc y , the first level will

cost roughly as much as the 13th lev el, even though there are 4096 times fewer nodes to process.

FULLY PARALLEL HIERARCHY GENERATION

There is no way to avoid this problem without so mehow changing the algorithm in a fundamental

way . Even if our GPU suppor ts dynamic parallelism (as NV IDIA Tesla K20 does), we canno t av oid

the fact that every node is dependent on the results of its parent. We have to finish processing the

root b efore we know which ranges its children cov er, and we cannot ev en hope to start

proc essing them until we do. In other words, regardless of how we implement top-downhierarchy generation, the first level is doomed to run at 0.006% occupancy. Is there a way to

break the dependency between nodes?

In fact there is, and I recently presented the so lution at High Performance Graphics 201 2 (paper,

slides). The idea is to number the internal nodes in a very specific way that allows us to find out

which range o f objec ts any giv en node c orresponds to, without having to know anything abo ut

the rest o f the tree. Utilizing the fact that any binary tree withNleaf nodes always has ex actlyN-1

internal nodes, we can then generate the entire hierarchy as illustrated by the following

pseudocode.

Node* generateHierarchy(unsignedint* sortedMortonCodes,

int* sortedObjectIDs,

int numObjects)
http://www.parallelforall.com/wp-content/uploads/2012/11/karras2012hpg_slides.pptxhttp://www.parallelforall.com/wp-content/uploads/2012/11/karras2012hpg_paper.pdfhttps://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu


7/11



{

LeafNode* leafNodes =newLeafNode[numObjects];

InternalNode* internalNodes =newInternalNode[numObjects -1];

// Construct leaf nodes.

// Note: This step can be avoided by storing

// the tree in a slightly different way.

for(int idx =0; idx < numObjects; idx++)// in parallel

leafNodes[idx].objectID = sortedObjectIDs[idx];

// Construct internal nodes.

for(int idx =0; idx < numObjects -1; idx++)// in parallel

{

// Find out which range of objects the node corresponds to.

// (This is where the magic happens!)

int2 range = determineRange(sortedMortonCodes, numObjects, id

int first = range.x;

intlast= range.y;

// Determine where to split the range.

int split = findSplit(sortedMortonCodes, first,last);

// Select childA.

Node* childA;

if(split == first)

childA =&leafNodes[split];

else

childA =&internalNodes[split];

// Select childB.

Node* childB;

if(split +1==last)

childB =&leafNodes[split +1];

else

childB =&internalNodes[split +1];

// Record parent-child relationships.

internalNodes[idx].childA = childA;internalNodes[idx].childB = childB;

childA->parent =&internalNodes[idx];

childB->parent =&internalNodes[idx];


8/11



}

// Node 0 is the root.

return&internalNodes[0];

}

The algorithm simply allocates an array ofN-1 internal nodes, and then proc esses all of them in

parallel. Each thread starts by determining which range of objects its node c orresponds to, with a

bit of magic, and then proceeds to split the range as usual. Finally , it selec ts c hildren for the no de

acc ording to their respect ive sub-ranges. If a sub-range has only one object, the c hild must be a

leaf so we reference the corresponding leaf node direc tly. Otherwise, we reference another

internal node from the array .

The way the internal nodes are numbered is already ev ident from the pseudocode. The root has

index 0 , and the children of ev ery node are loc ated on either side of its split position. Due to

some nice properties of the sorted Morton codes, this numbering scheme never results in any

duplicates or gaps. Furthermo re, it turns out that we can implement determineRange()in much

the same way as findSplit(), using two similar binary searches ov er the nearby Morton codes.

For further details on how and why this works, please see thepaper.

How does this algorithm compare to the recursive top-down approach? It clearly performs more

workwe no w need thr ee binary searches per no de, whereas the top-down approach only needs

one. But it does all of the work completely in parallel, reaching the full 7 5% occupanc y in our

ex ample, and this makes a huge difference. The execut ion time of parallel hierarchy generation is

merely0.02 msa 50x improvement ov er the top-down algorithm!

Y ou might think that the top-down algorithm should start to win when the number of objec ts is

sufficiently high, since the lack of occupancy is no longer a problem. Howev er, this is not the case

in practicethe parallel algorithm consistently performs better on all problem sizes. The

ex planation for this is, as always, div ergence. In the parallel algorithm, nearby threads are
http://www.parallelforall.com/wp-content/uploads/2012/11/karras2012hpg_paper.pdf


9/11



always accessing nearby Morton codes, whereas the top-down algorithm spreads out the

accesses ov er a wider area.

BOUNDING BOX CALCULATION

Now that we have a hierarchy of nodes in place, the only thing left to do is to assign a

conserv ative bounding box for each o f them. The approach I adopt in my paper is to do a parallel

bo ttom-up reductio n, where each thread star ts from a single leaf node and walks toward the ro ot.

To find the bounding box o f a given node, the thread simply looks up the bounding box es of its

children and calculates their union. To av oid duplicate work, the idea is to use an atomic flag per

node to terminate the first thread that enters it, while letting the second one thro ugh. This

ensures that ev ery node gets processed only once, and not before both of its children are

proc essed. The bounding box calculation has high execution divergenceonly half of the threads

remain active after proc essing one node, one quarter after processing two nodes, one eigth after

proc essing three nodes, and so on. Howev er, this is not really a problem in practic e because of

two reasons. First, bounding box calculation takes only0.06 ms, which is still reasonably low

compared to e.g. sorting the objects. Seco nd, the processing mainly c onsists of reading and

writing bounding boxes, and the amo unt of computation is minimal. This means that the

exe cution time is almost entirely dictated by the available memory bandwidth, and reducing

exe cution divergence would not really help that much.

SUMMARY

Our algorithm for finding potential collisions among a set of 3D objects c onsists of the following 5

steps (times are for the 1 2K object scene used in theprevious post).

1 . 0.02 ms, one thread per o bject: Calculate bounding box and assign Morto n code.

2. 0.18 ms, parallel radix sor t: Sort the objects acc ording to their Morton codes.

3. 0.02 ms, one thread pe r internal node: Generate BVH node hierarchy .

4. 0.06 ms, one thread per objec t: Calculate node bounding box es by walking the hierarchy

toward the root.

5. 0.25 ms, one thread per o bject: Find potential co llisions by traversing the BVH.

The complete algorithm takes 0.53 ms, out of which 53% goes to tree construc tion and 47 % to

tree trave rsal.

DISCUSSION

We hav e presente d a number of algorithms of vary ing c omplex ity in the c ontext of broad-phase

co llision detec tion, and identified some of the most important considerations when designing and

implementing them on a massively parallel processor. The comparison between independent

trav ersal and simultaneous trav ersal illustrates the importance of divergence in algorithm design

the best single-core algorithm may easily turn out to be the worst one in a parallel setting.

Relying on time complex ity as the main indicator o f a good algorithm can sometimes be

misleading or ev en harmfulit may actually be beneficial to do mo re wor k if it helps in reduc ing

divergence.

The parallel algorithm for BVH hierarc hy generation brings up another interesting point. In the

traditional sense, the algorithm is completely pointlesson a single-core processor, the
https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu


10/11



dependencies between nodes were not a problem to begin with, and doing more work per node

only makes the algorithm run slower. This shows that parallel programming is indeed

fundamentally different from traditional single-core programming: it is not so muc h about

porting ex isting algorithms to run on a parallel proc essor; it is about re-thinking some of the

things that we usually take for granted, and c oming up with new algorithms specifically designed

with massiv e paralle lism in mind.

And there is still a lot to be accomplished on this fro nt.

About the author: Tero Karras is a graphics research scientist at NVIDIA Research.

Parallel Forall is the NVI DIA Parallel Programming blog. If you enjoy ed this post, subsc ribe to

theParallel Forall RSS fee d! You may contact us via the contact form.

NVIDIA Developer Programs

Get exclusiv e access to the lat est softwar e, report bugs and r eceiv e notifications for special ev ents.

Lear n m ore and Register

Recommended Reading

About Par al lel Forall

Conta ct Para llel Forall

Parallel Forall Blog

Featured Articles

Prev iousPauseNext

Tag Index

accelerometer (1) Alg orithm s (3) An droid (1 ) ANR (1 ) ARM (2) Ar ray Fire (1) Au di (1) Aut omotiv e &

Embedded (1 ) Blog (2 0) Blog (2 3) Cluster (4 ) competition (1 ) Compilation (1 ) Concur rency (2)

Copperh ead (1 ) CUDA (2 3) CUDA 4.1 (1 ) CUDA 5.5 (3 ) CUDA C (1 5) CUDA Fort ra n (1 0) CUDA Pro Tip

(1 ) CUDA Profiler (1 ) CUDA Spotligh t (1 ) CUDA Zone (81 ) CUDACasts (2) Debug (1 ) Debugg er (1 )

Debugging (3) Dev elop 4 Shield (1) dev elopment kit (1 ) DirectX (3) Eclipse (1 ) Ev ents (2) FFT (1 ) Finite

Difference (4) Floatin g Point (2 ) Gam e & Graphics Dev elopment (3 5) Gam es and Gra phics (8) GeForce

Dev eloper Stories (1 ) getting started (1) google io (1 ) GTC (2) Hardwar e (1 ) Interv iew (1 ) Kepler (1 )

Lamborghini (1 ) Librar ies (4) m emory (6) Mobile Dev elopment (2 7 ) Monte Car lo (1 ) MPI (2) Multi -GPU

Udacity CS344: Intro to Parallel

Programming
https://developer.nvidia.com/udacity-cs344-intro-parallel-programminghttps://developer.nvidia.com/udacity-cs344-intro-parallel-programminghttps://developer.nvidia.com/blog/tags/4853?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4756?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4871?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4836?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/431?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4862?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4870?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4859?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/181?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4201?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/421?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4948?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4977?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/1?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4851?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4946?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4741?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4811?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3131?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/276?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4915?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4521?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4372?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4874?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/2?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4336?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/466?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4873?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/456?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4501?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4933?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4631?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4857?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4845?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4864?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4860?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4852?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4917?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4905?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4786?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/436?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4842?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4868?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4867?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/496?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4843?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/321?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttp://developer.nvidia.com/blog/feed/2http://developer.nvidia.com/content/contact-parallel-forallhttp://developer.nvidia.com/about-parallel-forallhttps://developer.nvidia.com/blog?term=2https://developer.nvidia.com/registered-developer-programshttps://developer.nvidia.com/content/contact-parallel-forallhttp://nvdrupws18.nvidia.com/blog/feed/2


11/11


(3) n ativ e_app_glue (1 ) NDK (1 ) NPP (1 ) Nsight (2) N Sight Eclipse Edition (1 ) Nsight Tegra (1 )

NSIGHT Visual Stu dio Edition (1 ) Num baPro (2) Nu m erics (1) NV IDIA Parallel Nsight (1 ) nv idia-smi

(1 ) Occupancy (1 ) OpenACC (6) OpenGL (3) OpenGL ES (1) Parallel Forall (6 9) Parallel Nsight (1 )

Parallel Progr am m ing (5) PerfHUD ES (2) Perform anc e (4) Port ability (1 ) Port ing (1 ) Pro Tip (5)

Professional Gra phics (6) Profiling (3) Program m ing Lang ua ges (1) Py thon (3 ) Robotics (1 ) Shape

Sensing (1 ) Shared Memory (6) Shield (1) Stream s (2) tablet (1 ) TADP (1 ) Technologies (3) t egra (5)

Tegra A ndroid Developer Pack (1 ) Tegra A ndroid Development Pack (1 ) Tegra Dev eloper Stories (1)

Tegra Profiler (1 ) Tegra Zone (1 ) Textur es (1) Th rust (3 ) Tools (10) tools (2) Toradex (1 ) Visual Stu dio

(3 ) Windows 8 (1 ) xoom (1 ) Zone In (1 )

Developer Blogs

Parallel Forall Blog

About

Contact

Copyright 2013 NVIDIA Corporation

Legal Inf ormation

Priv acy Policy

Code of C onduct
https://developer.nvidia.com/code-of-conducthttp://www.nvidia.com/object/privacy_policy.htmlhttp://www.nvidia.com/object/legal_info.htmlhttps://developer.nvidia.com/contacthttp://www.nvidia.com/page/companyinfo.htmlhttp://developer.nvidia.com/blog/feed/2https://developer.nvidia.com/blog/tags/4392?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4211?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4945?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4161?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4861?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/256?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/50?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4821?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4849?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4397?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4939?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4971?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4940?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4914?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/51?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/221?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4865?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/316?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4844?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4854?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4848?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4872?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4858?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4841?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4526?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4847?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4869?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4850?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3546?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4831?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/231?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/176?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4880?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/311?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3356?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4621?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4846?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4944?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4626?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4681?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4953?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4903?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4938?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4949?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4776?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4701?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/296?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4866?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublas

Thinking Parallel, Part 3

Documents

Transcript of Thinking Parallel, Part 3