Thinking Parallel, Part 2

download Thinking Parallel, Part 2

of 10

Transcript of Thinking Parallel, Part 2

  • 7/27/2019 Thinking Parallel, Part 2

    1/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 1/10

    NVIDIA Developer Zone

    Secondary links

    CUDA ZONE

    Hom e > CUDA ZONE > Parallel Fora ll

    Thinking Parallel, Part II: Tree Traversal on the GPU

    By Tero Karr as, posted Nov 26 201 2 a t 09 :09PM

    Tags: Algorithms, Parallel Programming

    In thefirst part of this se ries, we looked at c ollision detection on the GPU and discussed two

    commonly used algorithms that find potentially co lliding pairs in a set of 3D objec ts using their

    axis-aligned bo unding box es (AA BBs). Each o f the two algorithms has its weaknesses: sort and

    sweep suffers from high exec ution diver gence, while uniform gridrelies on too many simplifying

    assumptions that limit its applicability in practice.

    In this part we will turn our attention to a more sophisticated approach, hierarchical treetraversal, that avoids these issues to a large ex tent. In the proc ess, we will further explore the

    role of divergence in parallel programming, and show a couple o f practical examples of how to

    improve it.

    Bounding Volume Hierarchy

    We will b uild our appro ach around a bounding volume hierarchy (BVH), which is a commonly

    used acceleration structure in ray tracing (for ex ample). A b ounding v olume hierarchy is

    essentially a hierarchical grouping of 3D objec ts, where eac h group is associated with aconserv ative bounding box.

    Dev eloper Cent er s Tech nologies Tools

    Resou rces Com m unity

    Log In

    https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpuhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/blog/term/2https://developer.nvidia.com/user/loginhttps://developer.nvidia.com/https://developer.nvidia.com/suggested-readinghttps://developer.nvidia.com/toolshttps://developer.nvidia.com/technologieshttps://developer.nvidia.com/http://en.wikipedia.org/wiki/Bounding_volume_hierarchyhttps://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpuhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/bvh-tree-thumbnail.jpghttps://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/term/2https://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/https://developer.nvidia.com/
  • 7/27/2019 Thinking Parallel, Part 2

    2/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 2/10

    Suppose we have eight objects, O -O , the green triangles in the figure abov e. In a BVH,

    individual objects are r epresented by leaf nodes (green spheres in the figure), groups of objects

    by internal nodes (N -N , orange spheres), and the entire scene by the ro ot node (N ). Eachinternal node (e.g. N ) has two children (N and N ), and is associated with a bounding volume

    (orange rectangle) that fully contains all the underlying objects (O -O ). The bounding volumes

    can basically be any 3D shapes, but we will use axis-aligned bounding box es (AA BBs) for

    simplicity.

    Our ov erall approach is to first construct a BVH ov er the giv en set of 3D objec ts, and then use it

    to acc elerate the searc h for potentially co lliding pairs. We will postpone the discussion of

    efficient hierarchy construction to the third part of this series. For now, lets just assume that we

    already have the BVH in place.

    Independent Traversal

    Giv en the bounding box of a particular object, it is straightforward to formulate a rec ursive

    algorithm to query all the objects whose bo unding boxes it ov erlaps. The following function takes

    a BVH in the parameter bvh and an AABB to que ry against it in the parameter queryAABB. It tests

    the AABB against the BVH recursively and returns a list of potential co llisions.

    void traverseRecursive(CollisionList& list,

    const BVH& bvh,

    const AABB& queryAABB,

    int queryObjectIdx,

    NodePtr node)

    {

    // Bounding box overlaps the query => process node.

    if(checkOverlap(bvh.getAABB(node), queryAABB))

    {

    // Leaf node => report collision.

    if(bvh.isLeaf(node))

    list.add(queryObjectIdx, bvh.getObjectIdx(node));

    // Internal node => recurse to children.

    1 8

    1 7 12 4 5

    1 4

  • 7/27/2019 Thinking Parallel, Part 2

    3/10

  • 7/27/2019 Thinking Parallel, Part 2

    4/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 4/10

    The first line of the kernel compute s a linear 1D index for the curr ent thread. We do not make any

    assumptions about the block and grid sizes. It is enough to launch at least numObjectsthreads in

    one way or anotherany e xc ess threads will get terminated by the seco nd line. The third line

    fetches the bounding box o f the corresponding objec t, and calls our function to perform

    recursive trav ersal, passing the objec ts index and the pointer to the ro ot node o f the BVH in the

    last two arguments.

    To test our implementation, we will run a dataset taken fromAPEX Destruc tio n using a GeForceGTX 690 GPU. The data set c ontains 12,486 o bjects representing debris falling from the walls of a

    corridor, and 7 3,7 04 pairs of potentially co lliding objects, as shown in the following screenshot.

    The total execution time of our kernel for this dataset is 3.8 milliseconds. Not very good

    considering that this kernel is just one part o f collision detec tion, which is only one part of a

    simulation that we would ideally like to run at 60 FPS (16 ms). We should be able to do better .

    Minimizing Divergence

    The most o bv ious problem with our rec ursive implementation is high execution divergence. The

    decision of whether to skip a given node or recur se to its children is made independently b y each

    thread, and there is nothing to guarantee that nearby threads will remain in sync o nce they have

    made different decisions. We can fix this by performing the trav ersal in an iterative fashion, and

    managing the recursion stac k explicitly, as in the following function.

    __device__ void traverseIterative(CollisionList& list,

    BVH& bvh,

    AABB& queryAABB,

    int queryObjectIdx)

    {

    // Allocate traversal stack from thread-local memory,

    // and push NULL to indicate that there are no postponed nodes.

    NodePtr stack[64];

    NodePtr* stackPtr = stack;

    http://en.wikipedia.org/wiki/GeForce_600_Serieshttps://developer.nvidia.com/apex-destruction
  • 7/27/2019 Thinking Parallel, Part 2

    5/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 5/10

    *stackPtr++= NULL;// push

    // Traverse nodes starting from the root.

    NodePtr node = bvh.getRoot();

    do

    {

    // Check each child node for overlap.

    NodePtr childL = bvh.getLeftChild(node);

    NodePtr childR = bvh.getRightChild(node);

    bool overlapL =( checkOverlap(queryAABB,

    bvh.getAABB(childL)));

    bool overlapR =( checkOverlap(queryAABB,

    bvh.getAABB(childR)));

    // Query overlaps a leaf node => report collision.

    if(overlapL && bvh.isLeaf(childL))

    list.add(queryObjectIdx, bvh.getObjectIdx(childL));

    if(overlapR && bvh.isLeaf(childR))

    list.add(queryObjectIdx, bvh.getObjectIdx(childR));

    // Query overlaps an internal node => traverse.

    bool traverseL =(overlapL &&!bvh.isLeaf(childL));

    bool traverseR =(overlapR &&!bvh.isLeaf(childR));

    if(!traverseL &&!traverseR)

    node =*--stackPtr;// pop

    else

    {

    node =(traverseL)? childL : childR;

    if(traverseL && traverseR)

    *stackPtr++= childR;// push

    }

    }

    while(node != NULL);

    }

    The loop is exec uted once for ev ery internal node that ov erlaps the query box . We begin by

    chec king the children of the current node for ov erlap, and report an intersection if one of them is

    a leaf. We then chec k whether the ov erlapped children are internal nodes that need to be

    proc essed in a subsequent iteration. If there is only one child, we simply set it as the curre nt

    node and start ov er. If there are two c hildren, we set the left child as the current node and push

    the right child onto the stack. If there are no children to be trav ersed, we po p a node that was

    prev iously pushed to the stack. The traversal ends when we pop NULL, which indicates that there

    are no more nodes to proc ess.

    The total execution time of this kernel is 0.91 millisecondsa rather substantial improvement

    over 3.8 m s for the recur sive kernel! The reason for the improv ement is that each thread is now

    simply exec uting the same loop o v er and ov er, regardless of which traver sal decisions it ends up

  • 7/27/2019 Thinking Parallel, Part 2

    6/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 6/10

    making. This means that nearby threads exec ute ev ery iteration in sync with each other, ev en if

    they are trav ersing completely different parts of the tree.

    But what if threads are indeed traversing complete ly different parts of the tree? That means that

    they are acc essing different nodes (data divergence) and exec uting a different number of

    iterations (execution divergence). In our curr ent algorithm, there is nothing to guarantee that

    nearby threads will actually process objects that are nearby in 3D space. The amount of

    divergence is therefore ve ry sensitive to the order in which the objects are specified.

    Fortunately, we c an exploit the fact that the objects we want to query are the same objects from

    which we constructed the BVH. Due to the hierarc hic al nature of the BVH, objec ts c lose to each

    other in 3D are also likely to be loc ated in nearby leaf nodes. So lets order our queries the same

    way , as sho wn in the following kernel c ode.

    __global__ void findPotentialCollisions(CollisionList list,

    BVH bvh)

    { int idx = threadIdx.x + blockDim.x * blockIdx.x;

    if(idx < bvh.getNumLeaves())

    {

    NodePtr leaf = bvh.getLeaf(idx);

    traverseIterative(list, bvh,

    bvh.getAABB(leaf),

    bvh.getObjectIdx(leaf));

    }

    }

    Instead of launching one thread per o bject, as we did prev iously, we are now launching one

    thread per leaf node. This does not affect the behav ior of the kernel, since each o bject will still get

    processed ex actly once. However, it changes the or dering of the threads to minimize bo th

    exe cution and data div ergence. The total exec ution time is now0.43 millisecondsthis triv ial

    change improv ed the performance o f our algorithm by another 2x!

    There is still one minor pro blem with our algorithm: ev ery potential collision will be repor ted

    twiceonce by each participating objectand objects will also report c ollisions with themselves.

    Reporting twice as many co llisions also means that we have to perform twice as much wor k.Fortunately , this can be avoided through a simple modification to the algorithm. In order for

    objec t A to r eport a c ollision with object B, we require that A must appear before B in the tree.

    To avoid traversing the hierarchy all the way to the leav es in order to find out whether this is the

    case, we can store two additional pointers for ev ery internal node, to indicate the rightmost leaf

    that can be reached through eac h of its children. During the trav ersal, we can then skip a node

    whe never we notic e that it c annot be used to reach any leav es that wo uld be located after our

    query node in the tree.

    __device__ void traverseIterative(CollisionList& list,

    BVH& bvh,

    AABB& queryAABB,

  • 7/27/2019 Thinking Parallel, Part 2

    7/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 7/10

    int queryObjectIdx,

    NodePtr queryLeaf)

    {

    ...

    // Ignore overlap if the subtree is fully on the

    // left-hand side of the query.

    if(bvh.getRightmostLeafInLeftSubtree (node)

  • 7/27/2019 Thinking Parallel, Part 2

    8/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 8/10

    So, the parallel implementation of simultaneous trav ersal does less work than independent

    trav ersal, and it does not lack in parallelism, either. Sounds good, right? Wrong. It actually

    performs a lot worse than independent traversal. How is that possible?

    The answer isyou guessed itdivergence. In simultaneous trav ersal, each thread is working on

    a complete ly different por tion of the tree, so the data divergence is high. There is no cor relation

    between the traversal dec isio ns made by nearby threads, so the ex ec ution divergence is also

    high. To make matters even worse, the exe cution times of the individual threads vary wildlythreads that are giv en a non-overlapping initial pair will exit immediately, whereas the ones

    given a node paired with itself are likely to ex ecute the longest.

    Maybe there is a way to o rganize the computation differently so that simultaneous trav ersal

    would y ield better results, similar to what we did with independent trav ersal? There hav e been

    many attempts to accomplish something like this in other contex ts, using clev er work

    assignment, packet trav ersal, warp-synchronous programming, dynamic load balancing, and so

    on. Long story short, yo u can get pretty close to the performance of independent traver sal, but it

    is extremely difficult to actually be at it.

    Discussion

    We hav e lo oked at two way s of per forming broad-phase collision detectio n by trav ersing a

    hierarchical data structure in parallel, and we have seen that minimizing divergence through

    relatively simple algorithmic modifications can lead to substantial performance improv ements.

    Comparing independent trav ersal and simultaneous tr aversal is interesting because it highlights

    an important lesson about parallel programming. Independent traversal is a simple algorithm,

    but it performs mo re work than necessary . ov erall. Simultaneo us traversal, on the other hand, ismore intelligent about the work it performs, but this comes at the price of increased c omplex ity.

    Complex algorithms tend to be harder to parallelize, are more susceptible to divergence , and

    offer less flexibility when it co mes to optimization. In our ex ample, these effects end up

    completely nullifying the benefits of reduced overall computation.

    Parallel programming is often less about ho w much work the program per forms as it is about

    whe ther that work is div ergent or no t. Algorithmic complex ity often leads to div ergence, so it is

    important to try the simplest algorithm first. Chances are that after a few rounds o f optimization,

    the algorithm runs so well that more c omplex alternatives hav e a hard time competing with it.

    In my nex t post, I will focus on parallel BVH construction, talk about the problem of occupancy ,

    and present a recently published algorithm that explicitly aims to maximize it.

    Abo ut the autho r: Tero Karras is a graphics researc h scientist at NV IDIA Researc h.

    Parallel Forall is the NVI DIA Parallel Programming blog. If you enjoy ed this post, subsc ribe to

    theParallel Forall RSS fee d!

    NVIDIA Developer Programs

    Get exclusiv e access to the lat est softwar e, report bugs and r eceiv e notifications for special ev ents.

    https://developer.nvidia.com/blog/feed/2
  • 7/27/2019 Thinking Parallel, Part 2

    9/10

    10/10/13 Thinking Parallel, Part II: Tree Traversal on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-ii-tree-traversal-gpu 9/10

    Lear n m ore and Register

    Recommended Reading

    About Par al lel Forall

    Conta ct Para llel Forall

    Parallel Forall Blog

    Featured Articles

    Prev iousPauseNext

    Tag Index

    accelerometer (1) Alg orithm s (3) An droid (1 ) ANR (1 ) ARM (2) Ar ray Fire (1) Au di (1) Aut omotiv e &

    Embedded (1 ) Blog (2 0) Blog (2 3) Cluster (4 ) competition (1 ) Compilation (1 ) Concur rency (2)

    Copperh ead (1 ) CUDA (2 3) CUDA 4.1 (1 ) CUDA 5.5 (3 ) CUDA C (1 5) CUDA Fort ra n (1 0) CUDA Pro Tip(1 ) CUDA Profiler (1 ) CUDA Spotligh t (1 ) CUDA Zone (81 ) CUDACasts (2) Debug (1 ) Debugg er (1 )

    Debugging (3) Dev elop 4 Shield (1) dev elopment kit (1 ) DirectX (3) Eclipse (1 ) Ev ents (2) FFT (1 ) Finite

    Difference (4) Floatin g Point (2 ) Gam e & Graphics Dev elopment (3 5) Gam es and Gra phics (8) GeForce

    Dev eloper Stories (1 ) getting started (1) google io (1 ) GTC (2) Hardwar e (1 ) Interv iew (1 ) Kepler (1 )

    Lamborghini (1 ) Librar ies (4) m emory (6) Mobile Dev elopment (2 7 ) Monte Car lo (1 ) MPI (2) Multi -GPU

    (3) n ativ e_app_glue (1 ) NDK (1 ) NPP (1 ) Nsight (2) N Sight Eclipse Edition (1 ) Nsight Tegra (1 )

    NSIGHT Visual Stu dio Edition (1 ) Num baPro (2) Nu m erics (1) NV IDIA Parallel Nsight (1 ) nv idia-smi

    (1 ) Occupancy (1 ) OpenACC (6) OpenGL (3) OpenGL ES (1) Parallel Forall (6 9) Parallel Nsight (1 )

    Parallel Progr am m ing (5) PerfHUD ES (2) Perform anc e (4) Port ability (1 ) Port ing (1 ) Pro Tip (5)

    Professional Gra phics (6) Profiling (3) Program m ing Lang ua ges (1) Py thon (3 ) Robotics (1 ) Shape

    Sensing (1 ) Shared Memory (6) Shield (1) Stream s (2) tablet (1 ) TADP (1 ) Technologies (3) t egra (5)

    Tegra A ndroid Developer Pack (1 ) Tegra A ndroid Development Pack (1 ) Tegra Dev eloper Stories (1)

    Tegra Profiler (1 ) Tegra Zone (1 ) Textur es (1) Th rust (3 ) Tools (10) tools (2) Toradex (1 ) Visual Stu dio

    (3 ) Windows 8 (1 ) xoom (1 ) Zone In (1 )

    Developer Blogs

    Parallel Forall Blog

    About

    Contact

    Copyright 2013 NVIDIA Corporation

    Legal Inf ormation

    http://www.nvidia.com/object/legal_info.htmlhttps://developer.nvidia.com/contacthttp://www.nvidia.com/page/companyinfo.htmlhttp://developer.nvidia.com/blog/feed/2https://developer.nvidia.com/blog/tags/4392?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4211?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4945?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4161?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4861?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/256?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/50?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4821?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4849?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4397?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4939?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4971?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4940?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4914?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/51?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/221?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4865?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/316?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4844?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4854?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4848?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4872?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4858?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4841?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4526?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4847?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4869?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4850?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3546?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4831?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/231?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/176?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4880?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/311?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3356?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4621?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4846?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4944?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4626?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4681?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4953?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4903?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4938?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4949?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4776?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4701?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/296?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4866?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4853?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4756?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4871?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4836?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/431?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4862?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4870?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4859?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/181?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4201?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/421?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4948?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4977?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/1?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4851?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4946?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4741?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4811?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3131?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/276?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4915?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4521?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4372?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4874?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/2?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4336?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/466?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4873?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/456?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4501?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4933?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4631?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4857?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4845?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4864?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4860?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4852?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4917?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4905?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4786?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/436?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4842?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4868?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4867?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/496?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4843?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/321?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttp://developer.nvidia.com/blog/feed/2http://developer.nvidia.com/content/contact-parallel-forallhttp://developer.nvidia.com/about-parallel-forallhttps://developer.nvidia.com/blog?term=2https://developer.nvidia.com/registered-developer-programs
  • 7/27/2019 Thinking Parallel, Part 2

    10/10