Accelerating Robust Normal...

1
Accelerating Robust Normal Estimation A comparison of CPU vs. GPU performance Rohit Gupta & Ian Endres HERE Technologies [email protected] — () Abstract We present an implementation for calculating surface normals on the CPU and GPU.The GPU implementation can be up to an order of magnitude faster than the CPU implementation for the setup on a Desktop workstation with a -core ( thread) Intel Xeon processor and a Quadro M from NVIDIA with GB memory on board. Introduction In order to understand the contents of a scene from point clouds (Figure ), it is oen required to calculate normals of the surface the point cloud represents. Furthermore, this calculation is highly time consuming and recurring so optimizing it can have a signicant impact on the total runtime. Algorithm We use the RANSAC approach for calculating surface normals on our point clouds. In Algorithm we list the computational steps in our implementation. Algorithm Robust L normals estimation : Inputs . Point cloud: X = x i , N number of points . No. of neighbors: K neigh , . No. of normal hypotheses/point: K samp = K neigh . Outputs . Surface normals/point: V = v i . Goodness of t score/point:S = s i : Calculate nearest neighbors K Neigh for each point in X . : for i:,...,N do : y ij = x i - x j , where j = , ..., K neigh .{Compute relative vectors and store in Y = y ij } : ˆ Y = y ij |y ij | {Normalize Y} : end for : S = ,m, n Uniform random distribution in K neigh : for k,...,K samp do : u = ˆ y m × ˆ y n , ˆ U = u |u| {Normalize U} : s 0 = j | ˆ y T j ˆ u|{Compute hypothesis quality score (minimum value)} : if (s 0 < s i ) then : v i = ˆ u, s i = s 0 : end if : end for We chose to implement our normal estimation using the RANSAC method since it is robust to some edge eects which PCA is prone to. The RANSAC method shows a marked improvement (Figure ) as compared to the PCA scheme (Figure ) at discontinuities or corners in the point cloud (Figure ). It exhibits a change in direction of the normals which mimics the physical surface. The PCA method relies on using calculation of eigenvectors and eingenvalues of the covariance matrices generated using neighbors of a query point. Query points can be all the points in the given point cloud. The RANSAC (Random sample consensus) on the other hand chooses a random subset of the relative distances from the neighbors of the query points. We further create a set of hypothesis which are the outer products of these (randomly chosen) relative distances amongst each other. Using the hypothesis we generated we calculate the inner product of each of the hypothesis for each point with the relative vectors to arrive at a set of scores per point. The lowest score gives us our candidate normal. Figure : Point Cloud with region of interest Figure : PCA Figure : Robust Division into Kernels The code was broken down into kernels. Brief description of the kernels used follows: I FLANN NN search – This kernel is a call to the nearest neighbor search as implemented in FLANN. Line in Algorithm . I Populate Y – This kernel generates a matrix whose rows contain relative vectors of the current query point to all its neighbors. So, the matrix size is N × K neigh . It also normalizes each of these vectors. Lines through in Algorithm . I Calculate normals – This kernel calculates the cross product of two randomly selected vectors from Y matrix for all points. The number of generated cross products is K samp . They are also normalized. Line in Algorithm . I Dot products – This kernel calculates dot products of each query point with each of its candidate cross products to generate K samp scores per point in the point cloud. Line in Algorithm . I Best normal search – The normal vector (cross product from calculate normals kernel) with the minimum of these scores is chosen as the normal for a query point in this kernel. Lines through in Algorithm . Results In the adjoining plots we show how the execution times vary for both robust normal calculation and the preceding calculation of nearest neighbors on the GPU. In Figure we show the time taken by the FLANN implementation to calculate nearest neighbors. This is common to both CPU and GPU implementations. In Figures nd we have the plots for the execution times for CPU and GPU implementations of normal estimation. Figure : Nearest neighbor calc. using FLANN Figure : Robust normal calculation on the CPU Figure : Robust normal calculation on the GPU Discussion In Figure we can notice that the speedup can be more than an order of magnitude when comparing CPU and GPU implementations. However, for larger sizes of point clouds and larger nieghborhoods our method requires more memory on the GPU. Figure : Speedup for robust normal calculation Figure : Comparing error distribution with respect to reference In Figure we plot the distribution of errors of our method versus the reference which comes from a PCA based normal estimation implementation. We did this for many dierent runs of our method and we saw the same kind of error distribution. Conclusion In conclusion we would like to state that using the GPU for calculating surface normals can be benecial. If we take a look at the proling information for the two implementations Kernel Name Percentage execution time CPU GPU FLANN NN search Populate Y . . Calculate normals . . Dot products . Best normal search . . Table : Dataset with M points ( neighbors). Execution on CPU and GPU We can notice how most of the time on the GPU is spent in calculating the nearest neighbors. The normal calculation subroutines on the other hand are dominated by the data intensive steps in the beginning of the calculation. On the CPU the calculation of dot products looses heavily to the GPU since on the GPU all dot products are calculated in parallel whereas the CPU only has threads for parallel execution.

Transcript of Accelerating Robust Normal...

Page 1: Accelerating Robust Normal Estimationon-demand.gputechconf.com/gtc-eu/2017/presentation/HPC_06_Rohi… · Rohit Gupta & Ian Endres HERE Technologies rohit.gupta@here.com — +31 (40)

Accelerating Robust Normal EstimationA comparison of CPU vs. GPU performanceRohit Gupta & Ian EndresHERE [email protected] — +31 (40) 744 1368

AbstractWe present an implementation for calculating surface normals on the CPUand GPU.The GPU implementation can be up to an order of magnitude fasterthan the CPU implementation for the setup on a Desktop workstation with a6-core (12 thread) Intel Xeon processor and a Quadro M4000 from NVIDIAwith 8GB memory on board.

IntroductionIn order to understand the contents of a scene from point clouds (Figure 1), itis o�en required to calculate normals of the surface the point cloudrepresents. Furthermore, this calculation is highly time consuming andrecurring so optimizing it can have a signi�cant impact on the total runtime.

AlgorithmWe use the RANSAC approach for calculating surface normals on our pointclouds. In Algorithm 1 we list the computational steps in our implementation.Algorithm 1 Robust L1 normals estimation1: Inputs1. Point cloud: X = xi, N number of points2. No. of neighbors: Kneigh,3. No. of normal hypotheses/point: Ksamp =

Kneigh2 .

Outputs1. Surface normals/point: V = vi2. Goodness of �t score/point:S = si

2: Calculate nearest neighbors KNeigh for each point in X .3: for i:=0,...,N do4: yij = xi − xj, where j = 0, ...,Kneigh.{Compute relative vectors and store inY = yij}

5: Y =yij|yij|{Normalize Y}

6: end for7: S = 0,m,n Uniform random distribution in Kneigh8: for k=0,...,Ksamp do9: u = ym × yn, U = u

|u|{Normalize U}10: s′ =

∑j |yTj u| {Compute hypothesis quality score (minimum value)}

11: if (s′ < si) then12: vi = u, si = s′13: end if14: end for

We chose to implement our normal estimation using the RANSAC methodsince it is robust to some edge e�ects which PCA is prone to. The RANSACmethod shows a marked improvement (Figure 3 ) as compared to the PCAscheme (Figure 2) at discontinuities or corners in the point cloud (Figure 1). Itexhibits a change in direction of the normals which mimics the physicalsurface.The PCA method relies on using calculation of eigenvectors and eingenvaluesof the covariance matrices generated using neighbors of a query point. Querypoints can be all the points in the given point cloud.

The RANSAC (Random sample consensus) on the other hand chooses arandom subset of the relative distances from the neighbors of the querypoints. We further create a set of hypothesis which are the outer products ofthese (randomly chosen) relative distances amongst each other. Using thehypothesis we generated we calculate the inner product of each of thehypothesis for each point with the relative vectors to arrive at a set of scoresper point. The lowest score gives us our candidate normal.

Figure 1: Point Cloud with region of interest

Figure 2: PCA

Figure 3: Robust

Division into KernelsThe code was broken down into kernels. Brief description of the kernels usedfollows:I FLANN NN search – This kernel is a call to the nearest neighbor search asimplemented in FLANN. Line 2 in Algorithm 1.

I Populate Y – This kernel generates a matrix whose rows contain relativevectors of the current query point to all its neighbors. So, the matrix size isN× Kneigh. It also normalizes each of these vectors. Lines 3 through 6 inAlgorithm 1.

I Calculate normals – This kernel calculates the cross product of tworandomly selected vectors from Y matrix for all points. The number ofgenerated cross products is Ksamp. They are also normalized. Line 9 inAlgorithm 1.

IDot products – This kernel calculates dot products of each query point witheach of its candidate cross products to generate Ksamp scores per point inthe point cloud. Line 10 in Algorithm 1.

I Best normal search – The normal vector (cross product from calculatenormals kernel) with the minimum of these scores is chosen as the normalfor a query point in this kernel. Lines 11 through 13 in Algorithm 1.

ResultsIn the adjoining plots we show how the execution times vary for both robustnormal calculation and the preceding calculation of nearest neighbors on theGPU. In Figure 4 we show the time taken by the FLANN implementation tocalculate nearest neighbors. This is common to both CPU and GPUimplementations. In Figures 5 nd 6 we have the plots for the execution timesfor CPU and GPU implementations of normal estimation.

Figure 4: Nearest neighborcalc. using FLANN

Figure 5: Robust normalcalculation on the CPU

Figure 6: Robust normalcalculation on the GPU

DiscussionIn Figure 7 we can notice that the speedup can be more than an order ofmagnitude when comparing CPU and GPU implementations. However, forlarger sizes of point clouds and larger nieghborhoods our method requiresmore memory on the GPU.

Figure 7: Speedup for robust normalcalculation Figure 8: Comparing error

distribution with respect to reference

In Figure 8 we plot the distribution of errors of our method versus thereference which comes from a PCA based normal estimation implementation.We did this for many di�erent runs of our method and we saw the same kindof error distribution.

ConclusionIn conclusion we would like to state that using the GPU for calculatingsurface normals can be bene�cial. If we take a look at the pro�linginformation for the two implementations

Kernel Name Percentage execution timeCPU GPU

FLANN NN search 54 78Populate Y 13.2 11.7

Calculate normals 5.9 7.4Dot products 46 2.5

Best normal search 0.7 0.2Table 1: Dataset with 2M points (63 neighbors). Execution on CPU and GPU

We can notice how most of the time on the GPU is spent in calculating thenearest neighbors. The normal calculation subroutines on the other hand aredominated by the data intensive steps in the beginning of the calculation. Onthe CPU the calculation of dot products looses heavily to the GPU since onthe GPU all dot products are calculated in parallel whereas the CPU only has12 threads for parallel execution.