FAST MAP PROJECTION ON CUDA.ppt

30
FAST MAP PROJECTION ON CUDA Yanwei Zhao Institute of Computing Technology Chinese Academy of Sciences July 29, 2011

Transcript of FAST MAP PROJECTION ON CUDA.ppt

Page 1: FAST MAP PROJECTION ON CUDA.ppt

FAST MAP PROJECTION ON CUDA

Yanwei Zhao

Institute of Computing Technology

Chinese Academy of Sciences

July 29, 2011

Page 2: FAST MAP PROJECTION ON CUDA.ppt

OutlineOutline

Institute of Computing Technology,Chinese Academy of Sciences

Page 3: FAST MAP PROJECTION ON CUDA.ppt

OutlineOutline

Institute of Computing Technology,Chinese Academy of Sciences

Page 4: FAST MAP PROJECTION ON CUDA.ppt

Map Projection Establish the relationship between two different

coordinate systems. geographical coordinates → planar cartesian map space

coordinate system

Complicated and time consuming arithmetic operations. Fast answer with desired accuracy→ Slow exact

answer It's need to be accelerated for interactive GIS scenarios.

Institute of Computing Technology,Chinese Academy of Sciences

Page 5: FAST MAP PROJECTION ON CUDA.ppt

GPGPU(The general purpose computing on graphics processing units)

GPGPU is a young area of research. Advantage of GPU

Flexibility Power processing Low cost

GPGPU in applications other than 3D graphics GPU accelerates critical path of application

Institute of Computing Technology,Chinese Academy of Sciences

Page 6: FAST MAP PROJECTION ON CUDA.ppt

CUDA(Common Unified Device Architecture) NVIDIA's parallel computing

architecture C base programming

language and development toolkit

Advantage: Programmer can focus on the

important issues rather than an unfamiliar language

No need of graphics APIs and write efficient parallel code

Institute of Computing Technology,Chinese Academy of Sciences

Page 7: FAST MAP PROJECTION ON CUDA.ppt

The characteristic of Map Projection

Huge amount of coordinates to handle

The complexity of arithmetic operations

The requirement of a realtime response

Institute of Computing Technology,Chinese Academy of Sciences

Page 8: FAST MAP PROJECTION ON CUDA.ppt

Our proposals

using the new technology CUDA on the GPU

Take Universal Transverse Mercator (UTM) projection as an example

Performance: Improvement of up to 6x to 8x

(include transfer time) Speed up 70x to 90x

(not include transfer time)Institute of Computing Technology,

Chinese Academy of Sciences

Page 9: FAST MAP PROJECTION ON CUDA.ppt

OutlineOutline

Institute of Computing Technology,Chinese Academy of Sciences

Page 10: FAST MAP PROJECTION ON CUDA.ppt

Algorithm frameworkCPU

CPU

3. Copy the data from CPU to GPU global memory

5. Copy the result from GPU to CPU

GPU

1.Open the shapefile2.Read the coordinates of all features

6.free up the device memory

Block 0

…………

Thr

ead

0

Thr

ead

1

Thr

ead

n

Block m

……

Thr

ead

0

Thr

ead

1

Thr

ead

n

4. Execute the kernel function

7.Save or display the result

Striped partitioning

Matrix distribution

Institute of Computing Technology,Chinese Academy of Sciences

Page 11: FAST MAP PROJECTION ON CUDA.ppt

Striped partitioning

Define the number of block and thread: Block_num,Thread_num

CUDA built-in parameters: GridDim, BlockDim

Geographic feature number: fn

Each block runs features: fn/GridDim.x

Institute of Computing Technology,Chinese Academy of Sciences

Block 0

Block 1

Block m

feature 0

feature 1

feature m

feature m+1

feature m+2

feature 2m

……

……

coord 0

coord 1

coord n

coord 0

coord 1

coord n

thread 0

thread 1

thread n

The relationship between blocks

and features

The relationship between threads and coordinates

Page 12: FAST MAP PROJECTION ON CUDA.ppt

Striped partitioning

For surrounding loop: Blocks and features Block → Feature[i] i = blockidx.x*(fn/GridDim.x)

(1)

Block → next Feature[k] k = i + fn/GridDim.x (2)

For inner loop: Threads and coordinates thread→coord[j]

j = threadIdx.x thread→next coord[k]

k = j +Thread_numInstitute of Computing Technology,

Chinese Academy of Sciences

Block 0

Block 1

Block m

feature 0

feature 1

feature m

feature m+1

feature m+2

feature 2m

……

……

coord 0

coord 1

coord n

coord 0

coord 1

coord n

thread 0

thread 1

thread n

The relationship between blocks

and features

The relationship between threads and coordinates

Page 13: FAST MAP PROJECTION ON CUDA.ppt

Striped partitioning

For surrounding loop: Blocks and features Block → Feature[i]

i = blockidx.x*(fn/GridDim.x) Block → next Feature[k]

k = i + fn/GridDim.x

For inner loop: Threads and coordinates thread→coord[j]

j = threadIdx.x (1) thread→next coord[k] k = j +Thread_num (2)

Institute of Computing Technology,Chinese Academy of Sciences

Block 0

Block 1

Block m

feature 0

feature 1

feature m

feature m+1

feature m+2

feature 2m

……

……

coord 0

coord 1

coord n

coord 0

coord 1

coord n

thread 0

thread 1

thread n

The relationship between blocks

and features

The relationship between threads and coordinates

Page 14: FAST MAP PROJECTION ON CUDA.ppt

Matrix distribution

. . 1

. .

fn gridDim x grdiDim yk

gridDim x grdiDim y

Institute of Computing Technology,Chinese Academy of Sciences

Define the number of block and thread: grid(br,bc), block(tr,tc)

Each block run k features, where: (1)

Feature[i]: (2)

(3)

. .

. .

i blockIdx y GridDim x k

i blockIdx y GridDim x k k

Page 15: FAST MAP PROJECTION ON CUDA.ppt

Matrix distribution

Each block run s coordnates, where:

(1)

coord[j]:

[ ]. . . 1

. .

feature i size blockDim x blockDim ys

blockDim x blockDim y

. .

. .

j threadIdx y BlockDim x s

j threadIdx y BlockDim x s s

Institute of Computing Technology,Chinese Academy of Sciences

Page 16: FAST MAP PROJECTION ON CUDA.ppt

OutlineOutline

Institute of Computing Technology,Chinese Academy of Sciences

Page 17: FAST MAP PROJECTION ON CUDA.ppt

Experiment Environment

Hardware: CPU: Intel Core2 Duo CPU E8500 at 3.18GHz with

2GB of internal memory GPU: NVIDIA GeForce 9800 GTX+ graphics card

which has 512MB memory, 128 CUDA cores and 16 multiprocessors

Software: Microsoft Windows XP Pro SP2 Microsoft Visual Studio 2005 NVIDIA driver 2.2, CUDA sdk 2.2 and CUDA toolkit 2.2

Institute of Computing Technology,Chinese Academy of Sciences

Page 18: FAST MAP PROJECTION ON CUDA.ppt

The data parallel degree

total CPU time : initialization and file reading time serial projection time

Institute of Computing Technology,Chinese Academy of Sciences

Page 19: FAST MAP PROJECTION ON CUDA.ppt

The data parallel degree

total CPU time : initialization and file reading time serial projection time

Map projection can achieve more than 90 percent of parallelism.

Institute of Computing Technology,Chinese Academy of Sciences

Page 20: FAST MAP PROJECTION ON CUDA.ppt

Comparing with CPU

Block_num=64 Thread_num=512

Institute of Computing Technology,Chinese Academy of Sciences

Page 21: FAST MAP PROJECTION ON CUDA.ppt

Comparing with CPU

Total time = map projection time + data transfer time

Institute of Computing Technology,Chinese Academy of Sciences

Page 22: FAST MAP PROJECTION ON CUDA.ppt

Comparing with CPU

If consider the total time, the performance can obtain 6x to 8x.

Institute of Computing Technology,Chinese Academy of Sciences

Page 23: FAST MAP PROJECTION ON CUDA.ppt

Comparing with CPU

If only compare map projection time, we can obtain 70x to 90x speedups.

Institute of Computing Technology,Chinese Academy of Sciences

Page 24: FAST MAP PROJECTION ON CUDA.ppt

The performance of different task assignments

striped partitioning : Block_num=64, Thread_num=512

matrix distribution: dim_grid(32,32) = 32*32 blocks dim_block(256,256) = 256*256 threads

Institute of Computing Technology,Chinese Academy of Sciences

Page 25: FAST MAP PROJECTION ON CUDA.ppt

The performance of different task assignments

striped partitioning : Block_num=64, Thread_num=512

matrix distribution: dim_grid(32,32) = 32*32 blocks dim_block(256,256) = 256*256 threads

Striped: 6x to 8x

Matrix: 4x to 6x

Institute of Computing Technology,Chinese Academy of Sciences

Page 26: FAST MAP PROJECTION ON CUDA.ppt

The performance of different task assignments

thre

ad 0

thre

ad 1

thre

ad n

-1

……

thre

ad 0

thre

ad 1

thre

ad n

-1

……

thre

ad 0

thre

ad 1

thre

ad n

-1

…………

Block 0 Block 1 Block m-1

Global Memory

……………… …… …… ……0 1 n-1 n n+

1

2n mn

mn

+n

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(0,0)

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(m,0)

… … …Global

Memory

B(0,0) B(1,0) B(m,0)

B(0,m) B(1,m) B(m,m)

… … … …

Grid 0

BlockDim.x*GridDim.x

Matrix Striped

Institute of Computing Technology,Chinese Academy of Sciences

Page 27: FAST MAP PROJECTION ON CUDA.ppt

The performance of different task assignments

thre

ad 0

thre

ad 1

thre

ad n

-1

……

thre

ad 0

thre

ad 1

thre

ad n

-1

……

thre

ad 0

thre

ad 1

thre

ad n

-1

…………

Block 0 Block 1 Block m-1

Global Memory

……………… …… …… ……0 1 n-1 n n+

1

2n mn

mn

+n

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(0,0)

t(0,0) t(1,0) t(n,0)

t(0,1) t(1,1) t(n,1)

t(0,n) t(1,n) t(n,n)

… … … …

Block(m,0)

… … …Global

Memory

B(0,0) B(1,0) B(m,0)

B(0,m) B(1,m) B(m,m)

… … … …

Grid 0

BlockDim.x*GridDim.x

Matrix Striped

All threads in the block accessing consecutive memory.it can only ensure each row of

threads in the block handle consecutive data

Institute of Computing Technology,Chinese Academy of Sciences

Page 28: FAST MAP PROJECTION ON CUDA.ppt

OutlineOutline

Institute of Computing Technology,Chinese Academy of Sciences

Page 29: FAST MAP PROJECTION ON CUDA.ppt

Conclusion and Future work Implement a fast map projection method.

CUDA-enabled GPUs high speed-up compared to the CPU-based

method the power of modern GPU is able to considerably

speed up in the field of geoscience DEM-based spatial interpolation raster-based spatial analysis

Future work: GPU implementation of other GIS application

Institute of Computing Technology,Chinese Academy of Sciences

Page 30: FAST MAP PROJECTION ON CUDA.ppt

Thank you!Q & A

Yanwei Zhao

Institute of Computing Technology

Contact: [email protected]

Institute of Computing Technology,Chinese Academy of Sciences