Thinking in G i *(d) calculation with Map-Reduce

4
Thinking in G i *(d) calculation with Map- Reduce 2010-3-29

description

Thinking in G i *(d) calculation with Map-Reduce. 2010-3-29. Preprocessing Generate Data Table Divide domain into cells, count number of points in every cell; Accumulate cells into quads; Put all points into quads (I/O intensive operation? need Map Reduce?) Generate Index Table:O (n 2½ )? - PowerPoint PPT Presentation

Transcript of Thinking in G i *(d) calculation with Map-Reduce

Page 1: Thinking in G i *(d) calculation with Map-Reduce

Thinking in Gi*(d) calculation with Map-Reduce

2010-3-29

Page 2: Thinking in G i *(d) calculation with Map-Reduce

• Preprocessing– Generate Data Table

• Divide domain into cells, count number of points in every cell;• Accumulate cells into quads;• Put all points into quads(I/O intensive operation? need Map Reduce?)

– Generate Index Table:O(n2½)?• For every quad, increase its boundary by step, till it covers the whole domain.

– In every step, calculate quads which intersect with;(need spatial index?)– Store the deduplicate index item into index table.

• Calculation of Gi*(d)

– Algorithm of Gi*(d) in M-R(?)

• counts how many neighbor quads should be used by index table;• Copy current quad to nodes which neighbor quads reside;• Do map task to calculate Gi

*(d) in all neighbored nodes;

• Do reduce task to calculate Gi*(d).

– C/C++ should be used in Gi*(d) calculation

– GPU may be helpful in calculation.– Hotspot cells/quads should be reside in memory/most of nodes– How to accelerate calculation by tuning MR parameters/ Gi*(d) algorithm

parameters?

Page 3: Thinking in G i *(d) calculation with Map-Reduce

Structure of Tables• DATA_TABLE

– Row : Quad_id– Family : data

• Count : points in Quad• Body

– point info : point1/point2/point3/……– Each point record : x/y/z(3 float point number, 12 bytes)

• INDEX_TABLE– Row: Quad_id– Family : border

• XS• XE• YS• YE

– Family : D• D1• D2• …• Dn

Page 4: Thinking in G i *(d) calculation with Map-Reduce

Storage model• Data distribution strategies

– Evenly distributed in all nodes– Locality distributed

• Data Cache Strategies– ??– ??

• Application model– Batch processing of Gi*(d) (per cell/per

quad)– Interactive processing of Gi*(d) (per point)

• Support for different storage strategies

node0 0 1 … 99

node1 100 101 … 199

node9 900 901 … 999

node0 0 10 … 990

node1 1 11 … 991

node9 9 19 … 999

Evenly distributed

Locality distributed