Large-Scale Spatial Query Processing on GPU-Accelerated Big Data...
Transcript of Large-Scale Spatial Query Processing on GPU-Accelerated Big Data...
![Page 1: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/1.jpg)
Large-Scale Spatial Query Processing
on GPU-Accelerated Big Data Systems
Jianting Zhang1,2 Simin You2
1 Depart of Computer Science, CUNY City College (CCNY)
2 Department of Computer Science, CUNY Graduate Center
http://www-cs.ccny.cuny.edu/~jzhang/
![Page 2: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/2.jpg)
Outline•Introduction
•Spatial data, GIS, BigData and HPC
•Taxi trip data in NYC and Global Biodiversity Applications
•Spatial query processing on GPUs
•ISP-GPU
•Architecture and Implementations
•Experiment Results
•Alternative Techniques
•SpatialSpark
•Lightweight Distributed Execution (LDE) Engine
•Summary an Future Work
![Page 3: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/3.jpg)
Computational
Geometry
Spatial Databases:
data modeling,
indexing, query
processing
Scientific Data/Information Visualization
Statistics/Machine learning
Geographical Information System
GIS
Environmental
Modeling
Air quality
Remote
Sensing
Hydrology
Ecology
Social-
Economic
Modeling Urban planning
Transportation
Census/Taxation
Social Studies
Computer
Graphics
Image Processing/Computer Vision
![Page 4: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/4.jpg)
Big Geospatial Data Challenges
– Event Locations, trajectories and O-D data
• E.g., Taxi trip records (GPS traces or O-D locations)
• 0.5 million in NYC (medallion taxi cab only) and 1.2 million in Beijing per day
• From O-D locations to trajectories to frequent patterns
– Satellite: e.g., from GOES to GOES-R (2015/2016) [$11B]
• http://www.goes-r.gov/downloads/GOES-R-Tri-10-06-09_v7.pdf
• Spectral (3X)*spatial (4X)* temporal (5X)=60X
• 2km*2km*5min*16bands(360*60)*(180*60)*(12*24)*16~ 1+ trillion pixels per day
• Derived thematic data products (vector)
– http://www.goes-r.gov/products/baseline.html
– http://www.goes-r.gov/products/option2.html
– Species distributions
• E.g. 400+ million occurrence records (GBIF)
• E.g. 717,057 polygons and 78,929,697 vertices for 4148 birds distribution data (NatureServe)
![Page 5: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/5.jpg)
Cloud computing+MapReduce+Hadoop
A
B
C
Thread Block
CPU Host (CMP)
Core
Local Cache
Shared Cache
DRAM
HDD SSD
GPU
SIMD
PCI-E
Ring Bus
Local Cache
Core ... Core
Core Core
GDRAM GDRAM
Core Core Core... ...
MIC
PCI-E
T0 T1
T2 T3
4-Threads
In-Order
16 Intel Sandy Bridge CPU cores+ 128GB RAM + 8TB disk + GTX TITAN + Xeon Phi 3120A ~ $9,994
![Page 6: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/6.jpg)
ASCI Red: 1997 First 1 Teraflops
(sustained) system with 9298 Intel Pentium
II Xeon processors (in 72 Cabinets)
•Feb. 2013
•7.1 billion transistors (551mm²)
•2,688 processors •4.5 TFLOPS SP and 1.3 TFLOPS DP
•Max bandwidth 288.4 GB/s
•PCI-E peripheral device
•250 W (17.98 GFLOPS/W -SP)
• Suggested retail price: $999
What can we do today using a device that is more
powerful than ASCI Red 16 years ago?
![Page 7: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/7.jpg)
Geospatial Technologies and
Environmental CyberInfrastructure
(GeoTECI) Lab
Dr. Jianting Zhang
Department of Computer Science
The City College of New York
Affiliated Institutions
Collaborating Institutions
Students: Simin You (Ph.D. 2009 -), Siyu Liao (Ph.D. 2014-), Costin Vicoveanu (Undergraduate, 2014-)
Bharat Rosanlall (Undergraduate, 2014), Jay Yao (MS-thesis, 2011-2012), Chandrashekar Singh (MS
2013), Agniva Banerjee (MS, 2012), Roger King (MS, 2012), Wahyu Nugroho (MS, 2011), Xiao Quan
Cen Feng (MS 2011), Chetram Dasrat (Undergraduate, 2008)
![Page 8: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/8.jpg)
GeoTECI@CCNY
•HIGHEST-DB
•HIgh-performance GrapHics units based Engine for Spatial-Temporal data
•Spatial and Spatiotemporal indexing, query processing and optimization
•Trajectory data management on GPUs
•Segmentation/simplification/compression/Aggregation/Warehousing
•Map matching with road networks
•Data mining (moving cluster, convoy, swarm...)
… when yellow cabs,
green cabs and MTA
buses meet with multi-
core CPUs, GPUs and
MICs in NYC …
$449,845/4yr (08/01/2013-07/31/2017)
![Page 9: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/9.jpg)
GeoTECI@CCNY
High-resolution Satellite ImageryT
In-situ Observation Sensor Data
T
Global and Regional Climate Model Outputs
Data AssimilationZonal Statistics
TEcological, environmental and administrative zones
T
T
V
Temporal Trends
High-End Computing Facility
A
B
C
Thread Block
ROIs
… when GOES-R satellites, extratropical
cyclones and hummingbirds meet with TITAN …
![Page 10: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/10.jpg)
GeoTECI@CCNYCCNY Computer Science LAN
Microway
Dual 8-core
128GB memory
Nvidia GTX Titan
Intel Xeon Phi 3120A
8 TB storage
DIY
*2
SGI Octane III
Dual Quadcore
48GB memory
Nvidia C2050*2
8 TB storage
Dual-core
8GB memory
Nvidia GTX Titan
3 TB storage
Dell T5400
Dual Quadcore
16GB memory
Nvidia Quadro 6000
1.5 TB storage
Lenovo T400s
Dell T7500
Dual 6-core
24 GB memory
Nvidia Quadro 6000
Dell T7500
Dual 6-core
24 GB memory
Nvidia GTX 480
Dual Quadcore
16GB memory
Nvidia FX3700*2
Dell T5400
DIY
Quadcore (Haswell)
16 GB memory
AMD/ATI 7970
Quadcore
8 GB memory
Nvidia Quadro 5000m
HP 8740w
HP 8740w
CUNY HPCC
KVM
“Brawny” GPU cluster
“Wimmy” GPU cluster
Web Server/
Linux App Server Windows
App Server
...building a highly-configurable experimental computing
environment for innovative BigData technologies…
![Page 11: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/11.jpg)
Taxi trip data in NYC
11
Taxi trip records
•~170 million trips (300 million
passengers) in 2009
•1/5 of that of subway riders and
1/3 of that of bus riders in NYC
Taxicabs
•13,000 Medallion taxi cabs
•License priced at > $1M
•Car services and taxi services
are separate
![Page 12: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/12.jpg)
Taxi trip data in NYC
Count-Distance Distribution
0
5000000
10000000
15000000
20000000
<=
0
.0
( 0
.8,
1.0
]
( 1
.8,
2.0
]
( 2
.8,
3.0
]
( 3
.8,
4.0
]
( 4
.8,
5.0
]
( 5
.8,
6.0
]
( 6
.8,
7.0
]
( 7
.8,
8.0
]
( 8
.8,
9.0
]
( 9
.8,
10
.0]
( 1
0.8
, 1
1.0
]
( 1
1.8
, 1
2.0
]
( 1
2.8
, 1
3.0
]
( 1
3.8
, 1
4.0
]
( 1
4.8
, 1
5.0
]
( 1
5.8
, 1
6.0
]
( 1
6.8
, 1
7.0
]
( 1
7.8
, 1
8.0
]
( 1
8.8
, 1
9.0
]
( 1
9.8
, 2
0.0
]
Trip Distance (mile)
Co
un
t
Count-Time Distribution
0
5000000
10000000
15000000
20000000
<=
0
.0
( 2
.0,
3.0
]
( 5
.0,
6.0
]
( 8
.0,
9.0
]
( 1
1.0
, 1
2.0
]
( 1
4.0
, 1
5.0
]
( 1
7.0
, 1
8.0
]
( 2
0.0
, 2
1.0
]
( 2
3.0
, 2
4.0
]
( 2
6.0
, 2
7.0
]
( 2
9.0
, 3
0.0
]
( 3
2.0
, 3
3.0
]
( 3
5.0
, 3
6.0
]
( 3
8.0
, 3
9.0
]
( 4
1.0
, 4
2.0
]
( 4
4.0
, 4
5.0
]
( 4
7.0
, 4
8.0
]
> 5
0.0
TripTime (Minute)
Co
un
t
Count-Speed Distribution
0
5000000
10000000
15000000
20000000
<=
0
.0
( 1
.0, 2
.0]
( 3
.0, 4
.0]
( 5
.0, 6
.0]
( 7
.0, 8
.0]
( 9
.0, 1
0.0
]
( 1
1.0
, 1
2.0
]
( 1
3.0
, 1
4.0
]
( 1
5.0
, 1
6.0
]
( 1
7.0
, 1
8.0
]
( 1
9.0
, 2
0.0
]
( 2
1.0
, 2
2.0
]
( 2
3.0
, 2
4.0
]
( 2
5.0
, 2
6.0
]
( 2
7.0
, 2
8.0
]
( 2
9.0
, 3
0.0
]
( 3
1.0
, 3
2.0
]
( 3
3.0
, 3
4.0
]
( 3
5.0
, 3
6.0
]
( 3
7.0
, 3
8.0
]
( 3
9.0
, 4
0.0
]
( 4
1.0
, 4
2.0
]
( 4
3.0
, 4
4.0
]
( 4
5.0
, 4
6.0
]
( 4
7.0
, 4
8.0
]
( 4
9.0
, 5
0.0
]
Speed (MPH)
Co
un
t
Count-Fare Distribution
0
5000000
1000000015000000
20000000
25000000
30000000
<=
0.0
( 1
.0,
2.0
]
( 3
.0,
4.0
]
( 5
.0,
6.0
]
( 7
.0,
8.0
]
( 9
.0,
10.0
]
( 11.0
, 12.0
]
( 13.0
, 14.0
]
( 15.0
, 16.0
]
( 17.0
, 18.0
]
( 19.0
, 20.0
]
( 21.0
, 22.0
]
( 23.0
, 24.0
]
( 25.0
, 26.0
]
( 27.0
, 28.0
]
( 29.0
, 30.0
]
( 31.0
, 32.0
]
( 33.0
, 34.0
]
( 35.0
, 36.0
]
( 37.0
, 38.0
]
( 39.0
, 40.0
]
( 41.0
, 42.0
]
( 43.0
, 44.0
]
( 45.0
, 46.0
]
( 47.0
, 48.0
]
( 49.0
, 50.0
]
Fare ($)
Co
un
t
Over all distributions of trip distance, time, speed and fare (2009)
![Page 13: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/13.jpg)
Taxi trip data in NYC
• How to manage taxi trip data?
– Geographical Information System (GIS)
– Spatial Databases (SDB)
– Moving Object Databases (MOD)
• How good are they?
– Pretty good for small amount of data
– But, rather poor for large-scale data
![Page 14: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/14.jpg)
Taxi trip data in NYC
• Example 1: – Loading 170 million taxi pickup locations into PostgreSQL
– UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326);
– 105.8 hours!
• Example 2: – Finding the nearest tax blocks for 170 million taxi pickup locations
using open source libspatiaindex+GDAL
– 30.5 hours!
I do not have time to wait...
Can we do better?
Intel Xeon 2.26 GHz processors with 48G memory
![Page 15: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/15.jpg)
Global
Biodiversity
Data at GBIF
15
SELECT aoi_id, sp_id, sum (ST_area (inter_geom))
FROM
(
SELECT aoi_id, sp_id,
ST_Intersection (sp_geom, qw_geom)
AS inter_geom
FROM SP_TB, QW_TB
WHERE ST_Intersects (sp_geometry, qw_geom)
)
GROUP BY aoi_id, sp_id
HAVING sum(ST_area(inter_geom)) >T;
http://gbif.org
![Page 16: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/16.jpg)
Spatial Data Processing on GPUs
http://www-cs.ccny.cuny.edu/~jzhang/papers/gpu_spatial_tr.pdf
![Page 17: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/17.jpg)
Spatial query processing on GPUs
Single-Level Grid-File based Spatial Filtering
Vertices
(polygon/
polyline)
Points
•Perfect coalesced
memory accesses
•Utilizing GPU floating
point computing power
Nested-Loop based Refinement
J. Zhang, S. You and L. Gruenwald, "Parallel Online Spatial and Temporal Aggregations on Multi-core CPUs and Many-Core GPUs," Information Systems, vol. 44, p. 134–154, 2014.
![Page 18: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/18.jpg)
Spatial query processing on GPUs
Top: grid size =256*256resolution=128 feet Right: grid size =8192*8192resolution=4 feet
Spatial Aggregation
9,424 /326=30X (8192*8192)
Temporal Aggregation
1709/198=8.6X (minute)
1598 /165 = 9.7X (hour)
![Page 19: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/19.jpg)
Spatial query processing on GPUs
P2P-TP2N-D P2P-D
147,011
street
segments
38,794
census
blocks
(470,941
points)
735,488 tax
blocks
(4,698,986
points)
P2N-D P2P-T P2P-D
- 15.2 h 30.5 h
10.9 s 11.2 s 33.1 s
- 4,900X 3,200X
CPU time
GPU Time
Speedup
Algorithmic
improvement: 3.7X
Using main-memory
data structures: 37.4X
GPU Acceleration:
24.3X
![Page 20: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/20.jpg)
Outline•Introduction
•Spatial data, GIS, BigData and HPC
•Taxi trip data in NYC and Global Biodiversity Applications
•Spatial query processing on GPUs
•ISP-GPU
•Architecture and Implementations
•Experiment Results
•Alternative Techniques
•SpatialSpark
•Lightweight Distributed Execution (LDE) Engine
•Summary an Future Work
![Page 21: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/21.jpg)
ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters
![Page 22: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/22.jpg)
ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters
http://www.slideshare.net/hadooparchbook/impala-architecture-presentation
• SQL Frontend: translate SQL
queries into execution plans
• C/C++ backend with SSE4
support (for strings operations)
• Efficient implementations of
hash-joins (partitioned and non-
partitioned)
• LLVM-based JIT
• ….
Attractive Features
Extension is challenging!
![Page 23: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/23.jpg)
ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters
class SpatialJoinNode : public BlockingJoinNode {
public:
SpatialJoinNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl& descs);
virtual Status Prepare(RuntimeState* state);
virtual Status GetNext(RuntimeState* state, RowBatch* row_batch, bool* eos);
virtual void Close(RuntimeState* state);
protected:
virtual Status InitGetNext(TupleRow* first_left_row);
virtual Status ConstructBuildSide(RuntimeState* state);
private:
boost::scoped_ptr<TPlanNode> thrift_plan_node_;
RuntimeState* runtime_state_;
…
}
create_rtree(…)
pip_join(…)
nearest_join(…)
![Page 24: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/24.jpg)
ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters
http://www-cs.ccny.cuny.edu/~jzhang/papers/isp_gpu_tr.pdf
Scalable and Efficient Spatial Data Management on Multi-Core CPU and GPU Clusters.
IEEE HardBD’15 Workshop
![Page 25: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/25.jpg)
ISP-GPU: Scaling out Geospatial Data Processing to GPU Clusters
ISP-GPU ISP-MC+ GPU-Standalone MC-Standalone
taxi-nycb (s) 96 130 50 89
GBIF-WWF(s) 1822 2816 1498 2664
Taxi-nycb: ~170 million points, ~40 thousand polygons (9 vertices/polygon)
GBF-WWF: ~375 million points, ~15 thousand polygons (279 vertices/polygon)
Single-node results: 16core CPU/128GB, GTX Titan
Cluster results: 2-10 nodes each with 8 vCPU cores/15GB, 1536 CUDA cores/4 GB
(50 million species locations used due to memory constraint)
![Page 26: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/26.jpg)
Outline•Introduction
•Spatial data, GIS, BigData and HPC
•Taxi trip data in NYC and Global Biodiversity Applications
•Spatial query processing on GPUs
•ISP-GPU
•Architecture and Implementations
•Experiment Results
•Alternative Techniques
•SpatialSpark
•Lightweight Distributed Execution (LDE) Engine
•Summary an Future Work
![Page 27: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/27.jpg)
Alternative
Techniques
val sc = new SparkContext(conf)
//reading left side data from HDFS and perform pre-processing
val leftData = sc.textFile(leftFile, numPartitions).map(x => x.split(SEPARATOR)).zipWithIndex()
val leftGeometryById = leftData.map(x => (x._2, Try(new WKTReader().read(x._1.apply(leftGeometryIndex)))))
.filter(_._2.isSuccess).map(x => (x._1, x._2.get))
//similarly for right-side data….
//ready for spatial query (broadcast-based)
val joinPredicate =SpatialOperator.Within // NearestD can be applied similarly
var matchedPairs:RDD[(Long, Long)] = BroadcastSpatialJoin(sc, leftGeometryById, rightGeometryById, joinPredicate)
SpatialSpark: Just Open-Sourced
http://simin.me/projects/spatialspark/
http://www-cs.ccny.cuny.edu/~jzhang/papers/spatial_cc_tr.pdf
Large-Scale Spatial Join Query Processing in Cloud (Comparison with ISP-MC)
IEEE CloudDM’15 Workshop
![Page 28: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/28.jpg)
Alternative
Techniques
Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processinghttp://www-cs.engr.ccny.cuny.edu/~jzhang/papers/lde_spatial_tr.pdf
![Page 29: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/29.jpg)
Spatial Data Processing and IoT
http://www.sensor-networks.org/
• Cell-phone based sensing and querying 3D world (personal navigation)
• Crowd-sourcing 3D urban infrastructure/traffic monitoring using
RGB-D videos
• Emergency response
and disaster relief
• Building
Information
System and
energy control
![Page 30: Large-Scale Spatial Query Processing on GPU-Accelerated Big Data …geoteci.engr.ccny.cuny.edu/temp/GTC_Talk_03172015.pdf · 2018. 10. 30. · Big Geospatial Data Challenges –Event](https://reader034.fdocuments.in/reader034/viewer/2022052614/6058be859a6dd56905462b04/html5/thumbnails/30.jpg)
Summary and Future Work
• Designs and implementations of an in-memory
spatial data management system on multi-core
CPU and many-core GPU clusters by
extending Cloudera Impala for distributed
spatial join query processing
• Experiments on the initial implementations
have revealed both advantages and
disadvantages of extending a tightly-coupled
big data system to support spatial data types
and their operations.
• Alternative techniques are being developed to
further improve efficiency, scalability,
extensibility and portability.