U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data...

20
U 2 SOD-DB: A Database System to Manage Large-Scale U biquitous U rban S ensing O rigin-D estination Data Jianting Zhang 134 Hongmian Gong 234 Camille Kamga 24 , Le Gruenwald 5 1 CUNY City College (CCNY), 2 CUNY Hunter College 3 CUNY Graduate Center 4 University Transportation Research Center Region II, 5 University of Oklahoma

Transcript of U 2 SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data...

U2SOD-DB: A Database System to Manage Large-Scale Ubiquitous Urban Sensing Origin-Destination Data

Jianting Zhang134 Hongmian Gong234

Camille Kamga24, Le Gruenwald5

1 CUNY City College (CCNY), 2 CUNY Hunter College 3 CUNY Graduate Center

4 University Transportation Research Center Region II, 5 University of Oklahoma

Outline•Introduction & Background•System Architecture and Implementation

Time Segmented Column-Oriented Data LayoutEfficient Spatial -Temporal Aggregations Spatial Join with Infrastructural Data

•Case Studies and Performance Evaluations •Conclusion and Future Work

Introduction

3

Ubiquitous Urban Sensing Origin-Destination Data (U2SOD)

Taxi trips Cellular phone calls

Social network activities

Introduction• What do they have in common?

– produced and collected by end users using commodity sensing devices and are rich in data volumes in urban areas

– special type of spatial-temporal data– the intermediate locations between origins and

destinations are either unavailable, inaccessible or unimportant

– can be more effective to help understand the real dynamic of urban areas with respect to spatial/temporal resolutions and representativeness.

Introduction

• How to manage U2SOD data? – Geographical Information System (GIS)– Spatial Databases (SDB)– Moving Object Databases (MOD)

• How good are they? – Pretty good for small amount of data – But, rather poor for large-scale data

Introduction• Example 1:

– Loading 170 million taxi pickup locations into PostgreSQL– UPDATE t SET PUGeo = ST_SetSRID(ST_Point("PULong","PuLat"),4326);

– 105.8 hours!

• Example 2: – Finding the nearest tax blocks for 170 million taxi pickup locations

using open source libspatiaindex+GDAL

– 30.5 hours!

I do not have time to wait...

Can we do better?

Introduction

Cloud computing+MapReduce+Hadoop

Multicore CPUs

GPGPU Computing: From Fermi to Kepler

• The combination of architectural and organizational enhancements lead to 16 years of sustained growth in performance at an annual rate of 50% from 1986 to 2002.

• However, due to the combined power, memory and instruction-level parallelism problem, the growth rate has dropped to about 20% per year from 2002 to 2006

• On the other hand, the growth in performance for GPU remains 50% per year.

Quadro 6000 $4000

$500

$2500

Nvidia GTX 690: 3072 core (915 MHZ), 4GB GDDR5 memory, 384 GB/s bandwidth; under $1,000

Introduction

• So, the goal is to design a data management system to efficiently manage large-scale U2SOD data on massively data parallel GPUs

• And cut the runtimes from hours to seconds on a single commodity GPU device

• With the help of new data models, data structures and algorithms

System Design and Implementation

Spatial Joins and Shortest Path Computation

Day

Month

Year Raw data

Compression, aggregation and indexing

Physical Data Layout

U2SOD-DB

System Design and Implementation

Medallion#Shift#Trip#

Trip_Pickup_DateTimeTrip_Dropoff_DateTime

Trip_Pickup_LocationTrip_Dropoff_Location

Start_LonStart_LatEnd_LonEnd_Lat

Payment_TypeSurchargeTotal_AmtRate_Code

Passenger_CountFare_AmtTolls_AmtTip_AmtTrip_Time

Trip_Distance

vendor_namedate_loadedstore_and_forward

time_between_servicedistance_between_service

Start_Zip_CodeEnd_Zip_Code

start_xstart_yend_xend_y (local projection)

1

2

3

4

5

6

78 9

1110

System Design and Implementation

Year

Month

Day

Hour

Day of the Year

Week of the Year

Day of the Week

City

Borough

Community District

Police Precinct

Census Tract

Census Block

Street Segment

Tax Lot

Tax Block

Pickup/drop-off locations

Level 0 grid

Level k grid

Top level grid

15/30-minutes

Pickup/drop-off timestamps

NYC taxi trip records

Peak/off-peak

Auxiliary data (weather, events…)

System Design and Implementation

System Design and Implementation

P2P-TP2N-D P2P-D

The three types of spatial joins are now supported by U2SOD-DB completely on GPUs with signficant speedups.

Case Studies and Performance Evaluations

• Data– Taxi trip records: 300 million in two years (2008-

2010), ~170 million in 2009 (~150 million in Manhattan)

– NYC DCPLION street network data: 147,011 street segments

– NYC Census 2000 blocks: 38,794– NYC MapPluto Tax blocks: 735,488 in four boroughs

(excluding SI) and 43,252 in Manhattan• Hardware

– Dell T5400 Dual Quadcore CPUs with 16 GB memory– Nvidia Quadro 6000 with 448 cores and 6 GB memory

Case Studies and Performance Evaluations

Top: grid size =256*256resolution=128 feet Right: grid size =8192*8192resolution=4 feet

Spatial Aggregation

9,424 /326=30X (8192*8192)

Temporal Aggregation

1709/198=8.6X (minute)

1598 /165 = 9.7X (hour)

Case Studies and Performance Evaluations

T-Drive dataset: 17,762,489 GPS point locations; 47.25 milliseconds for aggregation (4,110 ms on CPU) using STL 87X speedup

Case Studies and Performance Evaluations

P2P-TP2N-D P2P-D

147,011 street segments

38,794 census blocks (470941 points)

735,488 tax blocks (4,698,986 points)

- 15.2 hours 30.5 hours

10.9 seconds 11.2 seconds 33.1 seconds

- 4,900X 3,200X

CPU time

GPU Time

Speedup

Conclusion and Future Work

• We reported our design and implementation of U2SOD-DB, a column-oriented, GPU-accelerated, in-memory data management system targeted at large-scale ubiquitous urban sensing origin-destination data

• Experiments have demonstrated signficant speedups over serial CPU implementations in main-memory (10-100X) and traditional disk-resident systems (3000-5000X) for processing 170 million taxi trip records and their spatial joins with various types of urban infrastructure data

Conclusion and Future Work

• Extend U2SOD-DB to handle other types of OD data as well as trajectory data

• Further improve the performance by designing and implementing more efficient data structures and algorithms on GPUs

• Apply U2SOD-DB to in-depth analysis of trip purposes and urban dynamics in NYC by collaborating with transportation researchers, and urban geographers.