Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In...

16
Capstone Capstone Project Project

Transcript of Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In...

Page 1: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

Capstone Capstone ProjectProject

Page 2: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

• The data is stored in CSV format, organized by year and

month.

• In each file, each row represents a single taxi trip.

• Table 1 below gives a small sample of this data.

• There are several entries per second for four years.

• The raw trip data takes up about 116GB in text CSV format.

Page 3: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

Page 4: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

• The data is organized as follows:

• Medallion (car ID).

• Hack license (driverID).

• Vender id

• Rate_code (taximeter rate).

• Store_and_fwd_flag (unknown attribute).

Page 5: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

• Pickup datetime: start time of the trip, mm-dd-yyyy

hh24:mm:ss EDT.

• Dropoff datetime: end time of the trip, mm-dd-yyyy

hh24:mm:ss EDT.

• Passenger count: number of passengers on the trip, default

value is one.

• Trip time in secs: trip time measured by the taximeter in

seconds.

Page 6: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

• Trip distance: trip distance measured by the taximeter in

miles.

• Pickup_longitude and pickup_latitude: GPS coordinates at

the start of the trip.

• Dropoff longitude and dropoff latitude: GPS coordinates at

the end of the trip.

Page 7: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

• Fare data is also available from 2010-2014. A sample of the fare

data is shown in Table 2 below. This dataset contains the

following attributes:

• Medallion: car ID.

• Hack license: driverID.

• Vender id:

• Pickup datetime: start time of the trip, mm-dd-yyyy

hh24:mm:ss EDT.

Page 8: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

• Fare amount: the meter fare, it should include the Newark

surcharge, in USD.

• Surcharge: Extra fees, such as rush hour and overnight

surcharges, in USD.

• Mta tax: Metropolitan commuter transportation mobility tax,

in USD.

• Tip amount: tip amount, in USD.

Page 9: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

• Tolls amount: total price paid for tolls, summed across all

tolls for the trip, in USD.

• Total amount: all charges that are presented to the passenger

at time of fare payment (includes tip for non-cash trips), in

USD.

Page 10: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

NYC Taxi DataSetNYC Taxi DataSet

Page 11: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

Trajectory Data Query Model Trajectory Data Query Model

• Existing query models of the trajectory data interested in

searching and finding trajectories or trips with respect to a

given range or point.

• (e.g. “find all objects within a given area (or at a given point)

sometime during a given time interval” or “find the k-closest

objects with respect to a given point at a given time interval”)

Page 12: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

Trajectory Data Query Model Trajectory Data Query Model

• The coordinate based queries:

• Point Queries: (e.g. find the location of specific object

between 1:00pm-1:30pm).

• Region Queries: (e.g. find all trajectories or trips passed

through R region between 1:00pm-1:30pm).

• K- Nearest Neighbor Queries: (e.g. find all trajectories or trips

within 500m of a gas station between 1:00pm-1:30pm).

Page 13: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

Trajectory Data Query Model Trajectory Data Query Model

• The trajectory based queries:

• Topological Queries: (e.g. “When did vehicle X enters street Y

most recently”).

• Navigational Queries: (e.g. “What is the current speed of

vehicle X”).

Page 14: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

A Study of New York City Taxi TripsA Study of New York City Taxi Trips

From: Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips. Nivan Ferreira, Jorge Poco, Huy T. Vo, Juliana Freire, and Claudio T. Silva

Page 15: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

• For NYC DataSet:

• 2013 : http://www.andresmh.com/nyctaxitrips/

• 2010 – 2013: https://uofi.app.box.com/NYCtaxidata

• NYC TaxiVis Paper: http://vgc.poly.edu/projects/taxivis/

Page 16: Capstone Project. NYC Taxi DataSet The data is stored in CSV format, organized by year and month. In each file, each row represents a single taxi trip.

QuestionsQuestions