Statistical Analysis of UK’s Traffic Dataset
-
Upload
piyush-ramnani -
Category
Documents
-
view
142 -
download
1
Transcript of Statistical Analysis of UK’s Traffic Dataset
UK Accidents Analysis-Piyush Ramnani
UK Accidents Dataset• Data is in CSV Format• It contains 1,000,000+ records• Each record has 22 attributes.• Attributes:-Accident_Index , Longitude, Latitude, Police_force, Accident_Severity , Number_of_Vehicles, Number_of_Casualties, Date, Day_of_week, Time, Road_Number1, Road_Type, Speed_Limit, Junction_Detail, Road_Number2, Pedestrian_Crossing, Light_Conditions, Weather_Conditions, Road_Surface_Conditions, Special_Condition, Urban_or_Rural_Area, Police_at_Site
Technologies Used:
• Pig• Hive• Cassandra• MapReduce using Mongodb• Mahout
Pig• Loading the data form HDFS into pig (mapreduce mode)
• Analysis performed using pig script• Output stored on HDFS• Graphical represenation of statistics using JFreeChart• Integrated user defined function
Analysis Performed:
1. Number of accidents depending on Area from 2005-10102. Day of the week with the highest accident rate3. speed limit that caused highest accidents4. Average Severity of accidents depending on no. of vehicles
involved5. Accident with maximum casualties involved, its date and
location6. UDF for the meaning of the values of weather condition 7. No. of accidents in each weather 8. Road type ranging from highest to lowest accident caused 9. Accidents caused where there were no street lights on the
road.
Analysis using Pig Script
1) Number of accidents depending on Area from 2005-2010
• Output:
• Urban = 6,67,882, Rural = 3,88,588 and Unknown = 143
Analysis using Pig Script (cont)2) Day of the week with the highest accident rate
• Output:
Analysis using Pig Script (cont)• Friday had the highest no. of accidents (1,71,918)
Analysis using Pig Script (cont)
3) Speed limit that Caused the highest no. of accidents
• Output
Analysis using Pig Script (cont)• Speed limit of 30 had caused highest no. of accident (667700)
Analysis using Pig Script (cont)4) Average Severity of accidents depending on no. of vehicles involved
Output:
Analysis using Pig Script (cont)
5) Accident with maximum casualties involved, its date and location
Output:
No. of casualties involved = 68Occurrence = 1Date = 3/1/07Location = {(-0.496697)},{(51.497547)})
Analysis using Pig Script (cont)• 6) UDF for the meaning of the values of weather condition
Analysis using Pig Script (cont)• Pig Script
• Output:
Analysis using Pig Script (cont)
7) No. of accidents in each weather • Highest no. of accidents were caused in Fine Weather
condition (831083).
• Output: Output from previous slide:
Analysis using Pig Script (cont)• 8) Road type ranging from highest to lowest accident caused
• Output:
Analysis using Pig Script (cont)
9) Accidents caused where there were no street lights on the road.
Output:
UK Accidents 2005-2010
Analysis from pig:-• There has been a fall in the no. of accidents from 2005-2010• Urban area faces almost double the rate of accidents every
year• Friday has been the most accident prone day of the week• Roads with speed limit of 30 has caused a majority of
accidents• Approximately 79% of the total accidents occurred in fine
weather condition• Single carriageway require more police on site than any other
road
Hive• Hive Schema:-
• Loading data from HDFS into Hive
• Analysis performed on Hive performed on hive
Analysis Performed:
1. Top 10 Accidents on 20/02/2009 depending on severity (low-max)
2. Number of accidents accidents with speed limit over 65
3. No. of accidents in which police attend and not attend the site
Analysis using HQL
1)Top 10 Accidents on 20/02/2009 depending on severity (low-max)
Output:
Analysis using HQL(cont)
2) Number of accidents with speed limit over 65
Output:
Analysis using HQL (cont)
3) No. of accidents in which police attend and not attend the site
Output:
Output:
Cassandra• Cassandra Schema• Load the data into cassandra from local system
• Queries performed using CQL
Analysis Performed:
1. No. of accidents on road no. 3218
2. Top 25 accidents which occurred in the presence of oil or diesel on the road
3. Accidents with pedestrians on zebra crossing
Analysis using CQL
1) No. of accidents on road no. 3218
Output:-
Analysis using CQL (cont)2) Top 25 accidents which occurred in the presence of oil or diesel on the road
Output:
Analysis using CQL (cont)
3) Accidents with pedestrians on zebra crossing
Output:
Mapreduce using MongoDB• Mongo client Connection:
• Loading and tokenizing the dataset
• Function takes the input keyword• Generates output with all the records matching the keyword
Mapreduce using MongoDB• Poplating result
Mapreduce using MongoDB (cont)
Accidents for user specific Junction
Mahout• Probability of accidents in 2011 depending on different types
of weather conditions• Analysis performed on a generated data :• Columns (Weather, year, no. of accidents)
Mahout (cont)
THANK YOU !!