Statistical Analysis of UK’s Traffic Dataset

35
UK Accidents Analysis -Piyush Ramnani

Transcript of Statistical Analysis of UK’s Traffic Dataset

Page 1: Statistical Analysis of UK’s Traffic Dataset

UK Accidents Analysis-Piyush Ramnani

Page 2: Statistical Analysis of UK’s Traffic Dataset

UK Accidents Dataset• Data is in CSV Format• It contains 1,000,000+ records• Each record has 22 attributes.• Attributes:-Accident_Index , Longitude, Latitude, Police_force, Accident_Severity , Number_of_Vehicles, Number_of_Casualties, Date, Day_of_week, Time, Road_Number1, Road_Type, Speed_Limit, Junction_Detail, Road_Number2, Pedestrian_Crossing, Light_Conditions, Weather_Conditions, Road_Surface_Conditions, Special_Condition, Urban_or_Rural_Area, Police_at_Site

Page 3: Statistical Analysis of UK’s Traffic Dataset

Technologies Used:

• Pig• Hive• Cassandra• MapReduce using Mongodb• Mahout

Page 4: Statistical Analysis of UK’s Traffic Dataset

Pig• Loading the data form HDFS into pig (mapreduce mode)

• Analysis performed using pig script• Output stored on HDFS• Graphical represenation of statistics using JFreeChart• Integrated user defined function

Page 5: Statistical Analysis of UK’s Traffic Dataset

Analysis Performed:

1. Number of accidents depending on Area from 2005-10102. Day of the week with the highest accident rate3. speed limit that caused highest accidents4. Average Severity of accidents depending on no. of vehicles

involved5. Accident with maximum casualties involved, its date and

location6. UDF for the meaning of the values of weather condition 7. No. of accidents in each weather 8. Road type ranging from highest to lowest accident caused 9. Accidents caused where there were no street lights on the

road.

Page 6: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script

1) Number of accidents depending on Area from 2005-2010

• Output:

• Urban = 6,67,882, Rural = 3,88,588 and Unknown = 143

Page 7: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)2) Day of the week with the highest accident rate

• Output:

Page 8: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)• Friday had the highest no. of accidents (1,71,918)

Page 9: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)

3) Speed limit that Caused the highest no. of accidents

• Output

Page 10: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)• Speed limit of 30 had caused highest no. of accident (667700)

Page 11: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)4) Average Severity of accidents depending on no. of vehicles involved

Output:

Page 12: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)

5) Accident with maximum casualties involved, its date and location

Output:

No. of casualties involved = 68Occurrence = 1Date = 3/1/07Location = {(-0.496697)},{(51.497547)})

Page 13: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)• 6) UDF for the meaning of the values of weather condition

Page 14: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)• Pig Script

• Output:

Page 15: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)

7) No. of accidents in each weather • Highest no. of accidents were caused in Fine Weather

condition (831083).

• Output: Output from previous slide:

Page 16: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)• 8) Road type ranging from highest to lowest accident caused

• Output:

Page 17: Statistical Analysis of UK’s Traffic Dataset

Analysis using Pig Script (cont)

9) Accidents caused where there were no street lights on the road.

Output:

Page 18: Statistical Analysis of UK’s Traffic Dataset

UK Accidents 2005-2010

Page 19: Statistical Analysis of UK’s Traffic Dataset

Analysis from pig:-• There has been a fall in the no. of accidents from 2005-2010• Urban area faces almost double the rate of accidents every

year• Friday has been the most accident prone day of the week• Roads with speed limit of 30 has caused a majority of

accidents• Approximately 79% of the total accidents occurred in fine

weather condition• Single carriageway require more police on site than any other

road

Page 20: Statistical Analysis of UK’s Traffic Dataset

Hive• Hive Schema:-

• Loading data from HDFS into Hive

• Analysis performed on Hive performed on hive

Page 21: Statistical Analysis of UK’s Traffic Dataset

Analysis Performed:

1. Top 10 Accidents on 20/02/2009 depending on severity (low-max)

2. Number of accidents accidents with speed limit over 65

3. No. of accidents in which police attend and not attend the site

Page 22: Statistical Analysis of UK’s Traffic Dataset

Analysis using HQL

1)Top 10 Accidents on 20/02/2009 depending on severity (low-max)

Output:

Page 23: Statistical Analysis of UK’s Traffic Dataset

Analysis using HQL(cont)

2) Number of accidents with speed limit over 65

Output:

Page 24: Statistical Analysis of UK’s Traffic Dataset

Analysis using HQL (cont)

3) No. of accidents in which police attend and not attend the site

Output:

Output:

Page 25: Statistical Analysis of UK’s Traffic Dataset

Cassandra• Cassandra Schema• Load the data into cassandra from local system

• Queries performed using CQL

Page 26: Statistical Analysis of UK’s Traffic Dataset

Analysis Performed:

1. No. of accidents on road no. 3218

2. Top 25 accidents which occurred in the presence of oil or diesel on the road

3. Accidents with pedestrians on zebra crossing

Page 27: Statistical Analysis of UK’s Traffic Dataset

Analysis using CQL

1) No. of accidents on road no. 3218

Output:-

Page 28: Statistical Analysis of UK’s Traffic Dataset

Analysis using CQL (cont)2) Top 25 accidents which occurred in the presence of oil or diesel on the road

Output:

Page 29: Statistical Analysis of UK’s Traffic Dataset

Analysis using CQL (cont)

3) Accidents with pedestrians on zebra crossing

Output:

Page 30: Statistical Analysis of UK’s Traffic Dataset

Mapreduce using MongoDB• Mongo client Connection:

• Loading and tokenizing the dataset

• Function takes the input keyword• Generates output with all the records matching the keyword

Page 31: Statistical Analysis of UK’s Traffic Dataset

Mapreduce using MongoDB• Poplating result

Page 32: Statistical Analysis of UK’s Traffic Dataset

Mapreduce using MongoDB (cont)

Accidents for user specific Junction

Page 33: Statistical Analysis of UK’s Traffic Dataset

Mahout• Probability of accidents in 2011 depending on different types

of weather conditions• Analysis performed on a generated data :• Columns (Weather, year, no. of accidents)

Page 34: Statistical Analysis of UK’s Traffic Dataset

Mahout (cont)

Page 35: Statistical Analysis of UK’s Traffic Dataset

THANK YOU !!