I-405 Corridor Analysis

26
Washington State University Data Analycs focus MBA Program I-405 Corridor Analysis Project Sponsor – Neal Analycs Torey Bearly, Andrew Kealoha, Jusn Rath, & Wade Rogerson 7-31-2016

Transcript of I-405 Corridor Analysis

Page 1: I-405 Corridor Analysis

Washington State University Data Analytics focus MBA Program

I-405 Corridor Analysis

Project Sponsor – Neal Analytics

Torey Bearly, Andrew Kealoha, Justin Rath, & Wade Rogerson7-31-2016

Page 2: I-405 Corridor Analysis

1I-405 Corridor Analysis

Acknowledgement

We would like to thank several individuals for their help and contributions to this project. We could not have

accomplished this project without their help, guidance, and information. We have listed these individuals

below and provided a brief description of how they supported the team.

David McClellan (WSU MBA Alumni 2014) - Neal Analytics

David McClellan was the project sponsor and provided the team with invaluable guidance, numerous

resources, and technological training along the way.

Joe St. Charles, John Anderson & Jeremy Bertrand - Washington State Department of Transportation

These various individuals from the Washington State Department of Transportation helped supply the

team with the data necessary to analyze Interstate 405 and complete our analysis.

Mauricio Featherman – Washington State University MBA Director

Dr. Featherman played a vital role to the birth of this project and helped the team get on their feet

while also providing support and guidance along the way.

Page 3: I-405 Corridor Analysis

2I-405 Corridor Analysis

Table of ContentsIntroduction..........................................................................................................................................................................3

Problem.................................................................................................................................................................................3

Goal.......................................................................................................................................................................................4

Hypothesis............................................................................................................................................................................5

Literature Review.................................................................................................................................................................6

Methods................................................................................................................................................................................8

Early Stages.............................................................................................................................................8

Data Extraction Stage..............................................................................................................................9

Hypothesis/Question Stage.....................................................................................................................9

Selecting Analysis Tools.........................................................................................................................10

Data Analysis Stage................................................................................................................................11

Findings...............................................................................................................................................................................12

Conclusion...........................................................................................................................................................................14

Recommendations:.............................................................................................................................................................15

Relation to Course Material:..............................................................................................................................................16

Utilized Technologies:.........................................................................................................................................................17

Work Allotment..................................................................................................................................................................20

RESOURCES.........................................................................................................................................................................24

Page 4: I-405 Corridor Analysis

3I-405 Corridor Analysis

Introduction

Time is arguably one of the most valuable resource an individual can maintain control of over their lifetime, so

it seems obvious that people often get upset when they feel their time is wasting away. This is most certainly

true for everyone when it comes to sitting in traffic; waiting helplessly through mind numbing traffic

congestion to arrive somewhere just minutes away. This project was put together, with the hopes of

addressing such an issue; to help individual across the Seattle metropolitan area save their most valuable

resource, time. Serving as our capstone project for the Washington State University Data Analytics Focused

MBA program, this project was provided by David McClellan from Neal Analytics, an advanced business and

data analytics firm in the Seattle-Bellevue area. David is an alumni of the program and hoped to provide a

valuable project which challenged its members to solve complex problems with data. With this in mind, our

team sign on to the project hoping to learn, analyze, and provide valuable insights into understanding traffic

behavior and congestion on Interstate 405; which has seen increasing complaints from local citizens and

residence of the area. This report includes an introduction to the problem, discussion of the hypothesis,

review of technologies and methods utilized, and an overview of our teams findings, conclusion, and

recommendation.

Problem

Over the last several years, the Washington State Department of Transportation (WSDOT) has been tasked

with improving traffic control in the Puget Sound region specifically in the Seattle-Tacoma-Bellevue

Metropolitan Area. In 2015, the estimated population in this range was 3,733,580 which is nearly half of the

population of the entire state. Seattle has become one of the fastest growing cities in the United States and

the impact of an increasing population has been felt by local commuters. According to the Tom Tom

navigation company, Seattle ranked 5th among cities in the United States for worst traffic congestion in 2015

with drivers in the Seattle-Tacoma-Bellevue Metropolitan Area spending an average of 89 hours delayed in

traffic jams. Commutes that only require 30 minutes when driving the speed limit now averages 53 minutes.

Ultimately, local interstates have been forced to handle an increasing number of commuters while road

capacity has remained unchanged or worsened as a result of ongoing construction projects. Additional

pressure from local residents and media outlets has pushed WSDOT to find a solution that will resolve traffic

congestion sooner rather than later.

After several years of development, new toll lanes opened both directions on Interstate 405 on September

27th, 2015. The 30 mile freeway known as I-405 takes travelers east of Lake Union from Tukwila to Lynnwood

and serves as a main source of transportation for commuters in the Puget Sound. Tolls were added to the 17-

Page 5: I-405 Corridor Analysis

4I-405 Corridor Analysis

mile stretch between Northeast Sixth Street in Bellevue and Interstate 5 in Lynnwood. All drivers have the

option to pay the toll to travel at speeds of 45 mph or faster at an adjusted rate based on traffic in the system.

Reports from commuters have made it unclear regarding

the effectiveness of the addition of tolls on I-405.

Frustration felt by many drivers has been a result of low

occupancy in toll lanes while heavy congestion remains in

the main lanes. According to the WSDOT, the toll lanes

are “meeting federal and state standards”, requiring that

the toll lanes maintain a speed of 45 mph 90 percent of

the time. No part of the addition of toll lanes ensures all

drivers will benefit from shorter commutes and less

congestion but is designed to increase revenue for the

state. It is unknown how the addition of toll lanes have

impacted the overall flow of traffic in all lanes on

Interstate 405

Goal

This project is sponsored by Neal Analytics, a consulting

firm headquartered in Bellevue-Seattle, Washington. Neal

Analytics helps their customers get the most out of their

data through traditional practices of Data Warehousing

and Business Intelligence to enhance the impact of

Predictive Analytics. As a premier featured partner of

Microsoft, Neal Analytics utilizes Azure Machine Learning

to provide analytics on a massive scale. Neal Analytics has

provided high performance computing along with access

to various analytical and modeling applications in support

of this project.

Reaching the end goal of this project which involves actionable insights based on observations made through

strategic analysis will require a deep understanding of database management, ETL (Extracting, Transforming,

and Loading) processing, and statistical modeling applications. Using a large data set will drive more accurate

Page 6: I-405 Corridor Analysis

5I-405 Corridor Analysis

predictive analytics but require an efficient ETL process to ensure that data is organized in a consistent and

usable format.

As a byproduct of extensive research, our goal as a team is to better understand the impact of implementing

tolls on Interstate 405 which is believed to have increased overall congestion levels. Using a large set of traffic

data collected by the WSDOT, statistical modeling measures the impact of a variety of factors on overall traffic

flow on I-405. Working with multiple years of traffic data will not only make it possible to validate models but

also provide a benchmark that can be used to compare current commuting time to historical averages.

Ultimately, recommendations will be made based on the finding of our analysis including numerous

perspectives on the impact of toll lanes on I-405.

Those that are familiar with current changes in the Puget Sound region concerning congestion recognize the

level of attention surrounding the decision made by the WSDOT to implement tolls on I-405. Measuring the

overall impact of the addition of toll lanes will provide closure to local commuters from a non-bias source.

Hypothesis

In order to fully understand the issues caused by these new I405 tolls, each member of the group created a list

of questions based on what might provide valuable could be answered insights into traffic behaviors.

Questions were then rated on their based on its complexity and potential value added to the project. Each

member wrote ten questions that were later reviewed as a group. The groups split up to rate the questions

based on complexity and value. Finally, the list was narrowed down to the following questions that our team

hoped to answer.

1. Will removing the toll lanes have any effect on throughput?

2. Have there been increases in collisions on the toll lanes?

3. How long does it take an average driver on an average day to drive from Bellevue to Lynnwood

(post toll lane vs. pre-toll lane)?

4. Is there an observable/significant difference in traffic speed before and after toll lane

construction began?

5. Is there an overall increase in collisions on all of 405 due to the heavy congestion?

6. Are the tolls discouraging commuters from driving rather than increasing flow?

7. Is the process of getting into the toll lane causing congestion?

8. Is there an observable/significant difference between travel times before and after the toll lanes

went live?

Page 7: I-405 Corridor Analysis

6I-405 Corridor Analysis

9. What are the busiest points on the 405, and are collisions a bigger issue after or before the toll

lanes?

10. How has the flow of traffic changed in the main lanes?

11. What time of day is busiest and what is the algorithm for calculating the toll rate during a certain

time?

The process of brainstorming questions allowed the group to articulate what the main issues are for the I405

and the toll lanes. These questions were good starting points to consider how different variables might arise

and be testable. Multiple rounds of scoring the questions based on complexity and value provided a short list

which truly reflects the group’s largest concerns about the toll lanes. These questions are the base for the

hypothesis for the entire project. After deliberating as a group, the research hypothesis for this project is

states that the addition of the toll lanes on 405 have had a significant and measurable impact on traffic speed

and vehicle volume throughput and average vehicle traveling speed on the 405. The null hypothesis is that

these variables have not changed much since the toll lanes have been added.

Literature Review

At 7 million people, the state of Washington is a populated and growing state. Numerous sources cite Seattle

and Bellevue as rapidly growing cities with major companies expanding their dominance in the area including

Amazon, Microsoft, and Costco to name a few. In addition, Business Insider cites Seattle as one of the best

cities to start a career. Since 2000, Washington has grown by over 1 million constituents (Bureau). Aside from

weather, one of the largest public outcries is the growing are the congestion on roads are consistently growing

along major highways and interstates. Washington State Department of Transportation (WSDOT) has worked

to develop solutions that have been proven beneficial by other congested states including California and

Florida. At the rapid growth rate that Washington is having, a solution needs to be developed that is beneficial

for constituents.

As of September 27th of 2015, WSDOT has implemented express toll lanes along the I-405 between Bellevue

and Lynnwood (WSDOT1). This development was supported federally by both the Obama and Bush

administration (Seattle Times 405 Toll). The goal was to create an opportunity for constituents who would like

to utilize the carpool lane but do not meet the high occupancy vehicle (HOV) occupancy requirement to pay a

variable fee. Congestion management is achieved through dynamic and variable tolling. This is essentially a

hybrid of high occupancy toll (HOT) and HOV lanes, adjusting fees based on real-time traffic conditions, time of

day, and day of the week (WSDOT2). Theoretically, this allows for more constituents to have access to HOV

lanes that they once did not at a price.

Page 8: I-405 Corridor Analysis

7I-405 Corridor Analysis

According to the WSDOT, the objective was to reduce the congestion by adding lanes and creating the ability

for express lane drivers to reach speeds of at least 45 mph. This initiative is relatively new, thus amendments

and lane improvements continue to be made since the beginning of implementation. Based on reviews, the

creation of express lanes has left drivers concerned as these renovations still have drivers feeling concerned of

the existing traffic. This literature review is aimed at identifying the known pros and cons of the

implementation of express toll lanes.

Express toll lanes are considered to be a beneficial addition to road improvements. According to WSDOT,

there are four key reasons to the changes: efficiency, traffic congestion, demand management, and future

improvements. The state of Washington has seen success prior to the implementation of I-405 Express Toll

through the SR 167 HOT lanes between Auburn and Renton. After the implementation of these HOT lanes in

2008, vehicles have exceeded an average of 50 miles per hour and driver time has decreased by approximately

6 minutes at the cost of an average toll of $1.75.

Upon implementation of express toll lanes, it has been reported by Community Transit that there is a 6 minute

savings at peak time with a 4% ridership gain. King County Transit reports 8 minute savings with 6% ridership

gain. Toll payers are saving approximately 14 minutes and traffic south of Bellevue is moving 7 minutes faster

(Seattle Times). According to the Bellevue Reporter as of May 10, 2016, King County Transit has seen a 10%

increase in ridership.

Since the express toll lanes in this sector is relatively new, changes are still being made. WSDOT is making an

effort to see this through (governer.wa.gov). Since December 18, 2015, six changes have already been

implemented including lengthened access points, skip stripes, and repaving for clarity. In addition, the

algorithm for corridor calculations are being adjusted to gain a clearer understanding of the I-405 traffic. Kate

Elliot, wrote a report for WSDOT acknowledging that the solution is not yet perfect but improvements have

been made. Trips between SR 522 and SR 527 are seeing slower speeds in the northern corridor and will be

adjusted accordingly. Emily Pace, speaker on behalf of WSDOT explained that because lanes are narrowed to

only 3, this is a capacity issue rather than a toll issue (seattlepi.com). These have been noted as future projects

that will be considered. Otherwise, WSDOT has identified speeds to be increasing by up to 6 minutes during

peak hours when constituents are returning home as identified by Figure 1.

Page 9: I-405 Corridor Analysis

8I-405 Corridor Analysis

While wait times have been shown to decrease, the financial gains aimed to help future improvements have

been significant. Over 3 months of the express toll lanes, $3.7 million has been raised from the initiative. This

is beneficial as it could go towards the improvements of the NB I-405 section which requires an estimate of $5

- $50 million depending on the timeline and environmental impacts (governer.wa.gov).

Although WSDOT data has shown evidence of improvement, suburban drivers noticed the shift of Kirkland

congestion ending up in Bothell. This Bothell chokepoint developed a petition of 29,000 signatures to repeal

toll lanes. Due to the considerable failure of an efficient I-405 toll setup, Republicans ousted Transportation

Secretary Lynn Peterson of her duties (Seattle Times 405 Toll). Although this has been viewed as a political

motive, it does acknowledge that thousands of motorists do not agree with this change.

In contrary to WSDOT’s analysis, independent assessments done by INRIX, found that the HOV lanes did in fact

see faster speeds than previously, but normal lanes suffered as a whole leaving miserable conditions for the

bulk of drivers. The report conducted considered 4 sections of the I-405 pre and post toll: “one each direction

north of Bothell near the state Route 527 interchange, one northbound near the state Route 520 interchange,

and one southbound near Northeast 68th Street in Kirkland (seattlepi.com).”

In addition to factual data, numerous constituents have taken their thoughts to blogging. Aside from the 3-

lane factor in the northern sector that is causing major backup, the Interstate 405 express lanes have hit their

toll maximum at $10. Questions are being raised as to whether the toll corridor was properly designed. Tolls

are adjusted to ensure a 45 mph minimum on the express lanes in hopes to also alleviate the main lanes. By

reaching max charges, the 45 mph promise cannot be kept. Spokesman Ethan Bergerson confirmed this

reporting a saving of 16-24 minutes rather than the reported 30 minutes by WSDOT. Additional public

resentment can be found in the growing petition to “Stop 405 Tolls” with over 19,000 supporters. Additional

notes include that the new toll lane funding came from gas-tax meaning those utilizing the regular lanes are

subsidizing the cost to those who can actually afford to use the new lane.

The argument for toll lanes is indicated also by the incline of price. Bergerson also stated that higher prices

mean more constituents are willing to pay for the drive than previously thought. Looking at other express toll

lanes, $11 dollar tolls have been spotted in Atlanta meaning the solution and promise to uphold 45 mph may

be hard to achieve (www.seattletimes.com). This entire project took seven months to complete in addition to

being a $484 million dollar project. Revenue will be around$7.6 million the first year, but after expenses will

leave $1.2 million to net income.

Page 10: I-405 Corridor Analysis

9I-405 Corridor Analysis

Methods

Early Stages

When the project and problem were initially presented, it was clear that contacting the Washington

Department of Transportation would need to be the first priority to acquire sufficient data. After some

discussion it was decided that a large amount of historical traffic volume data would be needed, and that the

best place to find the data would be directly through the department of transportation as it is all public

information. Over the course of a few weeks, many emails were exchanged with employees at the

department of transportation on how to extract the data that was needed. After talking to several employees

and numerous emails, WSDOT directed us to download a program created by the department of

transportation and to download traffic volume data from their database.

Data Extraction Stage

Once the program and the data files were downloaded, each team member began learning how the program

was setup and how the most value could be extracted from the files that had been provided. Another major

reason time was spent learning the program was so that each team member would have congruent data

during the extraction phase to make the data merging and cleansing phase more efficient.

After much experimentation with the program, it was decided that the team should examine historical data

for the entirety of I-405 between the years of 2010 to present day. From there it became clear that roughly 45

gigabytes of data (specific to I-405) would have to be extracted which equated to about 1.3 billion rows of

data. After much discussion amongst the team, a consensus was made that analyzing the entirety of I-405

might not yield the most accurate answers to the problem that needed to be solved. Instead of analyzing all

of I-405, the team instead chose to examine 1 mile before the toll lanes started and 1 mile after the toll lanes

ended. This drastically reduced the amount of data that would have to be extracted and analyzed (roughly ¼

of the previous estimate), which moreover would produce the most valid answers in a more efficient manner.

Hypothesis/Question Stage

Although only one question/problem was presented to the team, it became clear very quickly that many

subsets of questions would need to be answered to fully solve the overall problem. To accomplish this task,

each team member was individually tasked with thinking of 5-10 questions/hypotheses. Once all team

members had generated their questions, they were compiled into one document and a brainstorming session

was held in which there was a discussion of all questions/hypotheses and how well they could help solve the

traffic congestion problem. After a new list of questions had been compiled, each team member rated every

Page 11: I-405 Corridor Analysis

10I-405 Corridor Analysis

question on a scale of 1-5 on both the complexity of the problem (1 being easy to solve, 5 being incredibly

difficult to solve) and the value or insight that could be derived from answering that question (1 being little to

no value, 5 being great value). This process was repeated twice, each time discussing the new results. The

reasoning for doing this process twice was due to the fact that none of the team members had any prior

background in traffic science, but also because the only way to sufficiently solve the problem was to generate

as many insights as possible. The team finally decided on twelve questions and hypotheses that needed to be

answered to successfully answer address the problem; the other questions were not discarded, however the

team felt that they were not as substantial or critical to the project’s success. Over the course of the project

the team recognized a lack of time and resources would not allow for answering all twelve questions. However

these questions were then condensed into two questions/hypotheses. Is there an observable difference in

traffic speeds before and after the toll lanes were constructed, and how would traffic behave were the toll

lanes removed?

Selecting Analysis Tools

Because most of the team had little to no experience working with data sets of this size, the project mentor,

David McClellan, was consulted to gain some insight as to what data analysis tools would work best for a data

set of this size. Fortunately enough for the team, David was very informative about what analysis tools and

software’s would be best fit to handle both the scale and complexity of the data.

The first tool that was recommended was a virtual machine or VM, which is a computer that can be logged in

to remotely from any computer connected to the Internet. The reason many data analysts use VM’s is

because they have superior memory and processing power in comparison to the average home computer or

laptop, which greatly reduces the time it takes to run large experiments or data sets. Once access to the VM

had been received, team members began loading the industry standard data tools such as My SQL Server, R

Studio, Excel, and Data Tools 2013 onto the VM. Once the programs were loaded onto the VM, the next step

was to figure out how to quickly and efficiently upload the data to the VM. David suggested a program called

CloudXplorer as a starting point because it is a free software that allowed everyone on the team to access it,

while also having exceptional data transfer speeds to transfer the raw data files.

During the data-cleansing phase, David McClellan offered many suggestions of various programs that could

be used to efficiently and correctly cleanse the data. Since all of the data was extracted in Excel spreadsheets,

writing a VBA macro that eliminated missing values, bad data, and consolidated all of the files into the same

format was the most effective way to ensure all of our data was cleansed correctly. There were many other

Page 12: I-405 Corridor Analysis

11I-405 Corridor Analysis

programs that could have been used to cleanse the data such as R Studio, but every team member had a good

level of experience working with Excel and writing VBA code. Because the macro had to sift through roughly

550k rows of data per spreadsheet file (total of __ files), all of the files had to cleansed on the VM because of

its ability to process large amounts data at a fast pace.

Data Analysis Stage

Once the data was aggregated, cleansed, and uploaded to the data warehouse the team was ready to begin

the analysis phase. Prior to starting the analysis phase, a team meeting was held with David to discuss how to

efficiently and effectively analyze the data. After some suggestions and discussion, the team was in

agreement that it was necessary to build a model to process and generate insight from the data. Per

McClellan’s David’s suggestion, Azure Machine Learning (AML) was chosen to build the model.

Prior to McClellan’s David’s recommendation, none of the team members had any experience using AML.

Each team member spent an extensive amount of time learning how to properly build models in AML along

with deciding what type of statistical analysis was going to be conducted on the data. Based on research and

several recommendations from David, basic linear and boosted decision tree regression models were used to

determine which variables were the largest contributors to congestion on the I-405. In addition to

experimenting with different types of statistical analyses, team members also wrote queries in SQL Server that

were then copied over to AML to build models for each milepost. The reason for creating specific data sets

was to analyze different traffic factors such how on/off ramps affected the overall flow of traffic and

congestion; other factors such as time of day, location of the milepost, and direction (northbound &

southbound) of the freeway were also analyzed.

Once the models were created, each model was scored on its coefficient of determination (CoD) which is

similar to R Squared from a statistical analysis point of view; this statistical measure shows how closely the

data fits the regression line and how much of the data is explained. Once all of the CoD’s were collected, the

team decided focus on the 5 mileposts with the highest CoD’s. After selecting the mileposts, the team focused

on analyzing the traffic data from 3PM-6PM (peak rush hours) as it provided a smaller and more focused data

set for the problem to be solved.

After examining the data provided by the models, the team used Excel to derive insights and determine if

congestion had worsened since the completion of the corridor expansion. Wade and Torey created a

Page 13: I-405 Corridor Analysis

12I-405 Corridor Analysis

comprehensive Excel Workbook that incorporated an Azure Machine Learning add-on which allowed the team

to model several predictive scenarios; the workbook was also created with a simplistic user-interface so that

any end-user could easily use it.

In order to make the workbook and model predictive, traffic data from January 2016 through March 2016 was

fed into the workbook to create scenarios such as changing all lanes to non-toll lanes, 1 toll lane and 4 non-toll

lanes, only having tolls enabled during non-peak hours, etc.. Once the team had derived the desired results

from the data, a combination of Excel graphs and Power BI to visualize the results as they provided a simple

way to show others the effect of the toll lanes since their completion.

Findings

Analysis was conducted in two ways including further exploration of the raw data along with predictive

analysis using the AML models that were developed. As a result, multiple hypothesis questions originally

developed to help guide the analysis were addressed including “Will removing the toll lanes have any effect on

throughput?” “Is there an observable/significant difference in traffic speed before and after toll lane

construction began?” “How has the flow of traffic changed in the main lanes?” Ultimately, the findings of this

analysis served as the foundation for additional recommendations for improving congestion on the I-405

corridor.

Using the raw data set, peak congestion hours were identified using aggregated information for Average

Calculated Speed and Time both directions for each milepost on I-405 during the months of October,

November and December. As expected, this analysis displayed the highest congestion levels at 5:00 PM with

slowdowns beginning at 1:00 PM and lasting until after 7:00 PM. During the slowest hours of the day, vehicles

were traveling on at speeds of less than 25 mph on average. Slower speeds during these hours can be

explained given the increased volume levels of commuters on I-405 resulting in heavy congestion.

Page 14: I-405 Corridor Analysis

13I-405 Corridor Analysis

Based on the findings regarding peak congestion, the analysis was scoped to only include data between the

hours of 3:00 PM to 6:00. Again, filters were applied to only show data in October, September, and December,

for all mileposts each direction of I-405. Since the toll lanes are open to all drivers on weekends, filters were

applied to only show data for normal lanes during the week. In this portion of the analysis, Average Calculated

Speed and Average Volume were measured for dates before and after implementation of the toll lanes. In this

case, findings suggested that commutes before the addition of the toll lanes were worse than they are now.

According to the data, the addition of the toll lanes has decreased the average number of drivers on the road

and increased the average speed those drivers are traveling at by almost 4 mph on average. This data suggests

that traffic behavior have improved after the addition of toll lanes, however, including the toll lanes in the

analysis proved to skew the results. It was determined that a more significant analysis would measure the

impact of toll lanes on the average commuter.

In this stage of the analysis, additional filters were

added so that the data only included normal lanes.

In this case, the data told a very different story and

displayed a drop in the amount of drivers in normal lanes after the toll lanes were added along with a

decrease in speed of almost 2 mph on average. While it may be argued that 2 mph is hard to notice while on

the road, pair that decrease in speed with a decrease in drivers and the result is a decrease in throughput and

much slower commute times. Further, this data shows that at any given hour from 3-6:00 PM the average

amount of vehicles in any of the normal lanes is less than it was before the tolls were applied. According to the

definition of throughput, if all other factors remain constant, less vehicles on the road should allow those

vehicles to travel at faster speeds, however, the findings of this analysis are contrary.

Page 15: I-405 Corridor Analysis

14I-405 Corridor Analysis

Although it is important to note that drivers that choose pay a premium to use the I-405 toll lanes will receive

much faster travel times, it is more meaningful to demonstrate the negative impact toll lanes have had on

everyday travelers that choose to use normal lanes. As a result of the addition of toll lanes, throughput has

decreased for the thousands of drivers that use standard lanes on I-405.

The second phase of analysis for this project focused on using AML predictive models to simulate different

scenarios on the road. This process involved developing predictive models to help estimate traffic volumes and

speeds based upon data inserted into the model. Once these models were completed, the team moved

forward with an analysis to simulate opening up express toll lanes. Using the diagram below, the goal was to

simulate opening up an express lane on I405 and treating it as a regular lane. On the left on the diagram, there

is the current system in place for milepost 12.72 (south Bellevue). The green columns represent regular traffic

lanes and the yellowish orange column is an express toll lane. Using the predictive model developed for

milepost 12.72, traffic was then estimated assuming that the express toll lane was now treated as a regular

lane (right side of diagram).

Utilizing data from January to March of 2016, raw

traffic data was adjusted to simulate the scenario

above and then a comparison was made. This

comparison can be viewed in the chart below. This

chart provides an estimate change in the number of

vehicles that can travel through this milepost over

the course of any given hour. For example, at 12:00 noon on Mondays, these results demonstrate that 183

more vehicles can travel through the 12.72 milepost if the express lanes were open. This indicates that

opening the express lanes would help increase vehicle throughput for this milepost during that time and day.

With that in mind, each number highlighted green in the charge, represents an increase in throughput

assuming the express lanes were open. This analysis therefore demonstrates that throughout most of the day,

traffic sees an increase in throughput should the express lanes be opened and treated at regular lanes.

12:00 PM 1:00 PM 2:00 PM 3:00 PM 4:00 PM 5:00 PM 6:00 PM 7:00 PM 8:00 PM 9:00 PM 10:00 PMMonday 183 157 135 36 -25 -8 101 117 92 74 65Tuesday 184 179 105 12 -71 -71 50 97 99 105 81Wednesday 185 169 81 -27 -55 -51 30 91 113 99 72Thursday 167 153 36 -41 -44 -41 8 81 94 104 81Friday 156 87 -17 -11 -9 -2 50 75 84 94 96

Conclusion

Page 16: I-405 Corridor Analysis

15I-405 Corridor Analysis

Based upon the action taken by the WSDOT, average drivers commuting across the I-405 corridor have

experienced an increase in traffic congestion since the implementation of the express toll lanes. Commuters

who cannot afford or choose not to participate in the express lanes are experiencing an average decrease in

driving speed of 1.5 miles per hour, and there are on average 49.21 vehicles less traveling through various

mileposts during peak congestion times. This suggests that the WSDOT has actually harmed the experience

commuters have while driving along the I405; despite having set out to help these commuters. It appears that

the balance between generating revenue for the I-405 corridor versus helping decrease traffic congestion has

leaned towards generating revenues for the WSDOT. This is further supported by the WSDOT claiming they

have received revenues higher than expected since the opening of the express lanes. Therefore, the WSDOT

must reconsider how they intend to manage the I-405 if their number one goal is to help commuters by

decreasing congestion on the I-405 corridor.

Recommendations

Based upon our findings, the team has developed three recommendations for the WSDOT with the goal of

helping to decrease traffic congestion on the I-405 corridor. These recommendations were developed to be

mutually inclusive, but with priority based upon the order described below.

1. Minimum Four Regular Lanes

Along the I-405 corridor there are upwards to five lines at a time across the road, with anywhere from three

regular lanes and one express lane to four regular lanes and one express lane. It is recommended that the

WSDOT maintains at least four regular lanes minimum along the I-405 corridor. This would involve opening up

one express lane wherever there are two express lanes, and converting this lane into a regular lane. As shown

by our analysis earlier, by converting an express lane into a regular lane, the roads volume throughput of

vehicles increases overall. This also allows the WSDOT to maintain at least one express lane reserved for

carpooling and drivers who pay the toll.

2. Open Express Lanes During Non-Peak Hours

Throughout most of the day, the I-405 often does not have any impact on the I-405 corridor and there are

minimal tolls charged to use these lanes on the road. Additionally, the WSDOT open all toll lanes as regular

lanes during non-peak hours of congestion. This allow for upwards to two additional regular lanes on the road

Page 17: I-405 Corridor Analysis

16I-405 Corridor Analysis

and in return help drive down congestion. Similar to the previous recommendation our analysis shows that

during non-peak hours if the toll lanes are opened to the public then traffic volume throughput increases. If

managed well, opening these lanes can help curve the impact of congestion before and after non-peak hours.

3. Construct New Lanes

Assuming that either the WSDOT will not convert or open the express lanes or that opening the lanes would

not decrease congestion to a large enough extent; the WSDOT consider investing into the construction of

additional lanes along the I-405 corridor. A large reason the toll lanes were constructed was to help the

WSDOT raise funds for additional changes to the I-405 road. The WSDOT has already expressed that they are

receiving revenues higher than they could have anticipated. It would therefore make sense to reinvest these

funds back into the I-405 to help alleviate congestion due to the lack of alleviation the express lanes have

provided. However, this recommendation is a last resort assuming the public will resent additional lanes being

built given past construction resulted in little improvement.

Overall these recommendations were developed to help the WSDOT make informed decisions regarding the I-

405 corridor. While each recommendation will benefit the average driver in the long run, recommendations

one and two should be considered before recommendation three is approached. This due to the significant

costs of recommendation three. Additional in-depth analysis be completed near areas beyond the physical

locations of the express lanes is also encouraged. Assuming the WSDOT manages to greatly decrease

congestion along the road, this could lead to new bottlenecks being discovered where the express lanes are

not in place. There could also be new bottlenecks near on/exit ramps that are currently without issue due to

the congestion levels farther away from their location. Regardless, these recommendation are in the best

interest of the WSDOT to help solve the congestion problem on I-405.

Relation to Course Material

To accomplish this project, the team was required to call upon many of the skills, techniques, and concepts

taught over this last year of study. More specifically, the team had to rely on the numerous hard skills used for

managing and analyzing data. Most of these hard skills were taught through marketing analysis 555,

management information systems 593, and management operations 557. In addition, this project demanded a

strong understanding of statistics and developing regression models; all of which were discussed in BA###.

Furthermore, social media analytics was utilized to help explore where current I405 consumers stand in

regards to the traffic toll lanes. Within this section of the paper, major concepts used throughout the project

Page 18: I-405 Corridor Analysis

17I-405 Corridor Analysis

which were covered within our course material are explained. These concepts included the hard skills and

conceptual material used for data management, data analysis and statistics, and social media analytics.

Due to the nature of this project, the work required heavy technical experience from its members; focusing on

deriving valuable information from large datasets. Across both the marketing analysis 555 and management

information systems 593 courses; the team had to rely on techniques taught data management and

manipulation. In this case, these courses taught skills related to SQL Server and its many attached products

(such as SQL Data Tools and SQL Server Integration Services). For the majority of the team, exposure to these

technologies began this year in courses that covered major purposes of each software application and how to

approach problems for data management. A simple example is using SQL Server to store and manipulate data

for integration across three separate systems. The team had to develop processes which moved data across

excel file to a local SQL Server and then from that local SQL Server to a cloud based solution. Within these two

processes, the data was also aggregated and cleansed to ensure that only the necessary data was moved

across each system. These skills were all taught over the course of the year, focusing on SQL Server

management in the first semester followed by learning SQL Server Integration Services and Data Tools the

latter half of the program. Beyond these hard skills for data management, the team was also exposed to the

value of Excel’s visual basic coding functionalities. Excel’s VBA technology provided a resources for data

cleansing which allowed the team to prepare over 1300 data files for loading into a SQL Server. Without

learning this material from management operations 593, the entire project would have struggled to move

forward due to the manual work that would have been required. Instead, the team was capable of automating

data cleansing through developing VBA processes that looped through files and standardized data across the

1300 files.

Once the team had prepared data for analysis, the next steps focused on statistical and data analysis

technologies and methodologies. These materials were taught across all courses, focusing on both on the hard

skills required for this form of work, but also the thought processes required when approaching complex

business problems. In the case of hard skills, many of the team was required to have a foundation in statistics

and regression models. These concepts were taught heavily within our BA516 course where proper

procedures for collecting and analyzing variables for the purpose of regression were taught. For example,

when trying to determine what could impact traffic volumes and speed on I-405; it was important to account

for both external variables such as weather, sporting events, time of year, and so on but also how this data

was structured for our statistical analysis. Utilizing binary variables and data transformations to help scope

down variables and provide specific answers such as whether an individual weekday had impact vs all days

Page 19: I-405 Corridor Analysis

18I-405 Corridor Analysis

across the week. Although a subtle difference, this type of data transformation was essential to accomplish

the data analysis required for the project.

The final topic to cover is social media analytics, which allowed the team to understand where consumers

within the market are currently standing. This material focused on how to collect and analyze data from

Twitter to understand consumers.

Utilized Technologies

To complete this project, the team was required to work with a variety of applications and coding languages

which supported data collection, management, and analysis. Within this section, each technology will be

briefly summarized; including its general purpose and a short description of how the team drew value from its

use.

Compact dis Data Retrieval (CDR)

The compact disc data retrieval tool (CDR) was provided by WSDOT as a software application which allows

users to extract data from DAT files in the form of text and excel output files. The DAT files themselves were

also provided by WSDOT and were encrypted to help compress the data for ease of download and access to

users. The CDR would then take these files and output an excel spreadsheet based upon user requirements

entered into the program. These requirements would then help scope the data (date range, granularity, road,

milepost, etc.) as well as define the output file’s format and data structure. These two features allowed the

team to then standardize its data collection resulting in a uniform data structure for data cleansing. Overall,

CDR was used as our team’s main source of data collection from the WSDOT.

Excel

Throughout the project, Microsoft Office’s Excel platform was leveraged to assist with project management,

data cleansing, data analysis, and more. In order to effectively manage the team’s time and efforts, excel was

used to analyze and scope out data collection for various members of the team. For example, I405 has

numerous mileposts used to help analyze traffic and each milepost has various elements of analysis provided

to users. Due to the vast quantity of mileposts and elements, Excel provided the team a method of organizing

milepost and their subgroups to help divide work reasonably amongst team members. Excel’s data analysis

tools were also used for initial analysis, focusing on regression and correlation tests on the WSDOT data and

numerous external variables. Regarding data cleansing, the team wrote VBA processes to loop through

numerous excel files and clear summary data, add column headings, and filter and delete irrelevant or invalid

Page 20: I-405 Corridor Analysis

19I-405 Corridor Analysis

data points. In addition, VBA was used to loop through and test files to ensure they were prepared for various

stages of the cleansing processes. Each cleansing process often took anywhere from a couple hours to eight

plus hours of processing. In the end, Excel provided a platform for data cleansing and management while also

ensuring quality of work for file management.

R Programming

R is a statistical programming language often used to help data scientists, mathematicians, researchers, and

many more to manage, cleanse, and analyze data. Within our project, the team utilized R for initial data

analysis, file validation and management, and social media analytics. Starting out the project, the team ran

initial regression models in R to help drive the project forward, exploring what various factors could have

impact on traffic in the Seattle area. In addition, R was used to help with file validation and organization during

the data cleansing process. When running Excel cleansing processes, R would run in parallel to help speed

along the process. In this case R was used to loop through files and verify the data had been cleansed

properly. Once a file was cleansed it would then be converted into CSV (comma separated values) format and

organized based upon the file type (traffic volume or speed). This CSV format would then allow computers to

process files at a higher rate than normal Excel files. Lastly, R was used to help scrape data from Twitter to

provide social media data regarding what citizens think about the current conditions on the road. In the end, R

provided an additional tool to help manage and process data for the team’s variety of purposes.

SQL Server

SQL Server is a data storage and management software suite which is provided by the Microsoft Corporation.

This software is used as a server and database management toolset; which provides many sub-products to

help with data management, processing, and analysis. For this project specifically, the primary applications

used were SQL Server Management Studios (SSMS), SQL Server Integration Services, and SQL Server Business

Intelligence Data Tools 2013 (Data Tools). SSMS was to develop staging databases for data cleansing and

aggregation. Once data had been cleansed through Excel and R, it was loaded from files into the SQL Server

databases the team had developed. These data loading processes were completed through SSIS which helps

manage data extraction, transformation, and loading. In essence, once data was cleansed, the team utilized

SSIS to move data from files into SQL Server. SSMS was then used to aggregate data before it was ready for

movement into the cloud. Data Tools helped manage Azure’s SQL Data Warehouse; allowing the team to

develop data tables and views of data.

Microsoft Azure

Page 21: I-405 Corridor Analysis

20I-405 Corridor Analysis

Microsoft Azure is a cloud based software as a service which provides a variety of applications to businesses;

these include online servers and databases, machine learning technologies, data visualization platforms, and

more. In the case of our project, Azure was used for two major purposes which include its SQL data warehouse

service and its machine learning capabilities. The SQL Data Warehouse was used to store data via the cloud

and provide a data source location for Azure Machine Learning. To clarify, Azure machine learning is an

analytical platform provided by Microsoft and it requires access to data for its use. This data was stored in

Azure SQL Data Warehouse. From this point, Azure Machine Learning was leveraged as a streamlined tool for

data science. This tool provided the team with mathematical and computer science algorithms that are used

to understand trends and behaviors within large data sets. To summarize, Azure was used as a cloud based

storage and analytical platform once data cleansing and aggregation had been completed.

CloudApp VM

CloudApp VM was a service the team utilized to access a virtual machine in the cloud. This virtual machine

(VM) acted as an online computer with higher volumes of memory, storage, and processing power than any of

the personal machines the team had. Using this VM, the team would login through the internet and run any of

its data cleansing and management processes through the VM alone. This was due to the VM’s ability to

complete these processes within smaller time commitments than any local machines. Overall, Cloudapp VM

was leveraged to manage datasets too large for our current resources to handle.

Cloudxplorer

Cloudxplorer is an online file sharing software which allows users to save data and files in the cloud. These files

are then available to any users who have been given access to upload, view, and download files from the tool.

The team used this software to centralize its resources and transfer data on the Cloudapp VM in an efficient

manner.

Power BI

Power BI is a data visualization software provided by Microsoft. Similar to other products on the market

(Tableau, Micro Strategy, D3.js, etc.) this tool allows users to feed in data from various sources and develop

visualizations (charts, tables, graphs, etc.) to help explain the data. We utilized this technology to develop

several visualizations which helped explain traffic behavior over various periods of time.

Work Allotment

Page 22: I-405 Corridor Analysis

21I-405 Corridor Analysis

This project was accomplished by individuals from different professional backgrounds and experiences, and

below are descriptions of how each member contributed to the completion of this project.

Torey Bearly

When the project team was being put together, Torey Bearly was brought onboard to act as both the project

lead and technical lead. At a high level, Torey’s job was to delegate work and drive forward the project’s

technical work. This involved identifying the necessary processes and technologies required to accomplish the

project’s work; while communicating with David McClellan to help with clearing up ambiguity throughout the

project.

At the beginning of the project, Torey communicated with the WSDOT to help identify the required data that

the team would need. He also directed the team towards other potential datasets which were collected as a

means to help with project analysis. From this point, Torey learned how to use WSDOT’s CDR tool in order to

extract traffic data from the WSDOT. He then taught his fellow teammates and handed off sections of the data

extraction to various members of the team (Wade and Andrew). While this work was being accomplished,

Torey then developed several coding scripts to help cleanse and standardize in preparation for data

aggregation and integration into the cloud. Torey worked with Andrew to help cleanse data in batches on the

team’s Virtual Machine. Alongside this work, Torey communicated and supported Wade as Wade worked with

Twitter and R Studio to extract social media for potential analysis. Once these tasks were complete, the team

moved into the data integration and aggregation phase.

Torey then worked in SQL Server Integration Services to develop Extract, Transform, and Loading packages to

integration data into a SQL Server located on the team’s virtual machine. Torey then developed SQL views

which aggregated the data based upon requirements for analysis. After this was accomplished, Torey moved

data from the virtual machine into the teams Azure SQL Data Warehouse; proceeding to learn and teach the

team how to interact with Azure through SQL Server Data Tools. Once data was in Azure, Torey supported and

developed various Azure Machine Learning models alongside each team member for the purposes of analysis.

Finally Torey worked with the team to help make decisions regarding analysis; ending the project through

summarizing our finding and conclusions.

Andrew Kealoha

Each team member was proficient in different which helped the overall success of the project. Andrew for

instance has helped with many parts of the project from data extraction all the way to analysis. During the

Page 23: I-405 Corridor Analysis

22I-405 Corridor Analysis

beginning phases of the project Andrew did research on the I-405 corridor to determine which areas of the

freeway would yield the best results for the analysis portion of the project. Once he had determined which

areas of the corridor the team should focus on, he along with the rest of the group began extracting data his

portion of data from the CDR program provided by WSDOT.

Once the data extraction was completed, Andrew helped reduce the workload for Torey by helping the data

cleansing process on the VM (Virtual Machine/Computer). He researched different options on how to cleanse

the data with R, SSIS, and Excel. Because the files were so large and also in excel format, Torey and the team

felt it best to cleanse the data using VBA code. The files took upwards of an hour for each load to process

which required constant monitoring of the VM to ensure that the files were being cleansed correctly, while

also starting the next batch of files.

Once the data cleansing was complete the analysis phase of the data began which involved using Azure

Machine Learning (AML). Because some of the team members were unavailable during the initial portions of

the analysis phase, Andrew spent time researching and learning AML which also building a useful model with

the project sponsor David McClellan. After the model was built, Andrew explained to the team how the model

worked and how to interpret the results that it derived from the data sets. Once AML provided results from

the data sets, Andrew experimented with Power BI on how to best visual the insights derived from the data

sets in a simplistic manner that could be presented to a group of individuals with no background on the

problem that was being solved.

Finally, once the data had been collected and analyzed, Andrew, with the help of Justin, utilized Slide Bean to

create the final presentation. To create the presentation, Andrew took various screenshots of the area and

problem the team focused on, the programs utilized, and multiple graphs displaying the results of the project.

After the presentation was completed, Andrew helped proofread and edit the paper with the rest of the team.

Justin Rath

For the Washington State Department of Transportation project, Justin was committed mainly on the written

portion and presentation slide deck, taking on a support and advisory role for this capstone. When brought on,

the expectation was for Justin to ensure completion of the paper while dedicating parts to each team member

with Torey Bearly’s approval. In addition to managing the Capstone Final Paper, Justin was involved through all

meetings from the time he joined the team. If he was unable to make or missed meetings, Justin followed up

the same day or day after with Torey Bearly to get up to date. Included with the meetings were utilizing

Page 24: I-405 Corridor Analysis

23I-405 Corridor Analysis

Microsoft Azure to get familiar with the team’s activities and provide insight. He also facilitated with any small

tasks and provided ideas when requested. For the capstone paper, Justin was committed to reviewing and

researching all information pertinent to the project to develop the Literature Review. He also aided in editing

the entire paper, reviewing each member’s work. For the presentation, Justin organized the slide deck to

appropriately demonstrate the work done by the team. He utilized Slide Bean, an online application that

creates templates for professional presentations.

Wade Rogerson

In the beginning of this project, the team was focused on extracting data from CDR (Compact disc Data

Retrieval), a software provided by the WSDOT. This tool allowed the team to extract detailed information for

each milepost along I-405, but required numerous reports to pull the necessary data. Wade was asked to take

on a portion of the data and work in CDR to pull reports for assigned locations. Before any data could be

collected, it was crucial that the original files that would be used in CDR were organized in a readable format.

Wade organized the files that were needed to pull data in CDR and loaded the files onto an external hard drive

to be shared by other members of the team. Data collected by Wade and each member of the team required

over 500 individual reports.

After data had been collected, substantial time was spent cleansing and preparing the data for analysis. Once

the data had been transformed into a consistent and usable format it was ready to be used for intensive

modeling. Wade reached out to our project sponsor David McClellan and scheduled a working session where

the team was trained to use Microsoft’s Azure Machine Learning. Wade also explored various possibilities with

Azure and became comfortable building simple models with sample data provided by Microsoft. Ultimately,

one model was used to work with data sets for 25 different mileposts along I-405. As a result of the training

provided, Wade was able to work with each model and adjust the filters accordingly. Further, he analyzed the

statistical results provided for each milepost and ultimately selected the mileposts that would provide the

strongest analytical representation. Wade then utilized another Microsoft application, Power BI, to display to

the accuracy of our model in an effective manner.

Another way Wade impacted this project was through his efforts in building a Twitter Scraper that was used to

collect public opinion regarding the I-405 tolls. Wade spent many hours educating himself on R and different

ways to pull data from Twitter. Wade created a Twitter application profile specific to this project and found an

R script that was able to reference that application and pull tweets based on a specified criteria. These tweets

allowed the team to speak on the behalf of the general public without applying any bias of their own.

Page 25: I-405 Corridor Analysis

24I-405 Corridor Analysis

Lastly, Wade played a large role in the analysis portion of this project through his ability to aggregate and filter

data in a way that supported recommendations for improving the I-405 corridor. As a result of his analysis,

Wade was able to uncover the true impact of adding toll lanes to I-405 on everyday commuters.

Page 26: I-405 Corridor Analysis

25I-405 Corridor Analysis

RESOURCES

http://www.census.gov/popest/about/terms.html

http://www.wsdot.wa.gov/hov/

http://www.wsdot.wa.gov/Tolling/405/

http://www.wsdot.wa.gov/Tolling/expresstolllanes.htm

http://www.seattletimes.com/seattle-news/transportation/state-may-lift-i-405-tolls-at-night-on-

weekends-holidays/

http://www.governor.wa.gov/sites/default/files/documents/I-405_Map_021216.pdf

http://www.seattletimes.com/seattle-news/transportation/i-405-tolls-rake-in-more-than-3-times-

expected-income/

http://www.seattletimes.com/seattle-news/transportation/405-toll-lanes-may-have-been-trigger-for-

firing-of-wsdot-secretary-lynn-peterson/

http://www.seattlepi.com/local/transportation/article/I-405-Seattle-traffic-tolls-6823728.php

http://www.seattletimes.com/seattle-news/transportation/tolls-on-i-405-hit-10/

https://en.wikipedia.org/wiki/Seattle_metropolitan_area

http://www.seattletimes.com/seattle-news/transportation/seattle-congestion-were-no-5/

http://www.rentonreporter.com/news/322434171.html#

http://www.nealanalytics.com/