I-405 Corridor Analysis
-
Upload
wade-rogerson -
Category
Documents
-
view
49 -
download
2
Transcript of I-405 Corridor Analysis
Washington State University Data Analytics focus MBA Program
I-405 Corridor Analysis
Project Sponsor – Neal Analytics
Torey Bearly, Andrew Kealoha, Justin Rath, & Wade Rogerson7-31-2016
1I-405 Corridor Analysis
Acknowledgement
We would like to thank several individuals for their help and contributions to this project. We could not have
accomplished this project without their help, guidance, and information. We have listed these individuals
below and provided a brief description of how they supported the team.
David McClellan (WSU MBA Alumni 2014) - Neal Analytics
David McClellan was the project sponsor and provided the team with invaluable guidance, numerous
resources, and technological training along the way.
Joe St. Charles, John Anderson & Jeremy Bertrand - Washington State Department of Transportation
These various individuals from the Washington State Department of Transportation helped supply the
team with the data necessary to analyze Interstate 405 and complete our analysis.
Mauricio Featherman – Washington State University MBA Director
Dr. Featherman played a vital role to the birth of this project and helped the team get on their feet
while also providing support and guidance along the way.
2I-405 Corridor Analysis
Table of ContentsIntroduction..........................................................................................................................................................................3
Problem.................................................................................................................................................................................3
Goal.......................................................................................................................................................................................4
Hypothesis............................................................................................................................................................................5
Literature Review.................................................................................................................................................................6
Methods................................................................................................................................................................................8
Early Stages.............................................................................................................................................8
Data Extraction Stage..............................................................................................................................9
Hypothesis/Question Stage.....................................................................................................................9
Selecting Analysis Tools.........................................................................................................................10
Data Analysis Stage................................................................................................................................11
Findings...............................................................................................................................................................................12
Conclusion...........................................................................................................................................................................14
Recommendations:.............................................................................................................................................................15
Relation to Course Material:..............................................................................................................................................16
Utilized Technologies:.........................................................................................................................................................17
Work Allotment..................................................................................................................................................................20
RESOURCES.........................................................................................................................................................................24
3I-405 Corridor Analysis
Introduction
Time is arguably one of the most valuable resource an individual can maintain control of over their lifetime, so
it seems obvious that people often get upset when they feel their time is wasting away. This is most certainly
true for everyone when it comes to sitting in traffic; waiting helplessly through mind numbing traffic
congestion to arrive somewhere just minutes away. This project was put together, with the hopes of
addressing such an issue; to help individual across the Seattle metropolitan area save their most valuable
resource, time. Serving as our capstone project for the Washington State University Data Analytics Focused
MBA program, this project was provided by David McClellan from Neal Analytics, an advanced business and
data analytics firm in the Seattle-Bellevue area. David is an alumni of the program and hoped to provide a
valuable project which challenged its members to solve complex problems with data. With this in mind, our
team sign on to the project hoping to learn, analyze, and provide valuable insights into understanding traffic
behavior and congestion on Interstate 405; which has seen increasing complaints from local citizens and
residence of the area. This report includes an introduction to the problem, discussion of the hypothesis,
review of technologies and methods utilized, and an overview of our teams findings, conclusion, and
recommendation.
Problem
Over the last several years, the Washington State Department of Transportation (WSDOT) has been tasked
with improving traffic control in the Puget Sound region specifically in the Seattle-Tacoma-Bellevue
Metropolitan Area. In 2015, the estimated population in this range was 3,733,580 which is nearly half of the
population of the entire state. Seattle has become one of the fastest growing cities in the United States and
the impact of an increasing population has been felt by local commuters. According to the Tom Tom
navigation company, Seattle ranked 5th among cities in the United States for worst traffic congestion in 2015
with drivers in the Seattle-Tacoma-Bellevue Metropolitan Area spending an average of 89 hours delayed in
traffic jams. Commutes that only require 30 minutes when driving the speed limit now averages 53 minutes.
Ultimately, local interstates have been forced to handle an increasing number of commuters while road
capacity has remained unchanged or worsened as a result of ongoing construction projects. Additional
pressure from local residents and media outlets has pushed WSDOT to find a solution that will resolve traffic
congestion sooner rather than later.
After several years of development, new toll lanes opened both directions on Interstate 405 on September
27th, 2015. The 30 mile freeway known as I-405 takes travelers east of Lake Union from Tukwila to Lynnwood
and serves as a main source of transportation for commuters in the Puget Sound. Tolls were added to the 17-
4I-405 Corridor Analysis
mile stretch between Northeast Sixth Street in Bellevue and Interstate 5 in Lynnwood. All drivers have the
option to pay the toll to travel at speeds of 45 mph or faster at an adjusted rate based on traffic in the system.
Reports from commuters have made it unclear regarding
the effectiveness of the addition of tolls on I-405.
Frustration felt by many drivers has been a result of low
occupancy in toll lanes while heavy congestion remains in
the main lanes. According to the WSDOT, the toll lanes
are “meeting federal and state standards”, requiring that
the toll lanes maintain a speed of 45 mph 90 percent of
the time. No part of the addition of toll lanes ensures all
drivers will benefit from shorter commutes and less
congestion but is designed to increase revenue for the
state. It is unknown how the addition of toll lanes have
impacted the overall flow of traffic in all lanes on
Interstate 405
Goal
This project is sponsored by Neal Analytics, a consulting
firm headquartered in Bellevue-Seattle, Washington. Neal
Analytics helps their customers get the most out of their
data through traditional practices of Data Warehousing
and Business Intelligence to enhance the impact of
Predictive Analytics. As a premier featured partner of
Microsoft, Neal Analytics utilizes Azure Machine Learning
to provide analytics on a massive scale. Neal Analytics has
provided high performance computing along with access
to various analytical and modeling applications in support
of this project.
Reaching the end goal of this project which involves actionable insights based on observations made through
strategic analysis will require a deep understanding of database management, ETL (Extracting, Transforming,
and Loading) processing, and statistical modeling applications. Using a large data set will drive more accurate
5I-405 Corridor Analysis
predictive analytics but require an efficient ETL process to ensure that data is organized in a consistent and
usable format.
As a byproduct of extensive research, our goal as a team is to better understand the impact of implementing
tolls on Interstate 405 which is believed to have increased overall congestion levels. Using a large set of traffic
data collected by the WSDOT, statistical modeling measures the impact of a variety of factors on overall traffic
flow on I-405. Working with multiple years of traffic data will not only make it possible to validate models but
also provide a benchmark that can be used to compare current commuting time to historical averages.
Ultimately, recommendations will be made based on the finding of our analysis including numerous
perspectives on the impact of toll lanes on I-405.
Those that are familiar with current changes in the Puget Sound region concerning congestion recognize the
level of attention surrounding the decision made by the WSDOT to implement tolls on I-405. Measuring the
overall impact of the addition of toll lanes will provide closure to local commuters from a non-bias source.
Hypothesis
In order to fully understand the issues caused by these new I405 tolls, each member of the group created a list
of questions based on what might provide valuable could be answered insights into traffic behaviors.
Questions were then rated on their based on its complexity and potential value added to the project. Each
member wrote ten questions that were later reviewed as a group. The groups split up to rate the questions
based on complexity and value. Finally, the list was narrowed down to the following questions that our team
hoped to answer.
1. Will removing the toll lanes have any effect on throughput?
2. Have there been increases in collisions on the toll lanes?
3. How long does it take an average driver on an average day to drive from Bellevue to Lynnwood
(post toll lane vs. pre-toll lane)?
4. Is there an observable/significant difference in traffic speed before and after toll lane
construction began?
5. Is there an overall increase in collisions on all of 405 due to the heavy congestion?
6. Are the tolls discouraging commuters from driving rather than increasing flow?
7. Is the process of getting into the toll lane causing congestion?
8. Is there an observable/significant difference between travel times before and after the toll lanes
went live?
6I-405 Corridor Analysis
9. What are the busiest points on the 405, and are collisions a bigger issue after or before the toll
lanes?
10. How has the flow of traffic changed in the main lanes?
11. What time of day is busiest and what is the algorithm for calculating the toll rate during a certain
time?
The process of brainstorming questions allowed the group to articulate what the main issues are for the I405
and the toll lanes. These questions were good starting points to consider how different variables might arise
and be testable. Multiple rounds of scoring the questions based on complexity and value provided a short list
which truly reflects the group’s largest concerns about the toll lanes. These questions are the base for the
hypothesis for the entire project. After deliberating as a group, the research hypothesis for this project is
states that the addition of the toll lanes on 405 have had a significant and measurable impact on traffic speed
and vehicle volume throughput and average vehicle traveling speed on the 405. The null hypothesis is that
these variables have not changed much since the toll lanes have been added.
Literature Review
At 7 million people, the state of Washington is a populated and growing state. Numerous sources cite Seattle
and Bellevue as rapidly growing cities with major companies expanding their dominance in the area including
Amazon, Microsoft, and Costco to name a few. In addition, Business Insider cites Seattle as one of the best
cities to start a career. Since 2000, Washington has grown by over 1 million constituents (Bureau). Aside from
weather, one of the largest public outcries is the growing are the congestion on roads are consistently growing
along major highways and interstates. Washington State Department of Transportation (WSDOT) has worked
to develop solutions that have been proven beneficial by other congested states including California and
Florida. At the rapid growth rate that Washington is having, a solution needs to be developed that is beneficial
for constituents.
As of September 27th of 2015, WSDOT has implemented express toll lanes along the I-405 between Bellevue
and Lynnwood (WSDOT1). This development was supported federally by both the Obama and Bush
administration (Seattle Times 405 Toll). The goal was to create an opportunity for constituents who would like
to utilize the carpool lane but do not meet the high occupancy vehicle (HOV) occupancy requirement to pay a
variable fee. Congestion management is achieved through dynamic and variable tolling. This is essentially a
hybrid of high occupancy toll (HOT) and HOV lanes, adjusting fees based on real-time traffic conditions, time of
day, and day of the week (WSDOT2). Theoretically, this allows for more constituents to have access to HOV
lanes that they once did not at a price.
7I-405 Corridor Analysis
According to the WSDOT, the objective was to reduce the congestion by adding lanes and creating the ability
for express lane drivers to reach speeds of at least 45 mph. This initiative is relatively new, thus amendments
and lane improvements continue to be made since the beginning of implementation. Based on reviews, the
creation of express lanes has left drivers concerned as these renovations still have drivers feeling concerned of
the existing traffic. This literature review is aimed at identifying the known pros and cons of the
implementation of express toll lanes.
Express toll lanes are considered to be a beneficial addition to road improvements. According to WSDOT,
there are four key reasons to the changes: efficiency, traffic congestion, demand management, and future
improvements. The state of Washington has seen success prior to the implementation of I-405 Express Toll
through the SR 167 HOT lanes between Auburn and Renton. After the implementation of these HOT lanes in
2008, vehicles have exceeded an average of 50 miles per hour and driver time has decreased by approximately
6 minutes at the cost of an average toll of $1.75.
Upon implementation of express toll lanes, it has been reported by Community Transit that there is a 6 minute
savings at peak time with a 4% ridership gain. King County Transit reports 8 minute savings with 6% ridership
gain. Toll payers are saving approximately 14 minutes and traffic south of Bellevue is moving 7 minutes faster
(Seattle Times). According to the Bellevue Reporter as of May 10, 2016, King County Transit has seen a 10%
increase in ridership.
Since the express toll lanes in this sector is relatively new, changes are still being made. WSDOT is making an
effort to see this through (governer.wa.gov). Since December 18, 2015, six changes have already been
implemented including lengthened access points, skip stripes, and repaving for clarity. In addition, the
algorithm for corridor calculations are being adjusted to gain a clearer understanding of the I-405 traffic. Kate
Elliot, wrote a report for WSDOT acknowledging that the solution is not yet perfect but improvements have
been made. Trips between SR 522 and SR 527 are seeing slower speeds in the northern corridor and will be
adjusted accordingly. Emily Pace, speaker on behalf of WSDOT explained that because lanes are narrowed to
only 3, this is a capacity issue rather than a toll issue (seattlepi.com). These have been noted as future projects
that will be considered. Otherwise, WSDOT has identified speeds to be increasing by up to 6 minutes during
peak hours when constituents are returning home as identified by Figure 1.
8I-405 Corridor Analysis
While wait times have been shown to decrease, the financial gains aimed to help future improvements have
been significant. Over 3 months of the express toll lanes, $3.7 million has been raised from the initiative. This
is beneficial as it could go towards the improvements of the NB I-405 section which requires an estimate of $5
- $50 million depending on the timeline and environmental impacts (governer.wa.gov).
Although WSDOT data has shown evidence of improvement, suburban drivers noticed the shift of Kirkland
congestion ending up in Bothell. This Bothell chokepoint developed a petition of 29,000 signatures to repeal
toll lanes. Due to the considerable failure of an efficient I-405 toll setup, Republicans ousted Transportation
Secretary Lynn Peterson of her duties (Seattle Times 405 Toll). Although this has been viewed as a political
motive, it does acknowledge that thousands of motorists do not agree with this change.
In contrary to WSDOT’s analysis, independent assessments done by INRIX, found that the HOV lanes did in fact
see faster speeds than previously, but normal lanes suffered as a whole leaving miserable conditions for the
bulk of drivers. The report conducted considered 4 sections of the I-405 pre and post toll: “one each direction
north of Bothell near the state Route 527 interchange, one northbound near the state Route 520 interchange,
and one southbound near Northeast 68th Street in Kirkland (seattlepi.com).”
In addition to factual data, numerous constituents have taken their thoughts to blogging. Aside from the 3-
lane factor in the northern sector that is causing major backup, the Interstate 405 express lanes have hit their
toll maximum at $10. Questions are being raised as to whether the toll corridor was properly designed. Tolls
are adjusted to ensure a 45 mph minimum on the express lanes in hopes to also alleviate the main lanes. By
reaching max charges, the 45 mph promise cannot be kept. Spokesman Ethan Bergerson confirmed this
reporting a saving of 16-24 minutes rather than the reported 30 minutes by WSDOT. Additional public
resentment can be found in the growing petition to “Stop 405 Tolls” with over 19,000 supporters. Additional
notes include that the new toll lane funding came from gas-tax meaning those utilizing the regular lanes are
subsidizing the cost to those who can actually afford to use the new lane.
The argument for toll lanes is indicated also by the incline of price. Bergerson also stated that higher prices
mean more constituents are willing to pay for the drive than previously thought. Looking at other express toll
lanes, $11 dollar tolls have been spotted in Atlanta meaning the solution and promise to uphold 45 mph may
be hard to achieve (www.seattletimes.com). This entire project took seven months to complete in addition to
being a $484 million dollar project. Revenue will be around$7.6 million the first year, but after expenses will
leave $1.2 million to net income.
9I-405 Corridor Analysis
Methods
Early Stages
When the project and problem were initially presented, it was clear that contacting the Washington
Department of Transportation would need to be the first priority to acquire sufficient data. After some
discussion it was decided that a large amount of historical traffic volume data would be needed, and that the
best place to find the data would be directly through the department of transportation as it is all public
information. Over the course of a few weeks, many emails were exchanged with employees at the
department of transportation on how to extract the data that was needed. After talking to several employees
and numerous emails, WSDOT directed us to download a program created by the department of
transportation and to download traffic volume data from their database.
Data Extraction Stage
Once the program and the data files were downloaded, each team member began learning how the program
was setup and how the most value could be extracted from the files that had been provided. Another major
reason time was spent learning the program was so that each team member would have congruent data
during the extraction phase to make the data merging and cleansing phase more efficient.
After much experimentation with the program, it was decided that the team should examine historical data
for the entirety of I-405 between the years of 2010 to present day. From there it became clear that roughly 45
gigabytes of data (specific to I-405) would have to be extracted which equated to about 1.3 billion rows of
data. After much discussion amongst the team, a consensus was made that analyzing the entirety of I-405
might not yield the most accurate answers to the problem that needed to be solved. Instead of analyzing all
of I-405, the team instead chose to examine 1 mile before the toll lanes started and 1 mile after the toll lanes
ended. This drastically reduced the amount of data that would have to be extracted and analyzed (roughly ¼
of the previous estimate), which moreover would produce the most valid answers in a more efficient manner.
Hypothesis/Question Stage
Although only one question/problem was presented to the team, it became clear very quickly that many
subsets of questions would need to be answered to fully solve the overall problem. To accomplish this task,
each team member was individually tasked with thinking of 5-10 questions/hypotheses. Once all team
members had generated their questions, they were compiled into one document and a brainstorming session
was held in which there was a discussion of all questions/hypotheses and how well they could help solve the
traffic congestion problem. After a new list of questions had been compiled, each team member rated every
10I-405 Corridor Analysis
question on a scale of 1-5 on both the complexity of the problem (1 being easy to solve, 5 being incredibly
difficult to solve) and the value or insight that could be derived from answering that question (1 being little to
no value, 5 being great value). This process was repeated twice, each time discussing the new results. The
reasoning for doing this process twice was due to the fact that none of the team members had any prior
background in traffic science, but also because the only way to sufficiently solve the problem was to generate
as many insights as possible. The team finally decided on twelve questions and hypotheses that needed to be
answered to successfully answer address the problem; the other questions were not discarded, however the
team felt that they were not as substantial or critical to the project’s success. Over the course of the project
the team recognized a lack of time and resources would not allow for answering all twelve questions. However
these questions were then condensed into two questions/hypotheses. Is there an observable difference in
traffic speeds before and after the toll lanes were constructed, and how would traffic behave were the toll
lanes removed?
Selecting Analysis Tools
Because most of the team had little to no experience working with data sets of this size, the project mentor,
David McClellan, was consulted to gain some insight as to what data analysis tools would work best for a data
set of this size. Fortunately enough for the team, David was very informative about what analysis tools and
software’s would be best fit to handle both the scale and complexity of the data.
The first tool that was recommended was a virtual machine or VM, which is a computer that can be logged in
to remotely from any computer connected to the Internet. The reason many data analysts use VM’s is
because they have superior memory and processing power in comparison to the average home computer or
laptop, which greatly reduces the time it takes to run large experiments or data sets. Once access to the VM
had been received, team members began loading the industry standard data tools such as My SQL Server, R
Studio, Excel, and Data Tools 2013 onto the VM. Once the programs were loaded onto the VM, the next step
was to figure out how to quickly and efficiently upload the data to the VM. David suggested a program called
CloudXplorer as a starting point because it is a free software that allowed everyone on the team to access it,
while also having exceptional data transfer speeds to transfer the raw data files.
During the data-cleansing phase, David McClellan offered many suggestions of various programs that could
be used to efficiently and correctly cleanse the data. Since all of the data was extracted in Excel spreadsheets,
writing a VBA macro that eliminated missing values, bad data, and consolidated all of the files into the same
format was the most effective way to ensure all of our data was cleansed correctly. There were many other
11I-405 Corridor Analysis
programs that could have been used to cleanse the data such as R Studio, but every team member had a good
level of experience working with Excel and writing VBA code. Because the macro had to sift through roughly
550k rows of data per spreadsheet file (total of __ files), all of the files had to cleansed on the VM because of
its ability to process large amounts data at a fast pace.
Data Analysis Stage
Once the data was aggregated, cleansed, and uploaded to the data warehouse the team was ready to begin
the analysis phase. Prior to starting the analysis phase, a team meeting was held with David to discuss how to
efficiently and effectively analyze the data. After some suggestions and discussion, the team was in
agreement that it was necessary to build a model to process and generate insight from the data. Per
McClellan’s David’s suggestion, Azure Machine Learning (AML) was chosen to build the model.
Prior to McClellan’s David’s recommendation, none of the team members had any experience using AML.
Each team member spent an extensive amount of time learning how to properly build models in AML along
with deciding what type of statistical analysis was going to be conducted on the data. Based on research and
several recommendations from David, basic linear and boosted decision tree regression models were used to
determine which variables were the largest contributors to congestion on the I-405. In addition to
experimenting with different types of statistical analyses, team members also wrote queries in SQL Server that
were then copied over to AML to build models for each milepost. The reason for creating specific data sets
was to analyze different traffic factors such how on/off ramps affected the overall flow of traffic and
congestion; other factors such as time of day, location of the milepost, and direction (northbound &
southbound) of the freeway were also analyzed.
Once the models were created, each model was scored on its coefficient of determination (CoD) which is
similar to R Squared from a statistical analysis point of view; this statistical measure shows how closely the
data fits the regression line and how much of the data is explained. Once all of the CoD’s were collected, the
team decided focus on the 5 mileposts with the highest CoD’s. After selecting the mileposts, the team focused
on analyzing the traffic data from 3PM-6PM (peak rush hours) as it provided a smaller and more focused data
set for the problem to be solved.
After examining the data provided by the models, the team used Excel to derive insights and determine if
congestion had worsened since the completion of the corridor expansion. Wade and Torey created a
12I-405 Corridor Analysis
comprehensive Excel Workbook that incorporated an Azure Machine Learning add-on which allowed the team
to model several predictive scenarios; the workbook was also created with a simplistic user-interface so that
any end-user could easily use it.
In order to make the workbook and model predictive, traffic data from January 2016 through March 2016 was
fed into the workbook to create scenarios such as changing all lanes to non-toll lanes, 1 toll lane and 4 non-toll
lanes, only having tolls enabled during non-peak hours, etc.. Once the team had derived the desired results
from the data, a combination of Excel graphs and Power BI to visualize the results as they provided a simple
way to show others the effect of the toll lanes since their completion.
Findings
Analysis was conducted in two ways including further exploration of the raw data along with predictive
analysis using the AML models that were developed. As a result, multiple hypothesis questions originally
developed to help guide the analysis were addressed including “Will removing the toll lanes have any effect on
throughput?” “Is there an observable/significant difference in traffic speed before and after toll lane
construction began?” “How has the flow of traffic changed in the main lanes?” Ultimately, the findings of this
analysis served as the foundation for additional recommendations for improving congestion on the I-405
corridor.
Using the raw data set, peak congestion hours were identified using aggregated information for Average
Calculated Speed and Time both directions for each milepost on I-405 during the months of October,
November and December. As expected, this analysis displayed the highest congestion levels at 5:00 PM with
slowdowns beginning at 1:00 PM and lasting until after 7:00 PM. During the slowest hours of the day, vehicles
were traveling on at speeds of less than 25 mph on average. Slower speeds during these hours can be
explained given the increased volume levels of commuters on I-405 resulting in heavy congestion.
13I-405 Corridor Analysis
Based on the findings regarding peak congestion, the analysis was scoped to only include data between the
hours of 3:00 PM to 6:00. Again, filters were applied to only show data in October, September, and December,
for all mileposts each direction of I-405. Since the toll lanes are open to all drivers on weekends, filters were
applied to only show data for normal lanes during the week. In this portion of the analysis, Average Calculated
Speed and Average Volume were measured for dates before and after implementation of the toll lanes. In this
case, findings suggested that commutes before the addition of the toll lanes were worse than they are now.
According to the data, the addition of the toll lanes has decreased the average number of drivers on the road
and increased the average speed those drivers are traveling at by almost 4 mph on average. This data suggests
that traffic behavior have improved after the addition of toll lanes, however, including the toll lanes in the
analysis proved to skew the results. It was determined that a more significant analysis would measure the
impact of toll lanes on the average commuter.
In this stage of the analysis, additional filters were
added so that the data only included normal lanes.
In this case, the data told a very different story and
displayed a drop in the amount of drivers in normal lanes after the toll lanes were added along with a
decrease in speed of almost 2 mph on average. While it may be argued that 2 mph is hard to notice while on
the road, pair that decrease in speed with a decrease in drivers and the result is a decrease in throughput and
much slower commute times. Further, this data shows that at any given hour from 3-6:00 PM the average
amount of vehicles in any of the normal lanes is less than it was before the tolls were applied. According to the
definition of throughput, if all other factors remain constant, less vehicles on the road should allow those
vehicles to travel at faster speeds, however, the findings of this analysis are contrary.
14I-405 Corridor Analysis
Although it is important to note that drivers that choose pay a premium to use the I-405 toll lanes will receive
much faster travel times, it is more meaningful to demonstrate the negative impact toll lanes have had on
everyday travelers that choose to use normal lanes. As a result of the addition of toll lanes, throughput has
decreased for the thousands of drivers that use standard lanes on I-405.
The second phase of analysis for this project focused on using AML predictive models to simulate different
scenarios on the road. This process involved developing predictive models to help estimate traffic volumes and
speeds based upon data inserted into the model. Once these models were completed, the team moved
forward with an analysis to simulate opening up express toll lanes. Using the diagram below, the goal was to
simulate opening up an express lane on I405 and treating it as a regular lane. On the left on the diagram, there
is the current system in place for milepost 12.72 (south Bellevue). The green columns represent regular traffic
lanes and the yellowish orange column is an express toll lane. Using the predictive model developed for
milepost 12.72, traffic was then estimated assuming that the express toll lane was now treated as a regular
lane (right side of diagram).
Utilizing data from January to March of 2016, raw
traffic data was adjusted to simulate the scenario
above and then a comparison was made. This
comparison can be viewed in the chart below. This
chart provides an estimate change in the number of
vehicles that can travel through this milepost over
the course of any given hour. For example, at 12:00 noon on Mondays, these results demonstrate that 183
more vehicles can travel through the 12.72 milepost if the express lanes were open. This indicates that
opening the express lanes would help increase vehicle throughput for this milepost during that time and day.
With that in mind, each number highlighted green in the charge, represents an increase in throughput
assuming the express lanes were open. This analysis therefore demonstrates that throughout most of the day,
traffic sees an increase in throughput should the express lanes be opened and treated at regular lanes.
12:00 PM 1:00 PM 2:00 PM 3:00 PM 4:00 PM 5:00 PM 6:00 PM 7:00 PM 8:00 PM 9:00 PM 10:00 PMMonday 183 157 135 36 -25 -8 101 117 92 74 65Tuesday 184 179 105 12 -71 -71 50 97 99 105 81Wednesday 185 169 81 -27 -55 -51 30 91 113 99 72Thursday 167 153 36 -41 -44 -41 8 81 94 104 81Friday 156 87 -17 -11 -9 -2 50 75 84 94 96
Conclusion
15I-405 Corridor Analysis
Based upon the action taken by the WSDOT, average drivers commuting across the I-405 corridor have
experienced an increase in traffic congestion since the implementation of the express toll lanes. Commuters
who cannot afford or choose not to participate in the express lanes are experiencing an average decrease in
driving speed of 1.5 miles per hour, and there are on average 49.21 vehicles less traveling through various
mileposts during peak congestion times. This suggests that the WSDOT has actually harmed the experience
commuters have while driving along the I405; despite having set out to help these commuters. It appears that
the balance between generating revenue for the I-405 corridor versus helping decrease traffic congestion has
leaned towards generating revenues for the WSDOT. This is further supported by the WSDOT claiming they
have received revenues higher than expected since the opening of the express lanes. Therefore, the WSDOT
must reconsider how they intend to manage the I-405 if their number one goal is to help commuters by
decreasing congestion on the I-405 corridor.
Recommendations
Based upon our findings, the team has developed three recommendations for the WSDOT with the goal of
helping to decrease traffic congestion on the I-405 corridor. These recommendations were developed to be
mutually inclusive, but with priority based upon the order described below.
1. Minimum Four Regular Lanes
Along the I-405 corridor there are upwards to five lines at a time across the road, with anywhere from three
regular lanes and one express lane to four regular lanes and one express lane. It is recommended that the
WSDOT maintains at least four regular lanes minimum along the I-405 corridor. This would involve opening up
one express lane wherever there are two express lanes, and converting this lane into a regular lane. As shown
by our analysis earlier, by converting an express lane into a regular lane, the roads volume throughput of
vehicles increases overall. This also allows the WSDOT to maintain at least one express lane reserved for
carpooling and drivers who pay the toll.
2. Open Express Lanes During Non-Peak Hours
Throughout most of the day, the I-405 often does not have any impact on the I-405 corridor and there are
minimal tolls charged to use these lanes on the road. Additionally, the WSDOT open all toll lanes as regular
lanes during non-peak hours of congestion. This allow for upwards to two additional regular lanes on the road
16I-405 Corridor Analysis
and in return help drive down congestion. Similar to the previous recommendation our analysis shows that
during non-peak hours if the toll lanes are opened to the public then traffic volume throughput increases. If
managed well, opening these lanes can help curve the impact of congestion before and after non-peak hours.
3. Construct New Lanes
Assuming that either the WSDOT will not convert or open the express lanes or that opening the lanes would
not decrease congestion to a large enough extent; the WSDOT consider investing into the construction of
additional lanes along the I-405 corridor. A large reason the toll lanes were constructed was to help the
WSDOT raise funds for additional changes to the I-405 road. The WSDOT has already expressed that they are
receiving revenues higher than they could have anticipated. It would therefore make sense to reinvest these
funds back into the I-405 to help alleviate congestion due to the lack of alleviation the express lanes have
provided. However, this recommendation is a last resort assuming the public will resent additional lanes being
built given past construction resulted in little improvement.
Overall these recommendations were developed to help the WSDOT make informed decisions regarding the I-
405 corridor. While each recommendation will benefit the average driver in the long run, recommendations
one and two should be considered before recommendation three is approached. This due to the significant
costs of recommendation three. Additional in-depth analysis be completed near areas beyond the physical
locations of the express lanes is also encouraged. Assuming the WSDOT manages to greatly decrease
congestion along the road, this could lead to new bottlenecks being discovered where the express lanes are
not in place. There could also be new bottlenecks near on/exit ramps that are currently without issue due to
the congestion levels farther away from their location. Regardless, these recommendation are in the best
interest of the WSDOT to help solve the congestion problem on I-405.
Relation to Course Material
To accomplish this project, the team was required to call upon many of the skills, techniques, and concepts
taught over this last year of study. More specifically, the team had to rely on the numerous hard skills used for
managing and analyzing data. Most of these hard skills were taught through marketing analysis 555,
management information systems 593, and management operations 557. In addition, this project demanded a
strong understanding of statistics and developing regression models; all of which were discussed in BA###.
Furthermore, social media analytics was utilized to help explore where current I405 consumers stand in
regards to the traffic toll lanes. Within this section of the paper, major concepts used throughout the project
17I-405 Corridor Analysis
which were covered within our course material are explained. These concepts included the hard skills and
conceptual material used for data management, data analysis and statistics, and social media analytics.
Due to the nature of this project, the work required heavy technical experience from its members; focusing on
deriving valuable information from large datasets. Across both the marketing analysis 555 and management
information systems 593 courses; the team had to rely on techniques taught data management and
manipulation. In this case, these courses taught skills related to SQL Server and its many attached products
(such as SQL Data Tools and SQL Server Integration Services). For the majority of the team, exposure to these
technologies began this year in courses that covered major purposes of each software application and how to
approach problems for data management. A simple example is using SQL Server to store and manipulate data
for integration across three separate systems. The team had to develop processes which moved data across
excel file to a local SQL Server and then from that local SQL Server to a cloud based solution. Within these two
processes, the data was also aggregated and cleansed to ensure that only the necessary data was moved
across each system. These skills were all taught over the course of the year, focusing on SQL Server
management in the first semester followed by learning SQL Server Integration Services and Data Tools the
latter half of the program. Beyond these hard skills for data management, the team was also exposed to the
value of Excel’s visual basic coding functionalities. Excel’s VBA technology provided a resources for data
cleansing which allowed the team to prepare over 1300 data files for loading into a SQL Server. Without
learning this material from management operations 593, the entire project would have struggled to move
forward due to the manual work that would have been required. Instead, the team was capable of automating
data cleansing through developing VBA processes that looped through files and standardized data across the
1300 files.
Once the team had prepared data for analysis, the next steps focused on statistical and data analysis
technologies and methodologies. These materials were taught across all courses, focusing on both on the hard
skills required for this form of work, but also the thought processes required when approaching complex
business problems. In the case of hard skills, many of the team was required to have a foundation in statistics
and regression models. These concepts were taught heavily within our BA516 course where proper
procedures for collecting and analyzing variables for the purpose of regression were taught. For example,
when trying to determine what could impact traffic volumes and speed on I-405; it was important to account
for both external variables such as weather, sporting events, time of year, and so on but also how this data
was structured for our statistical analysis. Utilizing binary variables and data transformations to help scope
down variables and provide specific answers such as whether an individual weekday had impact vs all days
18I-405 Corridor Analysis
across the week. Although a subtle difference, this type of data transformation was essential to accomplish
the data analysis required for the project.
The final topic to cover is social media analytics, which allowed the team to understand where consumers
within the market are currently standing. This material focused on how to collect and analyze data from
Twitter to understand consumers.
Utilized Technologies
To complete this project, the team was required to work with a variety of applications and coding languages
which supported data collection, management, and analysis. Within this section, each technology will be
briefly summarized; including its general purpose and a short description of how the team drew value from its
use.
Compact dis Data Retrieval (CDR)
The compact disc data retrieval tool (CDR) was provided by WSDOT as a software application which allows
users to extract data from DAT files in the form of text and excel output files. The DAT files themselves were
also provided by WSDOT and were encrypted to help compress the data for ease of download and access to
users. The CDR would then take these files and output an excel spreadsheet based upon user requirements
entered into the program. These requirements would then help scope the data (date range, granularity, road,
milepost, etc.) as well as define the output file’s format and data structure. These two features allowed the
team to then standardize its data collection resulting in a uniform data structure for data cleansing. Overall,
CDR was used as our team’s main source of data collection from the WSDOT.
Excel
Throughout the project, Microsoft Office’s Excel platform was leveraged to assist with project management,
data cleansing, data analysis, and more. In order to effectively manage the team’s time and efforts, excel was
used to analyze and scope out data collection for various members of the team. For example, I405 has
numerous mileposts used to help analyze traffic and each milepost has various elements of analysis provided
to users. Due to the vast quantity of mileposts and elements, Excel provided the team a method of organizing
milepost and their subgroups to help divide work reasonably amongst team members. Excel’s data analysis
tools were also used for initial analysis, focusing on regression and correlation tests on the WSDOT data and
numerous external variables. Regarding data cleansing, the team wrote VBA processes to loop through
numerous excel files and clear summary data, add column headings, and filter and delete irrelevant or invalid
19I-405 Corridor Analysis
data points. In addition, VBA was used to loop through and test files to ensure they were prepared for various
stages of the cleansing processes. Each cleansing process often took anywhere from a couple hours to eight
plus hours of processing. In the end, Excel provided a platform for data cleansing and management while also
ensuring quality of work for file management.
R Programming
R is a statistical programming language often used to help data scientists, mathematicians, researchers, and
many more to manage, cleanse, and analyze data. Within our project, the team utilized R for initial data
analysis, file validation and management, and social media analytics. Starting out the project, the team ran
initial regression models in R to help drive the project forward, exploring what various factors could have
impact on traffic in the Seattle area. In addition, R was used to help with file validation and organization during
the data cleansing process. When running Excel cleansing processes, R would run in parallel to help speed
along the process. In this case R was used to loop through files and verify the data had been cleansed
properly. Once a file was cleansed it would then be converted into CSV (comma separated values) format and
organized based upon the file type (traffic volume or speed). This CSV format would then allow computers to
process files at a higher rate than normal Excel files. Lastly, R was used to help scrape data from Twitter to
provide social media data regarding what citizens think about the current conditions on the road. In the end, R
provided an additional tool to help manage and process data for the team’s variety of purposes.
SQL Server
SQL Server is a data storage and management software suite which is provided by the Microsoft Corporation.
This software is used as a server and database management toolset; which provides many sub-products to
help with data management, processing, and analysis. For this project specifically, the primary applications
used were SQL Server Management Studios (SSMS), SQL Server Integration Services, and SQL Server Business
Intelligence Data Tools 2013 (Data Tools). SSMS was to develop staging databases for data cleansing and
aggregation. Once data had been cleansed through Excel and R, it was loaded from files into the SQL Server
databases the team had developed. These data loading processes were completed through SSIS which helps
manage data extraction, transformation, and loading. In essence, once data was cleansed, the team utilized
SSIS to move data from files into SQL Server. SSMS was then used to aggregate data before it was ready for
movement into the cloud. Data Tools helped manage Azure’s SQL Data Warehouse; allowing the team to
develop data tables and views of data.
Microsoft Azure
20I-405 Corridor Analysis
Microsoft Azure is a cloud based software as a service which provides a variety of applications to businesses;
these include online servers and databases, machine learning technologies, data visualization platforms, and
more. In the case of our project, Azure was used for two major purposes which include its SQL data warehouse
service and its machine learning capabilities. The SQL Data Warehouse was used to store data via the cloud
and provide a data source location for Azure Machine Learning. To clarify, Azure machine learning is an
analytical platform provided by Microsoft and it requires access to data for its use. This data was stored in
Azure SQL Data Warehouse. From this point, Azure Machine Learning was leveraged as a streamlined tool for
data science. This tool provided the team with mathematical and computer science algorithms that are used
to understand trends and behaviors within large data sets. To summarize, Azure was used as a cloud based
storage and analytical platform once data cleansing and aggregation had been completed.
CloudApp VM
CloudApp VM was a service the team utilized to access a virtual machine in the cloud. This virtual machine
(VM) acted as an online computer with higher volumes of memory, storage, and processing power than any of
the personal machines the team had. Using this VM, the team would login through the internet and run any of
its data cleansing and management processes through the VM alone. This was due to the VM’s ability to
complete these processes within smaller time commitments than any local machines. Overall, Cloudapp VM
was leveraged to manage datasets too large for our current resources to handle.
Cloudxplorer
Cloudxplorer is an online file sharing software which allows users to save data and files in the cloud. These files
are then available to any users who have been given access to upload, view, and download files from the tool.
The team used this software to centralize its resources and transfer data on the Cloudapp VM in an efficient
manner.
Power BI
Power BI is a data visualization software provided by Microsoft. Similar to other products on the market
(Tableau, Micro Strategy, D3.js, etc.) this tool allows users to feed in data from various sources and develop
visualizations (charts, tables, graphs, etc.) to help explain the data. We utilized this technology to develop
several visualizations which helped explain traffic behavior over various periods of time.
Work Allotment
21I-405 Corridor Analysis
This project was accomplished by individuals from different professional backgrounds and experiences, and
below are descriptions of how each member contributed to the completion of this project.
Torey Bearly
When the project team was being put together, Torey Bearly was brought onboard to act as both the project
lead and technical lead. At a high level, Torey’s job was to delegate work and drive forward the project’s
technical work. This involved identifying the necessary processes and technologies required to accomplish the
project’s work; while communicating with David McClellan to help with clearing up ambiguity throughout the
project.
At the beginning of the project, Torey communicated with the WSDOT to help identify the required data that
the team would need. He also directed the team towards other potential datasets which were collected as a
means to help with project analysis. From this point, Torey learned how to use WSDOT’s CDR tool in order to
extract traffic data from the WSDOT. He then taught his fellow teammates and handed off sections of the data
extraction to various members of the team (Wade and Andrew). While this work was being accomplished,
Torey then developed several coding scripts to help cleanse and standardize in preparation for data
aggregation and integration into the cloud. Torey worked with Andrew to help cleanse data in batches on the
team’s Virtual Machine. Alongside this work, Torey communicated and supported Wade as Wade worked with
Twitter and R Studio to extract social media for potential analysis. Once these tasks were complete, the team
moved into the data integration and aggregation phase.
Torey then worked in SQL Server Integration Services to develop Extract, Transform, and Loading packages to
integration data into a SQL Server located on the team’s virtual machine. Torey then developed SQL views
which aggregated the data based upon requirements for analysis. After this was accomplished, Torey moved
data from the virtual machine into the teams Azure SQL Data Warehouse; proceeding to learn and teach the
team how to interact with Azure through SQL Server Data Tools. Once data was in Azure, Torey supported and
developed various Azure Machine Learning models alongside each team member for the purposes of analysis.
Finally Torey worked with the team to help make decisions regarding analysis; ending the project through
summarizing our finding and conclusions.
Andrew Kealoha
Each team member was proficient in different which helped the overall success of the project. Andrew for
instance has helped with many parts of the project from data extraction all the way to analysis. During the
22I-405 Corridor Analysis
beginning phases of the project Andrew did research on the I-405 corridor to determine which areas of the
freeway would yield the best results for the analysis portion of the project. Once he had determined which
areas of the corridor the team should focus on, he along with the rest of the group began extracting data his
portion of data from the CDR program provided by WSDOT.
Once the data extraction was completed, Andrew helped reduce the workload for Torey by helping the data
cleansing process on the VM (Virtual Machine/Computer). He researched different options on how to cleanse
the data with R, SSIS, and Excel. Because the files were so large and also in excel format, Torey and the team
felt it best to cleanse the data using VBA code. The files took upwards of an hour for each load to process
which required constant monitoring of the VM to ensure that the files were being cleansed correctly, while
also starting the next batch of files.
Once the data cleansing was complete the analysis phase of the data began which involved using Azure
Machine Learning (AML). Because some of the team members were unavailable during the initial portions of
the analysis phase, Andrew spent time researching and learning AML which also building a useful model with
the project sponsor David McClellan. After the model was built, Andrew explained to the team how the model
worked and how to interpret the results that it derived from the data sets. Once AML provided results from
the data sets, Andrew experimented with Power BI on how to best visual the insights derived from the data
sets in a simplistic manner that could be presented to a group of individuals with no background on the
problem that was being solved.
Finally, once the data had been collected and analyzed, Andrew, with the help of Justin, utilized Slide Bean to
create the final presentation. To create the presentation, Andrew took various screenshots of the area and
problem the team focused on, the programs utilized, and multiple graphs displaying the results of the project.
After the presentation was completed, Andrew helped proofread and edit the paper with the rest of the team.
Justin Rath
For the Washington State Department of Transportation project, Justin was committed mainly on the written
portion and presentation slide deck, taking on a support and advisory role for this capstone. When brought on,
the expectation was for Justin to ensure completion of the paper while dedicating parts to each team member
with Torey Bearly’s approval. In addition to managing the Capstone Final Paper, Justin was involved through all
meetings from the time he joined the team. If he was unable to make or missed meetings, Justin followed up
the same day or day after with Torey Bearly to get up to date. Included with the meetings were utilizing
23I-405 Corridor Analysis
Microsoft Azure to get familiar with the team’s activities and provide insight. He also facilitated with any small
tasks and provided ideas when requested. For the capstone paper, Justin was committed to reviewing and
researching all information pertinent to the project to develop the Literature Review. He also aided in editing
the entire paper, reviewing each member’s work. For the presentation, Justin organized the slide deck to
appropriately demonstrate the work done by the team. He utilized Slide Bean, an online application that
creates templates for professional presentations.
Wade Rogerson
In the beginning of this project, the team was focused on extracting data from CDR (Compact disc Data
Retrieval), a software provided by the WSDOT. This tool allowed the team to extract detailed information for
each milepost along I-405, but required numerous reports to pull the necessary data. Wade was asked to take
on a portion of the data and work in CDR to pull reports for assigned locations. Before any data could be
collected, it was crucial that the original files that would be used in CDR were organized in a readable format.
Wade organized the files that were needed to pull data in CDR and loaded the files onto an external hard drive
to be shared by other members of the team. Data collected by Wade and each member of the team required
over 500 individual reports.
After data had been collected, substantial time was spent cleansing and preparing the data for analysis. Once
the data had been transformed into a consistent and usable format it was ready to be used for intensive
modeling. Wade reached out to our project sponsor David McClellan and scheduled a working session where
the team was trained to use Microsoft’s Azure Machine Learning. Wade also explored various possibilities with
Azure and became comfortable building simple models with sample data provided by Microsoft. Ultimately,
one model was used to work with data sets for 25 different mileposts along I-405. As a result of the training
provided, Wade was able to work with each model and adjust the filters accordingly. Further, he analyzed the
statistical results provided for each milepost and ultimately selected the mileposts that would provide the
strongest analytical representation. Wade then utilized another Microsoft application, Power BI, to display to
the accuracy of our model in an effective manner.
Another way Wade impacted this project was through his efforts in building a Twitter Scraper that was used to
collect public opinion regarding the I-405 tolls. Wade spent many hours educating himself on R and different
ways to pull data from Twitter. Wade created a Twitter application profile specific to this project and found an
R script that was able to reference that application and pull tweets based on a specified criteria. These tweets
allowed the team to speak on the behalf of the general public without applying any bias of their own.
24I-405 Corridor Analysis
Lastly, Wade played a large role in the analysis portion of this project through his ability to aggregate and filter
data in a way that supported recommendations for improving the I-405 corridor. As a result of his analysis,
Wade was able to uncover the true impact of adding toll lanes to I-405 on everyday commuters.
25I-405 Corridor Analysis
RESOURCES
http://www.census.gov/popest/about/terms.html
http://www.wsdot.wa.gov/hov/
http://www.wsdot.wa.gov/Tolling/405/
http://www.wsdot.wa.gov/Tolling/expresstolllanes.htm
http://www.seattletimes.com/seattle-news/transportation/state-may-lift-i-405-tolls-at-night-on-
weekends-holidays/
http://www.governor.wa.gov/sites/default/files/documents/I-405_Map_021216.pdf
http://www.seattletimes.com/seattle-news/transportation/i-405-tolls-rake-in-more-than-3-times-
expected-income/
http://www.seattletimes.com/seattle-news/transportation/405-toll-lanes-may-have-been-trigger-for-
firing-of-wsdot-secretary-lynn-peterson/
http://www.seattlepi.com/local/transportation/article/I-405-Seattle-traffic-tolls-6823728.php
http://www.seattletimes.com/seattle-news/transportation/tolls-on-i-405-hit-10/
https://en.wikipedia.org/wiki/Seattle_metropolitan_area
http://www.seattletimes.com/seattle-news/transportation/seattle-congestion-were-no-5/
http://www.rentonreporter.com/news/322434171.html#
http://www.nealanalytics.com/