Imperial College London · Web viewto achieve traffic equilibrium from the network level based on...

22
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < Abstract—With the rapid development of urbanization, the problem of urban traffic congestion has become increasingly prominent. Dynamic route guidance will be a powerful way to improve the capacity of urban traffic management and mitigate traffic congestion in big cities. In the design of simulation-based experiments for most dynamic route guidance methods, the simulation data is generally estimated from a specific traffic scenario in the real-world, therefore, they are strongly dependent on the simulation data due to lack of adaptive capabilities. However, highly dynamic traffic in the city means that traffic scenarios in real systems are diverse. When out of the specific traffic scenario, the traffic efficiency improvement is limited for most dynamic route guidance methods. Thus, the ideal dynamic route guidance methods should have highly adaptive learning ability under diverse traffic scenarios, so as to have extensive improvement capabilities for different traffic scenarios. In this paper, an A ¿ trajectory Rejection method based on multi- agent Reinforcement learning ( A ¿ R 2 ) is proposed, which integrates both system and user perspective to mitigate traffic congestion and reduce travel time (TT) and travel distance (TD). First, due to the adaptive learning ability, the A ¿ R 2 can comprehensively analyze the traffic conditions for different traffic scenarios and intelligently evaluate the road congestion index from a system perspective. Then the A ¿ R 2 determines the routes for all the vehicles in the road network from a user perspective according to the road network congestion index. An extensive of simulation experiments show that under various traffic scenarios, the A ¿ R 2 can rely on adaptive learning ability to achieve better traffic efficiency. Moreover, even if many drivers are not fully complied with the route guidance, the traffic efficiency can also be improved significantly by A ¿ R 2 . This work was supported in part by the National Natural Science Foundation of China under Grant 61572369, and Grant 61711530238 (Corresponding author: Wenbin Hu, Simon Hu) C. Tang is with the School of Computer Science, Wuhan University, Wuhan 430072, China (e-mail: [email protected]). W. Hu is with the School of Computer Science, Wuhan University, Wuhan 430072, China (e-mail: [email protected]). S. Hu is with the School of Civil Engineering, ZJU-UIUC institute, Zhejiang University, Hangzhou 310058, China. He is also with the Civil and Environmental Engineering Department, Imperial College London, London, UK (e-mail: [email protected]). M. Stettler is with the Civil and Environmental Engineering Department, Imperial College London, London, UK. (e-mail: [email protected]). Index Terms—Urban transportation network; traffic congestion; dynamic route guidance; reinforcement learning; A ¿ algorithm I. INTRODUCTION odern urban transportation network is a complex system with high dynamic and uncertainty. With the fast M An Urban Traffic Route Guidance Method with High Adaptive Learning Ability under Diverse Traffic Scenarios Chuanhui Tang, Wenbin Hu, Simon Hu, and Marc Stettler 1

Transcript of Imperial College London · Web viewto achieve traffic equilibrium from the network level based on...

13

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

An Urban Traffic Route Guidance Method with High Adaptive Learning Ability under Diverse Traffic Scenarios

Chuanhui Tang, Wenbin Hu, Simon Hu, and Marc Stettler

Abstract—With the rapid development of urbanization, the problem of urban traffic congestion has become increasingly prominent. Dynamic route guidance will be a powerful way to improve the capacity of urban traffic management and mitigate traffic congestion in big cities. In the design of simulation-based experiments for most dynamic route guidance methods, the simulation data is generally estimated from a specific traffic scenario in the real-world, therefore, they are strongly dependent on the simulation data due to lack of adaptive capabilities. However, highly dynamic traffic in the city means that traffic scenarios in real systems are diverse. When out of the specific traffic scenario, the traffic efficiency improvement is limited for most dynamic route guidance methods. Thus, the ideal dynamic route guidance methods should have highly adaptive learning ability under diverse traffic scenarios, so as to have extensive improvement capabilities for different traffic scenarios. In this paper, an trajectory Rejection method based on multi-agent Reinforcement learning () is proposed, which integrates both system and user perspective to mitigate traffic congestion and reduce travel time (TT) and travel distance (TD). First, due to the adaptive learning ability, the can comprehensively analyze the traffic conditions for different traffic scenarios and intelligently evaluate the road congestion index from a system perspective. Then the determines the routes for all the vehicles in the road network from a user perspective according to the road network congestion index. An extensive of simulation experiments show that under various traffic scenarios, the can rely on adaptive learning ability to achieve better traffic efficiency. Moreover, even if many drivers are not fully complied with the route guidance, the traffic efficiency can also be improved significantly by .

This work was supported in part by the National Natural Science Foundation of China under Grant 61572369, and Grant 61711530238 (Corresponding author: Wenbin Hu, Simon Hu)

C. Tang is with the School of Computer Science, Wuhan University, Wuhan 430072, China (e-mail: [email protected]).

W. Hu is with the School of Computer Science, Wuhan University, Wuhan 430072, China (e-mail: [email protected]).

S. Hu is with the School of Civil Engineering, ZJU-UIUC institute, Zhejiang University, Hangzhou 310058, China. He is also with the Civil and Environmental Engineering Department, Imperial College London, London, UK (e-mail: [email protected]).

M. Stettler is with the Civil and Environmental Engineering Department, Imperial College London, London, UK. (e-mail: [email protected]).

Index Terms—Urban transportation network; traffic congestion; dynamic route guidance; reinforcement learning; algorithm

INTRODUCTION

M

odern urban transportation network is a complex system with high dynamic and uncertainty. With the fast development of modern society, urban traffic congestion has become a daily problem in many cities due to the increasing of traffic demands. Route guidance in dynamic environment is a hot research topic in the field of intelligent transportation system (ITS), will be a powerful way to improve the capacity of urban traffic management and mitigate traffic congestion in big cities. One consensus in many route guidance methods is to avoid traffic congestion as much as possible [1]–[4]. Traditionally, using traffic flow guidance may improve traffic efficiency to a certain degree; hence many big cities have been vastly installing variable message signs (VMS) or by means of radio to broadcast real-time traffic flow information. But the information in these systems is open to all drivers, and congestion can easily be transferred to different parts of the network. These drawbacks can be overcome by predicting future traffic conditions [5]–[7], but in some of the existing route guidance methods, the origin-destination (OD) set is assumed to be known by the system, which would obviously be unrealistic assumption [8], [9]. In addition, the application scenarios of many of the existing route guidance methods, require various conditions as a prerequisite, for example, need to know the road impedance function [10], [11].

The route guidance of urban traffic is essentially the shortest path problem in graph theory. The algorithm is the basic method to solve this problem, and quite a lot of research has been done based on algorithm [12]–[15]. In [13], an improved algorithm is proposed to improve the reliability of route guidance by adding penalties to links with high probability of congestion. The algorithm is widely used because it is easy to be improved and high efficiency in searching for optimal solutions. For instance, in [14] a bidirectional time-varying optimal path algorithm based on improved heuristic function and backward search technology is proposed, which greatly speeds up the path solving speed. However, these methods do not take into account the different interests of system perspective and user perspective. Many methods try to estimate the current TT or predict the anticipated TT. These methods can be divided into two categories: methods based on historical traffic data [16]–[18] and methods based on real-time traffic data [19]–[21]. Methods of first category may suffer from high computational complexity, and in addition, it is possible that a best match cannot be found in the historical database due to changes in traffic patterns. In addition, for methods of second category some of the prerequisites required are often difficult to meet in an actual transportation system, such as the speed-flow-density relationship.

The research of route guidance methods includes system perspective and user perspective. The user-oriented route guidance methods which focus on user’s preference compared to that a system-oriented have long been studied [22]–[24]. User’s preference that mainly refers to the shortest TT or travel distance (TD) is the primary consideration for the user-oriented guidance system. This may lead to traffic congestion because the user-optimal path only considers maximizing the user’s interest. Route guidance methods for finding system-optimal traffic distribution is also considered by many research [25]–[27]. It is well-known that a system-oriented guidance system sometimes seems to lack of fairness because certain users’ interests were sacrificed for the sake of system’s interests. Thus, in a system-oriented guidance system, when the unfairness reaches a certain level, the user may not follow the route guidance, thereby violating the original intention of the system optimization. Preventing the difference between the route proposed by the system-oriented system and the shortest route is too large, and a compromise can be achieved between the system perspective and the user perspective [1], [3], [28]. However, the complex parameter settings of these methods under different traffic scenarios make them lack certain adaptive ability.

Multi-agent system (MAS) has been widely used in path guidance [29]–[31], because it has the characteristics of self-adaptation and cooperation, and can exchange information effectively and real-time based on VANET [32]. For instance, Cao [2] propose a pheromone-based MAS, in which infrastructure agents collect pheromones left by vehicle agents on the road to predict the road congestion level in the near future. Currently, with the continuous development of machine learning, MAS combined with reinforcement learning (RL) bring new ideas to the traditional traffic control problem. Optimization algorithms based on RL has been a research frontier of ITS in recent years [33]–[36]. Optimization strategies applied in real networks are mostly applied to traffic signals to minimize traffic delays or maximize network capacity [37]–[39]. In addition, some researches have explored the application of RL in dynamic route guidance. In [35], the authors introduce the concept of node pressure into the traffic control system, and proposes a static and dynamic solution based on MAS and RL for reducing the node pressure. Basha [40] uses RL to solve the dynamic traffic routing problem based on a cell transmission model (CTM), so that the transportation network can achieve an equilibrium state. In [41], a Q-based dynamic programming and a Boltzmann distribution are used to create a priority route for the vehicle. However, in these methods, the agent is isolated from each other, and cannot achieve the overall coordination and cooperation of the agent. In some cases, even the coordination and cooperation of the agent is realized, the time and space cost is very high and the convergence speed of the algorithm is low.

As the development of system simulation technology, traffic simulation is a powerful tool for obtaining actual effect that cannot be obtained from analytical methods. It is often evaluated by a simulation-based experiment whether a route guidance method is good or bad. However, the simulation data is generally estimated from a specific traffic scenario in the real-world, as it is difficult to estimate all traffic scenarios in the real-world. In fact, any particular simulation is an imperfect approximation of a real-world system, and no single traffic scenario can accurately represent a real system. Therefore, most dynamic route guidance methods are strongly dependent on the simulation data due to lack of adaptive capabilities. For instance, dynamic random shortest path (DRkSP) algorithm cannot adaptively adjust the value of to achieve the best performance in different traffic scenarios. Moreover, dynamic shortest path (DSP) algorithm performs well if the network with less traffic, but when there is a lot of traffic on the network, it is easy to transfer congestion to a different part of the network.

In this paper, we propose an trajectory Rejection method based on multi-agent Reinforcement learning () to effectively mitigate traffic congestion and to reduce TT and TD under diverse traffic scenarios. To achieve this goal, the is designed to integrate both system and user perspective, i.e., to mitigate the traffic congestion of the system and reduce the TT and TD for the user. The method consists of two parts, the Module Q-learning () algorithm and the trajectory Rejection () algorithm. First, based on algorithm, which is an improved multi-agent RL (MARL), the can comprehensively analyze the traffic conditions and intelligently evaluate the road congestion index from network level. Then, based on algorithm, the determines the routes for all the vehicles in the road network from a user perspective according to congestion index of road network. Because RL has lower requirements for system prior knowledge, it is good at mastering a new traffic scenario from scratch, and has strong adaptive ability for different traffic scenarios. Thus, no matter which traffic scenario it is, the can learn the best strategy to mitigate traffic congestion from system perspective and increase traffic efficiency from user perspective adaptively.

The remainder of the paper is organized as follows: Section II presents the system architecture of method. Section III describes our proposed method. Section IV presents the simulation of method inner city transportation network and provides experimental results. Section V concludes the full paper.

System architecture

This section presents the system architecture of the method. From Fig. 1, the system architecture comprises of three parts: Location Services, Traffic Control Center, and Vehicle Agents (VA).

Traffic conditions in the system can be described by the total number of vehicles in the transportation network. Therefore, Location Services is used to obtain the real-time location of vehicles, and then calculate the number of vehicles on each road in the road network.

Fig. 1. System architecture.

Traffic Control Center helps to improve traffic efficiency from system perspective, and dedicate to mitigate traffic congestion in high dynamic cities. In this system architecture, Traffic Control Center generates Virtual Road Agent (VRA) for every single road in the network. Each VRA first obtains information about the road it controls from Location Services, then applies the algorithm to calculate the travel cost of each road of network. Thus, Traffic Control Center relies on a hybrid architecture, that is, even if it runs centralized on a single server, the decisions of VRAs are decentralized. There are various criteria to define the travel cost of roads in road networks, such as the TT of roads. In our study, we define the travel cost of a road as congestion index, determined by the action taken by the corresponding VRA (detailed in section III), and its value is equal to the free flow TT multiplied by a real number greater than one. The congestion index is set by the VRA to achieve traffic equilibrium from the network level based on real-time traffic condition. Because can grasp the potential regular patterns of various traffic scenarios and is good at mastering a new traffic scenario from scratch, so as to learn the optimal control strategy that will neither result in congestions nor low utilization of the road. Thus, no matter which traffic scenario it is, the can learn the best strategy to mitigate traffic congestion and increase traffic efficiency adaptively from system perspective.

The VAs elaborate on how to improve traffic efficiency from user perspective, and dedicate to minimize TT and TD. Each VA deployed on a vehicle, applies algorithm provides vehicle with capabilities to find the user-optimal route according to the congestion index of road network. Thus, VAs rely on a decentralized architecture.

Thus, this system is hierarchical architecture in which Traffic Control Center relies on a hybrid architecture to calculate the congestion index of road network from system perspective, while all VAs rely on a decentralized architecture to determine user-optimal routes.

The workflow is as shown in Fig. 1. First, Location Services is used to get the number of vehicles on each road in the road network. Then the data, which represents the current traffic condition, is transmitted to Traffic Control Center. After gathering all the necessary information, the Traffic Control Center will set congestion index for all roads to manage traffic. Finally, the congestion index of each road will be delivered to the VAs of all vehicles to provide decision-making support for each vehicle. It should be noted that to simplify the analysis, we start with an assumption of 100% compliance rate, i.e., all vehicles follow the routes provided by route guidance method completely.

Proposed Method

This section describes the methods we proposed in detail. The method consists of two parts, the algorithm and the algorithm. Because each VRA and each VA can work independently and will not interfere with each other, algorithm and algorithm can run in parallel, therefore increasing efficiency. In the next few subsections, we will introduce the and algorithm separately.

Algorithm

The detailed process of adjusting the network congestion index is shown in TABLE I.

TABLE I

Process of Adjusting the Network Congestion Index

Adjust the network congestion index

for time =1,2,3… do

Get the number of vehicles on each road in the road network.

for each VRA do

Use algorithm to calculate the congestion index of the corresponding road.

end for

end for

The process is repeated until all vehicles have reached the destination. It should be noted that the time interval in this paper is represented by , and its optimal setting is discussed in the experimental part.

In this subsection, we first present the definition and notation about algorithm. Then we will introduce the mechanism of 2-agent RL, for it is the basic of algorithm. At last, we will present the implement of algorithm.

1) Definition and Notation: Given an urban road network , which consists of a set of road sections (directed edge, namely two different directions of a real road will be regarded as two different road sections) , (M is the number of road sections) and a set of intersections , (N is the number of intersections). The goal of algorithm is to obtain accurate estimated travel cost of roads from available surveillance data according to the spatial and temporal characteristics of different traffic scenarios. We use , and other symbols to represent the roads, and the VRA of is denoted as . Where the agent acts as a process controller, the environment of transportation system can be modeled as a Markov decision process (MDP) to describe the process of interaction between the transportation system environment and the agent. The optimal strategy for MDP can be solved by RL. Single-agent RL, namely -learning algorithm, is the main type of most work. The optimal strategy for agent under arbitrary state is to select action for the largest , and then by any value function, if the agent select the action at each iteration according to the following formula, then after an infinite number of iterations, the sequence will converge to the optimal value.

(1)

Where is the learning rate and is the discount factor.

However, in -learning algorithm, the concern is to maximize payoffs of a lone agent, without taking into account the impact of decisions made by other agents. To overcome this shortcoming, the system should be modeled as a MARL model, in which each VRA can coordinate with other agents to optimize the road state by analyzing the possible impacts of other agents and making decisions. Before describing our model in detail, we first introduce the definitions of state, action, and reward in our model, because it is one of the main challenges in designing RL systems.

•State Definition: Number of Vehicles. On each road, the number of vehicles is a key factor in whether the road is congested. So in order to greatly simplify the design and reduce space complexity of MARL, the VAR’s state is designed to be represented by the total number of vehicles at the corresponding road.

•Action Definition: Congestion Index of the Road. Action of VAR is designed to reduce congestion from system perspective, will effectively manage the traffic flow of each road within the next time. So the VRA’s action is designed to represent the variable congestion index of the road. The congestion index is set by the VRA to achieve traffic equilibrium from the network level, it can be used as the travel cost of the road. The road congestion index is negatively correlated with the expectation for road traffic in next time, that is, the larger the value, the smaller expectation for road traffic. In this paper, any action in the action space is the free flow TT multiplied by a real number greater than one.

•Reward Definition: Reduction in the Distance of the Corresponding Road between Number of Vehicles and Optimal Flow. Where distance is defined as |Number of Vehicles - Optimal Flow|. In this paper, the difference between the distance of two successive decision points is defined as the reward for the corresponding VRA. After executing the selected action, the reward can be positive or negative. Only when the distance between the number of vehicles and the optimal flow is reduced, the reward is positive. Otherwise, a negative reward means that the action results in a low road utilization or congestion.

• (i.e., Optimal Flow of Road ): The Optimal Traffic Flow of Road . In a road, a very small traffic flow will lead to low utilization of the road (waste of road resources). On the contrary, a very large traffic flow will result in congestions, which will reduce traffic efficiency. We define Optimal Flow as the optimal traffic flow of road, where ‘optimal’ means the vehicles number will neither result in congestions nor low utilization of the road. The optimal traffic flow depends on the objective properties of the road, and the related attributes of optimal flow are shown in TABLE II.

TABLE II

Attributes of Optimal Flow

Parameter

Value

Description

lan

Lanes of the road section.

len

Length of the road section.

sth

Smoothness of the road section, the larger the value, the more flat the road is.

rto

The ratio of the speed limit of the road section to the max speed limit of the road network.

grd

Gradient of the road section, the steeper the road, the smaller the value.

sfy

Road safety. The larger the value, the smaller the possibility of accident happening.

According to the attributes described in TABLE II, the calculation of is defined as follows

(2)

Where is the optimal flow coefficient of transportation network, which adjust the calculation of optimal flow. If the number of vehicles in the whole transportation network is very small, then the can be set at a smaller value. If the number of vehicles is very large, then should be set at a larger value. The setting of has great influence on the effect of the algorithm, which is explored in the experimental part.

TABLE III shows the notation system to be used the algorithm.

TABLE III

Notation

Notation of algorithm

G

Urban road network.

E

Set of road sections,.

V

Set of intersections,.

The time interval until the next calculation of the congestion index.

Road section .

VRA of road section .

State space of ,

Action space of , .

The optimal traffic flow of road .

Optimal flow coefficient of road network.

Neighbor roads of road , furthermore, the roads that are close to road and have a direct impact on road .

The th neighbor road of road .

An event of , .

An event set of , .

A set of sequence that contains a joint state and an action, .

The number of occurrences of event in up to time t.

The number of occurrences of events of in up to time t.

2) 2-agent RL: Due to the inherent high dynamics and uncertainty nature of the urban traffic system, route guidance is an excellent application scenario for MARL. MARL is composed of multiple agents, which may be competitive or cooperative in the environment. The simplest MARL is combined by each lone single-agent, which assuming a stationary environment and only maximize their own payoffs. However, problem of dimensionality is the major challenge associated with the application of MARL in route guidance problem. In road network , each agent need maintain a table whose size is, where represent the state spaces for road , and represent the action spaces. Thus, with the increase of agents, MARL encounters the dimensionality problem that arises because both joint state and joint action spaces are exponentially growing. The problem of dimensionality not only cause a large occupation of memory space, but also leads to slow convergence of functions. So a new coordination mechanism must be designed to avoid these problem.

As background for next presentation, we introduce the definitions of 2-agent RL.

Definition 1: 2-agent RL in this paper can be represented by the tuple where:

· , is discrete joint state space of and , where , are referred to as the local state space of and , respectively. Local in here means that the components are belong to just one VRA.

· , is discrete joint action space of and , where , are referred to as the action space of and , respectively.

· is the transition probability function, means the probability of take the action when take the action at the joint state , where , and . Obviously the transition probability map subject to constraint: .

· is the reward function, means the reward when and take joint action at joint stage and then joint stage transit to .

Definition 2: A 2-agent RL is said to be reward independent if there exist and such that

(3)

Where , and , respectively. In other words, the overall reward of and is composed of two independent reward functions, each of which depends only on the local state and next local state of the VRAs.

Definition 3: A 2-agent RL is said to be transition independent if transition probability function subject that

(4)

(5)

Where . That is, the new action of each VRA depends only on its previous joint stage.

Obviously, if a 2-agent RL is both transition independent and reward independent, then it can be solved as two separate MDPs.

Definition 4: An event is a triplet that includes a joint state and an action. An event set is a set of events at joint state .

Definition 5: A record is a set of sequence that contains a joint state and an action, which means take the action at joint state. Let symbol represent the th record , and in the joint state subject that is outcome state of .

So, transition probability at time can be derived by

(6)

Where represents the number of occurrences of event in record , represents the number of occurrences of events of in record . So and can be derived by

(7)

(8)

Where is indicator function.

3) Implement of Algorithm: In order to reduce congestion from system perspective, algorithm is designed to apply RL to calculate the congestion index of each roads of network based on collected real-time traffic condition information. Congestion index is used as travel cost of the road, the larger value, the smaller expectation for road traffic. In the ideal case all vehicles in the transportation network comply with the expectation of the traffic flow, which provides global analysis and decision-making support for determining optimal routes. The principle of algorithm is that although the vehicle may choose not the user-optimal route, but it can meet the transportation network traffic expectations, thus mitigate congestion and reduce the overall TT delay.

Instead of consider equilibrium as necessary, the goal of the algorithm is to get an optimal payoff in the presence of other VRAs. To overcome the complexity barrier, for any , only some VRAs which have the greatest impact on the should be considered. We think the neighbor roads have greatest impact on road , and its definition is as follows.

• (i.e., Neighbor Roads of Road ): Roads that are close to Road and have a direct impact on Road . Taking the road in Fig. 2 as an example, is {,,,,, ,}. The traffic condition of road is directly affected by the any road of the neighbor roads. Although , , , ,, and are also close to , but their traffic condition cannot directly affect the traffic condition of , so they are non-neighbor roads. In order to better describe, we use to represent the th neighbor road of road .

Fig. 2. Neighbor roads of .

For any , can form a module with , this module consists of multiple independent 2-agent RL. Taking the road in Fig. 2 as an example, the VRA of road can form a module with neighbor roads’ VRA. In this module, VRA of road and each VRA in forms independent 2-agent RL in turn.

To simplify the presentation, for any road , the th neighbor road of road ( ) is represented by symbol . At time , not only the states , of and can be observed, but also the information prior to time has been recorded, such as the best action selection , and the reward value of and at time .

So, the best response at joint state can be derived by

(9)

can be updated by

=

(10)

Where is the learning rate and is the discount factor.

The challenge is how to coordinate these multiple independent 2-agent RL to get the best action . Following formula will demonstrate the procedure.

(11)

The best action is the congestion index of road at time , it can be used as the travel cost of the road. Congestion index of each road will then deliver to VAs of all vehicles for determining optimal routes.

Algorithm

The detailed process of finding the user-optimal route for each vehicle is shown in TABLE IV. Where is referred to as the all of vehicles at any time in network. The process is repeated until all vehicles have reached the destination.

TABLE IV

Process of Finding the User-optimal Route for Each Vehicle

Find the user-optimal route for each vehicle

for VA of f do

if f will leave the current road do

Apply algorithm to compute user-optimal route of vehicle .

end if

end for

In the field of transportation, there are many researches on the efficiency and application of the shortest path search problem. The is the leading algorithm that combines breadth-first search algorithm with greedy strategy. Based on the algorithm, the heuristic function is further improved, and the algorithm is proposed. In this subsection, we first introduce the network representation in algorithm for it is the basic of the algorithm. Then we will introduce the implementation of the algorithm. At last, we will analyze the advantage of the algorithm.

1)Network Representation: The urban transportation network is large and complex, with numerous intersections. During the driving of the vehicle, a large number of intersections is an important factor affecting the TT of the vehicle. To better solve the problem of determining the best route, it is necessary not only to know the TT of the roads, but also to know the wasted time within intersections. In the most widely used network representation, each intersection is represented by only one node, the direction of travel between intersections is represented by edges, and the weight of the edges represents a certain cost, as shown in Fig. 2; However, it is difficult to represent the 16 different vehicle travel trajectories as shown in Fig. 3 within the intersection. Therefore, the widely used network representation should be improved to avoid these problems.

Fig. 3. 16 different travel trajectories.

In order to represent the travel cost within the intersection, we can expand the intersection node one-to-many according to the actual construction of the intersection, then add the virtual edge to represent the sixteen movements in the intersection, and the weight of the virtual edge represents the travel cost of the corresponding movement. As shown in the left part of Fig. 4, node (intersection) of original network representation is expanded to 8 nodes () of new network representation, and 16 virtual edges added represent the corresponding movements within the intersection . Therefore, after this conversion, the new network representation of original transportation network in Fig. 2 is shown in the right part of Fig.4.

Fig. 4. Network representation.

Use notation to represent all nodes in the new network representation. To indicate the relationship between the node in the new network representation and the node in the original network representation, for any node , we define to represent the original network’s node whose extension contains node. For example, in the right part of Fig. 4, , and .

In the following section, whether it is a virtual edge or a real edge, it will be called a ‘road’.

2) Implement of Algorithm: Compared to other shortest path algorithms (such as the Dijkstra algorithm), the advantage of the is that the algorithm combines breadth-first search algorithm with greedy strategy, which greatly improves the efficiency of search. In algorithm, heuristic function is used to reduce a large number of invalid searches. Given current location , destination , for any node , the estimation function of node is obtained by

(12)

Where is distance function, that represent the shortest TT from the current location to , and is the heuristic estimate of the TT between and destination . In each step, among the nodes that have been accessed but not expanded, the node selected for expanding is the node with the smallest value, which means that the node is most promising to reach the destination faster. In algorithm, the heuristic estimate function is very important.

algorithm is the basic method to solve the shortest path problem, but we can't directly use it in high dynamic urban road network, because the Traffic Control Center’s system-oriented traffic flow induction strategy may have a conflict with the interests of individual vehicles. For example, if the Traffic Control Center guides the vehicle to change its route, which contains a large number of road sections that the vehicle has traveled, then the vehicle needs to consider whether it is worth taking this route. The algorithm does not take into account the effects of the road sections that has already traveled. Therefore, algorithm must be further improved.

We think that re-selecting the roads that have already been driven is a waste of road resources. The principle of the algorithm is to add penalties to this behavior. In the algorithm, the notation represents all the roads that the vehicle has traveled, and the notation represents the all the roads of the optimal route from the current location to the node. Use notation to denote the set of previously traveled roads contained in , then is calculated as follows

(13)

Use notation to represent the total time the vehicle spent on all the roads, and represent the total time the vehicle spent on road . Then we define as follow

(14)

So , and if it satisfies that and , then the following equation is satisfied

(15)

Then the function is defined as the total penalty for calculating the optimal path from the starting node to the node, which can be obtained by

(16)

Where is the penalty coefficient of algorithm. The value of has great influence on the effect of the algorithm. If is too small, the algorithm will degenerate into the algorithm, and the setting of the value will be discussed in the experiment.

If the node is extended by the node through the road , then the recursive formula of the can be obtained as follows

(17)

Then the recursive formula of the can be obtained from the following equation

(18)

Then the recursive formula of the functions can be obtained from the following equation

(19)

Where is indicator function.

Finally, redefine the estimation function of the algorithm as

(20)

Where are referred to as the distance function and the heuristic estimate of the algorithm, respectively. And they can be derived by

(21)

(22)

Where , in this paper, .

Thus the recursive formula of the can be obtained as follows

(23)

Where represent the TT of road .

3) Advantage of Algorithm: Fig. 5 shows how the algorithm works. At time , vehicle departs from the node and the destination is the node. According to the congestion index of all the roads of the time , the optimal route calculated by the vehicle by the algorithm is .

Suppose that the vehicle arrives at the node at time , thus the , where represent of vehicle . At the same time, the vehicle departs from the node at time , and the destination is also the node. In addition, for some reasons, road is congested at time , so the congestion index of is very large. At time , there are two paths from the current position () to the node, i.e., and .

For vehicle , ,so which path to choose just depends on which path is shorter, that is, the total congestion index of the path is the smallest. Because the congestion index of is very large, the vehicle will select the as long as the congestion index of the is sufficiently large.

Fig. 5. Principle of A*R.

However for vehicle , and , so. If vehicle choose , because contains road , vehicle must suffer not only the total congestion index of but also the penalty of road . So vehicle will select the as long as the penalty of road is sufficiently large.

As we all know, many route guidance methods may lead to inevitable congestion by providing the same alternative for multiple vehicles while congestion occurs. From this small example, we see that algorithm can assign a variety of routes to vehicles that even with the same current location and destination.

Experiments

The major objective of this section is to investigate the performance of the method under various traffic scenarios by comparison with other methods, and study whether the method has strong adaptive ability for different traffic scenarios. Specifically, we focus on the following questions.

• How do parameters (time interval , penalty coefficient , optimal flow coefficient ) influence the performance under various scenarios?

• What is the performance from user perspective under various scenarios (i.e., minimize TT and TD)?

• What is the performance from system perspective under various scenarios (i.e., mitigating congestion)?

• Is the system still robust at low compliance rates (i.e., many drivers are not fully complied with the route guidance)?

The following content will answer these questions. We start with an assumption of 100% compliance rate, i.e., all vehicles follow the routes provided by route guidance method completely. But in the final part of the experiment, the assumptions will be relaxed to study the robustness of the system at low compliance rates.

Experiment Setup

VANET Simulator is used to simulate the movement of vehicles in the transportation network. VANET Simulator is an open-source simulator for microsimulation that supports easy adaptation and extension of features. Each vehicle in VANET Simulator is simulated individually and takes decisions on its own so that road traffic is simulated as realistic as possible.

1) Dataset of the Transportation Network: First, we introduce the dataset of the transportation network we used in the experiments.

Fig. 6. Yizhuang economic and technological development zone, Beijing, China.

As shown in Fig. 6, a real transportation network of Yizhuang economic and technological development zone of the Beijing city, China is obtained from OpenStreetMap. The dataset describes the topology of the road network, and the information about the lanes, length, gradient and maximum speed limit of each road. To simplify our experiment, all roads are set to be bidirectional and all roads are divided into two types: the main road and the secondary road (the thicker lines in Fig. 6 represents the main roads, and the thinner lines represents the secondary roads), the same type of roads have the same attribute values except for the length. The associated attribute (defined in TABLE II) values of the two road types are shown in the TABLE V.

TABLE V

Attribute Values of the Two Road Types

main road

secondary road

lan

3

2

sth

0.9

0.7

speed limit

60km/h

50km/h

grd

1

0.9

sfy

0.9

0.7

2) The Methods for Comparison: Then, we will introduce the route guidance methods for comparison with the method. By constructing different scenarios, the method is compared with four different route guidance methods: shortest path (SP) algorithm, DSP algorithm, DRkSP algorithm and dynamic traffic assignment (DTA) algorithm.

The SP will calculate the shortest TT path from origin to destination of each vehicle based on the speed limit. Thus SP is a no-reroute route guidance method that does not require any rerouting during the travel time.

The DSP is a classical dynamic route guidance method that computes the shortest path based on real-time speed in the road network, thus it is a current shortest time path. The advantage of DSP is its simplicity, the service periodically collects real-time traffic condition, then assigns to each vehicle a new current shortest time path. The drawback of DSP is that it only considers the current traffic condition when performing rerouting, regardless of the road sections that has already traveled and the future impact of current rerouting. Thus, it is possible to generate new congestion or choose a farther route.

The DRkSP is another classical dynamic route guidance method, at each time the service according to real-time TT in the road network assigns each vehicles to one of the paths randomly for balancing the traffic. In the experiments, to prevent the difference between the best path and the worst path is too large, we do not limit to a fixed value, but we require that the difference between the length of the best path and the worst path does not exceed 20%. Thus, the DRkSP method achieves a compromise between the system perspective and the user perspective to a certain extent. This is why the DRkSP method is used.

In the field of traffic assignment and optimization, the DTA models are widely used. The current research on DTA can be broadly divided into two categories, i.e., simulation-based methods and analytical methods. In this paper, simulation-based DTA deployed in [11] is used to achieve user equilibrium through iterative simulation. The DTA first calculate the time-dependent costs of the road segments by simulation and then updates the route choices of some vehicles. The process is repeated until certain conditions are met (i.e., the routes of all vehicles no longer changes, the TT of the road network does not change or the number of iterations reaches the maximum). Thus, the DTA is a method that has adaptive ability for different traffic scenarios.

In addition, because all the algorithms except the SP algorithm have certain randomness, in order to make the experimental results more statistically significant, we average the results over 10 runs.

3) Traffic Scenarios: Finally, we will introduce several traffic scenarios that we designed in our experiments. In order to study whether the method has adaptive ability for different traffic scenarios, we need to design several traffic scenarios with different vehicle behavior patterns.

(a)

(b) (c)

(d) (e)

Fig. 7. The traffic scenarios.

As shown in Fig. 7, we have five traffic scenarios with different vehicle behavior patterns. All information in the OD set (departure time, origin, destination, etc. of each vehicle) is randomly generated according to different vehicle behavior patterns. In traffic scenario (a), the origin and destination of all vehicles are evenly distributed throughout the road network. In traffic scenario (b), if the origin of a vehicle is on the left part of the road network, its destination will be on the right part of the road network. Conversely, if the origin is on the right part of the road network, its destination will be on the left part of the road network. In traffic scenario (c), if the origin of a vehicle is on the upper left part of the road network, its destination will be on the lower right part of the road network. Conversely, if the origin is on the lower right part of the road network, its destination will be on the upper left part of the road network. In traffic scenario (d), the origin of a vehicle will only be on the left part of the road network, and its destination will only be on the right part of the road network. In traffic scenario (e), the origin of a vehicle will only be on the upper left part of the road network, and its destination will only be on the lower right part of the road network.

Optimal Setting of Parameters

TABLE VI shows the key parameters involved in the method, i.e., time interval , penalty coefficient , optimal flow coefficient . These three parameters have a great impact on the performance of method. We performed a lot of experiments to determine the optimal setting of the parameters. For simplicity, the experimental process of and are not described in detail. In the following experiments, we set and as 20s and 2, respectively. This is because when and are set to these two values, good results can be achieved under various traffic scenarios. In traffic scenario (b), we design a variety of OD sets, then set different values for and calculates the average TT and average TD of all vehicles. As shown in Fig. 8, the results of the average TT and average TD of the vehicles with the number of vehicles of 2000, 4000 and 6000 are given respectively.

TABLE VI

Key Parameters of

time interval; by default .

penalty coefficient of algorithm; by default .

optimal flow coefficient.

(a)

(b)

(c)

Fig. 8. Average TT and TD for different . (a) veh=2000. (b) veh=4000. (c) veh=6000.

In the case of a small number of vehicles, the average TT and TD decrease with the increase of at first, and then tend to be stable. This is because when the value of is too small, transportation network is easily considered to be congested, and the method will guide the vehicle to travel those routes which are not optimal, resulting in the increase of TT and TD. When the value reaches a certain value of , because the number of vehicles is relatively small, most roads cannot meet the congestion conditions prescribed by method, so most vehicles will still choose the optimal route. When > , most of roads are hard considered to be congested, so the average TT and TD tend to be stable. When the number of vehicles is 2000, 8.

In the case of a moderate number of vehicles, the average TT of the vehicle decreases rapidly before the increase of to and then increases slowly with the increase of . This is because when the reaches a certain value, it improves the criteria for judging road congestion. But because of the relatively large traffic flow on some roads, congestion actually occurs in the transportation network. When the number of vehicles is 4000, =12.

In the case of a large number of vehicles, the traffic flow of the whole transportation network is very large, so the must be very large, and the average TT is at a higher level before the increases to for the small will lead to a greater probability of road congestion, causing the state of the transportation network vibration. When > , it is easy to generate actual congestion, and the average TT increases rapidly. When the number of vehicles is 6000, =20.

TT and TD

In order to verify the performance of the method in reducing the TT, we compares the method with DSP, DRkSP and DTA under different scenarios. In these five traffic scenarios, the size of OD set is 5000, i.e., the number of vehicles is 5000. As shown in Fig. 9, In traffic scenario (a) and (b), the performance of all methods are similar. But in other scenarios, advantages of method gradually emerged. The method can reduce the average TT of vehicles in varying degrees regardless of the traffic scenarios. DSP method can dynamically update the driving path according to real-time traffic conditions. But it ignores the impact of the road sections that has already traveled and the future impact of current rerouting. So it has poor performance in some traffic scenarios. The reason why the average TT of method increases less is that with the increase of the number of vehicles it can obtain better strategies through RL. This trend eventually leads to the average TT of all method being satisfied:. This shows that the method has strong adaptive ability for different traffic scenarios.

Fig. 9. Average TT for different traffic scenarios.

In addition, we also studied the winners and losers in traffic scenario (b), that is how much time each vehicle saved or wasted against the SP method. We use negative numbers to represent wasted time, positive numbers to indicate time saved. As shown in Fig. 10, we show the results in a histogram, where the horizontal represents time segmentation and the vertical represents the number of corresponding vehicles.

Fig. 10. The winners and losers compared to the SP method. TS01=[-165,-110),TS02=[-110,-55),TSX=(-220+X*55,-165+X*55),…,TS17=[715,770).

Vehicle average TT and average TD are the key indicators to evaluate route guidance method. Experiments show that for most route guidance method, the average TT and average TD have a strong positive correlation in the scenario of fewer vehicles. Fig. 11 shows the TD comparison of with SP, DSP, DRkSP, DTA under different traffic scenarios. The SP algorithm plans the shortest path of all vehicles according to the maximum speed limit, and the path does not change any more, so the average TD is the smallest. The TD of DSP is the biggest, the main reason is that only the current state of transportation network is considered in path planning, regardless of the road sections that has already traveled, which causes the state of transportation network to oscillate. The DRkSP avoids the shortcomings of DSP algorithm by randomly selecting multi routes, but obviously it is not enough to use random method only. The method considers the penalty for the roads that the vehicle has traveled, avoiding the re-selection of the road that has been traveled, thus greatly reducing the average TD.

Fig.11. Average TD for different traffic densities.

The experimental results show that the method has strong adaptive ability for different traffic scenarios. Thus, the can increase traffic efficiency from user perspective under no matter which traffic scenario it is.

Traffic Congestion

The principle of method is to balance the traffic flow on all roads of the network to improve the utilization of road resources and reduce the number of congested roads.

As shown in the Fig. 12, by detecting the number of congested roads in the network in each iteration (every 20 seconds), we have obtained the ability of all methods to alleviate traffic network congestion in traffic scenario (b), the size of OD set is 5000. As can be seen from Fig. 12, the SP method cannot dynamically adjust the path selection, the number of congested roads have been at a high level, and the downward trend is very slow. The DSP and the DTA algorithm have the ability of re-route the path, so it performs better than the SP algorithm. The method has higher congestion number than DTA in the early period, but because of its adaptive learning ability, the trend of mitigate traffic congestion in the middle and late period is larger than that of DTA. This shows that the can increase traffic efficiency from system perspective under no matter which traffic scenario it is.

Fig. 12. The number of congested roads varies with the number of iterations.

Performance at Different Compliance Rates

In this paper, we assume that all vehicles are fully comply with the route guidance methods, but it is unrealistic, because not all vehicles in real life will install route guidance system and some users will not accept routes that do not meet the user’s preferences. The compliance rate is a key factor that must be considered in all dynamic route guidance methods design. In this paper, the compliance rate indicates that in each iteration, the probability that each vehicle participates in the new route calculation (perhaps the new route is the same as the original route) is , so the probability of continuing to travel on the original route is 1- . By changing the compliance rate, we measure the average vehicle TT as an indicator of performance of each route guidance method.

In scenario (b), and the size of OD set is 5000. Fig. 13 indicates that even under low compliance rates, all methods except the SP method can significantly improve the average TT. This is because as long as there are still vehicles comply with the route guidance methods, these vehicles not only can get a better route, but also give way to the rest of the vehicles.

Regardless of the compliance rate, the method has better performance than other methods. The reason for this is that, can adaptive adjust reroute strategy. The compliant vehicles avoid competing with the non-compliant vehicles for road resources, thereby obtaining better routes and reducing the likelihood of congestion.

Fig.13. Average TT varies with the compliance rate.

CONCLUSION

In this paper, we have proposed an urban traffic route guidance method with high adaptive learning ability in diverse traffic scenarios (). The adopts algorithm to assess real-time traffic conditions from system perspective based on real-time traffic information. Then algorithm is used to search the optimal route from the user’s perspective. The process is repeated periodically during travel. The has strong adaptive ability, no matter which traffic scenario it is, the is capable of leading to congestion, TT and TD reduction.

Despite the progress made in this paper, there is still some work that needs further research. In this paper, we did not consider the impact of traffic lights on traffic conditions. But it is well-known that the phases of the traffic lights in the city has a great influence on the traffic condition. In future work, we will study how to integrate the signal lights into the route guidance system, which will improve the performance of the system.

References

Angelelli E, Arsik I, Morandi V, et al. Proactive route guidance to avoid congestion[J]. Transportation Research Part B Methodological, 2016, 94:1-21.

Cao Z , Jiang S , Zhang J , et al. A Unified Framework for Vehicle Rerouting and Traffic Light Control to Reduce Traffic Congestion[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(7):1958-1973.

Wang S , Djahel S , Mcmanis J . A Multi-Agent Based Vehicles Re-routing System for Unexpected Traffic Congestion Avoidance[C]// IEEE International Conference on Intelligent Transportation Systems. 0.

Pan J , Popa I S , Borcea C . DIVERT: A Distributed Vehicular Traffic Re-routing System for Congestion Avoidance[J]. IEEE Transactions on Mobile Computing, 2017, 16(1):58-72.

Vlahogianni E I, Karlaftis M G, Golias J C. Short-term traffic forecasting: Where we are and where we’re going[J]. Transportation Research Part C Emerging Technologies, 2014, 43:3-19.

Li L , Su X , Wang Y , et al. Robust causal dependence mining in big data network and its application to traffic flow predictions[J]. Transportation Research Part C: Emerging Technologies, 2015:S0968090X15000820.

Alvarez-Marquez A, Aguilera I, Gentil M A, et al. Traffic Flow Prediction for Road Transportation Networks With Limited Traffic Data[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(2):653-662.

Y. C. Chiu, J. Bottom, M. Mahut, A. Paz, R. Balakrishna, T. Waller, and J. Hicks, "Dynamic traffic assignment: A primer." Transportation Research E-Circular E-C153 (2011).

Ben-Akiva M E, Gao S, Wei Z, et al. A dynamic traffic assignment model for highly congested urban networks[J]. Transportation Research Part C, 2012, 24(24):62-82.

Kachroo P , Sastry S . Traffic Assignment Using a Density-Based Travel-Time Function for Intelligent Transportation Systems[J]. IEEE Transactions on Intelligent Transportation Systems, 2016:1-10.

Juan Pan, Iulian Sandu Popa, Karine Zeitouni, et al. Proactive Vehicular Traffic Rerouting for Lower Travel Time[J]. IEEE Transactions on Vehicular Technology, 2013, 62(8):3551-3568.

Chen B Y , Lam W H K , Li Q . Efficient solution algorithm for finding spatially-dependent reliable shortest path in road networks[J]. Journal of advanced transportation, 2016, 50(7):1413-1431.

Kaparias I, Bell M G H, Bogenberger K, et al. An approach to time-dependence and reliability in dynamic route guidance[C]// Meeting of the Transportation Research Board. 2007.

Demiryurek U , Kashani F B , Shahabi C , et al. Online Computation of Fastest Path in Time-Dependent Spatial Networks[J]. 2011.

Chen B Y , Lam W H K , Agachai Sumalee…. Finding Reliable Shortest Paths in Road Networks Under Uncertainty[J]. Networks and Spatial Economics, 2013, 13(2):123-148.

Marković, Hrvoje & Dalbelo Bašić, Bojana & Gold, Hrvoje & Dong, Fangyan & Hirota, Kaoru. (2010). GPS Data-based Non-parametric Regression for Predicting Travel Times in Urban Traffic Networks. PROMET - Traffic&Transportation. 22. 10.7307/ptt.v22i1.159.

Wu C H , Ho J M , Lee D T . Travel-time prediction with support vector regression[J]. IEEE Transactions on Intelligent Transportation Systems, 2004, 5(4):276-281.

Elhenawy M , Chen H , Rakha H A . Dynamic travel time prediction using data clustering and genetic programming[J]. Transportation Research Part C: Emerging Technologies, 2014, 42:82-98.

Chen H , Rakha H A . Real-time travel time prediction using particle filtering with a non-explicit state-transition model[J]. Transportation Research Part C: Emerging Technologies, 2014, 43:112-126.

Lee W H , Tseng S S , Tsai S H . A knowledge based real-time travel time prediction system for urban network[J]. Expert Systems with Applications, 2009, 36(3-part-P1):4239-4247.

Balakrishna R , Benakiva M , Bottom J , et al. Information Impacts on Traveler Behavior and Network Performance: State of Knowledge and Future Directions[M]// Advances in Dynamic Network Modeling in Complex Transportation Systems. Springer New York, 2013.

Chen H K, Hsueh C F. A model and an algorithm for the dynamic user-optimal route choice problem[J]. Transportation Research B, 1998, 32(3):219-234.

Aiko S, Thaithatkul P, Asakura Y. Incorporating User Preference into Optimal Vehicle Routing Problem of Integrated Sharing Transport System[J]. Asian Transport Studies, 2018, 5.

Khodadadi F, Javad S, Harounabadi A. Improve Traffic Management in the Vehicular Ad Hoc Networks by Combining Ant Colony Algorithm and Fuzzy System[J]. International Journal of Advanced Computer Science & Applications, 2016, 7(4).

Stéphane Lafortune and Raja SenguptaDavid E. Kaufman and Robert L. Smith. Dynamic system-optimal traffic assignment using a state space model[J]. Transportation Research Part B Methodological, 1993, 27(6):451-472.

Groot N, Schutter B D, Hellendoorn H. Toward System-Optimal Routing in Traffic Networks: A Reverse Stackelberg Game Approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(1):29-40.

Mahmassani H S , Peeta S . System Optimal Dynamic Assignment for Electronic Route Guidance in a Congested Traffic Network[M]// Urban Traffic Networks: Dynamic Flow Modeling and Control. Springer Berlin Heidelberg, 1995.

Yan L P, Wen-Bin H U, Wang H, et al. Dynamic Real-Time Algorithm for Multi-Intersection Route Selection in Urban Traffic Networks[J]. Journal of Software, 2016.

Cao Z , Guo H , Zhang J . A Multiagent-Based Approach for Vehicle Routing by Considering Both Arriving on Time and Total Travel Time[J]. ACM Transactions on Intelligent Systems and Technology, 2017, 9(3):1-21.

Zhiguang Cao, Hongliang Guo, Jie Zhang, Ulrich Fastenrath. Multiagent-Based Route Guidance for Increasing the Chance of Arrival on Time. AAAI 2016: 3814-3820

Nakashima H , Noda I , Ohta M , et al. Smooth Traffic Flow with a Cooperative Car Navigation System[C]// International Joint Conference on Autonomous Agents & Multiagent Systems. ACM, 2005.

Godt A . Comparative Analysis of Various Routing Protocols in VANET[C]// Fifth International Conference on Advanced Computing & Communication Technologies. IEEE, 2015.

Jahangiri A, Rakha H A. Applying Machine Learning Techniques to Transportation Mode Recognition Using Mobile Phone Sensor Data[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(5):2406-2417.

Atallah R, Assi C, Yu J Y. A Reinforcement Learning Technique for Optimizing Downlink Scheduling in an Energy-Limited Vehicular Network[J]. IEEE Transactions on Vehicular Technology, 2017, PP(99):1-1.

Seshadri, Madhavan & Cao, Zhiguang & Guo, Hongliang & Zhang, Jie & Fastenrath, Ulrich. (2017). Multiagent-based Cooperative Vehicle Routing Using Node Pressure and Auctions. 10.1109/ITSC.2017.8317671.

Li Z , Liu P , Xu C , et al. Reinforcement Learning-Based Variable Speed Limit Control Strategy to Reduce Traffic Congestion at Freeway Recurrent Bottlenecks[J]. IEEE Transactions on Intelligent Transportation Systems, 2017:1-14.

Wiering M, Vreeken J, Van Veenen J, et al. Simulation and optimization of traffic in a city[C]// Intelligent Vehicles Symposium. IEEE, 2004:453-458.

El-Tantawy S, Abdulhai B, Abdelgawad H. Multiagent Reinforcement Learning for Integrated Network of Adaptive Traffic Signal Controllers (MARLIN-ATSC): Methodology and Large-Scale Application on Downtown Toronto[J]. IEEE Transactions on Intelligent Transportation Systems, 2013, 14(3):1140-1150.

Zhao B, Zhang C, Zhang L. Real-Time Traffic Light Scheduling Algorithm Based on Genetic Algorithm and Machine Learning[C]// International Conference on Internet of Vehicles. Springer International Publishing, 2015:385-398.

Basha N, Sadek A. INTELLIGENT AGENTS APPROACH TO THE ON-LINE MANAGEMENT AND CONTROL OF TRANSPORTATION NETWORKS[J]. 2015.

Zolfpour-Arokhlo M, Selamat A, Mohd Hashim S Z, et al. Modeling of route planning system based on Q value-based dynamic programming with multi-agent reinforcement learning algorithms[J]. Engineering Applications of Artificial Intelligence, 2014, 29(3):163-177.

Chuanhui Tang is currently a graduate student at the School of Computer Science, Wuhan University, Wuhan, China. His research interests include intelligent transport systems, optimization and scheduling, and uncertainty artificial intelligence.

Wenbin Hu is currently a professor at the School of Computer Science, Wuhan University, Wuhan, China. His research interests include intelligent transport systems, data mining and big data, uncertainty artificial intelligence and complex network, optimization and scheduling. Until now, he has authored or corresponding authored over 80 refereed journal and conference papers, such as the IEEE TKDE, IEEE TMC, ACM TKDD, Pattern Recognition, Information Sciences, Information System, WWW and IJCAI in these areas.

Simon Hu is an Assistant Professor at the ZJU-UIUC Institute at Zhejiang University, Hangzhou, China. He is also an Honorary Research Fellow at Imperial College London. His research interests lie in transport telematics, traffic simulation modelling, dynamic route guidance system, traffic management and data fusion.

Marc Stettler is a Senior Lecturer in the Centre for Transport Studies at Imperial College London. His main research interests are in intelligent transport systems, vehicle emissions measurement and modelling, environmental impacts of freight transport.

BCGAEFHD

ijnkmj4j8i6i2i4i8k6k2i3i7n5n1m3m7i5i1ABCDEFGH

62483751624837516248375162483751DCAB

57005800590060006100700705710715720725730481216202428Average travel distance(m)Average travel time(s)average travel timeaverage travel distance

5800590060006100620063006400700750800850900481216202428Average travel distance(m)Average travel time(s)average travel timeaverage travel distance

660068007000720074007600780080008200800850900950100010501100481216202428Average travel distance(m)Average travel time(s)average travel timeaverage travel distance

050010001500200025003000scenario(a)scenario(b)scenario(c)scenario(d)scenario(e)Average travel time(s)DRkSPDSPDTAA*R2

0100200300400500600700800Number of vehicesTime segmentation (TS)

4000600080001000012000140001600018000scenario(a)scenario(b)scenario(c)scenario(d)scenario(e)Average travel distance(m)SPDRkSPDSPDTAA*R2

051015202530354045506121824303642485460667278849096102108The number of congested roadsIteration numberDRkSPDSPSPDTAA*R2

70080090010001100120013001400150010080604020Average travel time(s)Compliance rateDRkSPDSPSPDTAA*R2

Vehicle AgentVRAVRAVRAVRAVRAVRAVRAVRAVRAVRAVRAVRAVRAVRA123Traffic Control CenterLocation Services