[IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China...

6
An Efficient Trip Planning Algorithm Under Constraints Jinling Bao 1,2 , XiaochunYang 1 , BinWang 1 , Jiaying Wang 1 1(College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China) 2(Department of Computer Science, Baicheng Normal College, Baicheng, 137000, China) {baojinling,wangjiaying}@research.neu.edu.cn, {yangxc ,binwang}@mail.neu.edu.cn Abstract—The problem of trip planning has received wide concerns in recent years. More and more people require the service of automatically confirming the optimal tour route. When users assign the source and the destination, and the time limit of the tour, how can automatically decide the optimal tour route with the highest sum of the popularity scores of scenic spots. Current methods for trip planning are on the setting that providing with the route which is composed of the scenic spots to travel. These would work poorly for the pre-mentioned problem when the route satisfying the constraints can not be found. Thus we adjust the setting to giving the route composed of the scenic spots which users visit or simply pass by. Obviously, the modified problem would incur larger search cost as each scenic spot in the given route has two states. It can be demonstrated that this new problem is NP hard, making it difficult to find an efficient exact algorithm for the present. In this paper, we propose a greedy strategy based algorithm to solve the trip planning problem, and we also present an improved algorithm with better performance. The experimental results on synthesized and real data sets reveal that our algorithm is able to find the approximately optimal path in high efficiency. Key words: Constraints; Path Searching; Trip Planning; Benefit Score; Cost Score I. INTRODUCTION The problem of trip planning has received wide concerns in recent years. More and more people require the service of automatically confirming the optimal tour route. For example, a tourist, traveling in an unfamiliar city, may require help for planning his/her trip: “How to efficiently visit the most popular attractions if I departure from the hotel at 8:00 and need to catch the train at 18:00.” In order to answer above question, the first problem we need to face is “which attractions are popular and interesting”. There exist a number of studies that discussed about how to recommend attractions to tourists. In the works [1-6], according to the user's historical trajectory, the popularity of an attraction can be calculated through certain rules and indicated by a numerical value. The higher popularity value represents the higher popularity of this attraction, i.e., it is more attractive. The second problem is “how to plan these attractions as a trip under constraints”. Many works have been proposed for trip planning. In the works [3,7-12], the corresponding solutions were proposed with the different constraints. When users assign the source (hotel), the destination (railway station), and the time limit (10 hours) of the tour, how can be confirmed the optimal tour route with the highest popularity value of attractions automatically? Current methods [3, 7-12] for trip planning provide the route in which every attraction must be visited. If users travel all the attractions in a route, the trip time of all routes may beyond the limit. These methods would work poorly for the pre-mentioned problem if the route satisfying the constraints can not be found from the hotel to the railway station. Thus we proposed a new solution which users select the attractions in a route to visit according to the constraints. For a tourist, it is most interesting to visit all the attractions in a route within the specified time. But if the time is not enough, he/she has to make the choice. For example, a route contains five attractions, but traveling all the five attractions is not allowed because of the time limit, the tourist has to find a satisfactory route to give up the two of attractions and only visit the remaining three by considering popularity score and the play time of every attraction on the route. Obviously, the modified rules would incur larger search cost as each attraction in the given route has two states (visited or passed by). It can be demonstrated that this new problem is NP hard, which make it difficult to find an efficient exact algorithm. How to search the optimal path and effectively reduce the search space with the new rules? In order to solve the trip planning problem, we propose two algorithms based on the greedy strategy. The advantages and contributions of this paper are fivefold: We define a trip planning problem under constrains and show that the problem of solving is NP-hard. To address the needs in practice, we define a new optimal path planning rules considering constraints and impact of weight on the node. To effectively reduce the search space with the new rules, we propose the MostBenifit algorithm based on the greedy strategy. We also propose an improved LeastCost algorithm with better performance. The experimental results on synthesized and real data sets reveal that our algorithm is able to find the approximately optimal path in higher efficiency. The rest of the paper is organized as follows. We briefly review the related work in Section 2. In Section 3, we formally 2013 10th Web Information System and Application Conference 978-0-7695-5134-0/13 $26.00 © 2013 IEEE DOI 10.1109/WISA.2013.87 429 2013 10th Web Information System and Application Conference 978-0-7695-5134-0/13 $26.00 © 2013 IEEE DOI 10.1109/WISA.2013.87 429 2013 10th Web Information System and Application Conference 978-0-7695-5134-0/13 $26.00 © 2013 IEEE DOI 10.1109/WISA.2013.87 429 2013 10th Web Information System and Application Conference 978-0-7695-5134-0/13 $26.00 © 2013 IEEE DOI 10.1109/WISA.2013.87 429 2013 10th Web Information System and Application Conference 978-1-4799-3219-1/13 $31.00 © 2013 IEEE DOI 10.1109/WISA.2013.87 429

Transcript of [IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China...

Page 1: [IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China (2013.11.10-2013.11.15)] 2013 10th Web Information System and Application Conference - An

An Efficient Trip Planning Algorithm Under Constraints

Jinling Bao1,2, XiaochunYang1, BinWang1, Jiaying Wang1 1(College of Information Science and Engineering, Northeastern University, Shenyang, 110819, China) 2(Department of Computer Science, Baicheng Normal College, Baicheng, 137000, China)

{baojinling,wangjiaying}@research.neu.edu.cn, {yangxc ,binwang}@mail.neu.edu.cn

Abstract—The problem of trip planning has received wide

concerns in recent years. More and more people require the service of automatically confirming the optimal tour route. When users assign the source and the destination, and the time limit of the tour, how can automatically decide the optimal tour route with the highest sum of the popularity scores of scenic spots. Current methods for trip planning are on the setting that providing with the route which is composed of the scenic spots to travel. These would work poorly for the pre-mentioned problem when the route satisfying the constraints can not be found. Thus we adjust the setting to giving the route composed of the scenic spots which users visit or simply pass by. Obviously, the modified problem would incur larger search cost as each scenic spot in the given route has two states. It can be demonstrated that this new problem is NP hard, making it difficult to find an efficient exact algorithm for the present. In this paper, we propose a greedy strategy based algorithm to solve the trip planning problem, and we also present an improved algorithm with better performance. The experimental results on synthesized and real data sets reveal that our algorithm is able to find the approximately optimal path in high efficiency.

Key words: Constraints; Path Searching; Trip Planning; Benefit Score; Cost Score

I. INTRODUCTION The problem of trip planning has received wide concerns in

recent years. More and more people require the service of automatically confirming the optimal tour route. For example, a tourist, traveling in an unfamiliar city, may require help for planning his/her trip: “How to efficiently visit the most popular attractions if I departure from the hotel at 8:00 and need to catch the train at 18:00.”

In order to answer above question, the first problem we need to face is “which attractions are popular and interesting”. There exist a number of studies that discussed about how to recommend attractions to tourists. In the works [1-6], according to the user's historical trajectory, the popularity of an attraction can be calculated through certain rules and indicated by a numerical value. The higher popularity value represents the higher popularity of this attraction, i.e., it is more attractive. The second problem is “how to plan these attractions as a trip under constraints”. Many works have been proposed for trip planning. In the works [3,7-12], the corresponding solutions were proposed with the different constraints.

When users assign the source (hotel), the destination (railway station), and the time limit (10 hours) of the tour, how

can be confirmed the optimal tour route with the highest popularity value of attractions automatically? Current methods [3, 7-12] for trip planning provide the route in which every attraction must be visited. If users travel all the attractions in a route, the trip time of all routes may beyond the limit. These methods would work poorly for the pre-mentioned problem if the route satisfying the constraints can not be found from the hotel to the railway station. Thus we proposed a new solution which users select the attractions in a route to visit according to the constraints.

For a tourist, it is most interesting to visit all the attractions in a route within the specified time. But if the time is not enough, he/she has to make the choice. For example, a route contains five attractions, but traveling all the five attractions is not allowed because of the time limit, the tourist has to find a satisfactory route to give up the two of attractions and only visit the remaining three by considering popularity score and the play time of every attraction on the route. Obviously, the modified rules would incur larger search cost as each attraction in the given route has two states (visited or passed by). It can be demonstrated that this new problem is NP hard, which make it difficult to find an efficient exact algorithm.

How to search the optimal path and effectively reduce the search space with the new rules? In order to solve the trip planning problem, we propose two algorithms based on the greedy strategy. The advantages and contributions of this paper are fivefold:

• We define a trip planning problem under constrains and show that the problem of solving is NP-hard.

• To address the needs in practice, we define a new optimal path planning rules considering constraints and impact of weight on the node.

• To effectively reduce the search space with the new rules, we propose the MostBenifit algorithm based on the greedy strategy.

• We also propose an improved LeastCost algorithm with better performance.

• The experimental results on synthesized and real data sets reveal that our algorithm is able to find the approximately optimal path in higher efficiency.

The rest of the paper is organized as follows. We briefly review the related work in Section 2. In Section 3, we formally

2013 10th Web Information System and Application Conference

978-0-7695-5134-0/13 $26.00 © 2013 IEEE

DOI 10.1109/WISA.2013.87

429

2013 10th Web Information System and Application Conference

978-0-7695-5134-0/13 $26.00 © 2013 IEEE

DOI 10.1109/WISA.2013.87

429

2013 10th Web Information System and Application Conference

978-0-7695-5134-0/13 $26.00 © 2013 IEEE

DOI 10.1109/WISA.2013.87

429

2013 10th Web Information System and Application Conference

978-0-7695-5134-0/13 $26.00 © 2013 IEEE

DOI 10.1109/WISA.2013.87

429

2013 10th Web Information System and Application Conference

978-1-4799-3219-1/13 $31.00 © 2013 IEEE

DOI 10.1109/WISA.2013.87

429

Page 2: [IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China (2013.11.10-2013.11.15)] 2013 10th Web Information System and Application Conference - An

define the problem. We present the basic approach and establish the computational complexities of the problem in Section 4. In Section 5, we present the algorithm based on greedy strategy and improve the algorithms. We report on the empirical studies in Section 6. Finally, we offer conclusions in Section 7.

II. RELATED WORK The trip planning problem has received a lot of attentions

based on the multi-constrained optimal path problem. In the works [3, 7-12], the corresponding solutions were proposed depending on the different constraints.

It is different from the works [7-11] that we consider two attributes of weight on each node: one is used as the cost score, the other is the benefit score. Because of the weights of each node and multi-constrained conditions, a node on the route has two states (visited or passed by) that lead to search space to find the optimal route increase. Then the above-mentioned solution to the problem cannot effectively reduce the increasing search space and the algorithms in the work [7-11] are not applicable to process our work.

Lu et al. [12] proposed an exact algorithm based on data mining, namely Trip-Mine, which efficiently found the optimal trip having the highest total popularity scores from the source node s and finally back to s. Furthermore, three optimization mechanisms, based on Trip-Mine to further enhance the mining efficiency and memory storage requirement for optimal trip finding, are proposed. The problem in the work [12] is similar to this paper, considering two attributes of weight on the each node. But the work [12] hypothesized that there was always a direct path between any two nodes, in fact, any two nodes were often indirectly reaching through another node. And a node on the same route had the different states (visited or simply passed by) to access, so that the search space to find the optimal route was increasing. But Trip-Mine approach can not effectively reduce the increasing search space, it is not applicable to resolve our problem.

Lu et al. [3] collected geo-tagged photos from Flickr and built travel routes based on them. They defined popularity scores on each attraction and each path, and recommended a route having the highest popularity scores within a travel duration in the whole city. The recommendation in this work was not formulated as queries and the recommendation algorithm ran in an extreme long time based on dynamic programming. The problem in the work [3] is similar to this paper, considering two attributes of weight on the each node. But the work didn’t consider the source and the destination of the trip. Under the multi-constrained conditions, we have to consider the different states of each node in each route. So that the search space to find the optimal route should increase. Then the above-mentioned solution can not effectively reduce the increasing search space, the algorithms in the work [3] are not applicable to process our work.

III. PROBLEM STATEMENT This section describes the trip planning problem under

constraints and its related definitions.

Definition 1: Graph. A graph G=(V,E) consists of a set of nodes V and a set of edges E V×V. Each node v V represents a location; each edge in E represents a directed route between two locations in V, and the edge from vi to vj is represented by (vi, vj).

Definition 2: Route. A route R=<v0,v1,...,vn)> is a path such that R goes through v0 to vn sequentially, following the relevant edges in G. We define the optimal route based on one attribute on each edge (vi, vj) as the cost score of this edge (e.g., the travel time cost), and two attributes on each node: one is used as the cost score of this node (e.g., the travel time), the other attribute is used as the benefit score of this node (e.g., the popularity). Note that we can pick up any two attributes to define the optimal route depending on different applications.

In the above definition of R vi denotes the node vi is visited, and vi inside a bracket i.e. (vi) denotes the node vi is simply pass by (not visited).

Fig.1 Example of G

Definition 3: Benefit Score and Cost Score. Given a route R=<v0, v1,..., vn )>, VR is the set of the nodes in the route R and VA is the set of the nodes which are visited, VA VR

vi VR vj VA. The benefit score of R is defined as the sum of the benefit score of all the nodes in VA, i.e.,

BS(R)= ( )j A

jBSv V v∈∀∑ 1

and the cost score is defined as the sum of the cost values of all the edges and the sum of the cost values of all the nodes in VA, i.e.,

CS(R)=1

( , )i R

i iCSv V v v∀ ∈ +∑ + ( )

j Aj

CSv V v∈∀∑ (2)

Figure 1 shows an example of the graph G. On each node, the score inside a bracket is the cost value, and the other number is the benefit value.

Problem Formulation. Given source node vs, destination vt and the specified cost θ, we aim to find the optimal route starting from vs and ending at vt , VR is the set of the nodes in the route R and VA is the set of the nodes which are visited, i.e. VA VR vi VR vj VA. This optimal route R satisfies the following two conditions:

(a) BS(R)= ( )j A

jBSv V v∈∀∑

(b)CS(R)=1

( , )i R

i iCSv V v v∀ ∈ +∑ + ( )

j Aj

CSv V v∈∀∑ ≤ θ

Example 1: Consider the example graph in Figure 1, the query is q = (v0, v7, 10, f), how to confirm the optimal tour

430430430430430

Page 3: [IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China (2013.11.10-2013.11.15)] 2013 10th Web Information System and Application Conference - An

route Ropt with the highest sum of the popularity scores of attractions.

As shown from the figure 1 the most popular route is R=<v0, v1, v2, v3, v4, v5, v6, v7> but the trip time exceed specified cost θ In accordance with the query q, there are 37 routes from v0 to v7, but not all of the nodes in each route can be visited due to time constraints θ. If there are m nodes on the route and each node has two states (visited or pass by), we will have the 2m-2 choices in this route, getting rid of the source and destination.

For example, in the route R=<v0, v1, v2, v3, v4, v5, v6, v7>, there are 26 = 64 combinations to visit nodes v1, v2, v3, v4, v5, v6, if we consider that each node is visited or not, there can be 916 choices from v0 to v7, the search space increases greatly. To determine which situation is the best, we must verify every one.

After verifying all the routes, we can get the optimal route R=<v0, (v1), v3, v5, v7>, whose cost score is 10 and benefit score is 8. It has the highest benefit score, starting from v0 to v7, passing by v1, visiting v3 and v5. Because of time constraints, the node v1 is the node that must pass by from v0 to v3, so the node v1 is in brackets.

We define G as a general graph. It can be a road network graph, or a graph extracted from users’ historical trajectories. For example, if G is a traffic network, the attributes can be travel duration, travel distance, popularity, travel cost and so on. If the node in G is not an attraction, we can set benefit score and cost score to 0, and our method is also applicable to it. To keep our discussion actual, we consider undirected graphs only in this paper. However, our discussion can be extended to directed graphs straightforwardly.

IV. BASIC APPROACH This section describes the basic approach to solve the trip

planning problem under constraints, and prove that the problem of solving is NP-hard.

A. Approach Description A basic approach to solve the trip planning problem

under constraints is to do an exhaustive search: We enumerate all candidate paths from the source node and all states of rach node which the path is extended to. We can use a queue to store the partial paths. In each step, we select one partial path from the queue. Then it is extended to generate more candidate partial paths and those paths whose cost scores are smaller than the specified cost. When a path is extended to the target node, we check whether it satisfies the cost constraint. We record all the feasible routes, and after all the candidate routes from the source node to the target node have been checked, we select one with the highest benefit score as the answer to the query.

At each node in the Graph G, we maintain a list of paths, each of which stores the information of a corresponding partial route from the source node to this node, including the nodes in the path, the benefit score and the cost score of the partial route. In this paper, the partial path denotes as Pi

k = path, BS,

CS . Many paths between any two nodes may exist, and thus each node may be associated with a large number of partial paths.

The path generating step is extending a partial route at node vi forward to all the outgoing neighbor nodes of vi, and thus more longer partial routes are generated. Given a path Pi

k at node vi, for each outgoing neighbor vj of node vi in G, we create a new path Pj

k for vj. If the node vj is visited, the path Pj

k=(Pik.path vj, Pi

k.BS+BS(vj), Pik.CS+CS(vi, vj) +BS(vj))

else if the node will not be accessed, the path Pj

k=(Pik.path (vj), Pi

k.BS Pik.CS+CS(vi,vj).

The basic approach not only enumerates all the paths from the source node s to the destination node t, and also enumerates the states of each node on each route. The main problem of the basic approach is that too many partial paths need to be stored on each node. Given a query with a specified cost θ, we know that the number of edges in a route exploited in the search is at most

mincsθ

⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

, where CSmin is the smallest cost

value of all edges in G. If the states of the node is not considered, the complexity of an exhaustive search is Ο(

minCSdθ⎢ ⎥

⎢ ⎥⎢ ⎥⎣ ⎦

), where d is the maximum outdegree in G. While we need to consider the states(visited or not) of each node in the route, the complexity of an exhaustive search becomes Ο(

min min2cs csdθ θ⎢ ⎥ ⎢ ⎥

⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

). So the cost considering the states of each node

in the route is min2 csθ⎢ ⎥

⎢ ⎥⎢ ⎥⎣ ⎦

times as much as not considering the states.

B. Complexity of the problem analysis Theorem 1: The problem of solving the trip planning

problem under constraints in this paper is NP-hard.

Proof Sketch: K-shortest path problem is NP-hard [13]and the K-shortest path problem can be reduced to the trip planning problem under constraints defined in this paper.

K-shortest path problem can be expressed as Graph G=(V, E) length l(e) Z+ for each e E, specified vertices s, t V, positive integers B and K, are there K or more distinct simple paths from s to t in G, each having total length B or less? In this paper, the length of the path is expressed as the cost score, for any node on the Graph G, having two attributes( benefit score and cost scores ), it aim to find the path which cost score CS(R)≤ θ, then select one of the highest score of the routes. K-shortest path problem can be reduced to the trip planning problem under constraints defined in this paper. This problem of solving is NP-hard. □

V. TRIP PLANNING ALGORITHM BASED ON GREEDY STRATEGY

This section elaborates algorithm MostBenifit based on greedy strategy and the improved algorithm LeastCost.

431431431431431

Page 4: [IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China (2013.11.10-2013.11.15)] 2013 10th Web Information System and Application Conference - An

A. MostBenifit algorithm The main problem of the basic approach is that too many

partial paths need to be stored on each node. Because it is NP-hard problem to solve this problem, We propose a planning scenario based on greedy strategy in this paper.

Before detailing the algorithm, we introduce the following related notations.

τi,j The minimum cost between any two nodes vi and vj , is denoted by τi,j. And it is called the shortest path with the minimum cost between any two nodes .

κi,j The next to last node in the current shortest path from vi to vj is called the key node, denoted κi,j.

ρi,j It is the benefit score in unit cost for any two nodes, i.e. in the shortest path of any two nodes:

ρi,j= ( )( , ) ( )

j

i j j

BS vCS v v CS v+

3

Theorem 2 Given a route R vi is the node which is not visited in the route, then the cost score and the benefit score of the route R has nothing to do with ones of vi.

Proof Sketch: Assume that VA is the set of the nodes visited in route R=<v0,v1,…,(vi), …,vn> then BS(R) is equal to the sum of benefit score of all nodes in the set VA based on the definition 3. Because of vi∉VA BS(R) does not include the BS(vi) in a similar way, CS(R) does not include CS(vi) BS(R) and CS(R) has nothing to do with BS(vi) and CS(vi). □

Theorem 3 If any two nodes are reachable, the cost score of their shortest path is less than or equal to the cost score of any path between them.

Proof Sketch Assume that any two nodes vi and vj are reachable in the graph G, the shortest path between the two nodes is R and the any path between the two nodes is R′because the shortest path of any two nodes has the minimum cost in all paths, obviously CS(R)=τi,j>=CS(R′). □

The basic idea of MostBenifit algorithm is that we sort ρi,j in descending order. Then we confirm the nodes visited in the route firstly and extend the route from the current node to the node which has the highest ρi,j of all its reachable node.

1) According to Theorem 2, the node simply passed by but not visited in the route does not affect the cost score and benefit score of the entire route. We do not consider these nodes firstly. When we extend the partial path from the current node, we try our best to visit the node having the highest ρi,j based on greedy strategy, in accordance with reachable nodes of current node, along the shortest path. According to Theorem 3, the cost of any two nodes is minimum along the shortest path. Then we can get the sequence of all visited nodes in the approximate optimal route.

2) If any two adjacent nodes in the approximate optimal route sequence are not directly reachable, we will add the nodes of the shortest path between the two nodes to the route. According to Theorem 2, these nodes do not affect the cost score and the benefit score of the whole route, so that a complete route is obtained.

The pseudocode is presented in Algorithm 1. We initialize the queue Q and route R as NULL(lines 1-2). When get a query q=(vs, vt, θ)(line 3), we compare τs,t and θ. If τs,t<θ, We create a path Pi at the starting node vs and enqueue it into Q(lines 4-7). We keep dequeuing path from Q until Q becomes empty (line 8). We terminate the algorithm when Q is empty.

We first dequeue the path Pi (line 9). For each reachable node vj of the current node vi, we visit the node having the highest ρi,j, if the codition (Pi.CS+τi,j+CS(vj)+τj,t +CS(vt)<=θ is true, we will create a new path Pj (line 9) and enqueue it into Q(lines 11-15). Else we will give up vj and visit the node having the second highest ρi,j until the node satisfies the condition. For each reachable node of the current node vi, we visit the node having the highest ρi,j. When the route extends to the destination vt, we verify whether the remaining cost is enough to visit other attractions, i.e., if the current path is R=<v0,v1,…vk,vt> and the condition CS(R)<=θ is true, while the extended route of R is R′=<v0,v1,…vk,vk+1,vt> and CS(R′)>θ we get the optimal route R=Pt. If two adjacent nodes in route R sequence are not directly reachable, we will add the nodes of the shortest path between the two nodes to the route R, so that a complete route is obtained.(lines 17-18).

We utilize the pre-processing results in order to accelerate the algorithms. We use the Floyd-Warshall algorithm [9], which is a well-known algorithm for finding all pairs shortest path, and we store τi,j, κi,j and ρi,j.

B. LeastCost algorithm In algorithm MostBenifit, when we extend the path from

the current node vi, we will select the node vj with highest ρi,j to visit from the nodes reachable to vi. If the node vj is in the current partial path Pi but not be visited, we will spend the more cost from the vi to visited vj based on the algorithm MostBenifit.

For the above, in order to avoid the additional costs because of redundant paths, this paper presents an improved algorithm. When we extend the path from vi, we will select the

Algorithm 1: MostBenifit Algorithm Input graph G Output route R;

1. Initialization queue Q 2. Initialization R=NULL; 3.Get q=(vs, vt, θ) 4.if τs,t <=θ 5. vi←vs; 6. create the path Pi at vi; 7. Q.enqueue(Pi ); 8. while(!Q.empty()) do 9. Pi←Q.dequeue(); 10. If(vi=vt)verify the maining cost; 11. for every reaching node vj from vi do 12.. if (Pi.CS+τi,j+CS(vj)+τj,t +CS(vt)<=θ) 13. create the path Pi at vi ; 14. Q.enqueue(Pi ); 15. Break; 16. R= Pi; 17. R=R load κi,j; 18. return R;

432432432432432

Page 5: [IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China (2013.11.10-2013.11.15)] 2013 10th Web Information System and Application Conference - An

node vj with highest ρi,j to visit. If the node vj is the node which is not visited in the current partial path , we will treat the node vj as the visited node through accumulating the benefit score and cost score of vj to the current partial path, then select the node reachable to vi with the second ρi,j to visit. Of course, we will need more space to store all nodes of partial path, including the visited nodes and the passed by nodes. Then, we will not load the key nodes to the route R at last.

Pseudo code of the improved algorithm is shown in Algorithm 2. When we extend the path from vi, we will select the node vj with highest ρi,j to visit. If the node vj is the node which is not visited but passed by in the current partial path, we will treat the node vj as the visited node through accumulating the benefit cost and cost score of vj to the current partial path and set the state of the node vj to visited, then select the next candidate node reachable of the node vi to visit(line 11-18).

Algorithm 2: LeastCost Algorithm

Input graph G Output route R;

1. Initialization queue NULL 2. Initialization R=Ф; 3.Get q=(vs,vt,θ) ; 4.if τs,t <=θ 5. vi←vs; 6. create the path Pi at vi; 7. Q.enqueue(Pi); 8. while(!Q.empty()) do 9. Pi←Q.dequeue(); 10. If(vi=vt) verify the maining cost; 11. for every reaching node vj from vi do 12. if (Pi.CS+τi,j+CS(vj)+τj,t +CS(vt)<=θ) 13. if(vj is in current path from vs to vi) 14. Pi.BS+=BS(vj); Pi.CS+= CS(vj); 15. continue; 16. createPath(Pj); 17. Q.enqueue(Pi ); 18. Break; 19. R= Pi; 20. return R;

Example 3 The question is shown as the example 1query q=(v0,v7,10).Because τ0,7=5<10 we will create the path P0=(<v0>,0,0)at the node v0 and enqueue P0 into Q. Then dequeue the path P0. In the node of all reachable nodes to v0 tρi,j of node v3 is the highest v3 is the candidate node in the route. Because the condition is P0.CS+τ0,3+CS(v3)+τ3,7+CS(v7)=8<=10, we create the path P3=(<v0,(v2),v3>,5,6) next, we will search the next candidate of the route, the node v6 has the highest ρi,j , but the condition P3.CS+τ3,6+CS(v6)+τ6,7+CS(v7)=11>10 we give up v6 then search the other candidate node v5 the cost condition is satisfied, we will get the path P5=(<v0,(v1),v3,v5>,8,9) in a similar way we extend the path from v5 and we must select v7 as the next node, so we get the route R=<v0, ( v1),v3,v5,v7>.

VI. EXPERIMENTAL STUDY This section mainly studies the running time and the

objective function value approximation ratio of the proposed algorithms: MostBenifit and LeastCost.

A. Experimental Settings We use eight datasets in our experimental study, which

are downloaded Beijing real map data. Based on the characteristics of the data required in this paper, we add two weight attributes, the popularity score and stay time, to the nodes in the graph, the final size of the data obtained eight different datasets, while the cost score of edge with a travel time, the cost score and benefit score of nodes are randomly generated ( shown in Table 1).

TABLE I. DESCRIPTION OF DATASETS

Dataset nodes edges Dataset 1 5k nodes 41k edges Dataset 2 8k nodes 65k edges Dataset 3 10k nodes 101k edges Dataset 4 30k nodes 401k edges Dataset 5 800 nodes 9k edges Dataset 6 500 nodes 5k edges Dataset 7 300 nodes 2.5k edges Dataset 8 100 nodes 1k edges

All algorithms were implemented in VC++ and ran on an In-tel(R) Xeon(TM)2 CPU CPU [email protected] with 4GB RAM.

B. Experimental Results

1) varying the Dataset size and limited time The objective of this set of experiments is to study the

running time of MostBenifit algorithm and LeastCost algorithm with varying the dataset size and limited time. Figure 2 (a) (b) (c) and (d) show the running time of MostBenifit algorithm and LeastCost algorithm processing 50 querys varying the limited time on the Dataset 1 Dataset 2 Dataset 3 and Dataset 4 respectively. Figure 3 shows that the running time of MostBenifit algorithm and LeastCost algorithm processing 50 querys varying the number of the nodes respectively.

The running time of MostBenifit algorithm and LeastCost algorithm is affected by the dataset size and limited time. They run more and more slower with increasing the dataset size and limited time.

(a) Dataset 1 (b) Dataset 2

433433433433433

Page 6: [IEEE 2013 10th Web Information System and Application Conference (WISA) - Yangzhou, China (2013.11.10-2013.11.15)] 2013 10th Web Information System and Application Conference - An

(c) Dataset 3 d Dataset 4

Fig.2 Runtime of MostBenifit and LeastCost on Different Dataset

(a) θ=24 (b) θ=48

Fig.3 Runtime of MostBenifit and LeastCost with θ Varying 2) Query approximation ratio

Definition 4: Query approximation ratio. The approximation ratio is the value of the optimal route popularity score (BS (Rg)) obtained from the approximation algorithm divide that (BS (Ropt)) obtained from the enumeration approach, i.e.:

Ratio= ( )( )

g

opt

BS RBS R

4

The objective of this set of experiments is to study the accuracy of the proposed MostBenifit algorithm and LeastCost algorithm through comparing Query approximation ratio. With increasing of the nodes in the graph and the specified time, the running time of enumeration method will become more and more larger. We implemented an enumeration approach. However, it is at least three orders of magnitude slower than MostBenifit and even cannot finish after one day. In order to get the ratio, we implemented enumeration approach with time pruning strategy, which obtained the optimal path as the same as the basic enumeration approach .

The result of figure 4 shows the ratio when the number of nodes becomes from 100 to 1000. And the result of figure 5 shows the ratio with varying the parameter θ when the number of nodes is 1000.

Fig.4 Ratio (θ=8) Fig.5 Ratio(number of nodes=1k)

In all datasets of this paper, the running time of MostBenifit algorithm is less than that of the LeastCost algorithm however, the accuracy of LeastCost Algorithms is better than that of MostBenifit algorithm. As shown in figure 4 and the figure 5, the accuracy of LeastCost Algorithms is obviously better than that of MostBenifit algorithm whether the number of nodes is increasing or the limited time θ is increasing. The experiment result shows that accuracy of the algorithm LeastCost is not less than 60%

VII. CONCLUSION AND FUTURE WORK To address the trip planning problem under constraints,

we define a new optimal path planning rule according to the weight of node on the path. In order to search the optimal path and effectively reduce the search space in accordance with the new rules, we propose the MostBenifit algorithm based on the greedy strategy, and also present an improved algorithm with better performance. The experimental results on synthesized and real data sets reveal that our algorithm is able to find the approximately optimal path in high efficiency.

In the future work, we would like to improve the accuracy of the algorithm and current pre-processing approach.

ACKNOWLEDGEMENT

The work is partially supported by the National Natural science Foundation of China (Nos. 61173031, 61272178), the Joint Research Fund for Overseas Natural Science of China (No. 61129002), the Doctoral Fund of Ministry of Education of China (No. 20110042110028), and the Fundamental Research Funds for the Central Universities (Nos. N120504001, N110404015).

REFERENCES [1] Huang Y.,Bian L., A Bayesian Network and Analyti Hierarchy Process

Based Personalized Recommendations for Tourist Attractionsover the Internet[J]. Expert Systems with Applications, 2009,36(1):933-943.

[2] T. Horozov,N. Narasimhan,V. Vasudevan, Using Location for Personalized POI Recommendations in Mobile Environments. In SAINT. Phoenix, USA: 2006. 124-129.

[3] X. Lu, C. Wang, J.M. Yang, Y. Pang, and L. Zhang. Photo2trip: generating travel routes from geotagged photos for trip planning.In MM.Firenze, Italy: 2010.143-152.

[4] X. Liu, X. Yang.A generalization based approach for anonymizing weighted social network graphs.In WAIM.WuHan,China:2011.118-130.

[5] Q. Hao , R. Cai, et al. Generating Location Overviews with Images and Tags by Mining User-Generated Travelogues.Q. In MM.Beijing, China:2009.801-804.

[6] Y. Zheng, L. Zhang, X. Xie, W.-Y. Ma.Mining interesting locations and travel sequences from gps trajectories..In WWW.Madrid, Spain:2009.791–800.

[7] X.Cao. L. Chen .G. Cong .X. Xiao.Keyword aware Optimal Route Search. In VLDB Endowment, 5(11). Istanbul, Turkey:VLDB Endowment, 2012.1136-1147.

[8] search in relational databases using nearly duplicate records[J]. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2010,48(5):1-7.

[9] F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.H. Teng.On trip planning queries in spatial databases[C].In ASTD. Berlin, Germany: 2005. 273–290.

[10] M. Sharifzadeh, M. R. Kolahdouzan, C. Shahabi. The optimal sequenced route query[J]. VLDB Journal, 2008,7(4):765–787.

[11] Z. Chen, H. T. Shen, X. Zhou. Discovering popular routes from trajectories.In ICDE. Hannover,Germany: ,2011.900–911.

[12] E. H.C. Lu, C.Y. Lin, V. S. Tseng. Tripmine: An efficient trip planning approach with travel time constraints.In MDM. Lulea, Sweden: 2011.152–161.

[13] M.R. Garey, D.S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness[D]. San Francisco,CA :W.H. Freeman and Company,1979.

434434434434434