TRUSTS: Scheduling Randomized Patrols for Fare Inspection in …sandholm/TRUSTS.AIMag13.pdf ·...

In the Los Angeles (LA) Metro Rail system and other proof-of-payment transit systems worldwide, passengers are legallyrequired to buy tickets before boarding, but there are no

gates or turnstiles. There are, quite literally, no barriers to entry,as illustrated in figure 1. Instead, security personnel are dynam-ically deployed throughout the transit system, randomlyinspecting passenger tickets. This proof-of-payment fare collec-tion method is typically chosen as a more cost-effective alter-native to direct fare collection, that is, when the revenue lost tofare evasion is believed to be less than what it would cost tomake fare evasion impossible.

For the LA Metro, with approximately 300,000 riders daily,this revenue loss can be significant; the annual cost has beenestimated at $5.6 million.1 The Los Angeles Sheriff’s Department(LASD) deploys uniformed patrols onboard trains and at sta-tions for fare checking (and for other purposes such as crimeprevention), in order to discourage fare evasion. With limitedresources to devote to patrols, it is impossible to cover all loca-

Articles

WINTER 2012 59Copyright © 2012, Association for the Advancement of Artificial Intelligence. All rights reserved. ISSN 0738-4602

TRUSTS: Scheduling Randomized Patrols for Fare Inspection

in Transit Systems Using Game Theory

Zhengyu Yin, Albert Xin Jiang, Milind Tambe, Christopher Kiekintveld, Kevin Leyton-Brown, Tuomas Sandholm, John P. Sullivan

n In proof-of-payment transit systems, passen-gers are legally required to purchase ticketsbefore entering but are not physically forced todo so. Instead, patrol units move about thetransit system, inspecting the tickets of passen-gers, who face fines if caught fare evading. Thedeterrence of fare evasion depends on the unpre-dictability and effectiveness of the patrols. Inthis article, we present TRUSTS, an applicationfor scheduling randomized patrols for fareinspection in transit systems. TRUSTS modelsthe problem of computing patrol strategies as aleader-follower Stackelberg game where theobjective is to deter fare evasion and hence max-imize revenue. This problem differs from previ-ously studied Stackelberg settings in that theleader strategies must satisfy massive temporaland spatial constraints; moreover, unlike inthese counterterrorism-motivated Stackelbergapplications, a large fraction of the ridershipmight realistically consider fare evasion, and sothe number of followers is potentially huge. Athird key novelty in our work is deliberate sim-plification of leader strategies to make patrolseasier to execute. We present an efficient algo-rithm for computing such patrol strategies andpresent experimental results using real-worldridership data from the Los Angeles Metro Railsystem. The Los Angeles County Sheriff’sDepartment is currently carrying out trials ofTRUSTS.

tions at all times. The LASD thus requires somemechanism for choosing times and locations forinspections. Any predictable patterns in such apatrol schedule are likely to be observed andexploited by potential fare evaders. The LASD’scurrent approach relies on humans for schedulingthe patrols. However, human schedulers are poorat generating unpredictable schedules (Wagenaar1972, Tambe 2011); furthermore such schedulingfor LASD is a tremendous cognitive burden on thehuman schedulers who must take into account allof the scheduling complexities (for example, traintimings, switching time between trains, and sched-ule lengths). Indeed, the sheer difficulty of evenenumerating the trillions of potential patrolsmakes any simple automated approach — such asa simple dice roll — inapplicable.

The result of our investigation is a novel appli-cation called TRUSTS (tactical randomization forurban security in transit systems) for fare-evasiondeterrence in urban transit systems, carried out incollaboration with the LASD. We take a game-the-oretic approach, which studies systems with mul-tiple self-interested parties and aims to predict thelikely outcomes of the system under rationalbehavior of the players. In particular, we modelthis problem as a Stackelberg game with one leader(the LASD) and many followers, in which eachmetro rider (a follower) takes a fixed route at afixed time. The leader precommits to a mixed

patrol strategy (a probability distribution over allpure strategies), and riders observe this mixed strat-egy before deciding whether to buy the ticket ornot (the decision to ride having already beenmade) in order to minimize their expected totalcost, following for simplicity the classic economicanalysis of rational crime (Becker and Landes1974). Both ticket sales and fines issued for fareevasion translate into revenue to the government.Therefore the optimization objective we choose forthe leader is to maximize total revenue (total tick-et sales plus penalties).

There are exponentially many possible purepatrol strategies, each subject to both the spatialand temporal constraints of travel within the tran-sit network. Explicitly representing a mixed strate-gy would be impractical. To remedy this difficulty,TRUSTS uses the transition graph, which capturesthe spatial as well as temporal structure of thedomain, and solves for the optimal (fractional)flow through this graph, using linear programming(LP). Such a flow can be interpreted as a marginalcoverage vector. Additionally, we show that astraightforward approach to extracting patrolstrategies from the marginals faces important chal-lenges: it can create infeasible patrols that violatethe constraint on patrol length, and it can gener-ate patrols that switch too frequently betweentrains, which can be difficult for patrol personnelto carry out. Thus, we present a novel technique to

Articles

60 AI MAGAZINE

Figure 1. Entrance of an LA Metro Rail Station.

overcome these difficulties using an extended for-mulation on a history-duplicate transition graphthat (1) forbids patrols that are too long and (2)penalizes patrols with too many switches.

Finally, we perform simulations based on actualridership data provided by the LASD for four LAMetro train lines (Blue, Gold, Green, and Red). Ourresults suggest the possibility of significant fare-evasion deterrence and hence prevention of rev-enue loss with very few resources. The LASD is cur-rently testing TRUSTS in the LA Metro system bydeploying patrols according to our schedules andmeasuring the revenue recovered.

Related WorkThere has been research on a wide range of prob-lems related to game-theoretic patrolling ongraphs. One line of work considers games in whichone player, the patroller, patrols the graph todetect and catch the other player, the evader, whotries to minimize the detection probability. Thisincludes work on hider-seeker games (Halvorson,Conitzer, and Parr 2009) for the case of mobileevaders, and search games (Gal 1979) for the caseof immobile evaders.

Another line of research considers games inwhich the patroller deploys resources (static ormobile) on the graph to prevent the other player,the attacker, from reaching certain target vertices.There are a few variations depending on the set ofpossible sources and targets of the attacker. Infil-tration games (Alpern 1992) considered one sourceand target. Asset protection problems (Dickersonet al. 2010) consider multiple sources and multipleequally weighted targets. Networked securitygames (Tsai et al. 2010) consider targets with dif-ferent weights.

The leader-follower Stackelberg game model hasbeen the topic of much recent research (Tambe2011) and has been applied to a number of real-world security domains, including the Los AngelesInternational Airport (Jain et al. 2010), the FederalAir Marshals Service (Jain et al. 2010), and theTransportation Security Administration (Pita et al.2011).

Urban transit systems, however, present uniquecomputational challenges. First, unlike in existingwork on graph patrolling games, and unlike in pre-vious deployed applications on counterterrorism,here the followers we seek to influence are poten-tially very many: large numbers of train ridersmight plausibly consider fare evasion. Booz AllenHamilton (see note 1) estimates that 6 percent ofriders are ticketless in the metro system overall;anecdotal reports suggest that on some lines thispercentage could be far greater, even a majority.Second, the leader has exponentially many possi-ble patrol strategies, corresponding to all the feasi-

ble trips within the transit network subject to cer-tain restrictions and preferences. Similar to FAMS(Jain et al. 2010), we represent patrol strategiescompactly as a marginal coverage vector. Butunlike the FAMS problem in which a patrol con-sists of a very limited number of flights (often apair of flights), TRUSTS allows much more com-plex patrols and thus uses a novel compact repre-sentation based on history-duplicate transitiongraphs.

Game Theory BackgroundWe begin with a brief overview of the game-theo-retic concepts that we use in this article. At a highlevel, game theory studies games, which are math-ematical models of interactions among multipleplayers, each trying to advance their self-interestby choosing among a set of strategies. Formally, agame has a set N of players; each player i N has aset Si of pure strategies; the outcome of the game isdetermined by the strategy profile s S, where S isthe Cartesian product of the player’s sets of purestrategies. Each player i’s preference over the out-comes of the game is specified by her utility func-tion ui : S �, and her objective is to maximizeher expected utility. Each player can choose a purestrategy, or a mixed strategy that is a probabilitydistribution over pure strategies. Player i’s bestresponse strategy, given strategies of the otherplayers, is a strategy of i that maximizes i’s expect-ed utility. It is always possible to pick a bestresponse that is a pure strategy.

In a Stackelberg game between a leader and a fol-lower, the leader commits to a mixed strategy, andthe follower observes the leader’s mixed strategyand plays a best response (Conitzer and Sandholm2006). This setting has been applied to securitydomains where the defender’s daily patrol strate-gies are observed by attackers. The optimal mixedstrategy for the leader to commit to, together withthe follower’s best response, forms a Stackelbergequilibrium of the game.

Problem SettingTRUSTS addresses the challenge of generating ran-domized schedules for LASD patrols for four LAMetro lines (Blue, Gold, Green, and Red). For sim-plicity, TRUSTS treats the LA Metro system’s multi-ple lines as independent. Indeed, currently, the LAMetro lines have just a few transfer points. Dealingwith the impact of transfers will be a topic forfuture work. We model the patrol scheduling prob-lem on each line as a leader-follower Stackelberggame with one leader (the LASD) and multiple fol-lowers (riders). In this game, a pure leader strategyis a patrol, that is, a sequence of patrol actions(defined below), of constant bounded duration.

Articles

WINTER 2012 61

The two possible pure follower strategies are buy-ing and not buying. There are many types of fol-lowers, one for each source, destination, anddeparture time triple (corresponding to the set ofall riders who take such a trip).

Train SystemThe train system consists of a single line (forexample, the Gold line as shown in figure 2a onwhich trains travel back and forth, in general withmultiple trains traveling simultaneously. The sys-tem operates according to a fixed daily schedule (asnippet is shown in figure 2b), with trains arrivingat stations at (finitely many) designated timesthroughout the day. Therefore we can model timeas discrete, focusing only on the time steps atwhich some train arrival/departure event occurs.We use the (directed) transition graph G = ⟨V, E⟩ toencode the daily timetable of the metro line,where a vertex v = ⟨s, t⟩ corresponds to some pairof station s and time point t. An edge in G repre-sents a possible (minimal) action. In particular,there is an edge from (s, t⟩ to ⟨s, t⟩ if: s is eitherthe predecessor or successor of s in the stationsequence and ⟨s, t⟩ and ⟨s, t⟩ are two consecutivestops for some train in the train schedule (travel-ing action), or s = s, t < t, and there is no vertex⟨s, t⟩ with t < t < t (staying action). We refer tothe entire path that a given train takes through G,from the start station to the terminal station, as atrain path.

PatrolsThere are a fixed number of deployable patrolunits, each of which may be scheduled on a patrolof duration at most hours (with, for example, =7). There are two sorts of patrol actions, which agiven patrol unit can alternate between on its shift:on-train inspections (in which patrollers ride thetrain, inspecting passengers), and in-stationinspections (in which they inspect passengers asthey exit the station). A pure patrol strategy is rep-resented mathematically as a path in G for eachpatrol unit, in which an edge e represents an atom-ic patrol action, that is, inspecting in-station fromthe time of one train event at that station to thenext (at that station) or inspecting on-train as ittravels from one station to the next. Each edge ehas a length le equal to the corresponding patrolaction duration and an effectiveness value fe,which represents the percentage of the relevantridership inspected by this action. For both in-sta-tion and on-train inspections, fe depends on theridership volume at that location, the time of day,and the duration. A valid pure patrol strategy is aset of paths P1 , ..., P, each of size at most .

ExampleA simple scenario with three stations (A, B, C) andfour discrete time points (6pm, 7pm, 8pm, 9pm) isgiven in figure 3. The dashed lines represent stay-ing actions; the solid lines represent travelingactions. There are four trains in the system; all edgedurations are 1 hour. A sample train path here is(A, 6pm) (B, 7pm) (C, 8pm). In this example,if = 2 and = 1, then the valid pure leader strate-

Articles

62 AI MAGAZINE

Los Angeles River

PASADENA

DOWNTOWNLOS ANGELES

SOUTHPASADENA

MONTECITOHEIGHTS

LINCOLNHEIGHTS

CYPRESSPARK

MOUNTWASHINGTON

HIGHLANDPARKLOS ANGELES

EAST LOS ANGELES

BOYLEHEIGHTS

Amtrak/M

etrolink

Metrolink

Am

trak

/M

etro

link

HERITAGE SQUARE

SOUTH PASADENA

FILLMORE

DEL MAR

LAKE ALLEN

LINCOLN/CYPRESS

SOUTHWESTMUSEUM

HIGHLANDPARK

MEMORIAL PARK

SIERRA MADREVILLA

UNION STATION

CHINATOWN

PICO/ALISOMARIACHI PLAZA

LITTLE TOKYO/ARTS DISTRICT SOTO

INDIANA

ATLANTICMARAVILLA

EAST LA CIVIC

CENTER

Metro Gold LineMetro Rail StationEdward R. Roybal Linea de OroEastside ExtensionFreeway

LEGEND

ROUTE MAP

CIVICCENTERSTATION

Figure 2. The Los Angeles Metro’s Gold Line.

Atla

ntic

East

LA

Civ

ic C

ente

r

Indi

ana

Soto

Mar

iach

i Pla

za

Litt

le T

okyo

/Art

s D

istr

ict

Uni

on S

tatio

n

Chi

nato

wn

— — — — — — 3:40A 3:43A— — — — — — 4:10 4:12— — — — — — 4:28 4:304:21A 4:23A 4:31A 4:34A 4:35A 4:41A 4:45 4:47— — — — — — 4:56 4:584:41 4:43 4:51 4:54 4:55 5:01 5:05 5:07— — — — — — 5:14 5:165:00 5:02 5:10 5:13 5:14 5:20 5:24 5:26— — — — — — 5:32 5:345:18 5:20 5:28 5:31 5:32 5:38 5:42 5:44— — — — — — 5:48 5:505:30 5:32 5:40 5:43 5:44 5:50 5:54 5:56— — — — — — 6:00 6:025:42 5:44 5:52 5:55 5:56 6:02 6:06 6:085:48 5:50 5:58 6:01 6:02 6:08 6:12 6:145:54 5:56 6:04 6:07 6:08 6:14 6:18 6:20

A

B

gies (pure patrol strategies) consist of all paths oflength 2.

RidersThe riders are assumed to be daily commuters whotake a fixed route at a fixed time. Horizon ResearchCorporation2 estimates that more than 82 percentof riders use the system at least three days a week.The ticket price (for any trip within the transit sys-tem) is a nominal fee , with the fine for fare eva-sion much greater. As the riders follow the sameroute every day, they could estimate the likelihoodof being inspected, based on which they make adecision as to whether to buy a ticket. We assumethe riders know the inspection probability perfect-ly and are rational, risk-neutral economic actors(Becker and Landes 1974) who make this choice inorder to minimize expected cost.

A rider’s type is defined by the path he or shetakes in the graph. Because there is a single trainline, riders never linger in stations, that is, do notfollow any ‘‘stay’’ edges (staying at a station) mid-journey; the last edge of every follower type is a(short) stay edge, representing the action of ‘‘exit-ing’’ the destination station, during which the rid-er may be subject to in-station inspection. There-fore the space of rider types corresponds to theset of all subpaths of train paths. (When G is drawnas in figure 3, all rider paths are ‘‘diagonal’’ exceptfor the last edge.) A metro line with N stops and Mscheduled trains will have O(M N2) rider types.

Given a pure patrol strategy of the units, (P1,…, P), the inspection probability for a rider of type is:

(1)

and therefore his or her expected utility is the neg-ative of the expected amount he or she pays: − ifshe or he buys the ticket and − multiplied by theexpression in equation (1) otherwise. The inspec-tion probability for a mixed strategy is then theexpectation of equation (1), taken over the distri-bution of pure strategies.

We justify the inspection probability in equation(1) as follows. First, consider on-train inspections.The fraction of the train that is inspected in a giv-en inspection action is determined by fe (whichdepends on ridership volume). The key is that inthe next inspection action, a patrol will not rein-spect the fraction of the train that is alreadyinspected in a previous inspection action. There-fore, unlike in settings where patrollers mayrepeatedly draw a random sample from the sameset of train passengers to inspect, in our setting, theprobabilities fe are added rather than multiplied.Now also consider in-station inspections. Since arider taking a journey only exits a single station, arider will encounter at most one in-station inspec-tion. Finally, when multiple patrol units cover thesame edge e, the inspection probability given by

min{1, fee!Pi"#$

i=1

%

$ },

Articles

WINTER 2012 63

A

B

C

6PM 7PM 8PM 9PM

A, 6PM A, 7PM A, 8PM A, 9PM

C, 6PM C, 7PM C, 8PM C, 9PM

B, 6PM B, 7PM B, 8PM B, 9PM

Figure 3. The Transition Graph of a Toy Problem Instance.

(1) is the sum of the contributions from each patrolunit, capped at 1. This is a reasonable assumptionwhen the number of patrol units on each edge e issmall, as multiple patrol units on the same traincould check different cars or different portions ofthe same car, and multiple patrol units inspectingat the same station could check different exits.

ObjectiveThe leader’s utility, equal to total expected rev-enue, can be decomposed into utilities from bilat-eral interactions with each individual follower; fur-thermore the followers do not directly affect eachother’s utilities. As a result, the game can be equiv-alently formulated as a two-player Bayesian Stack-elberg game, in which there is just one followertaking on the role of one of the passengers. Thetype of this passenger is known to the follower butnot to the leader and is drawn from a probabilitydistribution such that the probability p of a fol-lower type is proportional to its ridership vol-ume.

Furthermore, although in general solvingBayesian Stackelberg games is NP-complete(Conitzer and Sandholm 2006), the utility func-tions of this game satisfy the zero-sum property,that is, the utility gained by the leader alwaysequals the utility lost by the follower. For suchzero-sum Bayesian games, the Stackelberg equilib-rium is equivalent to the maximin solution, andthe games are solvable by the linear programmingformulations of Ponssard and Sorin (1980) orKoller, Megiddo, and von Stengel (1994). However,these LP formulations explicitly enumerate thepure strategies of the leader; since our game ofinterest has an exponential number of leader purestrategies, even storing these formulations wouldrequire exponential space.

Linear Program FormulationIn this section, we introduce our linear-program-ming-based approach for finding a maximum-rev-enue (mixed) patrol strategy. As noted above, theleader’s space of pure strategies is exponentiallylarge, even with a single patrol unit. We avoid thisdifficulty by compactly representing mixed patrolstrategies by marginal coverage on edges of thetransition graph, that is, by the expected numbersof inspections that will occur on these edges. Sub-sequently, we construct a mixed strategy (that is, aprobability distribution over pure strategies) con-sistent with the marginal coverage.

For expository purposes, we first present a basicformulation of our approach of compactly repre-senting the problem using marginal coverage. Thisbasic formulation also illustrates the key issues thatmake it difficult for the end user to deploy thepatrol strategies computed. We then introduce an

extended formulation to address these issues. Wewill focus on the high-level ideas and refer inter-ested readers to our technical paper (Yin et al.2012) for further details.

Basic FormulationFor algorithmic convenience, we add to the transi-tion graph a source v+ with edges to all possiblestarting vertices in the transition graph and a sinkv− with edges from all possible ending vertices. Weassign these additional dummy edges zero dura-tion and zero effectiveness.

Let xe be the expected number of inspectionsthat will occur on edge e of this transition graph.We call the vector x of marginal coverage on alledges the marginal strategy. Since the number ofedges in the graph is exponentially fewer than thenumber of paths, the marginal strategies are amuch more compact representation of mixedstrategies. A valid vector x must satisfy the follow-ing constraints, which are the constraints defininga fractional flow on the graph when we interpret xeto be the amount of flow on edge e: the total flowsentering and exiting the system are bounded by ,the number of total patrol units allowed; the totalflow entering each intermediate node equals thetotal flow out of that node. Furthermore, theexpected total number of time units spent by thepatrols must be bounded by · .

Recall that our objective is to maximize theleader’s expected revenue, which equals the riders’expected total payment. Since the riders are ration-al players who minimize their expected costs, theexpected payment of each rider is the minimum ofthe ticket price, and his or her expected paymentif he or she decides to evade, which equals the finetimes the probability of the rider being captured. Itturns out that we could upper-bound the probabil-ity of capture using a linear expression in terms ofx, leading to a linear program that provides anupper bound on the optimal revenue. Fortunately,once we generate the patrols from the marginalswe are able to compute the actual best-responseexpected utilities of the riders. Our experimentsshow that the differences between the actualexpected utilities and the upper-bounds given bythe LP formulation are small. Meanwhile the sizeof this LP grows only polynomially in the size ofthe transition graph. Once we solve the LP to findthe optimal marginal strategy x, we can efficientlyconstruct a -unit mixed strategy whose marginalsmatch x.

Issues with the Basic FormulationThere are two fundamental issues with the basicformulation. First, the mixed strategy constructedcan fail to satisfy the patrol length limit of ,notwithstanding the constraint on the sum of thelengths of all patrols, and hence be infeasible. In

Articles

64 AI MAGAZINE

fact, the marginal strategy computed in the basicformulation may not correspond to any feasiblemixed strategy in which all patrols have length atmost . Consider the counterexample in figure 4.Edges v1 v2 and v2 v3 represent two real atom-ic actions, each with duration 1. Patrols must startfrom either v1 or v3, but can terminate at any of v1,v2, and v3. This is specified using v+ and v−, thedummy source and sink, respectively. We assume = 1 and = 1. It can be verified that the marginalstrategy shown in figure 4 satisfies the constraintsof our basic LP. However, the only correspondingmixed strategy is to take v+ v3 v− with 50 per-cent probability and v+ v1 v2 v3 v− with50 percent probability. (To see this, observe that wecannot put any probability mass on the path v+

v1 v2 v− because the edge v2 v− has zero flow;the path v+ v1 v− is excluded by a similar argu-ment.) This mixed strategy is infeasible since itssecond patrol has duration greater than 1. Thispatrol length violation arises because the basic for-mulation only constrains the average patrollength, and therefore allows the use of overlongpatrols as long as some short patrols are also used.

Second, the paths selected according to the con-structed mixed strategy may switch between trainsor between in-station and on-train inspection animpractically large number of times, making thepatrol path difficult to implement and error prone.This is an important issue as we want real LASDofficers to be able to carry out these strategies. Themore switches there are in a patrol strategy, themore instructions the patrol unit has to remember,and the more likely the unit will miss a switch dueto imperfections in the train schedule and/or theunit’s misexecution of the instructions. For exam-ple, in the example, ⟨A, 6pm⟩ ⟨B, 7pm⟩ ⟨A,

8pm⟩ and ⟨C, 6pm⟩ ⟨B, 7pm⟩ ⟨C, 8pm⟩ each doone switch while ⟨A, 6pm⟩ ⟨B, 7pm⟩ ⟨C, 8pm⟩and ⟨C, 6pm⟩ ⟨B, 7pm⟩ ⟨A, 8pm⟩ each do none.Both path pairs cover the same set of edges, mak-ing the second preferable because it is easier toimplement.

Extended FormulationNow we present a more sophisticated formulationdesign to address the two aforementioned issues.The difficulty involved in imposing constraints onthe patrol paths (that is, penalizing or forbiddingcertain paths) in the marginal representation isthat paths themselves are not represented, insteadbeing encoded only as marginal coverage.

Hence the key idea is to preserve sufficient pathhistory information within vertices to be able toevaluate our constraints, while avoiding the expo-nential blowup creating a node for every pathwould cause. We construct a new graph, called thehistory-duplicate transition (HTD) graph, by creat-ing multiple copies of the original vertices, eachcorresponding to different values of history infor-mation.

We first explain how to construct the HDT graphfrom a transition graph G in order to forbid patrolpaths longer than . The HDT graph is composedof multiple restricted copies of G (that is, sub-graphs of G), corresponding to different possiblestarting time points. For the copy corresponding tostarting time point t*, we only keep the subgraphon vertices v = ⟨s, t⟩ V where t* ≤ t ≤ t* + . Thus,in each restricted copy of G, the length of any pathis guaranteed to be less than or equal to . Sincethere are a finite number of distinct possible start-ing time points (that is, all distinct discrete timepoints in V+ ), the new graph is a linear expansion

Articles

WINTER 2012 65

v+ v1 v2 v3 v-

0.5

0.5 0.50.5 1

000

Figure 4. Example of an Infeasible Marginal Strategy.

of G. An approximation can be obtained by takingone starting time point every time units. In thiscase, an original vertex (edge) will be kept in atmost / copies, implying the new graph is atmost / times larger than G.

Figure 5a shows the HDT graph (the shaded por-tion further explained below) of the example with = 2 and two starting time points, 6pm and 7pm.The HDT graph is thus composed of two restrictedcopies of the original transition graph. In each ver-tex, the time shown in parenthesis indicates thestarting time point. For example, the original ver-tex ⟨A, 7pm⟩ now has two copies ⟨A, 7pm, (6pm)⟩and ⟨A, 7pm, (7pm)⟩ in the HDT graph. For thestarting time point of 6pm, the patrol must end ator before 8pm, hence we do not need to keep ver-tices whose discrete time point is 9pm. For thestarting time point of 7pm, the patrol must start ator after 7pm, hence we do not need to keep verticeswhose discrete time point is 6pm. The two restrict-ed copies are not two separate graphs but a singlegraph that will be tied together by the dummysource and sink.

Next, we explain how to further extend the HDTgraph to penalize complex patrol paths. The idea isto have each vertex encode the last action occur-ring prior to it. Specifically, we create multiplecopies of a vertex v, each corresponding to a differ-ent edge that leads to it. If v is a possible startingvertex, we create an additional copy representingno prior action. If there is an edge from v to v, weconnect all copies of v to the specific copy of vwhose last action was (v, v). A new edge is called aswitching edge if the recorded last actions of itstwo vertices are of different types (for example,inspecting different trains), unless one of the twovertices is a ‘‘no prior action’’ vertex. As can be ver-ified, the number of switches of a patrol path inthe new graph is the number of switching edges ithas. To favor simple patrol paths, we impose a costfor using switching edges.

In figure 5b, we show how to apply this exten-sion using the subgraph shown in the shaded boxof figure 5a. Since there is only one edge leading to⟨A, 7pm, (6pm)⟩, we create one copy of it repre-senting the action of staying at A. There are threeedges leading to ⟨B, 7pm, (6pm)⟩, so we create threecopies of it representing the actions of taking trainfrom A, staying at B, and taking train from C. Theoriginal edges are also duplicated. For example, ⟨B,7pm, (6pm)⟩ ⟨B, 8pm, (6pm)⟩ has three copiesconnecting the three copies of ⟨B, 7pm, (6pm)⟩ tothe copy of ⟨B, 8pm, (6pm)⟩ representing the stay-ing at B action. Among the three copies, only the‘‘Stay’’ to ‘‘Stay’’ edge is not a switching edge.

Given the final HDT graph G = ⟨V, E⟩, we pro-vide an extended linear program formulation, withvariables ye representing the marginal coverage ofan HDT graph edge e E being selected. To penal-

ize switches, we add to the objective function apenalty for each unit of marginal coverage usedby switching edges. Varying the value of lets ustrade off between solution quality (greater rev-enue) and patrol preference (lower average num-ber of switches). A path in the HDT graph G triv-ially corresponds to a path in the transition graphG, since any edge in G is a duplicate of some edgein G. Because the length of any patrol path in theHDT graph is bounded by , the mixed strategymust be feasible.

Real-World EvaluationWe present our evaluation based on real metroschedules and rider traffic data provided by theLASD. We solved the LP in the extended formula-tion using CPLEX 12.2 on a standard 2.8 GHzmachine with 4 GB memory. We first describe thedata sets we used, followed by our experimentalresults.

Data SetsWe created four data sets, each based on a differentLos Angeles Metro Rail line: Red (including Pur-ple), Blue, Gold, and Green. For each line, we cre-ated its transition graph using the correspondingtimetable from www.metro.net. Implementing theLP requires a distribution of types of potential fareevaders (recall that a rider type corresponds to a 4-tuple of boarding station / time and disembarkingstation / time). We have access to hourly boardingand alighting counts provided by the Los AngelesSheriff’s Department, and from this can estimatethe distribution of population on the train systemat each hour; but we do not know what percentageof the population are potential fare evaders. In ourexperiments, we assumed that potential fareevaders were evenly distributed among the gener-al population. Specifically, suppose the percentageof riders boarding in hour i is di

+ and the percent-age of riders alighting in hour i is di

−. Denote theset of rider types that board in hour i by i

+ andthose that alight in hour i by i

−. Then we wouldlike to compute a fine-grained ridership distribu-tion p to match the hourly boarding and alightingpercentages, that is, to find a point within the fol-lowng convex region ,

We estimate the fare-evader distribution by findingthe analytic center of , which is efficiently com-putable.

The inspection effectiveness fe of an edge isassigned based on the assumption that 10 passen-gers can be inspected per minute. The inspectioneffectiveness fe is capped at 0.5 to capture the factthat the inspector cannot switch between cars

!= {p | p! 0" p##$%i

+& = di

+ " p##$%i

'& = di

',(i}.

Articles

66 AI MAGAZINE

while the train is moving. (Trains contain at leasttwo cars.) The ticket fare was set to $1.50 (the actu-al current value) while the fine was set to $100.00(Fare evaders in Los Angeles can be fined $200.00,but they also may be issued warnings.) If we couldincrease the fine dramatically the riders wouldhave much less incentive for fare evasion, and wecould achieve better revenue. However a larger fineis infeasible legally. Table 1 summarizes thedetailed statistics for the LA Metro lines.

Experimental ResultsThroughout our experiments, we fixed to 1. Inour first set of experiments, we fixed the penalty for using patrol paths with more switches to 0, andvaried the maximum number of hours that aninspector can patrol from 4 to 7 hours. To create

the HDT graph, we took one starting time pointevery hour.

Figure 6a shows the expected revenue per riderof the mixed patrol strategy we generated, which isthe total revenue divided by the number of daily

Articles

WINTER 2012 67

From A

From CStay

Stay

Stay

From A

From B

From B

From CStay

Stay

Stay A, 8PM(6PM)

B, 8PM(6PM)

C, 8PM(6PM)

A, 7PM(6PM)

B, 7PM(6PM)

C, 7PM(6PM)

A, 7PM(6PM)

B, 7PM(6PM)

C, 7PM(6PM)

A, 8PM(6PM)

B, 8PM(6PM)

C, 8PM(6PM)

B

A, 7PM(6PM)

A, 6PM(6PM)

AA, 8PM(6PM)

A, 7PM(7PM)

A, 8PM(7PM)

A, 9PM(7PM)

B, 7PM(6PM)

B, 6PM(6PM)

B, 8PM(6PM)

B, 7PM(7PM)

B, 8PM(7PM)

B, 9PM(7PM)

C, 7PM(6PM)

C, 6PM(6PM)

C, 8PM(6PM)

C, 7PM(7PM)

C, 8PM(7PM)

C, 9PM(7PM)

Figure 5. HDT Graph of Example 1.

(a) Two starting time points. (b) Extension storing the last action occurring.

Line Stops Trains Daily Riders Types

Red 16 433 149991.5 26033

Blue 22 287 76906.2 46630

Gold 19 280 30940.0 41910

Green 14 217 38442.6 19559

Table 1. Statistics of Los Angeles Metro Lines.

Articles

68 AI MAGAZINE

4 5 6 71.2

1.3

1.4

1.5

Number of patrol hours

Reve

nue

per

rid

er

Cum

ulat

ive

dist

ribut

ion

0 20 40 60 800

0.2

0.4

0.6

0.8

1

Number of switches

β = 0β = 0.001β = 0.01

4 5 6 7

102

103

104


Solv

ing

time

(in s

econ

ds)

BLUEGOLDGREENRED

BLUEGOLDGREENRED

4 5 6 796%

97%

98%

99%

100%


Perc

enta

ge o

f up

per

bou

nd

BLUEGOLDGREENRED

4 5 6 70

10%

20%

30%

40%


pre

fer

fare

−eva

sion

BLUEGOLDGREENRED

5 10 15 20 2595%

96%

97%

98%

99%

100%

Number of switches

Nor

mal

ized

rev

enue

BLUEGOLDGREENRED

10 100 100085%

90%

95%

100%

Solving time (in seconds)

Nor

mal

ized

rev

enue

BLUEGOLDGREENRED

4 5 6 70

20%

40%

60%

80%

100%


Perc

enta

ge o

f rid

ers

prefer purchasing

prefer fare evasion

Indifferent

A

B

C

D

E

F

G

H

Figure 6. Experimental Results.

riders. Since the LP only returns an upper boundof the attainable revenue, the true expected rev-enue of the mixed patrol strategy was computed byevaluating the riders’ best responses for all ridertypes. A rider can always pay the ticket price of$1.50 and will only evade the ticket when theexpected fine is lower. Hence the theoretical max-imum achievable value is $1.50, which is achievedwhen every rider purchases a ticket. As we can see,the per rider revenue increases as the number ofpatrol hours increases, almost converging to thetheoretical upper bound of $1.50 for the Gold andGreen line. Specifically, a 4-hour patrol strategyalready provides reasonably good expected value:1.31 for the Blue line (87.4 percent of the maxi-mum), 1.45 for the Gold line (97.0 percent), 1.48for the Green line (98.8 percent), and 1.22 for theRed line (81.3 percent). Among the four lines, theRed line has the lowest revenue per rider. This isbecause the effectiveness of fare inspectiondecreases as the volume of daily riders increases,and the Red line has a significantly higher numberof daily riders than the other lines.

We depict in figure 6b the percentage of the trueexpected revenue versus the theoretical upperbound returned by the LP. Strategies generated byour method are near optimal; for example, our 4-hour strategies for the Blue, Gold, Green, and Redlines provided expected revenues of 96.5 percent,98.5 percent, 99.5 percent, and 97.0 percent of theupper bound (and thus at least as much of the opti-mum), respectively.

To study riders’ responses to the computed strat-egy, we partitioned the entire population of ridersinto three groups depending on their expected fineif fare evading: riders who prefer purchasing tickets(expected fine is greater than 1.7 — 13.3 percentabove the ticket price), riders who prefer fare eva-sion (expected fine is less than 1.3 — 13.3 percentbelow the ticket price), and indifferent riders(expected fine is between 1.3 and 1.7). In figure 6c,we show the distribution of the three groupsagainst the strategies computed for the Red line.The three dashed lines inside the region of indif-ferent riders represent, from top to bottom, thepercentages of riders whose expected fine is lessthan 1.6, 1.5, and 1.4, respectively. As the numberof patrol hours increases from 4 to 7, the percent-age of riders who prefer fare evasion decreases from38 percent to 7 percent, the percentage of riderswho prefer purchasing tickets increases from 17percent to 43 percent, and the percentage of indif-ferent riders remains stable between 45 percentand 50 percent.

Zooming in on the fare evasion, figure 6d showsthe percentage of riders who preferred fare evasionagainst the patrol strategies computed. As we cansee, this percentage decreased almost linearly inthe number of additional patrol hours beyond 4.

Our 7-hour patrol strategy lowered this percentageto 4.2 percent for the Blue line, 0.01 percent for theGold line, 0.01 percent for the Green line, and 6.8percent for the Red line. Again, due to having thehighest daily volume, the Red line had the highestpercentage of riders who preferred fare evasion.

Finally, figure 6e shows the run time required byCPLEX to solve the LPs we created. As we can see,the run time increased as the number of patrolhours increased for all the metro lines. This isbecause the size of the HDT graph constructed isroughly proportional to the maximum length ofthe patrols, and a larger HDT graph requires an LPwith more variables and constraints. Among thefour lines, the Red and the Green lines have sig-nificantly fewer types, and are thus easier to solvethan the other two lines.

In our second experiment, we varied the interval of taking starting time points, trading off solu-tion quality for efficiency. We fixed the patrollength to 4 hours and penalty parameter to 0.For each line, we tested six interval () settingsranging from 0.5 hour to 4 hours. In figure 6f, thex-axis is the run time (in log-scale) and the y-axis isthe normalized revenue against the expected rev-enue of = 0.5 within each line. For each line, adata point from left to right corresponds to = 4,3, 2, 1.5, 1, and 0.5 respectively. Increasing the runtime (by decreasing ) always led to a better solu-tion; however, the quality gain diminished. Forexample, for the Blue line, it took 20 seconds ofadditional run time to increase the solution quali-ty from 87.9 percent ( = 4 hours) to 92.9 percent( = 3 hours), whereas it took 1456 seconds of addi-tional run time to increase the solution qualityfrom 99.1 percent( = 1 hour) to 100 percent ( = 0.5 hour).

In the final experiment, we varied the penalty for using more switches, trading off between thesolution quality and the average number of switch-es. We fixed the patrol length to 4 hours andstarting time interval to one hour. For each line,we tested seven penalty settings from = 0 to =0.01. Figure 6g plots the average number of switch-es against the normalized revenue against theexpected revenue of = 0 within each line. For alllines, higher values led to both lower solutionquality and fewer number of switches. For exam-ple, the average number of switches in the solutionof the highest revenue ( = 0) ranged from 18.6(Gold line) to 26.7 (Red line). However, by allow-ing 3 percent quality loss, this number could belowered to less than 10 for all the four lines.

To further understand the patrol paths returnedin these solutions, we show, in figure 6h, thecumulative probability distributions of the numberof switches for the Red line given three settings of: 0, 0.001, and 0.01. Choosing a lower tended tolead to more complex patrol paths. For example,

Articles

WINTER 2012 69

the solution of = 0 used patrol paths whose num-ber of switches is greater than 20 with 68.9 percentprobability; the solution of = 0.001 (99.7 percentof the optimum) only used such paths with 31.2percent probability. And the solution of = 0.01(97.0 percent of the optimum) never used patrolpaths that had more than 20 switches.

LASD Evaluation of TRUSTSSince January 2012, LASD has been testing ourgenerated patrol strategies on the Red Line. Forexample, in initial test runs on Thursday, January4, and Friday, January 5, 2012, one patrol unit con-ducted a 4-hour fare-inspection patrol on each day.A total of 851 fare checks were made, with 41 fareevaders cited and felons felons arrested. Thepatrols implemented in the two days had four andfive switches, respectively, and the officers wereable to make the switches we requested. The LASDofficers have given us valuable feedback and wehave incorporated their suggestions, includinginserting breaks in the middle of patrol shifts.Dozens of test patrol runs have been carried out sofar, and more tests are scheduled in the future toprovide a more thorough evaluation of the effec-tiveness of our strategies. Figure 7 shows two offi-

cers doing fare checks at a station exit during oneof the test patrols.

SummaryIn this paper we presented TRUSTS, a novel appli-cation for fare-evasion deterrence in urban transitsystems. Our development of TRUSTS opens thedoor to applying game-theoretical randomizationbeyond previous applications of counterterrorism,to a much broader setting in which common indi-viduals and daily routines are involved. We mod-eled the domain as a Stackelberg game, providinga novel compact representation of the leader’smixed strategies as flows in the history-duplicatetransition graph. We found in our simulations thatour method computed close-to-optimal strategies,which effectively deterred fare evasion andensured high levels of revenue with few patrolhours. We are currently evaluating TRUSTS withinthe LA Metro Rail system in collaboration withLASD.

We briefly mention a couple of future directions.One of the observations from the test runs is thatLASD officers often need to deviate from the patrolschedule due to making felony arrests or being

Articles

70 AI MAGAZINE

Figure 7. Fare Check at a Station Exit.

called to deal with emergencies at other parts ofthe train system. After dealing with the emergency,which could take a significant amount of time, it isoften difficult to catch up to the original patrolschedule. We are currently investigating more effi-cient ways of utilizing resources in case of suchemergencies. Another future direction is to inves-tigate bounded rational models of the passengers,since they are human decision makers and thusmight not be perfect optimizers. Incorporatingsuch behavior models of the adversary in securitygames (Pita et al. 2010; Yang et al. 2011) maypotentially increase the robustness of our solu-tions. And we ultimately want to combine farechecking with LASD’s other tasks of crime preven-tion and counterterrorism. We plan to make use ofmultiobjective optimization techniques to intelli-gently allocate resources that makes trade-offsamong these different objectives.

AcknowledgementsThis research is supported by MURI grantW911NF-11-1-0332. Tuomas Sandholm was fund-ed by the National Science Foundation undergrants IIS-0905390, IIS-0964579, and CCF-1101668. Kevin Leyton-Brown was funded by theNatural Sciences and Engineering Research Coun-cil of Canada under the Discovery Grants program.

Notes1. See the Booz Allen Hamilton 2007 report, FaregatingAnalysis commissioned by LA Metro(boardarchives.metro.net/Items/2007/11˙Novem-ber/20071115EMACItem2%7.pdf)

2. See the 2002 Horizon Research Corporation Metropol-itan Transit Authority Fare Evasion Study. (librar-yarchives.metro.net/DPGTL/studies/2002˙horizon˙fare˙evasio%n˙study.pdf)

ReferencesAlpern, S. 1992. Infiltration Games on Arbitrary Graphs.Journal of Mathematical Analysis and Applications 163(1):286–288.

Becker, G., and Landes, W. 1974. Essays in the Economicsof Crime and Punishment. New York: Columbia UniversityPress.

Conitzer, V., and Sandholm, T. 2006. Computing theOptimal Strategy to Commit To. In Proceedings of the 7thACM Conference on Electronic Commerce. New York: Asso-ciation for Computing Machinery.

Dickerson, J. P.; Simari, G. I.; Subrahmanian, V. S.; andKraus, S. 2010. A Graph-Theoretic Approach to ProtectStatic and Moving Targets from Adversaries. In Proceed-ings of the 9th International Conference on AutonomousAgents and Multiagent Systems. Richland, SC: Internation-al Foundation for Autonomous Agents and MultiagentSystems.

Gal, S. 1979. Search Games with Mobile and ImmobileHider. SIAM Journal on Control and Optimization 17(1): 99–122.

Halvorson, E.; Conitzer, V.; and Parr, R. 2009. Multi-StepMulti-Sensor Hider-Seeker Games. In Proceedings of the21st International Joint Conference on Artificial Intelligence.Menlo Park, CA: AAAI Press.

Jain, M.; Tsai, J.; Pita, J.; Kiekintveld, C.; Rathi, S.; Tambe,M.; and Ordóñez, F. 2010. Software Assistants for Ran-domized Patrol Planning for the LAX Airport Police andthe Federal Air Marshals Service. Interfaces 40(4): 267–290.

Koller, D.; Megiddo, N.; and von Stengel, B. 1994. FastAlgorithms for Finding Randomized Strategies in GameTrees. In Proceedings of the Twenty-Sixth Annual ACM Sym-posium on Theory of Computing. New York: Association forComputing Machinery.

Pita, J.; Jain, M.; Tambe, M.; Ordóñez, F.; and Kraus, S.2010. Robust Solutions to Stackelberg Games: AddressingBounded Rationality and Limited Observations inHuman Cognition. Artificial Intelligence Journal 174(15):1142–1171.

Pita, J.; Tambe, M.; Kiekintveld, C.; Cullen, S.; andSteigerwald, E. 2011. Guards — Game Theoretic SecurityAllocation on a National Scale. In Proceedings of the 10thInternational Conference on Autonomous Agents and Multia-gent Systems. Richland, SC: International Foundation forAutonomous Agents and Multiagent Systems.

Ponssard, J., and Sorin, S. 1980. The L.P. Formulation ofFinite Zero-Sum Games with Incomplete Information.International Journal of Game Theory 9(2): 99–105.

Tambe, M. 2011. Security and Game Theory: Algorithms,Deployed Systems, Lessons Learned. New York: CambridgeUniversity Press.

Tsai, J.; Yin, Z.; Kwak, J.; Kempe, D.; Kiekintveld, C.; andTambe, M. 2010. Urban Security: Game-TheoreticResource Allocation in Networked Physical Domains. InProceedings of the Twenty-Fourth AAAI Conference on Artifi-cial Intelligence. Menlo Park, CA: AAAI Press.

Wagenaar, W. A. 1972. Generation of Random Sequencesby Human Subjects: A Critical Survey of Literature. Psy-chological Bulletin 77(1): 65–72.

Yang, R.; Kiekintveld, C.; Ordóñez, F.; Tambe, M.; andJohn, R. 2011. Improved Computational Models ofHuman Behavior in Security Games. In Proceedings of the10th International Conference on Autonomous Agents andMultiagent Systems. Richland, SC: International Founda-tion for Autonomous Agents and Multiagent Systems.

Yin, Z.; Jiang, A. X.; Johnson, M.; Tambe, M.; Kiekintveld,C.; Leyton-Brown, K.; Sandholm, T.; and Sullivan, J.2012. TRUSTS: Scheduling Randomized Patrols for FareInspection in Transit Systems. In Proceedings of the Twen-ty-Fourth AAAI Conference on Innovative Applications ofArtificial Intelligence (IAAI). Menlo Park, CA: AAAI Press.

Zhengyu Yin is a Ph.D. student in the Department ofComputer Science at the University of Southern Califor-nia. His research interests include algorithmic game the-ory and robust optimization, with applications to securi-ty domains. He has published and presented at AAMAS,IJCAI, and AAAI, and others, and published in JAIR andothers.

Albert Xin Jiang is a postdoctoral researcher in theDepartment of Computer Science at the University ofSouthern California. Much of his research is addressing

Articles

WINTER 2012 71

computational problems arising in game theory, includ-ing the efficient computation of solution concepts. In2011 he received the best student paper award at theACM Conference on Electronic Commerce. He receivedthe 2012 CAIAC Doctoral Dissertation Award and therunner up IFAAMAS-11 Victor Lesser Distinguished Dis-sertation Award.

Milind Tambe is Helen N. and Emmett H. Jones Profes-sor in Engineering and a professor of computer scienceand industrial and systems engineering at the Universityof Southern California (USC). He leads the TEAMCOREResearch Group at USC, with research focused on agent-based and multiagent systems. He is a fellow of AAAI andrecipient of the ACM/SIGART Autonomous AgentsResearch Award and the Christopher Columbus Fellow-ship Foundation Homeland security award. In addition,he is also the recipient of the ”influential paper award”from the International Foundation for Agents and Multi-agent Systems.

Christopher Kiekintveld is an assistant professor at theUniversity of Texas at El Paso (UTEP). His research focus-es on intelligent agents and decision making, includingcomputational game theory with applications to securityand trading agents.

Kevin Leyton-Brown is an associate professor in com-

puter science at the University of British Columbia. Heworks at the intersection of computer science and micro-economics, addressing computational problems in eco-nomic contexts and incentive issues in multiagent sys-tems.

Tuomas Sandholm is a professor in the Computer Sci-ence Department at Carnegie Mellon University. He wasfounder, chairman, and CTO/chief scientist of Com-bineNet, Inc. from 1997 until its acquisition in 2010. Heholds a Ph.D. and MS in computer science and an MS (BSincluded) with distinction in industrial engineering andmanagement science. He is recipient of the NSF CareerAward, the inaugural ACM Autonomous Agents ResearchAward, the Alfred P. Sloan Foundation Fellowship, theCarnegie Science Center Award for Excellence, and theComputers and Thought Award. He is Fellow of ACM andAAAI.

John P. Sullivan is a career police officer. He currentlyserves as a lieutenant with the Los Angeles Sheriff’sDepartment. He is coeditor of Countering Terrorism andWMD: Creating a Global Counter-Terrorism Network (Rout-ledge, 2006) and Global Biosecurity: Threats and Responses(Routledge, 2010). He is coauthor of Policing Transporta-tion Facilities (Charles C. Thomas, 1994), and EmergencyPreparedness for Transit Terrorism (Transportation ResearchBoard, 1997).

Articles

72 AI MAGAZINE

AAAI to Colocate with Cognitive Science Society in 2014!AAAI is pleased to announce that it will colocate with the 2014 Cognitive Science Society Conference in pictur-esque Québec City, Québec, July 27-31, 2014. The conference will be held at the beautiful Centre des congrès deQuébec, and attendees can stay at the adjacent Hilton Québec. More details will be available at the AAAI web-site soon!

TRUSTS: Scheduling Randomized Patrols for Fare Inspection in …sandholm/TRUSTS.AIMag13.pdf ·...

Documents

Transcript of TRUSTS: Scheduling Randomized Patrols for Fare Inspection in …sandholm/TRUSTS.AIMag13.pdf ·...