=1 Revealing graph bandits for maximizing local...

1
Revealing graph bandits for maximizing local influence Experiments ALEXANDRA CARPENTIER and MICHAL VALKO [email protected] and [email protected] round t regret Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.80 0 200 400 600 800 1000 1200 0 5 10 15 × 10 4 BARE GraphMOSS b D ? = 134, b T ? = 36 round t regret Graph: Facebook - Number of runs: 100 - revelation p = 0.80 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 0.5 1 1.5 2 2.5 3 3.5 × 10 6 BARE GraphMOSS b D ? = 125, b T ? = 28 round t regret Graph: Enron - Number of runs: 100 - revelation p = 0.80 1 5000 10000 15000 20000 25000 30000 35000 0 1 2 3 4 5 6 7 8 9 × 10 7 BARE GraphMOSS b D ? = 564, b T ? = 107 round t regret Graph: Gnutella - Number of runs: 100 - revelation p = 0.80 0 2000 4000 6000 8000 10000 12000 0 1 2 3 4 5 6 7 8 9 × 10 5 BARE GraphMOSS b D ? = 3916, b T ? = 779 Figure 1: Left : Barab´ asi-Albert. Middle left : Facebook. Middle right : Enron. Right : Gnutella. round t regret Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.20 0 200 400 600 800 1000 1200 0 0.5 1 1.5 2 2.5 3 3.5 4 × 10 4 BARE GraphMOSS b D ? = 529, b T ? = 147 round t regret Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.40 0 200 400 600 800 1000 1200 0 1 2 3 4 5 6 × 10 4 BARE GraphMOSS b D ? = 230, b T ? = 64 round t regret Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.60 0 200 400 600 800 1000 1200 0 1 2 3 4 5 6 7 8 × 10 4 BARE GraphMOSS b D ? = 161, b T ? = 50 round t regret Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.80 0 200 400 600 800 1000 1200 0 5 10 15 × 10 4 BARE GraphMOSS b D ? = 134, b T ? = 36 round t regret Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 1.00 0 200 400 600 800 1000 1200 0 2 4 6 8 10 12 14 × 10 4 BARE GraphMOSS b D ? = 133, b T ? = 34 Algorithm BARE - BAndit REvelator Input d: the number of nodes n: time horizon Initialization T k,t 0, for 8k d d r k,t 0, for 8k d t 1, b T ? 0, b D ?,t d, b σ ?,1 d Global exploration phase while t b σ ?,t - 4 p d log(dn)/t q b D ?,t n do Influence a node at random (choose k t uniformly at random) and get S k t ,t from this node \ r k,t+1 t t+1 d r k,t + d t+1 S k t ,t (k ) b σ ?,t+1 max k 0 q \ r k 0 ,t+1 +8d log(nd)/(t + 1) w ?,t+1 8 b σ ?,t+1 q d log(nd) t+1 + 24d log(nd) t+1 b D ?,t+1 n k : max k 0 \ r k 0 ,t+1 - \ r k,t+1 w ?,t+1 o t t +1 end while b T ? t. Bandit phase Run minimax-optimal bandit algorithm on the b D ?, b T ? chosen nodes (e.g., Algorithm 1) ExtraLearn Algorithm Input d: the number of nodes n: time horizon Initialization Sample each arm twice Update b r k,2d , b σ k,2d , and T k,2d 2, for 8k d for t =2d +1,...,n do C k,t 2 b σ k,t q max(log(n/(dT k,t )),0) T k,t + 2 max(log(n/(dT k,t )),0) T k,t , for 8k d k t arg max k b r k,t + C k,t Sample node k t and receive |S k t ,t | Update b r k,t+1 , b σ k,t+1 , and T k,t+1 , for 8k d end for GraphMOSS - We only receive |S| instead of S - Can be mapped to multi-arm bandits - rewards are 0, …, d - variance bounded with r kt - We adapt MOSS to GraphMOSS - Regret upper bound of GraphMOSS - matching lower bound E [R n ] U min r ? n, r ? d + p r ? nd Baseline Detectable dimension - number of nodes we can efficiently extract in less than n rounds - function D controls number of nodes given a gap - D(r) = d for rr * and D(0) = number of most influenced nodes - Detectable dimension D * = D( * ) - Detectable gap * constants coming from the analysis and the Bernstein inequality - Detectable horizon T * , smallest integer s.t. - Equivalently: D * corresponding to smallest T * such that - For (easy, structured) star graphs D * = 1 even for small n (big gain ) - For (difficult) empty graphs D * = d even for large n ( no gain ) - In general: D * roughly decreases with n and it is small when D decreases quickly - For n large enough D * is the number of the most influences nodes - Example: D * for Barabási–Albert model & Enron graph as a function of n D (Δ) def = |{i d : r ? - r i Δ}| Δ ? def = 16 s r ? d log (nd) T ? + 80d log (nd) T ? · T ? r ? p D ? nr ? , n - number of rounds D * - detectable dimension Enron - Number of nodes: 36692 - Number of runs: 1 - revelation p = 0.80 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 0 0.5 1 1.5 2 2.5 3 3.5 4 × 10 4 n - number of rounds D * - detectable dimension Barabasi-Albert - Number of nodes: 1000 - Number of runs: 10 - revelation p = 0.80 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 200 300 400 500 600 700 800 900 1000 T ? r ? v u u u t D 0 @ 16 s r ? d log (nd) T ? + 80d log (nd) T ? 1 A nr ? , BAndit REvelator: 2-phase algorithm - global exploration phase - super-efficient exploration - linear regret — needs to be short! - extracts D * nodes - bandit phase - uses a minimax-optimal bandit algorithm - GraphMOSS is a little brother of MOSS - has a “square root” regret on D * nodes - D * realizes the optimal trade-off ! - different from exploration/exploitation tradeoff Our solution E [R n ] C min r ? n, D ? r ? + p r ? nD ? Upper bound on the regret of BARE Matching lower bound Guarantees d = number of nodes n = time horizon p ij i j Motivation - product placement - information campaign - local model of influence and unknown graphs - information gathered through “likes” or promotional codes Setting Unknown (p ij ) ij — (symmetric) probability of influences In each time step t = 1, …., n learner picks a node k t environment reveals the set of influenced node S kt Select influential people = Find the strategy maximising L n = n X t=1 |S k t ,t | . Definitions The number of expected influences of node k is by definition Oracle strategy always selects the best Expected regret of the oracle strategy Expected regret of any adaptive strategy unaware of (p ij ) ij r k = E [|S k,t |]= X j d p k,j . k ? = arg max k E " n X t=1 |S k,t | # = arg max k nr k . E [R n ]= E [L ? n ] - E [L n ] E [L ? n ]= nr ? .

Transcript of =1 Revealing graph bandits for maximizing local...

Page 1: =1 Revealing graph bandits for maximizing local influenceresearchers.lille.inria.fr/~valko/hp/serve.php?... · Revealing graph bandits for maximizing local influence periments ALEXANDRA

Revealing graph bandits for maximizing local influence

Experiments

ALEXANDRA CARPENTIER and MICHAL VALKO [email protected] and [email protected]

Revealing graph bandits for maximizing local influence

round t

regr

et

Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.80

0 200 400 600 800 1000 12000

5

10

15× 104

BAREGraphMOSS

bD? = 134, bT? = 36

round t

regr

et

Graph: Facebook - Number of runs: 100 - revelation p = 0.80

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

0.5

1

1.5

2

2.5

3

3.5× 106

BAREGraphMOSS

bD? = 125, bT? = 28

round t

regr

et

Graph: Enron - Number of runs: 100 - revelation p = 0.80

1 5000 10000 15000 20000 25000 30000 350000

1

2

3

4

5

6

7

8

9× 107

BAREGraphMOSS

bD? = 564, bT? = 107

round t

regr

et

Graph: Gnutella - Number of runs: 100 - revelation p = 0.80

0 2000 4000 6000 8000 10000 120000

1

2

3

4

5

6

7

8

9× 105

BAREGraphMOSS

bD? = 3916, bT? = 779

Figure 1: Left : Barabasi-Albert. Middle left : Facebook. Middle right : Enron. Right : Gnutella.

bD? = 529, bT? = 147 bD? = 230, bT? = 64 bD? = 161, bT? = 50 bD? = 134, bT? = 36 bD? = 133, bT? = 34

Figure 2: Barabasi-Albert model with varying p between 0.2 and 1

We first performed an experiment on a graph gen-erated by 10-out-degree Barabasi-Albert model withd = 1000 nodes. Figure 1 (left) compares BARE withGraphMOSS. As expected, GraphMOSS su↵ers linear re-gret up to time t = d, since there is no sharing ofinformation and for t d, GraphMOSS pulls each armonce. While the regret of GraphMOSS is no longer linearfor t > d and eventually detects the best node, BARE isable to detect promising nodes much sooner during itsglobal exploration phase and we can see the benefit ofrevealed information already around t = 300.

In Figure 2, we varied the probability of revelation pfor a Barabasi-Albert graph. When p close is to one,the more of the graph structure is revealed and theproblem becomes easier. On the other hand, with pclose to zero we do not get as much informationabout the structure and the performance of BARE andGraphMOSS are similar.

We also performed the experiments on Enron mailgraph (Klimt & Yang, 2004) with d = 36692 and thesnapshot of symmetrized version of Gnutella networkfrom August 4th, 2002 (Ripeanu et al., 2002) withd = 10879, obtained from Stanford Large NetworkDataset Collection (Leskovec & Krevl, 2014). Further-more, we evaluated BARE on a subset of Facebook net-work with d = 4039 (Viswanath et al., 2009). We usedthe same parameters as for the Barabasi-Albert case.

As expected, Figure 1 (middle left, middle right,right) shows that the performance gains of BARE overGraphMOSS depend heavily on the structure. In Enronand Facebook, the gain of BARE is significant whichsuggests that the graphs from these networks featurea relatively small number of influential nodes. On theother hand, the gain of BARE on Gnutella was muchsmaller which again suggests that this network is moredecentralized.

In all the plots we include also the empirical estimateof the detectable dimension bD? and the detectablehorizon bT?. Notice that the smaller bD?, as comparedto d, and the smaller bT? is as compared to n, the sooneris BARE able to learn the most influential node as com-pared to GraphMOSS.

6 Conclusion

We hope that out work on local revelation incites theextensions on more elaborate propagation models ongraphs (Kempe et al., 2015). One way to directly ex-tend to more general propagation models is to considerthat a more distant neighbor is a direct neighbor withcontamination probability being the sum of the pathproducts. Moreover, if we allow for more feedback,e.g., the identity of the influencing paths, our resultscould extend more e�ciently. Note that in our setting,we were completely agnostic to the graph structure.Realistic networks often exhibit some additional struc-tural properties that are captured by graph generatormodels, such as various stochastic block models (Gir-van & Newman, 2002). In future, we would like toextend our approach to cases where we can take advan-tage of the assumptions stemming from these modelsand consider the subclasses of graph structures wherewe can further improve the learning rates.

Acknowledgements We thank Alan Mislove forthe Facebook dataset. The research presented in thispaper was supported by French Ministry of HigherEducation and Research, Nord-Pas-de-Calais RegionalCouncil, French National Research Agency projectExTra-Learn (n.ANR-14-CE24-0010-01), and by Ger-man Research Foundation’s Emmy Noether grantMuSyAD (CA 1488/1-1).

Revealing graph bandits for maximizing local influence

bD? = 134, bT? = 36 bD? = 125, bT? = 28 bD? = 564, bT? = 107 bD? = 3916, bT? = 779

Figure 1: Left : Barabasi-Albert. Middle left : Facebook. Middle right : Enron. Right : Gnutella.

round t

regr

et

Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.20

0 200 400 600 800 1000 12000

0.5

1

1.5

2

2.5

3

3.5

4× 104

BAREGraphMOSS

bD? = 529, bT? = 147

round t

regr

et

Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.40

0 200 400 600 800 1000 12000

1

2

3

4

5

6× 104

BAREGraphMOSS

bD? = 230, bT? = 64

round t

regr

et

Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.60

0 200 400 600 800 1000 12000

1

2

3

4

5

6

7

8× 104

BAREGraphMOSS

bD? = 161, bT? = 50

round t

regr

et

Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 0.80

0 200 400 600 800 1000 12000

5

10

15× 104

BAREGraphMOSS

bD? = 134, bT? = 36

round t

regr

et

Graph: Barabasi-Albert - Number of runs: 100 - revelation p = 1.00

0 200 400 600 800 1000 12000

2

4

6

8

10

12

14× 104

BAREGraphMOSS

bD? = 133, bT? = 34

Figure 2: Barabasi-Albert model with varying p between 0.2 and 1

We first performed an experiment on a graph gen-erated by 10-out-degree Barabasi-Albert model withd = 1000 nodes. Figure 1 (left) compares BARE withGraphMOSS. As expected, GraphMOSS su↵ers linear re-gret up to time t = d, since there is no sharing ofinformation and for t d, GraphMOSS pulls each armonce. While the regret of GraphMOSS is no longer linearfor t > d and eventually detects the best node, BARE isable to detect promising nodes much sooner during itsglobal exploration phase and we can see the benefit ofrevealed information already around t = 300.

In Figure 2, we varied the probability of revelation pfor a Barabasi-Albert graph. When p close is to one,the more of the graph structure is revealed and theproblem becomes easier. On the other hand, with pclose to zero we do not get as much informationabout the structure and the performance of BARE andGraphMOSS are similar.

We also performed the experiments on Enron mailgraph (Klimt & Yang, 2004) with d = 36692 and thesnapshot of symmetrized version of Gnutella networkfrom August 4th, 2002 (Ripeanu et al., 2002) withd = 10879, obtained from Stanford Large NetworkDataset Collection (Leskovec & Krevl, 2014). Further-more, we evaluated BARE on a subset of Facebook net-work with d = 4039 (Viswanath et al., 2009). We usedthe same parameters as for the Barabasi-Albert case.

As expected, Figure 1 (middle left, middle right,right) shows that the performance gains of BARE overGraphMOSS depend heavily on the structure. In Enronand Facebook, the gain of BARE is significant whichsuggests that the graphs from these networks featurea relatively small number of influential nodes. On theother hand, the gain of BARE on Gnutella was muchsmaller which again suggests that this network is moredecentralized.

In all the plots we include also the empirical estimateof the detectable dimension bD? and the detectablehorizon bT?. Notice that the smaller bD?, as comparedto d, and the smaller bT? is as compared to n, the sooneris BARE able to learn the most influential node as com-pared to GraphMOSS.

6 Conclusion

We hope that out work on local revelation incites theextensions on more elaborate propagation models ongraphs (Kempe et al., 2015). One way to directly ex-tend to more general propagation models is to considerthat a more distant neighbor is a direct neighbor withcontamination probability being the sum of the pathproducts. Moreover, if we allow for more feedback,e.g., the identity of the influencing paths, our resultscould extend more e�ciently. Note that in our setting,we were completely agnostic to the graph structure.Realistic networks often exhibit some additional struc-tural properties that are captured by graph generatormodels, such as various stochastic block models (Gir-van & Newman, 2002). In future, we would like toextend our approach to cases where we can take advan-tage of the assumptions stemming from these modelsand consider the subclasses of graph structures wherewe can further improve the learning rates.

Acknowledgements We thank Alan Mislove forthe Facebook dataset. The research presented in thispaper was supported by French Ministry of HigherEducation and Research, Nord-Pas-de-Calais RegionalCouncil, French National Research Agency projectExTra-Learn (n.ANR-14-CE24-0010-01), and by Ger-man Research Foundation’s Emmy Noether grantMuSyAD (CA 1488/1-1).

Algorithm

BARE - BAndit REvelator

Alexandra Carpentier, Michal Valko

nodes, then D? converges to the number of most influ-enced nodes as n tends to infinity.

Finally let us write the influential-influenced gap as

"?def

= r? � maxk2D�

rk,

where D� def

={i : r�i = maxk r�k}. The quantity "? quan-tifies the gap between the most influential node overallvs. the most influential node in the set of most influ-enced nodes.

Remark 2. The quantity "? is small when one of themost influenced nodes is also very influential. It isexactly zero when one of the most influential nodeshappens to also be one of the most influenced nodes.For instance, the case "? = 0 appears in undirectedsocial network models with mutual influence.

The graph structure is helpful when the D functiondecreases quickly with n. To give an intuition abouthow is D linked to the graph topology, consider a star-shaped graph which is the most helpful and can haveD? = 1 even for a small n. On the other hand, a badcase is a graph with many small cliques. The worstcase is where all nodes are disconnected except two,where D? will be of order d even for a large n.

The detectable dimension D? is a problem dependentquantity that represents the complexity of the probleminstead of d. In real networks, D? is typically smallerthan the number of nodes d and we give several ex-amples of the empirical value of D? in Section 5 andAppendix ??. As our analysis will show, D? representsthe number of nodes that we can e�ciently extractfrom d nodes in less than n rounds of the time budget.Our bandit revelator algorithm, BARE (Algorithm 2),starts by the global-exploration phase and extracts asubset of cardinality less than or equal toD?, that con-tains a very influential node, that is at most "? awayfrom the most influential node. BARE does this extrac-tion without scanning all the d nodes, which could beimpossible anyway, since we do not restrict to d n.In the subsequent bandit phase, BARE proceeds withscanning this smaller set of selected nodes to find themost influential one.

We now state our main theoretical result that provesa bound on the regret of BARE.Theorem 2 (proof in Section 4). In the unrestrictedlocal influence setting with information about theneighbors, BARE satisfies, for a constant C > 0,

E [Rn] Cmin⇣

r?n,D?r? +p

r?nD? + n"?⌘

.

Remark 3. Note that BARE does not need prelimi-nary information about G, as a classic multi-arm ban-dit strategy described in Section 2.2 would require inorder to attain this rate.

Algorithm 2 BARE: Bandit revelatorInputd: the number of nodesn: time horizon

InitializationTk,t 0, for 8k ddr�k,t 0, for 8k d

t 1, bT? 0, bD?,t d, b�?,1 dGlobal exploration phase

while t⇣

b�?,t � 4p

d log(dn)/t⌘

q

bD?,tn do

Influence a node at random (choose kt uniformlyat random) and get Skt,t from this node\r�k,t+1

tt+1

dr�k,t +d

t+1

Skt,t(k)

b�?,t+1

maxk0

q

\r�k0,t+1

+ 8d log(nd)/(t+ 1)

w?,t+1

8b�?,t+1

q

d log(nd)t+1

+ 24d log(nd)t+1

bD?,t+1

n

k : maxk0 \r�k0,t+1

�\r�k,t+1

w?,t+1

o

t t+ 1end whilebT? t.Bandit phaseRun minimax-optimal bandit algorithm on thebD?,bT?

chosen nodes (e.g., Algorithm 1)

Corollary 1. For an undirected social network modelthe expected regret of BARE is

E [Rn] Cmin⇣

r?n,D?r? +p

r?nD?

,

which is the minimax-optimal regret in the case wherethere are D? instead of d nodes. This highlights thedimensionality reduction potential of our method.

Finally, we state a lower bound for our setting. Noticethat the influential-influence gap also appears here.

Theorem 3 (proof in Appendix ??). Let d � Cn > 0where C > 0 is an universal constant. Consider theset of unrestricted local influence settings with infor-mation about the neighbors, and the set of all problemsthat have maximal influence bounded by r, detectabledimension smaller than D d/2 and influential-influence gap smaller than ". Then the expected regretof the best possible algorithm in the worst case of theseproblems is lower bounded as

C 00 min⇣

rn,Dr? +p

rnD + n"⌘

,

where C 00 is a universal constant.

4 Proof of Theorem 2

For any node k d and any round t that is duringthe global exploration phase, let us define the following

ExtraLearn

Algorithm

Revealing graph bandits for maximizing local influence

� > 0 such that for n large enough, dependingon ", we have that

inf supE [Rn] � �min⇣

r?n, r?d+p

r?nd⌘

,

where inf sup means the best possible algorithmon the worst possible graph bandit problem.

• Upper bound. There exists a constant U > 0such that the regret of Algorithm 1 is bounded as

E [Rn] U min⇣

r?n, r?d+p

r?nd⌘

.

Algorithm 1 GraphMOSSInput

d: the number of nodesn: time horizon

InitializationSample each arm twiceUpdate brk,2d, b�k,2d, and Tk,2d 2, for 8k d

for t = 2d+ 1, . . . , n do

Ck,t 2b�k,t

q

max(log(n/(dTk,t)),0)Tk,t

+ 2max(log(n/(dTk,t)),0)Tk,t

, for 8k d

kt argmaxk brk,t + Ck,t

Sample node kt and receive |Skt,t|Update brk,t+1

, b�k,t+1

, and Tk,t+1

, for 8k dend for

The lower bound holds also in the specific case wherethe graph G is undirected (i.e., symmetric M), as isexplained in the proof. This is an important remarkas the undirected graphs are a canonical and “per-fect” example of graphs where influencing and beinginfluenced is correlated and where the dual influenceis equal to the influence for each node.

3 The BARE algorithm and results

In this section we treat the unrestricted setting de-scribed in Section 2.1 where we get revealed the iden-tity of the influenced nodes, while the reward stays thesame as in Section 2.2. First, note that the minimax-optimal rate in this setting is the same as in the re-stricted information case above. To see that, one can,for instance, consider a network composed of isolatednodes with only a very small clique of most influen-tial nodes, connected only to each other. Another ex-ample is a graph where the fact of being influentialis uncorrelated with the fact of being influenced andwhere, for instance, the most influential node is notinfluenced by any node. For the same reasons as theones described in Theorem 1, when n d, there is

no adaptive strategy in a minimax sense, also in thisunrestricted setting.

However, the cases where the identity of the influencednodes does not help, are somewhat pathological. In-tuitively, they correspond to cases where the graphstructure is not very informative for finding the mostinfluential node. This is the case when there are manyisolated nodes, and also in the case where observingnodes that are very influenced does not provide in-formation on these nodes’ influence. In many typicaland more interesting situations, this is not the case.First, in these problems, the nodes that have highinfluence are also very likely to be subject being in-fluenced, for instance, many interesting networks aresymmetric and then it is immediately the case. Sec-ond, in the realistic graphs, there is typically a smallportion of the nodes that are noticeably more con-nected than the others (Barabasi & Albert, 1999).

In order to rigorously define these non-degeneratecases, let us first define function D that controls thenumber of nodes with a given dual gap, i.e., a givensuboptimality with respect to the most influencednode

D(�)def

= |{i d : r�? � r�i �}| .The function D(�) is a non-decreasing quantity dualto the arm gaps. Note that D(r) = d for any r � r�?and that D(0) is the number of most influenced nodes.We now define the problem dependent quantities thatexpress the di�culty of the problem and allow us tostate our results.

Definition 1. We define the detectable horizon asthe smallest integer T? > 0 such that

T?r�? �

p

D?nr�?,

when such T? exists and T? = n otherwise. Here, D?

is the detectable dimension defined as

D?def

=D(�?),

where the detectable gap �? is defined as

�?def

= 16

s

r�?d log (nd)

T?+

80d log (nd)

T?·

Remark 1. From the definitions above, the detectabledimension is the D? that corresponds to the smallestinteger T? > 0 such that

T?r�? �

v

u

u

u

tD

0

@16

s

r�?d log (nd)

T?+

80d log (nd)

T?

1

Anr�?,

or D? = d if such T? does not exist. It is thereforea well defined quantity. Moreover, since D is nonde-creasing and D(0) is the number of most influenced

GraphMOSS

- We only receive |S| instead of S

- Can be mapped to multi-arm bandits

- rewards are 0, …, d

- variance bounded with rkt

- We adapt MOSS to GraphMOSS

- Regret upper bound of GraphMOSS

- matching lower bound

Revealing graph bandits for maximizing local influence

� > 0 such that for n large enough, dependingon ", we have that

inf supE [Rn] � �min⇣

r?n, r?d+p

r?nd⌘

,

where inf sup means the best possible algorithmon the worst possible graph bandit problem.

• Upper bound. There exists a constant U > 0such that the regret of Algorithm 1 is bounded as

E [Rn] U min⇣

r?n, r?d+p

r?nd⌘

.

Algorithm 1 GraphMOSSInput

d: the number of nodesn: time horizon

InitializationSample each arm twiceUpdate brk,2d, b�k,2d, and Tk,2d 2, for 8k d

for t = 2d+ 1, . . . , n do

Ck,t 2b�k,t

q

max(log(n/(dTk,t)),0)Tk,t

+ 2max(log(n/(dTk,t)),0)Tk,t

, for 8k d

kt argmaxk brk,t + Ck,t

Sample node kt and receive |Skt,t|Update brk,t+1

, b�k,t+1

, and Tk,t+1

, for 8k dend for

The lower bound holds also in the specific case wherethe graph G is undirected (i.e., symmetric M), as isexplained in the proof. This is an important remarkas the undirected graphs are a canonical and “per-fect” example of graphs where influencing and beinginfluenced is correlated and where the dual influenceis equal to the influence for each node.

3 The BARE algorithm and results

In this section we treat the unrestricted setting de-scribed in Section 2.1 where we get revealed the iden-tity of the influenced nodes, while the reward stays thesame as in Section 2.2. First, note that the minimax-optimal rate in this setting is the same as in the re-stricted information case above. To see that, one can,for instance, consider a network composed of isolatednodes with only a very small clique of most influen-tial nodes, connected only to each other. Another ex-ample is a graph where the fact of being influentialis uncorrelated with the fact of being influenced andwhere, for instance, the most influential node is notinfluenced by any node. For the same reasons as theones described in Theorem 1, when n d, there is

no adaptive strategy in a minimax sense, also in thisunrestricted setting.

However, the cases where the identity of the influencednodes does not help, are somewhat pathological. In-tuitively, they correspond to cases where the graphstructure is not very informative for finding the mostinfluential node. This is the case when there are manyisolated nodes, and also in the case where observingnodes that are very influenced does not provide in-formation on these nodes’ influence. In many typicaland more interesting situations, this is not the case.First, in these problems, the nodes that have highinfluence are also very likely to be subject being in-fluenced, for instance, many interesting networks aresymmetric and then it is immediately the case. Sec-ond, in the realistic graphs, there is typically a smallportion of the nodes that are noticeably more con-nected than the others (Barabasi & Albert, 1999).

In order to rigorously define these non-degeneratecases, let us first define function D that controls thenumber of nodes with a given dual gap, i.e., a givensuboptimality with respect to the most influencednode

D(�)def

= |{i d : r�? � r�i �}| .The function D(�) is a non-decreasing quantity dualto the arm gaps. Note that D(r) = d for any r � r�?and that D(0) is the number of most influenced nodes.We now define the problem dependent quantities thatexpress the di�culty of the problem and allow us tostate our results.

Definition 1. We define the detectable horizon asthe smallest integer T? > 0 such that

T?r�? �

p

D?nr�?,

when such T? exists and T? = n otherwise. Here, D?

is the detectable dimension defined as

D?def

=D(�?),

where the detectable gap �? is defined as

�?def

= 16

s

r�?d log (nd)

T?+

80d log (nd)

T?·

Remark 1. From the definitions above, the detectabledimension is the D? that corresponds to the smallestinteger T? > 0 such that

T?r�? �

v

u

u

u

tD

0

@16

s

r�?d log (nd)

T?+

80d log (nd)

T?

1

Anr�?,

or D? = d if such T? does not exist. It is thereforea well defined quantity. Moreover, since D is nonde-creasing and D(0) is the number of most influenced

Baseline

Detectable dimension

- number of nodes we can efficiently extract in less than n rounds

- function D controls number of nodes given a gap

- D(r) = d for r≥ r* and D(0) = number of most influenced nodes

- Detectable dimension D* = D(Δ∆*)

- Detectable gap Δ∆* constants coming from the analysis and the Bernstein inequality

- Detectable horizon T*, smallest integer s.t.

- Equivalently: D* corresponding to smallest T* such that

- For (easy, structured) star graphs D* = 1 even for small n (big gain)

- For (difficult) empty graphs D*= d even for large n (no gain)

- In general: D* roughly decreases with n and it is small when D decreases quickly

- For n large enough D* is the number of the most influences nodes

- Example: D* for Barabási–Albert model & Enron graph as a function of n

Revealing graph bandits for maximizing local influence

� > 0 such that for n large enough, dependingon ", we have that

inf supE [Rn] � �min⇣

r?n, r?d+p

r?nd⌘

,

where inf sup means the best possible algorithmon the worst possible graph bandit problem.

• Upper bound. There exists a constant U > 0such that the regret of Algorithm 1 is bounded as

E [Rn] U min⇣

r?n, r?d+p

r?nd⌘

.

Algorithm 1 GraphMOSSInput

d: the number of nodesn: time horizon

InitializationSample each arm twiceUpdate brk,2d, b�k,2d, and Tk,2d 2, for 8k d

for t = 2d+ 1, . . . , n do

Ck,t 2b�k,t

q

max(log(n/(dTk,t)),0)Tk,t

+ 2max(log(n/(dTk,t)),0)Tk,t

, for 8k d

kt argmaxk brk,t + Ck,t

Sample node kt and receive |Skt,t|Update brk,t+1

, b�k,t+1

, and Tk,t+1

, for 8k dend for

The lower bound holds also in the specific case wherethe graph G is undirected (i.e., symmetric M), as isexplained in the proof. This is an important remarkas the undirected graphs are a canonical and “per-fect” example of graphs where influencing and beinginfluenced is correlated and where the dual influenceis equal to the influence for each node.

3 The BARE algorithm and results

In this section we treat the unrestricted setting de-scribed in Section 2.1 where we get revealed the iden-tity of the influenced nodes, while the reward stays thesame as in Section 2.2. First, note that the minimax-optimal rate in this setting is the same as in the re-stricted information case above. To see that, one can,for instance, consider a network composed of isolatednodes with only a very small clique of most influen-tial nodes, connected only to each other. Another ex-ample is a graph where the fact of being influentialis uncorrelated with the fact of being influenced andwhere, for instance, the most influential node is notinfluenced by any node. For the same reasons as theones described in Theorem 1, when n d, there is

no adaptive strategy in a minimax sense, also in thisunrestricted setting.

However, the cases where the identity of the influencednodes does not help, are somewhat pathological. In-tuitively, they correspond to cases where the graphstructure is not very informative for finding the mostinfluential node. This is the case when there are manyisolated nodes, and also in the case where observingnodes that are very influenced does not provide in-formation on these nodes’ influence. In many typicaland more interesting situations, this is not the case.First, in these problems, the nodes that have highinfluence are also very likely to be subject being in-fluenced, for instance, many interesting networks aresymmetric and then it is immediately the case. Sec-ond, in the realistic graphs, there is typically a smallportion of the nodes that are noticeably more con-nected than the others (Barabasi & Albert, 1999).

In order to rigorously define these non-degeneratecases, let us first define function D that controls thenumber of nodes with a given dual gap, i.e., a givensuboptimality with respect to the most influencednode

D(�)def

= |{i d : r�? � r�i �}| .The function D(�) is a non-decreasing quantity dualto the arm gaps. Note that D(r) = d for any r � r�?and that D(0) is the number of most influenced nodes.We now define the problem dependent quantities thatexpress the di�culty of the problem and allow us tostate our results.

Definition 1. We define the detectable horizon asthe smallest integer T? > 0 such that

T?r�? �

p

D?nr�?,

when such T? exists and T? = n otherwise. Here, D?

is the detectable dimension defined as

D?def

=D(�?),

where the detectable gap �? is defined as

�?def

= 16

s

r�?d log (nd)

T?+

80d log (nd)

T?·

Remark 1. From the definitions above, the detectabledimension is the D? that corresponds to the smallestinteger T? > 0 such that

T?r�? �

v

u

u

u

tD

0

@16

s

r�?d log (nd)

T?+

80d log (nd)

T?

1

Anr�?,

or D? = d if such T? does not exist. It is thereforea well defined quantity. Moreover, since D is nonde-creasing and D(0) is the number of most influenced

Revealing graph bandits for maximizing local influence

� > 0 such that for n large enough, dependingon ", we have that

inf supE [Rn] � �min⇣

r?n, r?d+p

r?nd⌘

,

where inf sup means the best possible algorithmon the worst possible graph bandit problem.

• Upper bound. There exists a constant U > 0such that the regret of Algorithm 1 is bounded as

E [Rn] U min⇣

r?n, r?d+p

r?nd⌘

.

Algorithm 1 GraphMOSSInput

d: the number of nodesn: time horizon

InitializationSample each arm twiceUpdate brk,2d, b�k,2d, and Tk,2d 2, for 8k d

for t = 2d+ 1, . . . , n do

Ck,t 2b�k,t

q

max(log(n/(dTk,t)),0)Tk,t

+ 2max(log(n/(dTk,t)),0)Tk,t

, for 8k d

kt argmaxk brk,t + Ck,t

Sample node kt and receive |Skt,t|Update brk,t+1

, b�k,t+1

, and Tk,t+1

, for 8k dend for

The lower bound holds also in the specific case wherethe graph G is undirected (i.e., symmetric M), as isexplained in the proof. This is an important remarkas the undirected graphs are a canonical and “per-fect” example of graphs where influencing and beinginfluenced is correlated and where the dual influenceis equal to the influence for each node.

3 The BARE algorithm and results

In this section we treat the unrestricted setting de-scribed in Section 2.1 where we get revealed the iden-tity of the influenced nodes, while the reward stays thesame as in Section 2.2. First, note that the minimax-optimal rate in this setting is the same as in the re-stricted information case above. To see that, one can,for instance, consider a network composed of isolatednodes with only a very small clique of most influen-tial nodes, connected only to each other. Another ex-ample is a graph where the fact of being influentialis uncorrelated with the fact of being influenced andwhere, for instance, the most influential node is notinfluenced by any node. For the same reasons as theones described in Theorem 1, when n d, there is

no adaptive strategy in a minimax sense, also in thisunrestricted setting.

However, the cases where the identity of the influencednodes does not help, are somewhat pathological. In-tuitively, they correspond to cases where the graphstructure is not very informative for finding the mostinfluential node. This is the case when there are manyisolated nodes, and also in the case where observingnodes that are very influenced does not provide in-formation on these nodes’ influence. In many typicaland more interesting situations, this is not the case.First, in these problems, the nodes that have highinfluence are also very likely to be subject being in-fluenced, for instance, many interesting networks aresymmetric and then it is immediately the case. Sec-ond, in the realistic graphs, there is typically a smallportion of the nodes that are noticeably more con-nected than the others (Barabasi & Albert, 1999).

In order to rigorously define these non-degeneratecases, let us first define function D that controls thenumber of nodes with a given dual gap, i.e., a givensuboptimality with respect to the most influencednode

D(�)def

= |{i d : r�? � r�i �}| .The function D(�) is a non-decreasing quantity dualto the arm gaps. Note that D(r) = d for any r � r�?and that D(0) is the number of most influenced nodes.We now define the problem dependent quantities thatexpress the di�culty of the problem and allow us tostate our results.

Definition 1. We define the detectable horizon asthe smallest integer T? > 0 such that

T?r�? �

p

D?nr�?,

when such T? exists and T? = n otherwise. Here, D?

is the detectable dimension defined as

D?def

=D(�?),

where the detectable gap �? is defined as

�?def

= 16

s

r�?d log (nd)

T?+

80d log (nd)

T?·

Remark 1. From the definitions above, the detectabledimension is the D? that corresponds to the smallestinteger T? > 0 such that

T?r�? �

v

u

u

u

tD

0

@16

s

r�?d log (nd)

T?+

80d log (nd)

T?

1

Anr�?,

or D? = d if such T? does not exist. It is thereforea well defined quantity. Moreover, since D is nonde-creasing and D(0) is the number of most influenced

Revealing graph bandits for maximizing local influence

� > 0 such that for n large enough, dependingon ", we have that

inf supE [Rn] � �min⇣

r?n, r?d+p

r?nd⌘

,

where inf sup means the best possible algorithmon the worst possible graph bandit problem.

• Upper bound. There exists a constant U > 0such that the regret of Algorithm 1 is bounded as

E [Rn] U min⇣

r?n, r?d+p

r?nd⌘

.

Algorithm 1 GraphMOSSInput

d: the number of nodesn: time horizon

InitializationSample each arm twiceUpdate brk,2d, b�k,2d, and Tk,2d 2, for 8k d

for t = 2d+ 1, . . . , n do

Ck,t 2b�k,t

q

max(log(n/(dTk,t)),0)Tk,t

+ 2max(log(n/(dTk,t)),0)Tk,t

, for 8k d

kt argmaxk brk,t + Ck,t

Sample node kt and receive |Skt,t|Update brk,t+1

, b�k,t+1

, and Tk,t+1

, for 8k dend for

The lower bound holds also in the specific case wherethe graph G is undirected (i.e., symmetric M), as isexplained in the proof. This is an important remarkas the undirected graphs are a canonical and “per-fect” example of graphs where influencing and beinginfluenced is correlated and where the dual influenceis equal to the influence for each node.

3 The BARE algorithm and results

In this section we treat the unrestricted setting de-scribed in Section 2.1 where we get revealed the iden-tity of the influenced nodes, while the reward stays thesame as in Section 2.2. First, note that the minimax-optimal rate in this setting is the same as in the re-stricted information case above. To see that, one can,for instance, consider a network composed of isolatednodes with only a very small clique of most influen-tial nodes, connected only to each other. Another ex-ample is a graph where the fact of being influentialis uncorrelated with the fact of being influenced andwhere, for instance, the most influential node is notinfluenced by any node. For the same reasons as theones described in Theorem 1, when n d, there is

no adaptive strategy in a minimax sense, also in thisunrestricted setting.

However, the cases where the identity of the influencednodes does not help, are somewhat pathological. In-tuitively, they correspond to cases where the graphstructure is not very informative for finding the mostinfluential node. This is the case when there are manyisolated nodes, and also in the case where observingnodes that are very influenced does not provide in-formation on these nodes’ influence. In many typicaland more interesting situations, this is not the case.First, in these problems, the nodes that have highinfluence are also very likely to be subject being in-fluenced, for instance, many interesting networks aresymmetric and then it is immediately the case. Sec-ond, in the realistic graphs, there is typically a smallportion of the nodes that are noticeably more con-nected than the others (Barabasi & Albert, 1999).

In order to rigorously define these non-degeneratecases, let us first define function D that controls thenumber of nodes with a given dual gap, i.e., a givensuboptimality with respect to the most influencednode

D(�)def

= |{i d : r�? � r�i �}| .The function D(�) is a non-decreasing quantity dualto the arm gaps. Note that D(r) = d for any r � r�?and that D(0) is the number of most influenced nodes.We now define the problem dependent quantities thatexpress the di�culty of the problem and allow us tostate our results.

Definition 1. We define the detectable horizon asthe smallest integer T? > 0 such that

T?r�? �

p

D?nr�?,

when such T? exists and T? = n otherwise. Here, D?

is the detectable dimension defined as

D?def

=D(�?),

where the detectable gap �? is defined as

�?def

= 16

s

r�?d log (nd)

T?+

80d log (nd)

T?·

Remark 1. From the definitions above, the detectabledimension is the D? that corresponds to the smallestinteger T? > 0 such that

T?r�? �

v

u

u

u

tD

0

@16

s

r�?d log (nd)

T?+

80d log (nd)

T?

1

Anr�?,

or D? = d if such T? does not exist. It is thereforea well defined quantity. Moreover, since D is nonde-creasing and D(0) is the number of most influenced

n - number of rounds

D*-

dete

ctab

le d

imen

sion

Enron - Number of nodes: 36692 - Number of runs: 1 - revelation p = 0.80

5000 10000 15000 20000 25000 30000 35000 40000 45000 500000

0.5

1

1.5

2

2.5

3

3.5

4× 104

n - number of rounds

D*-

dete

ctab

le d

imen

sion

Barabasi-Albert - Number of nodes: 1000 - Number of runs: 10 - revelation p = 0.80

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000200

300

400

500

600

700

800

900

1000

Revealing graph bandits for maximizing local influence

� > 0 such that for n large enough, dependingon ", we have that

inf supE [Rn] � �min⇣

r?n, r?d+p

r?nd⌘

,

where inf sup means the best possible algorithmon the worst possible graph bandit problem.

• Upper bound. There exists a constant U > 0such that the regret of Algorithm 1 is bounded as

E [Rn] U min⇣

r?n, r?d+p

r?nd⌘

.

Algorithm 1 GraphMOSSInput

d: the number of nodesn: time horizon

InitializationSample each arm twiceUpdate brk,2d, b�k,2d, and Tk,2d 2, for 8k d

for t = 2d+ 1, . . . , n do

Ck,t 2b�k,t

q

max(log(n/(dTk,t)),0)Tk,t

+ 2max(log(n/(dTk,t)),0)Tk,t

, for 8k d

kt argmaxk brk,t + Ck,t

Sample node kt and receive |Skt,t|Update brk,t+1

, b�k,t+1

, and Tk,t+1

, for 8k dend for

The lower bound holds also in the specific case wherethe graph G is undirected (i.e., symmetric M), as isexplained in the proof. This is an important remarkas the undirected graphs are a canonical and “per-fect” example of graphs where influencing and beinginfluenced is correlated and where the dual influenceis equal to the influence for each node.

3 The BARE algorithm and results

In this section we treat the unrestricted setting de-scribed in Section 2.1 where we get revealed the iden-tity of the influenced nodes, while the reward stays thesame as in Section 2.2. First, note that the minimax-optimal rate in this setting is the same as in the re-stricted information case above. To see that, one can,for instance, consider a network composed of isolatednodes with only a very small clique of most influen-tial nodes, connected only to each other. Another ex-ample is a graph where the fact of being influentialis uncorrelated with the fact of being influenced andwhere, for instance, the most influential node is notinfluenced by any node. For the same reasons as theones described in Theorem 1, when n d, there is

no adaptive strategy in a minimax sense, also in thisunrestricted setting.

However, the cases where the identity of the influencednodes does not help, are somewhat pathological. In-tuitively, they correspond to cases where the graphstructure is not very informative for finding the mostinfluential node. This is the case when there are manyisolated nodes, and also in the case where observingnodes that are very influenced does not provide in-formation on these nodes’ influence. In many typicaland more interesting situations, this is not the case.First, in these problems, the nodes that have highinfluence are also very likely to be subject being in-fluenced, for instance, many interesting networks aresymmetric and then it is immediately the case. Sec-ond, in the realistic graphs, there is typically a smallportion of the nodes that are noticeably more con-nected than the others (Barabasi & Albert, 1999).

In order to rigorously define these non-degeneratecases, let us first define function D that controls thenumber of nodes with a given dual gap, i.e., a givensuboptimality with respect to the most influencednode

D(�)def

= |{i d : r�? � r�i �}| .The function D(�) is a non-decreasing quantity dualto the arm gaps. Note that D(r) = d for any r � r�?and that D(0) is the number of most influenced nodes.We now define the problem dependent quantities thatexpress the di�culty of the problem and allow us tostate our results.

Definition 1. We define the detectable horizon asthe smallest integer T? > 0 such that

T?r�? �

p

D?nr�?,

when such T? exists and T? = n otherwise. Here, D?

is the detectable dimension defined as

D?def

=D(�?),

where the detectable gap �? is defined as

�?def

= 16

s

r�?d log (nd)

T?+

80d log (nd)

T?·

Remark 1. From the definitions above, the detectabledimension is the D? that corresponds to the smallestinteger T? > 0 such that

T?r�? �

v

u

u

u

tD

0

@16

s

r�?d log (nd)

T?+

80d log (nd)

T?

1

Anr�?,

or D? = d if such T? does not exist. It is thereforea well defined quantity. Moreover, since D is nonde-creasing and D(0) is the number of most influenced

BAndit REvelator: 2-phase algorithm

- global exploration phase

- super-efficient exploration 😸

- linear regret 😿 — needs to be short!

- extracts D* nodes

- bandit phase

- uses a minimax-optimal bandit algorithm

- GraphMOSS is a little brother of MOSS

- has a “square root” regret on D* nodes

- D* realizes the optimal trade-off !

- different from exploration/exploitation tradeoff

Our solution

E [Rn] Cmin⇣r?n,D?r? +

pr?nD?

⌘Upper bound on the regret of BARE

Matching lower bound

Guarantees

d = number of nodes

n = time horizonpij

ij

Motivation

- product placement

- information campaign

- local model of influence and unknown graphs

- information gathered through “likes” or promotional codes

Setting

Unknown (pij)ij — (symmetric) probability of influences

In each time step t = 1, …., n

learner picks a node kt

environment reveals the set of influenced node Skt

Select influential people = Find the strategy maximising

Alexandra Carpentier, Michal Valko

2 Local influence bandit settings

2.1 Description of the problem

Let G be a graph with d nodes. When a node i is se-lected, it can influence the nodes of G, including itself.Node i influences each node j with fixed but unknownprobability pi,j . Let M = (pi,j)i,j be the d⇥ d matrixthat represents G.

We consider the following online, active setting. Ateach round (time) t, the learner chooses a node kt andobserves which nodes are influenced by kt, i.e., theset Skt,t of influenced nodes is revealed. Let us alsowrite Skt,t(r) for the rth coordinate of Skt,t, i.e., itis 1 if kt influences r at time t and 0 otherwise. Givena budget of n rounds, the objective is to maximize thenumber of influences that the selected node exerts.Formally, our goal is to find the strategy maximizingthe performance

Ln =nX

t=1

|Skt,t| .

The influence of node k, i.e., the expected number ofnodes that node k exerts influence on, is by definition

rk = E [|Sk,t|] =X

jd

pk,j .

We also define the dual influence of node k as

r�k =X

jd

pj,k.

This quantity is the expected number of nodes thatexert influence on node k. For an undirected graph G,M is symmetric and r�k = rk. However, in general, thisis not the case, but we assume that the influence is upto a certain degree mutual. In other words, we assumethat if a node is very influential, it also is subject to theinfluence of many other nodes. We make this precisein Section 3.

As the performance measure, we compare any adaptivestrategy for this setting with the optimal oracle thatknows M. The oracle strategy always chooses one ofthe most influential nodes, which are the nodes whoseexpected number of influences rk is maximal. We callone of these node k?, such that

k? = argmaxk

E"

nX

t=1

|Sk,t|#

= argmaxk

nrk.

Let the reward of this node be

r? = rk? .

Then, its expected performance, if it consistently sam-pled k? over n rounds, is equal to

E [L?n] = nr?.

The expected regret of any adaptive strategy that isunaware of M, with respect to the oracle strategy, isdefined as the expected di↵erence of the two,

E [Rn] = E [L?n]� E [Ln] .

Dually, we define r�? as the average number of influ-ences received by the most influenced node,

r�? = maxk

r�k.

2.2 Baseline comparison: Observing only|Skt |, the number of influenced nodes

For a meaningful baseline comparison that shows thebenefit of the graph structure, we first consider a re-stricted version of the setting from Section 2.1. Therestriction is that the learner, at round t, does notobserve the set of influenced nodes Skt,t, but only thenumber number of elements in Skt,t, denoted by |Skt,t|.In other words, once we select a node, we receive asa feedback only the number of influenced nodes, butnot their identity. In this setting, we do not observeenough information about the graph structure to ex-ploit it, since we do not observe the links between thenodes. As a result, this setting can be mapped toa classic multi-arm bandit setting without underlyinggraph structure, where the reward that the learner ob-serves for node kt is equal to |Skt,t|.

If n � d, it is possible to directly apply classic multi-arm bandit reasoning. Since we never receive any in-formation about the graph structure, we cannot ex-ploit it and we can only consider the quantity |Skt,t|as the standard bandit reward, which is a noisy ver-sion of rkt . Such problem is a standard bandit problemwith rewards |Skt,t|, that are integers between 0 and dand have a variance bounded by rkt .

Directly building on upper and lower bounds argu-ments for the classic bandit strategies (Lai & Rob-bins, 1985; Audibert & Bubeck, 2009), we give thefollowing result. This result’s upper bound holds fora specific bandit algorithm that we call GraphMOSS, aslight adaptation of the MOSS algorithm by Audibert& Bubeck (2009) to our specific setting.

Theorem 1 (proof in Appendix A). In the graph ban-dit problem from Section 2.2, with the reward equal tothe number of influenced nodes |Skt,t| instead of Skt,t,the regret is bounded as follows.

• Lower bound. If for some fixed " > 0, we have"d < r? < (1 � ")d, then there exists a constant

Definitions

The number of expected influences of node k is by definition

Oracle strategy always selects the best

Expected regret of the oracle strategy

Expected regret of any adaptive strategy unaware of (pij)ij

Alexandra Carpentier, Michal Valko

2 Local influence bandit settings

2.1 Description of the problem

Let G be a graph with d nodes. When a node i is se-lected, it can influence the nodes of G, including itself.Node i influences each node j with fixed but unknownprobability pi,j . Let M = (pi,j)i,j be the d⇥ d matrixthat represents G.

We consider the following online, active setting. Ateach round (time) t, the learner chooses a node kt andobserves which nodes are influenced by kt, i.e., theset Skt,t of influenced nodes is revealed. Let us alsowrite Skt,t(r) for the rth coordinate of Skt,t, i.e., itis 1 if kt influences r at time t and 0 otherwise. Givena budget of n rounds, the objective is to maximize thenumber of influences that the selected node exerts.Formally, our goal is to find the strategy maximizingthe performance

Ln =nX

t=1

|Skt,t| .

The influence of node k, i.e., the expected number ofnodes that node k exerts influence on, is by definition

rk = E [|Sk,t|] =X

jd

pk,j .

We also define the dual influence of node k as

r�k =X

jd

pj,k.

This quantity is the expected number of nodes thatexert influence on node k. For an undirected graph G,M is symmetric and r�k = rk. However, in general, thisis not the case, but we assume that the influence is upto a certain degree mutual. In other words, we assumethat if a node is very influential, it also is subject to theinfluence of many other nodes. We make this precisein Section 3.

As the performance measure, we compare any adaptivestrategy for this setting with the optimal oracle thatknows M. The oracle strategy always chooses one ofthe most influential nodes, which are the nodes whoseexpected number of influences rk is maximal. We callone of these node k?, such that

k? = argmaxk

E"

nX

t=1

|Sk,t|#

= argmaxk

nrk.

Let the reward of this node be

r? = rk? .

Then, its expected performance, if it consistently sam-pled k? over n rounds, is equal to

E [L?n] = nr?.

The expected regret of any adaptive strategy that isunaware of M, with respect to the oracle strategy, isdefined as the expected di↵erence of the two,

E [Rn] = E [L?n]� E [Ln] .

Dually, we define r�? as the average number of influ-ences received by the most influenced node,

r�? = maxk

r�k.

2.2 Baseline comparison: Observing only|Skt |, the number of influenced nodes

For a meaningful baseline comparison that shows thebenefit of the graph structure, we first consider a re-stricted version of the setting from Section 2.1. Therestriction is that the learner, at round t, does notobserve the set of influenced nodes Skt,t, but only thenumber number of elements in Skt,t, denoted by |Skt,t|.In other words, once we select a node, we receive asa feedback only the number of influenced nodes, butnot their identity. In this setting, we do not observeenough information about the graph structure to ex-ploit it, since we do not observe the links between thenodes. As a result, this setting can be mapped toa classic multi-arm bandit setting without underlyinggraph structure, where the reward that the learner ob-serves for node kt is equal to |Skt,t|.

If n � d, it is possible to directly apply classic multi-arm bandit reasoning. Since we never receive any in-formation about the graph structure, we cannot ex-ploit it and we can only consider the quantity |Skt,t|as the standard bandit reward, which is a noisy ver-sion of rkt . Such problem is a standard bandit problemwith rewards |Skt,t|, that are integers between 0 and dand have a variance bounded by rkt .

Directly building on upper and lower bounds argu-ments for the classic bandit strategies (Lai & Rob-bins, 1985; Audibert & Bubeck, 2009), we give thefollowing result. This result’s upper bound holds fora specific bandit algorithm that we call GraphMOSS, aslight adaptation of the MOSS algorithm by Audibert& Bubeck (2009) to our specific setting.

Theorem 1 (proof in Appendix A). In the graph ban-dit problem from Section 2.2, with the reward equal tothe number of influenced nodes |Skt,t| instead of Skt,t,the regret is bounded as follows.

• Lower bound. If for some fixed " > 0, we have"d < r? < (1 � ")d, then there exists a constant

Alexandra Carpentier, Michal Valko

2 Local influence bandit settings

2.1 Description of the problem

Let G be a graph with d nodes. When a node i is se-lected, it can influence the nodes of G, including itself.Node i influences each node j with fixed but unknownprobability pi,j . Let M = (pi,j)i,j be the d⇥ d matrixthat represents G.

We consider the following online, active setting. Ateach round (time) t, the learner chooses a node kt andobserves which nodes are influenced by kt, i.e., theset Skt,t of influenced nodes is revealed. Let us alsowrite Skt,t(r) for the rth coordinate of Skt,t, i.e., itis 1 if kt influences r at time t and 0 otherwise. Givena budget of n rounds, the objective is to maximize thenumber of influences that the selected node exerts.Formally, our goal is to find the strategy maximizingthe performance

Ln =nX

t=1

|Skt,t| .

The influence of node k, i.e., the expected number ofnodes that node k exerts influence on, is by definition

rk = E [|Sk,t|] =X

jd

pk,j .

We also define the dual influence of node k as

r�k =X

jd

pj,k.

This quantity is the expected number of nodes thatexert influence on node k. For an undirected graph G,M is symmetric and r�k = rk. However, in general, thisis not the case, but we assume that the influence is upto a certain degree mutual. In other words, we assumethat if a node is very influential, it also is subject to theinfluence of many other nodes. We make this precisein Section 3.

As the performance measure, we compare any adaptivestrategy for this setting with the optimal oracle thatknows M. The oracle strategy always chooses one ofthe most influential nodes, which are the nodes whoseexpected number of influences rk is maximal. We callone of these node k?, such that

k? = argmaxk

E"

nX

t=1

|Sk,t|#

= argmaxk

nrk.

Let the reward of this node be

r? = rk? .

Then, its expected performance, if it consistently sam-pled k? over n rounds, is equal to

E [L?n] = nr?.

The expected regret of any adaptive strategy that isunaware of M, with respect to the oracle strategy, isdefined as the expected di↵erence of the two,

E [Rn] = E [L?n]� E [Ln] .

Dually, we define r�? as the average number of influ-ences received by the most influenced node,

r�? = maxk

r�k.

2.2 Baseline comparison: Observing only|Skt |, the number of influenced nodes

For a meaningful baseline comparison that shows thebenefit of the graph structure, we first consider a re-stricted version of the setting from Section 2.1. Therestriction is that the learner, at round t, does notobserve the set of influenced nodes Skt,t, but only thenumber number of elements in Skt,t, denoted by |Skt,t|.In other words, once we select a node, we receive asa feedback only the number of influenced nodes, butnot their identity. In this setting, we do not observeenough information about the graph structure to ex-ploit it, since we do not observe the links between thenodes. As a result, this setting can be mapped toa classic multi-arm bandit setting without underlyinggraph structure, where the reward that the learner ob-serves for node kt is equal to |Skt,t|.

If n � d, it is possible to directly apply classic multi-arm bandit reasoning. Since we never receive any in-formation about the graph structure, we cannot ex-ploit it and we can only consider the quantity |Skt,t|as the standard bandit reward, which is a noisy ver-sion of rkt . Such problem is a standard bandit problemwith rewards |Skt,t|, that are integers between 0 and dand have a variance bounded by rkt .

Directly building on upper and lower bounds argu-ments for the classic bandit strategies (Lai & Rob-bins, 1985; Audibert & Bubeck, 2009), we give thefollowing result. This result’s upper bound holds fora specific bandit algorithm that we call GraphMOSS, aslight adaptation of the MOSS algorithm by Audibert& Bubeck (2009) to our specific setting.

Theorem 1 (proof in Appendix A). In the graph ban-dit problem from Section 2.2, with the reward equal tothe number of influenced nodes |Skt,t| instead of Skt,t,the regret is bounded as follows.

• Lower bound. If for some fixed " > 0, we have"d < r? < (1 � ")d, then there exists a constant

Alexandra Carpentier, Michal Valko

2 Local influence bandit settings

2.1 Description of the problem

Let G be a graph with d nodes. When a node i is se-lected, it can influence the nodes of G, including itself.Node i influences each node j with fixed but unknownprobability pi,j . Let M = (pi,j)i,j be the d⇥ d matrixthat represents G.

We consider the following online, active setting. Ateach round (time) t, the learner chooses a node kt andobserves which nodes are influenced by kt, i.e., theset Skt,t of influenced nodes is revealed. Let us alsowrite Skt,t(r) for the rth coordinate of Skt,t, i.e., itis 1 if kt influences r at time t and 0 otherwise. Givena budget of n rounds, the objective is to maximize thenumber of influences that the selected node exerts.Formally, our goal is to find the strategy maximizingthe performance

Ln =nX

t=1

|Skt,t| .

The influence of node k, i.e., the expected number ofnodes that node k exerts influence on, is by definition

rk = E [|Sk,t|] =X

jd

pk,j .

We also define the dual influence of node k as

r�k =X

jd

pj,k.

This quantity is the expected number of nodes thatexert influence on node k. For an undirected graph G,M is symmetric and r�k = rk. However, in general, thisis not the case, but we assume that the influence is upto a certain degree mutual. In other words, we assumethat if a node is very influential, it also is subject to theinfluence of many other nodes. We make this precisein Section 3.

As the performance measure, we compare any adaptivestrategy for this setting with the optimal oracle thatknows M. The oracle strategy always chooses one ofthe most influential nodes, which are the nodes whoseexpected number of influences rk is maximal. We callone of these node k?, such that

k? = argmaxk

E"

nX

t=1

|Sk,t|#

= argmaxk

nrk.

Let the reward of this node be

r? = rk? .

Then, its expected performance, if it consistently sam-pled k? over n rounds, is equal to

E [L?n] = nr?.

The expected regret of any adaptive strategy that isunaware of M, with respect to the oracle strategy, isdefined as the expected di↵erence of the two,

E [Rn] = E [L?n]� E [Ln] .

Dually, we define r�? as the average number of influ-ences received by the most influenced node,

r�? = maxk

r�k.

2.2 Baseline comparison: Observing only|Skt |, the number of influenced nodes

For a meaningful baseline comparison that shows thebenefit of the graph structure, we first consider a re-stricted version of the setting from Section 2.1. Therestriction is that the learner, at round t, does notobserve the set of influenced nodes Skt,t, but only thenumber number of elements in Skt,t, denoted by |Skt,t|.In other words, once we select a node, we receive asa feedback only the number of influenced nodes, butnot their identity. In this setting, we do not observeenough information about the graph structure to ex-ploit it, since we do not observe the links between thenodes. As a result, this setting can be mapped toa classic multi-arm bandit setting without underlyinggraph structure, where the reward that the learner ob-serves for node kt is equal to |Skt,t|.

If n � d, it is possible to directly apply classic multi-arm bandit reasoning. Since we never receive any in-formation about the graph structure, we cannot ex-ploit it and we can only consider the quantity |Skt,t|as the standard bandit reward, which is a noisy ver-sion of rkt . Such problem is a standard bandit problemwith rewards |Skt,t|, that are integers between 0 and dand have a variance bounded by rkt .

Directly building on upper and lower bounds argu-ments for the classic bandit strategies (Lai & Rob-bins, 1985; Audibert & Bubeck, 2009), we give thefollowing result. This result’s upper bound holds fora specific bandit algorithm that we call GraphMOSS, aslight adaptation of the MOSS algorithm by Audibert& Bubeck (2009) to our specific setting.

Theorem 1 (proof in Appendix A). In the graph ban-dit problem from Section 2.2, with the reward equal tothe number of influenced nodes |Skt,t| instead of Skt,t,the regret is bounded as follows.

• Lower bound. If for some fixed " > 0, we have"d < r? < (1 � ")d, then there exists a constant

Alexandra Carpentier, Michal Valko

2 Local influence bandit settings

2.1 Description of the problem

Let G be a graph with d nodes. When a node i is se-lected, it can influence the nodes of G, including itself.Node i influences each node j with fixed but unknownprobability pi,j . Let M = (pi,j)i,j be the d⇥ d matrixthat represents G.

We consider the following online, active setting. Ateach round (time) t, the learner chooses a node kt andobserves which nodes are influenced by kt, i.e., theset Skt,t of influenced nodes is revealed. Let us alsowrite Skt,t(r) for the rth coordinate of Skt,t, i.e., itis 1 if kt influences r at time t and 0 otherwise. Givena budget of n rounds, the objective is to maximize thenumber of influences that the selected node exerts.Formally, our goal is to find the strategy maximizingthe performance

Ln =nX

t=1

|Skt,t| .

The influence of node k, i.e., the expected number ofnodes that node k exerts influence on, is by definition

rk = E [|Sk,t|] =X

jd

pk,j .

We also define the dual influence of node k as

r�k =X

jd

pj,k.

This quantity is the expected number of nodes thatexert influence on node k. For an undirected graph G,M is symmetric and r�k = rk. However, in general, thisis not the case, but we assume that the influence is upto a certain degree mutual. In other words, we assumethat if a node is very influential, it also is subject to theinfluence of many other nodes. We make this precisein Section 3.

As the performance measure, we compare any adaptivestrategy for this setting with the optimal oracle thatknows M. The oracle strategy always chooses one ofthe most influential nodes, which are the nodes whoseexpected number of influences rk is maximal. We callone of these node k?, such that

k? = argmaxk

E"

nX

t=1

|Sk,t|#

= argmaxk

nrk.

Let the reward of this node be

r? = rk? .

Then, its expected performance, if it consistently sam-pled k? over n rounds, is equal to

E [L?n] = nr?.

The expected regret of any adaptive strategy that isunaware of M, with respect to the oracle strategy, isdefined as the expected di↵erence of the two,

E [Rn] = E [L?n]� E [Ln] .

Dually, we define r�? as the average number of influ-ences received by the most influenced node,

r�? = maxk

r�k.

2.2 Baseline comparison: Observing only|Skt |, the number of influenced nodes

For a meaningful baseline comparison that shows thebenefit of the graph structure, we first consider a re-stricted version of the setting from Section 2.1. Therestriction is that the learner, at round t, does notobserve the set of influenced nodes Skt,t, but only thenumber number of elements in Skt,t, denoted by |Skt,t|.In other words, once we select a node, we receive asa feedback only the number of influenced nodes, butnot their identity. In this setting, we do not observeenough information about the graph structure to ex-ploit it, since we do not observe the links between thenodes. As a result, this setting can be mapped toa classic multi-arm bandit setting without underlyinggraph structure, where the reward that the learner ob-serves for node kt is equal to |Skt,t|.

If n � d, it is possible to directly apply classic multi-arm bandit reasoning. Since we never receive any in-formation about the graph structure, we cannot ex-ploit it and we can only consider the quantity |Skt,t|as the standard bandit reward, which is a noisy ver-sion of rkt . Such problem is a standard bandit problemwith rewards |Skt,t|, that are integers between 0 and dand have a variance bounded by rkt .

Directly building on upper and lower bounds argu-ments for the classic bandit strategies (Lai & Rob-bins, 1985; Audibert & Bubeck, 2009), we give thefollowing result. This result’s upper bound holds fora specific bandit algorithm that we call GraphMOSS, aslight adaptation of the MOSS algorithm by Audibert& Bubeck (2009) to our specific setting.

Theorem 1 (proof in Appendix A). In the graph ban-dit problem from Section 2.2, with the reward equal tothe number of influenced nodes |Skt,t| instead of Skt,t,the regret is bounded as follows.

• Lower bound. If for some fixed " > 0, we have"d < r? < (1 � ")d, then there exists a constant