arXiv:2109.02547v1 [math.OC] 6 Sep 2021

52
k -median: exact recovery in the extended stochastic ball model * Alberto Del Pia Mingchen Ma September 3, 2021 Abstract We study exact recovery conditions for the linear programming relaxation of the k-median problem in the stochastic ball model (SBM). In Awasthi et al. (2015), the authors give a tight result for the k-median LP in the SBM, saying that exact recovery can be achieved as long as the balls are pairwise disjoint. We give a counterexample to their result, thereby showing that the k-median LP is not tight in low dimension. Instead, we give a near optimal result showing that the k-median LP in the SBM is tight in high dimension. We also show that, if the probability measure satisfies some concentration assumptions, then the k-median LP in the SBM is tight in every dimension. Furthermore, we propose a new model of data called extended stochastic ball model (ESBM), which significantly generalizes the well-known SBM. We then show that exact recovery can still be achieved in the ESBM. 1 Introduction Clustering problems form a fundamental class of problems in data science with a wide range of ap- plications in computational biology, social science, and engineering. Although clustering problems are often NP-hard in general, recent results in the literature show that we may be able to solve these problems efficiently if the data exhibits a good structure. More specifically, we may be able to solve these problems in polynomial time if the problem data is generated according to some rea- sonable model of data. These models of data are defined in such a way that there is a ground-truth that reveals which cluster a data point comes from. In this way, for each instance of the clustering problem generated according to such model of data, it is clear which optimal solution our algorithm should return. If the algorithm returns the correct solution, we say that the algorithm “achieves exact recovery”. Examples of models of data include the stochastic block model and the stochastic ball model. One of the most successful types of algorithms to achieve exact recovery in polynomial time are convex relaxation techniques, including linear programming (LP) relaxations and semidefinite pro- gramming (SDP) relaxations. When these algorithms achieve exact recover, the optimal solution to the convex relaxation is an integer vector which is also the optimal solution to the underlying inte- ger linear programming problem which models the clustering problem. Some recent LP relaxations * This work is supported by ONR grant N00014-19-1-2322. Any opinions, findings, and conclusions or recommen- dations expressed in this material are those of the authors and do not necessarily reflect the views of the Office of Naval Research. Department of Industrial and Systems Engineering & Wisconsin Institute for Discovery, University of Wisconsin- Madison, Madison, WI, USA. E-mail: [email protected]. Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA. E-mail: [email protected]. 1 arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Transcript of arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Page 1: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

k-median: exact recovery in the extended stochastic ball model∗

Alberto Del Pia † Mingchen Ma ‡

September 3, 2021

Abstract

We study exact recovery conditions for the linear programming relaxation of the k-medianproblem in the stochastic ball model (SBM). In Awasthi et al. (2015), the authors give a tightresult for the k-median LP in the SBM, saying that exact recovery can be achieved as long as theballs are pairwise disjoint. We give a counterexample to their result, thereby showing that thek-median LP is not tight in low dimension. Instead, we give a near optimal result showing thatthe k-median LP in the SBM is tight in high dimension. We also show that, if the probabilitymeasure satisfies some concentration assumptions, then the k-median LP in the SBM is tight inevery dimension. Furthermore, we propose a new model of data called extended stochastic ballmodel (ESBM), which significantly generalizes the well-known SBM. We then show that exactrecovery can still be achieved in the ESBM.

1 Introduction

Clustering problems form a fundamental class of problems in data science with a wide range of ap-plications in computational biology, social science, and engineering. Although clustering problemsare often NP-hard in general, recent results in the literature show that we may be able to solvethese problems efficiently if the data exhibits a good structure. More specifically, we may be ableto solve these problems in polynomial time if the problem data is generated according to some rea-sonable model of data. These models of data are defined in such a way that there is a ground-truththat reveals which cluster a data point comes from. In this way, for each instance of the clusteringproblem generated according to such model of data, it is clear which optimal solution our algorithmshould return. If the algorithm returns the correct solution, we say that the algorithm “achievesexact recovery”. Examples of models of data include the stochastic block model and the stochasticball model.

One of the most successful types of algorithms to achieve exact recovery in polynomial time areconvex relaxation techniques, including linear programming (LP) relaxations and semidefinite pro-gramming (SDP) relaxations. When these algorithms achieve exact recover, the optimal solution tothe convex relaxation is an integer vector which is also the optimal solution to the underlying inte-ger linear programming problem which models the clustering problem. Some recent LP relaxations

∗This work is supported by ONR grant N00014-19-1-2322. Any opinions, findings, and conclusions or recommen-dations expressed in this material are those of the authors and do not necessarily reflect the views of the Office ofNaval Research.

†Department of Industrial and Systems Engineering & Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA. E-mail: [email protected].

‡Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, USA. E-mail:[email protected].

1

arX

iv:2

109.

0254

7v1

[m

ath.

OC

] 6

Sep

202

1

Page 2: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

that achieve exact recovery are [8, 15, 16, 29], while SDP relaxations that achieve exact recoveryinclude [1–5,8, 18–20,23,25]

In this paper we study the k-median problem, which is one of the most well-known and studiedclustering problems. We are given a set P of n different points in a metric space (X, d) and a positiveinteger k ≤ n, and our goal is to partition these n points into k different sets A1, A2, . . . , Ak,also known as clusters. Each cluster Ai has a center ai ∈ P, which satisfies

∑p∈Ai d(ai, p) =

min∑

p∈Ai d(q, p) | q ∈ Ai, and each point in P is assigned to the cluster with the closest center.Formally, the k-median problem is defined as the following optimization problem:

min∑p∈P

mini∈[k]

d(p, ai)

s.t. a1, . . . , ak ∈ P.

The k-median problem is NP-hard even in some very restrictive settings, like the Euclidean k-median problem on the plane [26], and only few very special cases of the k-median problem areknown to be solvable in polynomial time, like the k-median problem on trees [21,30]. Several papersstudy approximation algorithms for the k-median problem, including [6, 7, 12,13,22,24].

The model of data that we consider in this paper, and that is arguably the one used the mostin the study of the k-median problem, is the stochastic ball model (SBM), formally introduced inDefinition 2. In the SBM, we consider k probability measures, each one supported on a unit ballin Rm, and n data points are sampled from each of them. On the other hand, in this paper westudy the effectiveness of the LP relaxation to achieve exact recovery. The main goal of this paperis then to seek for the minimum pairwise distance ∆ between the ball centers which is needed forthe LP relaxation to achieve exact recovery with high probability when the number of the inputdata points n is large enough. To the best of our knowledge, the only known result in this directionare Theorem 7 in [8] (or Theorem 6 in the conference version of the paper [9]). Unfortunately, aswe discuss later, this result is false.

The SBM has also been used as a model of data for other closely related clustering problems,including k-means and k-medoids clustering. In Table 1 we summarize the known exact recoveryresults for clustering problems in the SBM, including some of our results that will discuss later.For more details about the results in the table, including the additional assumptions required, werefer the reader to the corresponding cited paper. We remark that the problem considered in [29]differs from the k-median defined in this paper because in the objective function the sum of thesquared distances is considered.

1.1 Our contribution

In [8], the authors study the k-median problem in the SBM. In the model of data considered in thepaper, there are k unit balls and n points are sampled from each ball. The probability measures,supported on each ball, are translations of each other. Moreover, each probability measure isinvariant under rotations centered in the ball center and every neighborhood of each ball centerhas positive probability measure. In Theorem 7 in [8], the authors claim that, if the unit balls arepairwise disjoint, then the LP-relaxation of the k-median problem achieves exact recovery with highprobability. Unfortunately this result is false. In Example 2 in Appendix B, we present an examplein R2 where the balls are pairwise disjoint and the probability measures satisfy the assumption ofTheorem 7 in [8], but when n is large enough, with high probability the LP relaxation does notachieve exact recovery. Our example implies that to achieve exact recovery, a significant distancebetween the ball centers is needed. In Appendix C we also point out the key problem in the proof

2

Page 3: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Problem Method Sufficient Condition Reference

k-means / k-median Thresholding ∆ > 4 Simple Algorithm

k-means SDP ∆ > 2√

2(1 + 1/√m) Theorem 3 in [8]

SDP ∆ > 2 + k2/m Theorem 9 in [20]

SDP ∆ > 2 +O(√k/m) Corollary 2 in [23]

SDP ∆ > O(√

log n/m) Corollary of Theorem 3 in [18]

k-means LP ∆ > 4 Theorem 9 in [8]

LP ∆ > 1 +√

3 Theorem 4 in [15]

k-median LP ∆ > 3.75 Theorem 6 in [29]LP ∆ > 2 Theorem 7 in [8]LP ∆ > 3.29 Theorem 6

LP ∆ > 2 +O(√k logm/m) Theorem 7

Table 1: Exact recovery results for clustering problems in the SBM.

of Theorem 7 in [8]. Furthermore, we notice that the techniques used in [8] highly depend on theassumptions that we draw the same number of points from each ball, and that the balls have thesame radius and the same probability measure. These observations naturally lead to two questions,which are at the heart of this paper.

Question 1. What is the minimum pairwise distance ∆ between the ball centers which guaranteesthat the k-median LP relaxation in the SBM achieves exact recovery with high probability?

Question 2. If we relax some of the assumptions in the model of data, will exact recovery stillhappen for the k-median LP relaxation?

In this paper, we provide the first answers to Question 1 and Question 2. We propose a moregeneral version of the SBM called ESBM, which is a natural model for Question 2 formally definedin Definition 3. In the ESBM, the number of points drawn from each ball can be different, theballs can have different radii and different probability measures. We study exact recovery for thek-median problem in the ESBM. Informally, we obtain the following results, where we denote byci the center and by ri the radius of ball i.

• Theorem 5: In the ESBM, if for every i 6= j we have d(ci, cj) > (1 + β)R + maxri, rj +O(√k logm/m), then the k-median LP achieves exact recovery with high probability. Here,

R := maxi∈[k] ri and β is a parameter that measures the difference between the numbers ofpoints sampled from the balls.

• Theorem 6: In the SBM, if ∆ > 3.29, then the k-median LP achieves exact recovery withhigh probability.

• Theorem 7: In the SBM, if ∆ > 2 + O(√k logm/m), then the k-median LP achieves exact

recovery with high probability.

• Theorem 8: In the SBM, if ∆ > 2 and the density function decreases as we increase thedistance from the center, then the k-median LP achieves exact recovery with high probability.

We remark that exact recovery can only be considered when the balls are pairwise disjoint. More-over, we need to assume that d(ci, cj) > 2 maxri, rj for every i 6= j, otherwise the ground-truth

3

Page 4: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

solution may not be optimal to the k-median problem. In particular, in the SBM we need to have∆ > 2.

For the ESBM, Theorem 5 provides sufficient conditions for exact recovery. For the SBM,Theorems 6 and 7 provide the condition ∆ > min3.29, 2 + O(

√k logm/m) to guarantee exact

recovery. This result implies that the k-median LP is tight in high dimension. Furthermore,Theorem 8 implies that, if we add strong assumptions on the probability measures, then ∆ > 2also guarantees exact recovery.

The rest of the paper is organized as follows. In Section 2 we introduce the integer programmingformulation (IP) of the k-median problem and the corresponding linear programming relaxation(LP). We then provide deterministic necessary and sufficient conditions which guarantee that afeasible solution to (IP) is optimal to (LP) (Theorem 1). In Section 3 we introduce the definitionof SBM, ESBM, and exact recovery. In Section 4, we introduce a very general sufficient conditionwhich ensures that exact recovery happens with high probability (Theorem 2). Fianlly, in Section 5,we will present our main theorems for exact recovery (Theorems 3 to 8).

2 The k-median problem via linear programming

The k-median problem can be formulated as an integer linear program as follows.

min∑p,q∈P

d(p, q)zpq

s.t.∑p∈P

zpq = 1 ∀q ∈ P

zpq ≤ yp ∀p, q ∈ P∑p∈P

yp = k

yp, zpq ∈ 0, 1 ∀p, q ∈ P.

(IP)

Here, yp = 1 if and only if p is a center, and zpq = 1 if and only if p is the center of q. The firstconstraint says that each point is assigned to exactly one center. The second constraint says thatzpq = 1 can happen only if p is a center. The third constraint says that there are exactly k centers.It is simple to check that an optimal solution to (IP) provides an optimal solution to the k-medianproblem.

In this paper we consider the linear programming relaxation of (IP) obtained from (IP) byreplacing the constraints yp, zpq ∈ 0, 1 with yp, zpq ≥ 0. Such linear program, which is givenbelow, has been used in other works in the literature including [13].

min∑p,q∈P

d(p, q)zpq

s.t.∑p∈P

zpq = 1 ∀q ∈ P

zpq ≤ yp ∀p, q ∈ P∑p∈P

yp = k

yp, zpq ≥ 0 ∀p, q ∈ P.

(LP)

4

Page 5: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

The linear program (LP) is called a linear programming relaxation of (IP) because each feasiblesolution to (IP) is also feasible to (LP). The main advantage of (LP) over (IP) is that the first canbe solved in polynomial time, while the second is NP-hard.

2.1 Conditions for the integrality of (LP)

Let (y, z) be a feasible solution to (IP). The main goals of this section are twofold. First, weprovide necessary and sufficient conditions for (y, z) to be an optimal solution to (LP). Second, wegive sufficient conditions for (y, z) to be the unique optimal solution to (LP). In particular, underthese sufficient conditions the k-median problem is polynomially solvable.

We start by writing down the the dual linear program of (LP). To do so, we associate the dualvariables αq ∀q ∈ P , to the first block of constraints, the dual variables βpq ∀p, q ∈ P , to the secondblock of constraints, and the dual variable ω to the single constraint

∑p∈P yp = k. We obtain the

dual linear program

max∑q∈P

αq − kω

s.t. αq ≤ βpq + d(p, q) ∀p, q ∈ P∑q∈P

βpq ≤ ω ∀p ∈ P

βpq ≥ 0 ∀p, q ∈ P.

(DLP)

It is simple to see that (LP) always has a finite optimum, thus by the Strong Duality Theorem,so does (DLP). In particular, (DLP) is always feasible.

Let (y, z) be a feasible solution to (LP), and let (α, β, ω) be a feasible solution to (DLP). TheComplementary Slackness Theorem (see, e.g., Theorem 4.5 in [11]), says that the vector (y, z) isoptimal to (LP) and (α, β, ω) is optimal to (DLP) if and only if

βpq (zpq − yp) = 0 ∀p, q ∈ P (1)

zpq (αq − βpq − d(p, q)) = 0 ∀p, q ∈ P (2)

yp

∑q∈P

βpq − ω

= 0 ∀p ∈ P. (3)

Now let (y, z) be a feasible solution to (IP). Clearly, the vector (y, z) is feasible to (LP).Furthermore, let (α, β, ω) be a feasible solution to (DLP). From complementary slackness, thevector (y, z) is optimal to (LP) and (α, β, ω) is optimal to (DLP) if and only if

βpq = 0 ∀p, q ∈ P such that yp = 1, zpq = 0 (4)

βpq = αq − d(p, q) ∀p, q ∈ P such that zpq = 1 (5)∑q∈P

βpq = ω ∀p ∈ P such that yp = 1. (6)

Next, we provide an interpretation of the dual variables. We can interpret αq as the maximumdistance a point q can “see”. We can then interpret βpq as the “contribution” from q to p. The aboveconditions (4)–(6), together with (DLP) feasibility, can then be interpreted as follows. When q isnot assigned to a center p, condition (4) says that q does not contribute to p, and the first constraintin (DLP) implies that q cannot see p. Viceversa, when q is assigned to a center p, condition (5)

5

Page 6: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

and the third constraint in (DLP), imply that q can see p, and that q contributes to p. Hence, acenter p is seen exactly by the points in its cluster, which are also the points that contribute to p.Finally, condition (6) says that the centers of the clusters all get the same contribution ω.

In the remainder of the paper, we denote by t+ the positive part of a number t, i.e., t+ :=maxt, 0. We obtain the following observation regarding (DLP).

Observation 1. Suppose (α, β, ω) is a feasible solution to (DLP). For each p, q ∈ P, let β′pq :=(αq − d(p, q))+. Then (α, β′, ω) is a feasible solution to (DLP) with the same objective value.

In particular, Observation 1 implies that there is always an optimal solution to (DLP) whereβpq = (αq − d(p, q))+. Next, we define the contribution function.

Definition 1 (Contribution function). Given α ∈ RP , the contribution function Cα(z) : Rm → Ris defined by

Cα(z) :=∑q∈P

(αq − d(z, q))+.

According to Observation 1, the contribution function can be seen as the contribution that apoint p ∈ P gets from all points in P . We are now ready to present our main deterministic result.

Theorem 1. Let (y, z) be a feasible solution to (IP). Let ai, i ∈ [k], be the k points in P such thatyai = 1. For every i ∈ [k], let Ai := q ∈ P | zaiq = 1. Then (y, z) is optimal to (LP) if and onlyif there exists α ∈ RP such that

Cα(a1) = · · · = Cα(ak) (7)

Cα(q) ≤ Cα(a1) ∀q ∈ P \ aii∈[k] (8)

αq ≥ d(ai, q) ∀i ∈ [k], ∀q ∈ Ai (9)

αq ≤ d(ai, q) ∀i ∈ [k], ∀q ∈ P \Ai. (10)

Furthermore, if there exists α ∈ RP such that (7), (9) hold, and (8), (10) are satisfied strictly, then(y, z) is the unique optimal solution to (LP).

Proof. In the first part of the proof we show the ‘if and only if’ in the statement. After that, wewill show the ‘uniqueness’.

First, we show the implication from left to right. Assume that (y, z) is an optimal solution to(LP). Then by Strong Duality (DLP) also has an optimal solution, which we denote by (α, β, ω).For each p, q ∈ P , let β′pq := (αq − d(p, q))+. According to Observation 1, (α, β′, ω) is also optimalto (DLP). Complementary slackness implies that (x, y) and (α, β′, ω) satisfy the complementaryslackness conditions (4)–(6). Note that for every p ∈ P , we have

∑q∈P β

′pq =

∑q∈P (αq−d(p, q))+ =

Cα(p). Constraints (7) are then implied by (6), since Cα(ai) = ω for every i ∈ [k]. Constraints (8)are implied by (6) and the second constraint in (DLP). Constraints (9) are implied by (5) and thethird constraint in (DLP). Finally, constraints (10) are implied by (4) and the first constraint in(DLP).

Next, we show the implication from right to left. Let α ∈ RP such that (7)–(10) are satisfied.For every p, q ∈ P , we define βpq := (αq−d(p, q))+ and we let ω := Cα(a1). From (8), we know that(α, β, ω) is feasible to (DLP). We can then check that (y, z) and (α, β, ω) satisfy the complementaryslackness conditions (4), (5), and (6) due to (10), (9), and (7), respectively. We conclude that (x, y)is optimal to (LP).

6

Page 7: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

To show the ‘uniqueness’ part of the statement, we continue the previous proof (of the impli-cation from right to left) with the additional assumption that (8), (10) are satisfied strictly.

From complementary slackness we also obtain that (α, β, ω) is an optimal solution to (DLP). Let(y′, z′) be a feasible solution to (LP). Applying complementary slackness to (y′, z′) and (α, β, ω), weobtain that (y′, z′) is an optimal solution to (LP) if and only if these two vectors satisfy conditions(1)–(3). Thus, to prove that (y, z) is the unique optimal solution to (LP), we only need to showthat if (y′, z′) and (α, β, ω) satisfy (1)–(3), then (y′, z′) = (y, z).

Since for every p ∈ P \ aii∈[k], we have Cα(p) =∑

q∈P βpq < ω, (3) implies that y′p = 0for every p ∈ P \ aii∈[k]. From the primal constraints z′pq ≤ y′p ∀p, q ∈ P , we obtain z′pq = 0∀p ∈ P \aii∈[k], ∀q ∈ P . Since for every i ∈ [k] and for every q ∈ P \Ai, we have αq < d(ai, q), weknow from (2) that z′aiq = 0 for every i ∈ [k] and for every q ∈ P \Ai. From the primal constraint∑

p∈P z′pq = 1 ∀q ∈ P we then obtain z′aiq = 1 for every i ∈ [k] and for every q ∈ Ai. Primal

constraints z′pq ≤ y′p ∀p, q ∈ P and∑

p∈P y′p = k imply y′ai = 1 for every i ∈ [k]. We have thereby

shown (y′, z′) = (y, z).

We remark that deterministic sufficient conditions which guarantee that an integer solution to(IP) is an optimal solution to (LP) have also been presented in [8, 29]. The main difference withrespect to these known results is that Theorem 1 provides necessary and sufficient conditions. Inthis paper, we do not only use Theorem 1 to prove that (LP) can achieve exact recovery, but wealso use it to construct examples where (LP) does not achieve exact recovery.

3 Models of data and exact recovery

In Section 2, we considered the k-median problem in a deterministic setting. In the remainderof the paper we will instead consider a probabilistic setting. Furthermore, our discussion of thek-median problem so far is very general, as it applies to any given input consisting of n points ina metric space. In the remainder of the paper, we will only consider the Euclidean space. Thus weuse d(·, ·) to denote the Euclidean distance and we use ‖·‖ to denote the Euclidean norm. We alsodenote by Bm

r (c) the closed ball of radius r and center c in Rm and by Sm−1r (c) the sphere of radius

r and center c in Rm. In this paper, unless otherwise stated, we always assume that the radius r ofballs is positive, i.e., r ∈ R+, where R+ := x ∈ R | x > 0. On the other hand we allow the radiusof spheres to be nonnegative, i.e., r ∈ x ∈ R | x ≥ 0. In particular, Sm−1

0 (c) is the set containingonly the vector c.

In this paper we will consider two models of data for the k-median problem, which are calledthe stochastic ball model and the extended stochastic ball model. Before defining these two modelsof data, we first introduce our notation for basic probability theory, in particular, our notationfollows [17]. Let (µ,Ω,F) be a probability space, where Ω is a set of “outcomes”, F is a set of“events”, and µ is a probability measure. The set F is a σ-algebra on Ω, and in this paper wewill always let F be the σ-algebra generated by Ω. Therefore we will refer to the probability space(µ,Ω,F) by simply writing (µ,Ω). If A ∈ F is a event, we use A to denote its complementaryevent. We say X is an m-dimensional random vector if X is a measurable map from (Ω,F) to(Rm,Rm), where Rm is the σ-algebra generated by Rm. If m = 1, we call X a random variable. Inparticular, if (µ,Ω,F) is a probability space, Ω ⊆ Rm and X is the identity map, we say that X isa random vector drawn according to µ. If X is a random variable, we define its expected value tobe EX =

∫ΩX(x)dµ(x).

We are now ready to define the stochastic ball model.

7

Page 8: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Definition 2 (Stochastic ball model (SBM)). For every i ∈ [k], let (µ,Bm1 (0)) be a probability

space. For each i ∈ [k], draw n i.i.d. random vectors v(i)` , for ` ∈ [n], according to µ. The points

from cluster i are then taken to be x(i)` := ci + v

(i)` , for ` ∈ [n].

Variants of the SBM have been considered in the literature, with different assumptions on theproperties that the probability space (µ,Bm

1 (0)) should satisfy. We refer the reader for exampleto [20].

In this paper we will also consider a more general model of data, which we call the extendedstochastic ball model. The extended stochastic ball model is more general than the SBM in thefollowing ways: (i) we do not require the balls to have the same radius, (ii) we do not require theprobability measure on the balls to coincide, and (iii) we allow to draw different numbers of datapoints from different balls.

Definition 3 (Extended stochastic ball model (ESBM)). For every i ∈ [k], let (µi, Bmri (ci)) be a

probability space. For each i ∈ [k], let βi ≥ 1 and draw ni := βin i.i.d. random vectors x(i)` , for

` ∈ [ni], according to µi. The points from cluster i are then taken to be x(i)` , for ` ∈ [ni].

In this paper we will consider three different assumptions on the probability spaces of the form(µi, B

mri (ci)) that we consider, namely:

(a1) The probability measure µi is invariant under rotations centered in ci;

(a2) Every open subset of Bmri (ci) containing ci has positive probability measure;

(a3) Every subset of Bmri (ci) with zero Lebesgue measure has zero probability measure.

In this paper we will see that in the ESBM, the linear program (LP) can perform very wellin solving the k-median problem. To formalize this notion we define next the concept of exactrecovery.

Definition 4 (Exact recovery). We say that (LP) achieves exact recovery if it has a unique optimalsolution, such solution is also feasible (thus optimal) to (IP), and it assigns each point to the ballfrom which it is drawn.

The reader might wonder why in the definition of the ESBM we assume that ni = βin fori ∈ [k], effectively requiring the ni to be of the same order. In Example 1 in Appendix A we showthat this assumption is needed in order to obtain exact recovery.

4 Sufficient conditions for exact recovery in the ESBM

In this section, we introduce general sufficient conditions which guarantee that (LP) achieves exactrecovery with high probability in the ESBM. To state our results we fist introduce the contributionfunction in the ESBM.

In the original definition (Definition 1), we assumed that α is a vector in RP . When we willconsider the contribution function in the ESBM, we will always assume that for every i ∈ [k] thereexists α′i ∈ R such that ∀` ∈ [ni] we have α

x(i)`

= α′i. For ease of notation, we then define the

contribution function in the ESBM.

Definition 5 (Contribution function in the ESBM). Given α ∈ Rk, the contribution function inthe ESBM Cα(z) : Rm → R is defined by

Cα(z) :=∑i∈[k]

∑`∈[ni]

(αi − d(z, x(i)` ))+.

8

Page 9: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Clearly, given α ∈ RP and α′ ∈ Rk such that ∀i ∈ [k], ` ∈ [ni] we have αx(i)`

= α′i, the two

definitions are equivalent, i.e., Cα(z) = Cα′(z) for every z ∈ Rm. Since each x

(i)` is a random vector

drawn according to µi, we define Ω :=∏`∈[n1]B

mr1(0) × · · · ×

∏`∈[nk]B

mrk

(ck) and we let µ be the

corresponding joint probability measure for x(1)1 , . . . , x

(1)n1 , . . . , x

(k)1 , . . . , x

(k)nk . Then for every z ∈ Rm

and for every α ∈ Rk, Cα(z) is a random variable on the probability space (µ, Ω).Next, we define the function Gα(z), which plays a fundamental role in our sufficient conditions.

Definition 6. Given α ∈ Rk, in the ESBM we define the function Gα(z) : Rm → R as

Gα(z) :=1

nECα(z).

Observation 2. In the ESBM, we obtain

Gα(z) =∑i∈[k]

βi

∫x∈Bmri (ci)

(αi − d(z, x))+dµi(x)

=∑i∈[k]

βi

∫Bmαi (z)∩B

mri

(ci)(αi − d(z, x))dµi(x).

Proof. The expected value of the contribution function is

ECα(z) =∑i∈[k]

ni

∫x∈Bmri (ci)

(αi − d(z, x))+dµi(x).

Using ni = βin, for i ∈ [k], we obtain

Gα(z) =1

nECα(z) =

∑i∈[k]

βi

∫x∈Bmri (ci)

(αi − d(z, x))+dµi(x)

=∑i∈[k]

βi

∫Bmαi (z)∩B

mri

(ci)(αi − d(z, x))dµi(x),

where the last equality holds because αi − d(z, x) ≥ 0 if and only if x ∈ Bmri (ci) ∩B

mαi(z).

Observation 3. In the ESBM, the function from Rk+m to R defined by (α, z) 7→ Gα(z) is contin-uous.

Proof. To prove this observation, it suffices to show that for every compact set B ⊆ Rk+m, thefunction from B to R defined by (α, z) 7→ Gα(z) is continuous. Therefore, let B ⊆ Rk+m be anarbitrary compact set. From Observation 2, Gα(z) can be written in the form

Gα(z) =∑i∈[k]

βi

∫x∈Bmri (ci)

(αi − d(z, x))+dµi(x).

Hence, it suffices to show that, for every i ∈ [k], the function from B to R defined by (α, z) 7→∫x∈Bmri (ci)

(αi − d(z, x))+dµi(x) is continuous.

We know that the function from B ×Bmri (ci) to R defined by (α, z, x) 7→ (αi − d(z, x))+ is con-

tinuous. Since B×Bmri (ci) is a compact set, the Heine–Cantor theorem implies that (αi−d(z, x))+

is uniformly continuous over B × Bmri (ci). This implies that for every ε > 0, there is some δ > 0,

9

Page 10: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

such that for every x ∈ Bmri (ci) and for every (α1, z1), (α2, z2) ∈ B, when

∥∥(α1, z1)− (α2, z2)∥∥ < δ,

we have∣∣(α1

i − d(z1, x))+ − (α2i − d(z2, x))+

∣∣ < ε. We obtain that∣∣∣∣∣∫x∈Bmri (ci)

(α1i − d(z1, x))+ − (α2

i − d(z2, x))+dµi(x)

∣∣∣∣∣ ≤∫x∈Bmri (ci)

∣∣(α1i − d(z1, x))+ − (α2

i − d(z2, x))+

∣∣ dµi(x)

< εP(x ∈ Bmri (ci)) ≤ ε.

This concludes the proof that, for every i ∈ [k], the function from B to R defined by (α, z) 7→∫x∈Bmri (ci)

(αi − d(z, x))+dµi(x) is continuous.

For ease of notation, given k balls Bmri (ci) ⊆ Rm, for i ∈ [k], throughout the paper we denote

by

Di := mind(ci, cj)− ri | j ∈ [k], j 6= i.

We also give the following definition in order to simplify the language in this paper.

Definition 7. Let (µ(n),Ω(n),F(n)) be a probability space, which depends on a parameter n, andlet An ∈ F(n) be an event which depends on n. We say that An happens with high probability, iffor every δ ∈ (0, 1), there exists N > 0 such that when n > N , P(An) > 1− δ.

Note that, when we say with high probability, we always mean with respect to the parametercalled n in the probability space. In this paper we use several times the well-known fact that ifa finite number of events happen with high probability, then they also happen together with highprobability.

We are now ready to state the main result of this section.

Theorem 2. Consider the ESBM. For every i ∈ [k], assume that the probability space (µi, Bmri (ci))

satisfies (a1), (a2), (a3). For every i ∈ [k], denote by Ei := Ed(x, ci), where x is a random vectordrawn according to µi. Assume that there exists some γ ∈ R that satisfies maxi∈[k] βi(ri − Ei) <γ < mini∈[k] βi(Di−Ei). For every i ∈ [k], let αi := Ei + γ

βiand assume that ci is the unique point

that achieves maxGα(z) | z ∈ Bmri (ci). Then (LP) achieves exact recovery with high probability.

Next, we present a corollary of Theorem 2 for the ESBM with some special structure.

Corollary 1. Consider the ESBM. For every i ∈ [k], assume that the probability space (µi, Bmri (ci))

satisfies (a1), (a2), (a3). For every i ∈ [k], assume ni = n, ri = 1, and denote by Ei := Ed(x, ci),where x is a random vector drawn according to µi. We further assume E1 = · · · = Ek. Assumethat there exists some α′ ∈ R that satisfies 1 < α′ < mini 6=j d(ci, cj) − 1. For every i ∈ [k], letαi := α′ and assume that ci is the unique point that achieves maxGα(z) | z ∈ Bm

ri (ci). Then (LP)achieves exact recovery with high probability.

In this paper we often use the concept of median. Let P be a finite set of points in Rm. We saythat x∗ ∈ P is a median of P if x∗ ∈ argmin

∑s∈P d(x, s) | x ∈ P.

Next, we give an overview of the proof of Theorem 2. We first study points that are drawnfrom a single ball. The key observation is that when n is large enough, the median of the pointsdrawn from a single ball is very close to the ball center. This allows us to characterize the solutioncorresponding to the ground-truth. Then, using Hoeffding’s inequality, we prove that with highprobability, when we add any small perturbation to α, the medians still get most contribution,which in turn implies (8). The condition maxi∈[k] βi(ri−Ei) < γ < mini∈[k] βi(Di−Ei) guarantees

10

Page 11: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

that with high probability, when we add a very small perturbation to α, the resulting α satisfies(9) and (10). Finally, using Hoeffding’s inequality, we can guarantee that with high probability thechoice of α that satisfies (7) is very close to the parameter α in the statement. As a consequence,(LP) achieves exact recovery with high probability according to Theorem 1.

A careful reader may find that, if we assume that all Ei are the same and all βi equal one, thenour proof is similar to Steps 2–4 in the proof of Theorem 7 in [8]. However, we point out here animportant difference. In Step 2, the authors show that, after adding a small perturbation to α,the points that get most contribution in expectation will be the ball centers. Then in Step 3, theyshow that with high probability a special choice of α can be seen as the α in Step 2 plus a smallperturbation. Finally in Step 4, they use Hoeffding’s inequality to show that with high probabilitythe α in Step 3 can make the median in each ball obtain most contribution. However, we noticethat since Step 4 is conditioned on Step 3, the probability spaces considered in Step 3 and Step 4are different, so in Step 4 the sequence of random variables considered in Hoeffding’s inequality arenot independent, and Hoeffding’s inequality cannot be used directly. In our proof this problem isnot present.

In Sections 4.1 and 4.2 we prove some lemmas that will be used in the proof of Theorem 2,which is given in Section 4.3. Then, in Section 4.4, we prove Corollary 1.

In this paper we use the standard notation [a, b] for closed segments and (a, b) for open seg-ments in R. Throughout the paper this notation is used only when these segments are nonempty.Therefore, each time we write [a, b] or (a, b) we are also implicitly assuming a ≤ b and a < b,respectively.

4.1 Lemmas about a single ball

In this section we present some lemmas that consider a single probability space of the form(µ,Bm

r (0)).

Lemma 4. Let (µ,Bmr (0)) be a probability space that satisfies (a3). Let x1, . . . , xn be random

vectors drawn i.i.d. according to µ, where n ≥ 3, and let M the set of medians of x``∈[n]. Then|M| = 1 with probability one.

Proof. SinceM is always non empty, in order to show that |M| = 1 with probability one, it sufficesto show that we have |M| ≥ 2 with probability zero.

Let x1, . . . , xn−1 ∈ Bmr (0). Then we have

P(|M| ≥ 2) =

∫Bmr (0)

· · ·∫Bmr (0)

P(|M| ≥ 2 | x1 = x1, . . . , xn−1 = xn−1)dµ(x1) · · · dµ(xn−1).

Hence, to prove the lemma it suffices to show that for every x1, . . . , xn−1 ∈ Rm we have

P(|M| ≥ 2 | x1 = x1, . . . , xn−1 = xn−1) = 0. (11)

From (a3), we know that x1, . . . , xn−1 are different points with probability one. So it is sufficientto show that (11) holds when x1, . . . , xn−1 are all different.

Note that |M| ≥ 2 implies that there exist u, v ∈ [n] with u 6= v such that∑

`∈[n] d(xu, x`) =∑`∈[n] d(xv, x`). So we have

P(|M| ≥ 2 | x1 = x1, . . . , xn−1 = xn−1) ≤∑

u,v∈[n],u6=v

P

∑`∈[n]

d(xu, x`) =∑`∈[n]

d(xv, x`) | x1 = x1, . . . , xn−1 = xn−1

.

11

Page 12: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Thus, to prove the lemma, it suffices to show that, for every x1, . . . , xn−1 ∈ Rm all different, andfor every u, v ∈ [n] with u 6= v, we have

P

∑`∈[n]

d(xu, x`) =∑`∈[n]

d(xv, x`) | x1 = x1, . . . , xn−1 = xn−1

= 0.

Notice that the above event only depends on the choice of xn, since x1, . . . , xn−1 are fixed tox1, . . . , xn−1 respectively. Thus we define

S :=

xn ∈ Rm |∑`∈[n]

d(xu, x`) =∑`∈[n]

d(xv, x`), x1 = x1, . . . , xn−1 = xn−1

.

To prove the lemma, it suffices to show that the Lebesgue measure of S is zero. In fact, (a3) thenimplies that S has zero probability measure. Hence, in the remainder of the proof we show thatthe Lebesgue measure of S is zero.

We consider separately two cases. In the first case we assume u 6= n and v 6= n. Then

S =

xn ∈ Rm | d(xu, xn)− d(xv, xn) =∑

`∈[n]\n

d(xv, x`)−∑

`∈[n]\n

d(xu, x`)

.

We define the function f : Rm → R defined by

f(xn) := d(xu, xn)− d(xv, xn)−∑

`∈[n]\n

d(xv, x`) +∑

`∈[n]\n

d(xu, x`).

Note that S is the zero set of f . The function f(xn) is a real analytic function on the connected opendomain Rm \ xu, xv since the distance function can be written as a composition of exponentialfunctions, logarithms and polynomials. Furthermore, f(xn) is not identically zero, since it increasesas xn moves on the segment from xv to xu. From Proposition 1 in [27], we obtain that S has zeroLebesgue measure.

In the second case we assume u = n and v 6= n. Then

S =

xn ∈ Rm |∑

`∈[n]\n\v

d(xn, x`) =∑

`∈[n]\n

d(xv, x`)

.

We define the function f : Rm → R defined by

f(xn) :=∑

`∈[n]\n\v

d(xn, x`)−∑

`∈[n]\n

d(xv, x`).

Also in this case S is the zero set of f . As in the previous case, the function f(xn) is a real analyticfunction on the connected open domain Rm \ x1, . . . , xv−1, xv+1, . . . , xn−1. Furthermore, it is notidentically zero, as it increases as the norm of xn goes to infinity. Again from Proposition 1 in [27],we obtain that S has zero Lebesgue measure. So in both cases we have shown that the Lebesguemeasure of S is zero.

The next two lemmas state that, under some assumptions on the probability space (µ,Bmr (0)),

the vector z = 0 is the unique point that achieves minEd(z, y) | z ∈ Bmr (0), where y be a random

vector drawn according to µ. In Lemma 5 we consider the case m = 1 and in Lemma 6 we studythe case m ≥ 2.

12

Page 13: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Lemma 5. Let (µ,B1r (0)) be a probability space that satisfies (a1), (a2). Let y be a random vector

drawn according to µ. Then z = 0 is the unique point that achieves minEd(z, y) | z ∈ B1r (0).

Proof. We show that for every z 6= 0, we have Ed(z, y) > Ed(0, y). Let z ∈ [−r, r] \ 0. Then wehave

Ed(z, y) =

∫ z

−r(z − y)dµ(y) +

∫ r

z(y − z)dµ(y).

Without loss of generality, we assume that z > 0. We then have

Ed(z, y)− Ed(0, y) =

∫ −z−r

zdµ(y) +

∫ 0

−zzdµ(y) +

∫ z

0(z − 2y)dµ(y) +

∫ r

z−zdµ(y).

Since µ satisfies (a1), we have∫ −z−r zdµ(y) =

∫ rz zdµ(y) and

∫ 0−z zdµ(y) =

∫ z0 zdµ(y). So we obtain

Ed(z, y)− Ed(0, y) =

∫ 0

−zzdµ(y) +

∫ z

0(z − 2y)dµ(y) = 2

∫ z

0(z − y)dµ(y) > 0,

where the inequality holds due to (a2).

Lemma 6. Let (µ,Bmr (0)) be a probability space with m ≥ 2 that satisfies (a1). Let y be a random

vector drawn according to µ. Then z = 0 is the unique point that achieves minEd(z, y) | z ∈Bmr (0).

Proof. Note that we can write any z ∈ Bmr (0) as z = tv, for a unit vector v and a scalar t ∈ [0, r].

Since µ is invariant under rotations centered in the origin, to prove the lemma it suffices to showthat for any fixed unit vector v, t = 0 is the unique point that achieves minEd(tv, y) | t ∈ [0, r].To prove the lemma it is sufficient to show that

∂tEd(tv, y) > 0 ∀t ∈ (0, r). (12)

In fact, we notice that Ed(tv, y) is a continuous function in t ∈ [0, r], since for every ε > 0 and forevery t, t′ ∈ [0, r] with |t− t′| < ε, we have∣∣Ed(tv, y)− Ed(t′v, y)

∣∣ =∣∣E(d(tv, y)− d(t′v, y))

∣∣ ≤ ∣∣tv − t′v∣∣ =∣∣t− t′∣∣ < ε.

Hence, if (12) holds, then by the Newton-Leibniz formula, we have

Ed(sv, y)− Ed(0, y) =

∫ s

0

∂tEd(tv, y)dt > 0 ∀s > 0.

Thus, in the remainder of the proof we show (12).We know that

Ed(tv, y) =

∫Bmr (0)

d(tv, y)dµ(y),

thus we obtain

∂tEd(tv, y) =

∂t

∫Bmr (0)

d(tv, y)dµ(y) =

∫Bmr (0)

∂td(tv, y)dµ(y) =

∫Bmr (0)

〈tv − y, v〉d(tv, y)

dµ(y), (13)

13

Page 14: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

where 〈·, ·〉 denotes the scalar product. In the remainder of the proof, for s ≥ 0, we denote byµs the uniform probability measure with support Sm−1

s (0). Since µ is invariant under rotationscentered in the origin, we know that a vector y with ‖y‖ = s, s ∈ [0, r], is drawn according to µs.

We evaluate (13) in a fixed t ∈ (0, r). Let µ be the probability measure of the random variable‖y‖ and let x be a random vector drawn according to µs. We have

∂tEd(tv, y)

∣∣∣t=t

=

∫Bmr (0)

〈tv − y, v〉d(tv, y)

dµ(y) =

∫ r

0dµ(s)

∫Sm−1s (0)

〈tv − x, v〉d(tv, x)

dµs(x). (14)

Next, we study the inner integral in (14) and consider two subcases. In the first subcase wehave s ∈ [0, t], and obtain

〈tv − x, v〉 = t− 〈x, v〉 ≥ t− ‖x‖ ≥ 0,

where the chain of inequalities holds at equality if and only x = tv. So we obtain that the innerintegral in (14) is strictly positive when s ∈ (0, t].

In the second subcase we have s ∈ (t, r]. We define the random variable θ ∈ [0, π] to be theangle between x and v and we let µ be its probability measure. We also define the random variableψ ∈ [0, π] to be the angle between v and tv − x, and the random variable φ ∈ [0, π) to be theangle between x and x − tv. See Figure 1 for a depiction of the angles θ, ψ, φ. Note that onceθ is determined, since t is fixed, ψ and φ are also determined. Therefore, we can consider thefunctions ψ, φ : [0, π] → [0, π] that associate to each angle angle θ, the corresponding angles ψ(θ)and φ(θ). Then we know that for every θ ∈ [0, π], ψ(θ) = π − φ(θ) − θ ≤ π − θ, and, whenθ ∈ (0, π), ψ(θ) < π − θ. We then have

Figure 1: The angles θ, ψ, φ in the proof of Lemma 6. The dotted vector is tv − x applied to tv.

∫Sm−1s (0)

〈tv − x, v〉d(tv, x)

dµs(x) =

∫ π

0cosψ(θ)dµ(θ) =

∫ π2

0cosψ(θ) + cosψ(π − θ)dµ(θ) > 0.

In the above formula, the second equality use the fact that µ is symmetric with respect to θ = π/2,and the inequality follows because, when m ≥ 2 and θ ∈ (0, π), we have ψ(θ) < π − θ and

14

Page 15: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

ψ(π − θ) < θ, which implies cosψ(θ) + cosψ(π − θ) > cos(π − θ) + cos θ = 0, when θ ∈ (0, π/2).We obtain that the inner integral in (14) is strictly positive when s ∈ (t, r]. Thus we conclude that(14) is positive.

In the next lemma we make use of Lemmas 5 and 6.

Lemma 7. Let (µ,Bmr (0)) be a probability space that satisfies (a1), (a2). Let x1, . . . , xn be random

vectors drawn i.i.d. according to µ, and let x∗ be a median of x``∈[n]. Then ∀ε > 0, with highprobability, we have ‖x∗‖ < ε.

Proof. Let y be a random vector drawn according to µ. We know from Lemmas 5 and 6 that x = 0is the unique point that achieves minEd(x, y) | x ∈ Bm

r (0). Furthermore, since the function fromBmr (0) to R defined by x 7→ Ed(x, y) is continuous, we know that for every ε ∈ (0, r), there is

some τ with 0 < τ < ε < r and some ξ > 0 such that for each x ∈ Bmr (0) \ Bm

ε (0) and for eachx′ ∈ Bm

τ (0), we have Ed(x, y) − Ed(x′, y) > ξ. Let xmin := argmin‖x‖ | x ∈ x``∈[n] and noticethat x∗ = argmin

∑`∈[n] d(x, x`)/n | x ∈ x``∈[n].

We observe that to prove the lemma it suffices to show that ‖xmin‖ ≤ τ , that∑

`∈[n] d(xu, x`)/n−Ed(xu, y) ≥ −ξ/2 for every u ∈ [n] with xu ∈ Bm

r (0) \ Bmε (0), and that

∑`∈[n] d(xv, x`)/n −

Ed(xv, y) ≤ ξ/2 for every v ∈ [n] with xv ∈ Bmτ (0). In fact, under these assumptions we obtain

that for every u ∈ [n] with with xu ∈ Bmr (0) \ Bm

ε (0) and for every v ∈ [n] with xv ∈ Bmτ (0), we

have∑`∈[n] d(xv, x`)

n−∑

`∈[n] d(xu, x`)

n=

=

(∑`∈[n] d(xv, x`)

n− Ed(xv, y)

)−

(∑`∈[n] d(xu, x`)

n− Ed(xu, y)

)− (Ed(xu, y)− Ed(xv, y))

2+ξ

2− ξ = 0.

Since ‖xmin‖ ≤ τ , the above expression implies that ‖x∗‖ ≤ ε.Inspired by the above observation, we define the following events. We denote by A the event

that ‖x∗‖ ≤ ε and we denote by T the event that ‖xmin‖ ≤ τ . For every w ∈ [n], we denote by Mw

the event that at least one of the following events happens:

• xw ∈ Bmτ (0) and

∑`∈[n] d(xw, x`)/n− Ed(xw, y) ≤ ξ/2;

• xw ∈ Bmε (0) \Bm

τ (0);

• xw ∈ Bmr (0) \Bm

ε (0) and∑

`∈[n] d(xw, x`)/n− Ed(xw, y) ≥ −ξ/2.

In the remainder of the proof, we denote by E the complement of an event E. We know that if Tand Mw, for all w ∈ [n], are true then A is true. So we get

P(A) ≥ P

T ∩ ⋂w∈[n]

Mw

= 1− P

T ∪ ⋃w∈[n]

Mw

≥ 1− P(T )−∑w∈[n]

P(Mw). (15)

We next upper bound P(T ) and P(Mw).We define p := P(y ∈ Bm

r (0) \Bmτ (0)) and obtain

P(T ) = pn. (16)

Since (µ,Bmr (0)) satisfies (a2), we know that p < 1.

For w ∈ [n], we know that Mw is true if and only if at least one of the following event is true:

15

Page 16: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Pw: xw ∈ Bmτ (0) and

∑`∈[n] d(xw, x`)/n− Ed(xw, y) > ξ/2;

Qw: xw ∈ Bmr (0) \Bm

ε (0) and∑

`∈[n] d(xw, x`)/n− Ed(xw, y) < −ξ/2.

Hence in the following we will upper bound separately P(Pw) and P(Qw).We start by analyzing Pw. For every z ∈ Bm

τ (0), we have

P

(∑`∈[n] d(z, x`)

n− Ed(z, y) >

ξ

2| xw = z

)= P

(∑`6=w d(z, x`)

n− Ed(z, y) >

ξ

2| xw = z

)

= P(∑

` 6=w d(z, x`)

n− Ed(z, y) >

ξ

2

)= P

∑`6=w

d(z, x`)− (n− 1)Ed(z, y) >nξ

2+ Ed(z, y)

≤ exp

(−2(nξ/2 + Ed(z, y))2

(n− 1)r2

)≤ exp

(−nξ

2

2r2

).

Here, the second equality holds because x`, for ` ∈ [n] are independent. In the first inequality, weuse the Hoeffding’s inequality and the fact that d(z, x`) ∈ [0, r]. The last inequality follows becauseEd(z, y) ≥ 0. So we get

P(Pw) =

∫Bmτ (0)

P

(∑`∈[n] d(z, x`)

n− Ed(z, y) >

ξ

2| xw = z

)dµ(z)

≤ sup

P

(∑`∈[n] d(z, x`)

n− Ed(z, y) >

ξ

2| xw = z

)| z ∈ Bm

τ (0)

≤ exp

(−nξ

2

2r2

).

Next, we analyze in a similar way Qw. We have

P(Qw) =

∫Bmr (0)\Bmε (0)

P

(∑`∈[n] d(z, x`)

n− Ed(z, y) < −ξ

2| xw = z

)dµ(z) ≤ exp

(−nξ

2

4r2

),

because for every z ∈ Bmr (0) \Bm

ε (0), we have

P

(∑`∈[n] d(z, x`)

n− Ed(z, y) < −ξ

2| xw = z

)= P

(∑` 6=w d(z, x`)

n− Ed(z, y) < −ξ

2| xw = z

)

= P(∑

6=w d(z, x`)

n− Ed(z, y) < −ξ

2

)= P

∑`6=w

d(z, x`)− (n− 1)Ed(z, y) < −(nξ

2− Ed(z, y)

)≤ exp

(−2(nξ/2− Ed(z, y))2

(n− 1)r2

)≤ exp

(−2(nξ/2− r)2

nr2

)≤ exp

(−nξ

2

4r2

),

where the last inequality holds when n > 4r/((2−√

2)ξ).In the rest of the proof, we assume that n > 4r/((2−

√2)ξ). Using the union bound, we have

P(Mw) ≤ P(Pw) + P(Qw) ≤ exp

(−nξ

2

2r2

)+ exp

(−nξ

2

4r2

)≤ 2 exp

(−nξ

2

4r2

).

Using (15) and (16), we obtain

P(A) ≥ 1− P(T )−∑w∈[n]

P(Mw) ≥ 1− pn − 2n exp

(−nξ

2

4r2

).

The latter quantity goes to 1 as n goes to infinity because p < 1 and p, ξ and r are all parametersthat do not depend on n. So with high probability we have ‖x∗‖ ≤ ε.

16

Page 17: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

In the next lemma we use Lemma 7.

Lemma 8. Let (µ,Bmr (0)) be a probability space that satisfies (a1), (a2). Let x1, . . . , xn be random

vectors drawn i.i.d. according to µ, and let x∗ be a median of x``∈[n]. Let E := E ‖x‖, where xis a random vector drawn according to µ. Let OPT :=

∑`∈[n] d(x∗, x`). Then for each ε > 0, with

high probability we have |OPT /n− E| < ε.

Proof. Let ε > 0. We apply Lemma 7 and we know that with high probability, we have ‖x∗‖ < ε/2.This implies that, with high probability, we have |d(x`, x∗) − d(x`, 0)| < ε/2 for each ` ∈ [n].Summing the latter n inequalities, we obtain that with high probability we have∣∣∣∣∣OPT

n−∑

`∈[n] ‖x`‖n

∣∣∣∣∣ < ε

2. (17)

On the other hand, according to Hoeffding’s inequality,

P

(∣∣∣∣∣∑

`∈[n] ‖x`‖n

− E

∣∣∣∣∣ < ε

2

)> 1− 2 exp

(−nε

2

2r2

).

Since exp(−nε2/(2r2)) goes to zero as n goes to +∞, with high probability we have∣∣∣∣∣∑

`∈[n] ‖x`‖n

− E

∣∣∣∣∣ < ε

2. (18)

From (17) and (18), with high probability we have∣∣∣∣OPT

n− E

∣∣∣∣ ≤∣∣∣∣∣OPT

n−∑

`∈[n] ‖x`‖n

∣∣∣∣∣+

∣∣∣∣∣∑

`∈[n] ‖x`‖n

− E

∣∣∣∣∣ < ε

2+ε

2= ε.

4.2 Lemmas about several balls

While in Section 4.1 we only considered one ball Bmr (0), in the three lemmas presented in this

section we will consider k balls Bmri (ci), for i ∈ [k].

Lemma 9. Let Bmri (ci), for i ∈ [k], be k balls in Rm. For every i ∈ [k], assume ri < Di and let

[ai, bi] ⊂ (ri, Di). Then there exist τi > 0, ∀i ∈ [k], such that ∀i, j ∈ [k], ∀z ∈ intBmτi (ci), and

∀αj ∈ [aj , bj ], we have

Bmαj (z) ∩B

mrj (cj) =

Bmri (ci) if j = i

∅ otherwise.(19)

Proof. We first show the following claim, obtained from the statement of the lemma by fixing someαi ∈ (ri, Di), for i ∈ [k]. Let Bm

ri (ci), for i ∈ [k], be k balls in Rm, assume ri < Di, and letαi ∈ (ri, Di). Then, for every i ∈ [k], there exists τi(α) > 0 such that ∀z ∈ intBm

τi(α)(ci), we have

(19).To prove the claim we choose, for every i ∈ [k], τi(α) := minαi−ri, D1−α1, . . . , Dk−αk > 0.

Let z ∈ intBmτi(α)(ci). We first show that Bm

αi(z) ∩ Bmri (ci) = Bm

ri (ci). We only need to prove that

for each x ∈ Bmri (ci), we have x ∈ Bm

αi(z), and this holds because

d(x, z) ≤ d(x, ci) + d(ci, z) < ri + τi(α) ≤ ri + (αi − ri) = αi.

17

Page 18: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Next we show that we have Bmαj (z) ∩ B

mrj (cj) = ∅ for j 6= i. We only need to prove that for each

x ∈ Bmrj (cj), we have x 6∈ Bm

αj (z). We have

d(x, ci) ≥ d(ci, cj)− d(x, cj) ≥ d(ci, cj)− rj ≥ Dj ,

thus

d(x, z) ≥ d(x, ci)− d(ci, z) ≥ Dj − d(ci, z) > Dj − τi(α) ≥ αj ,

where the last inequality follows from the definition of τi(α). We have shown d(x, z) > αj , thusx 6∈ Bm

αj (z). This concludes the proof of the claim.To prove the lemma, we define the set S :=

∏i∈[k][ai, bi] and take τi := infτi(α) | α ∈ S =

minτi(α) | α ∈ S > 0, where the equality follows from the extreme value theorem, since S iscompact and τi(α) is a continuous function over S, for every i ∈ [k].

Lemma 10. Consider the ESBM. Let α ∈ Rk and let si ∈ Rm for every i ∈ [k]. Assume that wehave

Bmαj (si) ∩B

mrj (cj) =

Bmri (ci) if j = i

∅ otherwise∀i, j ∈ [k]. (20)

Then, we have

αi ≥ d(si, x(i)` ) ∀i ∈ [k], ∀` ∈ [ni]

αj < d(si, x(j)` ) ∀i, j ∈ [k], i 6= j, ∀` ∈ [nj ]

Cα(si) = niαi −∑`∈[ni]

d(si, x(i)` ) ∀i ∈ [k].

Proof. From (20) with j = i we obtain that for every i ∈ [k] we have Bmαi(si) ∩ B

mri (ci) = Bm

ri (ci)

thus Bmri (ci) ⊆ B

mαi(si). Since x

(i)` ∈ B

mri (ci) for every i ∈ [k], ` ∈ [ni], we obtain

αi ≥ d(si, x(i)` ) ∀i ∈ [k], ∀` ∈ [ni].

From (20) with i 6= j, we obtain that for every i, j ∈ [k] with i 6= j, we have Bmαj (si) ∩B

mrj (cj) = ∅.

Since x(j)` ∈ B

mrj (cj) for every j ∈ [k], ` ∈ [nj ], we obtain

αj < d(si, x(j)` ) ∀i, j ∈ [k], i 6= j, ∀` ∈ [nj ].

We obtain that for every i ∈ [k],

Cα(si) =∑j∈[k]

∑`∈[nj ]

(αj − d(si, x(j)` ))+

=∑`∈[ni]

(αi − d(si, x(i)` ))+ +

∑j∈[k], j 6=i

∑`∈[nj ]

(αj − d(si, x(j)` ))+

=∑`∈[ni]

(αi − d(si, x(i)` ))

= niαi −∑`∈[ni]

d(si, x(i)` ).

18

Page 19: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Lemma 11. Consider the ESBM. For every i ∈ [k], assume that the probability space (µi, Bmri (ci))

satisfies (a2). For every i ∈ [k], assume ri < Di, let αi ∈ (ri, Di), let τi > 0, and assumethat ci is the unique point that achieves maxGα(z) | z ∈ Bm

ri (ci). Then there exists ξ > 0such that with high probability, for every α′ ∈ Rk with ‖α′ − α‖∞ ≤ ξ and for every i ∈ [k],

argmaxCα′(z) | z ∈ x(i)` `∈[ni] ⊆ intBm

τi (ci).

Proof. Since Gα(z) is continuous in z according to Observation 3, and Bmri (ci) \ intBm

τi (ci) is com-pact, we know that for every i ∈ [k], maxGα(z) | z ∈ Bm

ri (ci) \ intBmτi (ci) is achieved. Since, by

assumption, for every i ∈ [k], ci is the unique point that achieves maxGα(z) | z ∈ Bmri (ci), we

obtain that Gα(ci)−maxGα(z) | z ∈ Bmri (ci) \ intBm

τi (ci) > 0, ∀i ∈ [k]. Let

L := mini∈[k]

Gα(ci)−maxGα(z) | z ∈ Bm

ri (ci) \ intBmτi (ci)

> 0.

Since for every i ∈ [k], Gα(z) is continuous in z = ci, we know that for every i ∈ [k], thereexists 0 < τ ′i < τi such that for every z ∈ Bm

τ ′i(ci), we have Gα(z) > Gα(ci)−L/2. Hence, for every

z ∈ Bmτ ′i

(ci), we have

Gα(z)−maxGα(z) | z ∈ Bmri (ci) \ intBm

τi (ci)

> Gα(ci)−L

2−maxGα(z) | z ∈ Bm

ri (ci) \ intBmτi (ci) ≥ L−

L

2=L

2> 0.

(21)

Let β := maxi∈[k] βi and let ξ := L/(8kβ). Notice that for every α′ ∈ Rk with ‖α′ − α‖∞ ≤ ξ andfor every z ∈ Rm, we have

∣∣∣∣ 1nCα′(z)− 1

nCα(z)

∣∣∣∣ =1

n

∣∣∣∣∣∣∑i∈[k]

∑`∈[ni]

((α′i − d(z, x

(i)` ))+ − (αi − d(z, x

(i)` ))+

)∣∣∣∣∣∣≤ 1

n

∑i∈[k]

∑`∈[ni]

∣∣∣(α′i − d(z, x(i)` ))+ − (αi − d(z, x

(i)` ))+

∣∣∣≤ 1

n

∑i∈[k]

∑`∈[ni]

∣∣α′i − αi∣∣ ≤ 1

n

∑i∈[k]

∑`∈[ni]

L

8kβ≤ L

8.

(22)

For every i ∈ [k], denote by Ai the event that for every α′ ∈ Rk with ‖α′ − α‖∞ ≤ ξ, we have

argmaxCα′(z) | z ∈ x(i)` `∈[ni] ⊆ intBm

τi (ci). For every i ∈ [k], denote by Ti the event that there

is some w ∈ [ni] such that x(i)w ∈ Bm

τ ′i(ci). For every i ∈ [k] and w ∈ [ni], denote by Miw the event

that at least one of the following event happens:

• x(i)w ∈ Bm

τ ′i(ci) and 1

nCα(x

(i)w )−Gα(x

(i)w ) ≥ −L/8;

• x(i)w ∈ intBm

τi (ci) \Bmτ ′i

(ci);

• x(i)w ∈ Bm

ri (ci) \ intBmτi (ci) and 1

nCα(x

(i)w )−Gα(x

(i)w ) ≤ L/8.

Note that for every i ∈ [k], if Ti is true and Miw is true for every w ∈ [ni], then Ai is true. This

is because Bmτ ′i

(ci) ∩ x(i)w w∈[ni] is nonempty and, for every z ∈ Bm

τ ′i(ci) ∩ x(i)

w w∈[ni] and for every

19

Page 20: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

z′ ∈ (Bmri (ci)\ intBm

τi (ci))∩x(i)w w∈[ni], we have Cα

′(z) > Cα

′(z′) for every α′ with ‖α′ − α‖∞ ≤ ξ.

To see the last inequality, we use (22), the definition of the events Miw, and (21) to obtain

1

nCα′(z)− 1

nCα′(z′) =

(1

nCα′(z)− 1

nCα(z)

)+

(1

nCα(z)−Gα(z)

)−(

1

nCα(z′)−Gα(z′)

)+

(1

nCα(z′)− 1

nCα′(z′)

)+ (Gα(z)−Gα(z′))

> −L8− L

8− L

8− L

8+L

2= 0.

To prove the lemma we just need to show that the event⋂i∈[k]Ai happens with high probability.

In the remainder of the proof, we denote by E the complement of an event E. From the abovediscussion, we have that for every i ∈ [k],

P(Ai) ≥ P(Ti ∩⋂

w∈[ni]

Miw) = 1− P(Ti ∪⋃

w∈[ni]

Miw) ≥ 1− P(Ti)−∑w∈[ni]

P(Miw). (23)

Hence, in the remainder of the proof we will provide a lower bound for P(Ai) by providing upperbounds for P(Ti) and P(Miw).

Let p := maxi∈[k] P(x(i)1 6∈ Bm

τ ′i(ci)). Since for every i ∈ [k], the probability space (µi, B

mri (ci))

satisfies (a2), we know that p ∈ [0, 1). So we get

P(Ti) =∏`∈[ni]

P(x(i)` 6∈ B

mτ ′i

(ci)) ≤ pni ≤ pn. (24)

Next, we derive an upper bound for P(Miw). We start by observing that for every i ∈ [k] andw ∈ [ni], the event Miw is true if and only if at least one of the events Piw and Qiw happens, wherethe events Piw and Qiw are defined below.

Piw: x(i)w ∈ Bm

τ ′i(ci) and 1

nCα(x

(i)w )−Gα(x

(i)w ) < −L/8;

Qiw: x(i)w ∈ Bm

ri (ci) \ intBmτi (ci) and 1

nCα(x

(i)w )−Gα(x

(i)w ) > L/8.

Next, we upper bound the probability of the event Piw. We notice that

P(Piw) =

∫Bmτ ′i(ci)

P(

1

nCα(z)−Gα(z) < −L

8| x(i)

w = z

)dµi(z).

≤ sup

P(

1

nCα(z)−Gα(z) < −L

8| x(i)

w = z

)| z ∈ Bm

τ ′i(ci)

.

(25)

For every z ∈ Rm and for every i ∈ [k], w ∈ [ni], we define the random variable Xiw(z) :=

(αi − d(z, x(i)w ))+. We know that for every z, Xiw(z) are independent random variables since x

(i)w

are independent. We then obtain

Cα(z) =∑i∈[k]

∑w∈[ni]

(αi − d(z, x(i)w ))+ =

∑i∈[k]

∑w∈[ni]

Xiw(z).

Note that, if we fix z = x(i)w , we can then rewrite Cα(z) in the form

Cα(z) =∑

j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z) + αi. (26)

20

Page 21: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Also, we have

E

∑j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z)

= E(Cα(z)−Xiw(z)) = nGα(z)− I(z), (27)

where I(z) :=∫Bmαi (z)∩B

mri

(ci)(αi − d(z, x))dµi(x) and the last equality follows from the definition of

Gα(z) and using the same argument in the proof of Observation 2. We then define M := maxi∈[k] αiand observe that Xj`(z) ∈ [0,M ] for every j ∈ [k], ` ∈ [nj ]. Let β := maxi∈[k] βi. We obtain

P(

1

nCα(z)−Gα(z) < −L

8| x(i)

w = z

)

= P

1

n

∑j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z)

− (Gα(z)− 1

nI(z)

)< −L

8+

1

nI(z)− αi

n| x(i)

w = z

= P

1

n

∑j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z)

− (Gα(z)− 1

nI(z)

)< −L

8+

1

nI(z)− αi

n

= P

∑j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z)

− (nGα(z)− I(z)) < −(nL

8− I(z) + αi

)≤ exp

(− (nL− 8I(z) + 8αi)

2

32M2(∑

i∈[k] ni − 1)

)≤ exp

(− (nL)2

32M2(∑

i∈[k] ni − 1)

)≤ exp

(− nL2

32kβM2

).

The first equality follows from (26) and by adding on both sides 1nI(z). In the second equality, we

use the fact that for every z, Xj`(z) are independent. In the first inequality, we use Hoeffding’sinequality and (27). The second inequality holds because αi − I(z) ≥ 0 and the last inequalityfollows by the definition of β. Using (25), we obtain the following upper bound on the probabilityof the event Piw.

P(Piw) ≤ sup

P(

1

nCα(z)−Gα(z) < −L

8| x(i)

w = z

)| z ∈ Bm

τ ′i(ci)

≤ exp

(− nL2

32kβM2

).

In a similar fashion, we now obtain an upper bound on the probability of the event Qiw. Wehave

P(

1

nCα(z)−Gα(z) >

L

8| x(i)

w = z

)

= P

1

n

∑j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z)

− (Gα(z)− 1

nI(z)

)>L

8+

1

nI(z)− αi

n| x(i)

w = z

= P

1

n

∑j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z)

− (Gα(z)− 1

nI(z)

)>L

8+

1

nI(z)− αi

n

= P

∑j∈[k]\i

∑`∈[nu]

Xj`(z) +∑

`∈[ni]\w

Xi`(z)

− (nGα(z)− I(z)) >nL

8+ I(z)− αi

≤ exp

(− (nL+ 8I(z)− 8αi)

2

32M2(∑

i∈[k] ni − 1)

)≤ exp

(− (nL− 8αi)

2

32M2(∑

i∈[k] ni − 1)

)≤ exp

(−(nL− 8αi)

2

32kβnM2

),

21

Page 22: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

where the second inequality holds because I(z) ≥ 0. We obtain the following upper bound on theprobability of the event Qiw.

P(Qiw) ≤ sup

P(

1

nCα(z)−Gα(z) >

L

8| x(i)

w = z

)| z ∈ Bm

ri (ci) \ intBmτi (ci)

≤ exp

(−(nL− 8αi)

2

32kβnM2

)≤ exp

(− nL2

64kβM2

),

where the last inequality holds when n > 16αmax/((2−√

2)L), and αmax := maxi∈[k] αi.

Using the union bound, when n > 16αmax/((2−√

2)L), we have

P(Miw) ≤ P(Piw) + P(Qiw) ≤ exp

(− nL2

32kβM2

)+ exp

(− nL2

64kβM2

)≤ 2 exp

(− nL2

64kβM2

).

Using (23) and (24), when n > 16αmax/((2−√

2)L), we have

P(Ai) ≥ 1− pn − 2ni exp

(− nL2

64kβM2

)≥ 1− pn − 2βn exp

(− nL2

64kβM2

).

The latter quantity goes to 1 as n goes to infinity because p < 1 and p, k, β, L,M are all param-eters that do not depend on n. Hence each event Ai, for i ∈ [k], happens with high probability.Therefore, also

⋂i∈[k]Ai happens with high probability. So with high probability, for every α′

with ‖α′ − α‖∞ ≤ ξ and for every i ∈ [k], we have that argmaxCα′(z) | z ∈ x(i)` `∈[ni] ⊆

intBmτi (ci).

We are now ready to prove Theorem 2. In the proof we use Theorem 1 and Lemmas 4 and 7to 11.

4.3 Proof of Theorem 2

In this proof, for every i ∈ [k], we denote by x(i)∗ a median of x(i)

` `∈[ni]. Let (y, z) be the feasible

solution to (IP) that assigns each point x(i)` to the ball Bm

ri (ci) from which it is drawn. In particular,

in this solution we have yp = 1 if and only if p ∈ x(i)∗ | i ∈ [k]. Furthermore, we have zpq = 1 if

and only if yp = 1 and p, q are drawn from the same ball.To prove the theorem, we show that (y, z) is the unique optimal solution to (LP) with high

probability. Clearly, (y, z) is a feasible solution to (IP). We know from Theorem 1 that (y, z) is theunique optimal solution to (LP) if there exists α ∈ RP such that

Cα(a1) = · · · = Cα(ak) (28)

Cα(q) < Cα(a1) ∀q ∈ P \ aii∈[k] (29)

αq ≥ d(ai, q) ∀i ∈ [k], ∀q ∈ Ai (30)

αq < d(ai, q) ∀i ∈ [k], ∀q ∈ P \Ai. (31)

Let γ, α as in the statement of Theorem 2. For all i ∈ [k], we obtain ri < Di, and using thedefinition of αi, we obtain αi ∈ (ri, Di). Hence there exists ξ1 > 0 such that [αi − ξ1, αi + ξ1] ⊂(ri, Di) for all i ∈ [k]. From Lemma 9 with ai = αi− ξ1 and bi = αi + ξ1, we obtain that there existτi > 0, ∀i ∈ [k], such that ∀i, j ∈ [k], ∀z ∈ intBm

τi (ci), and ∀α′j ∈ [αj − ξ1, αj + ξ1], we have

Bmα′j

(z) ∩Bmrj (cj) =

Bmri (ci) if j = i

∅ otherwise.(32)

22

Page 23: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Since the assumptions of Lemma 11 are satisfied, there exists ξ2 > 0 such that with high probability,

for every α′ ∈ Rk with ‖α′ − α‖∞ ≤ ξ2 and for every i ∈ [k], argmaxCα′(z) | z ∈ x(i)` `∈[ni] ⊆

intBmτi (ci). Let ξ := minξ1, ξ2. For every i ∈ [k], let OPTi :=

∑`∈[ni]

d(x(i)∗ , x

(i)` ). We then know

from Lemma 8 that with high probability, for every i ∈ [k], we have |OPTi /ni − Ei| < ξ. From

Lemma 7, we know that with high probability, for every i ∈ [k], we have x(i)∗ ∈ intBm

τi (ci).For every i ∈ [k], fix α′i := αi + εi, where εi := OPTi /ni−Ei. For every q ∈ P , we set αq := α′i,

where i is the unique index in [k] with q ∈ Bmri (ci). We next show that, with this choice of α,

(28)–(31) are satisfied with high probability. Using the definition of α, it suffices to show that

Cα′(x

(1)∗ ) = · · · = Cα

′(x

(k)∗ ) (33)

Cα′(x

(i)` ) < Cα

′(x

(i)∗ ) ∀i ∈ [k], ∀` ∈ [ni] with x

(i)` 6= x

(i)∗ (34)

α′i ≥ d(x(i)∗ , x

(i)` ) ∀i ∈ [k], ∀` ∈ [ni] (35)

α′i < d(x(j)∗ , x

(i)` ) ∀i, j ∈ [k], i 6= j, ∀` ∈ [ni]. (36)

Since for every i ∈ [k], we have |εi| < ξ, we know that α′i ∈ (ri, Di). Since for every i ∈ [k], we have

x(i)∗ ∈ intBm

τi (ci), we have that (32) holds with z = x(i)∗ . Thus from Lemma 10 (with si = x

(i)∗ ) we

obtain (35), (36), and

Cα′(x

(i)∗ ) = niα

′i −

∑`∈[ni]

d(x(i)∗ , x

(i)` ) = ni(αi + εi)−OPTi

= ni

βi+

OPTi

ni

)−OPTi = γn+ OPTi−OPTi = γn ∀i ∈ [k].

which implies (33). For every i ∈ [k], let si ∈ intBmτi (ci) ∩ x

(i)` `∈[ni]. Since si ∈ intBm

τi (ci), forevery i, j ∈ [k], we have that z = si satisfies (32). From Lemma 10 we obtain

Cα′(si) = niα

′i −

∑`∈[ni]

d(si, x(i)` ) ≤ niα′i −

∑`∈[ni]

d(x(i)∗ , x

(i)` ) = Cα

′(x

(i)∗ ) ∀i ∈ [k].

Since from Lemma 4, the vector x(i)∗ is the unique median of x(i)

` `∈[ni] with probability one when

ni ≥ 3, the above inequality achieves equality if and only if si = x(i)∗ . Since ‖α′ − α‖∞ = ‖ε‖∞ <

ξ ≤ ξ2, for every i ∈ [k] we have argmaxCα′(z) | z ∈ x(i)` `∈[ni] ⊆ intBm

τi (ci). Thus, we know

that x(i)∗ is the unique point that achieves maxCα′(z) | z ∈ x(i)

` `∈[ni]. This implies (34).Hence (y, z) is the unique optimal optimal solution to (LP) with high probability.

4.4 Proof of Corollary 1

Let γ := α′−E1. Since for every i ∈ [k], βi = 1, ri = 1, and E1 = · · · = Ek, we have maxi∈[k] βi(ri−Ei) = 1−E1 and mini∈[k] βi(Di−Ei) = mini 6=j d(ci, cj)−1−E1. From 1 < α′ < mini 6=j d(ci, cj)−1we then obtain maxi∈[k] βi(ri−Ei) < γ < mini∈[k] βi(Di−Ei). Notice that for every i ∈ [k] we have

αi = α′ = Ei + γ = Ei +γ

βi.

Since for i ∈ [k], ci is the unique point that achieves maxGα(z) | z ∈ Bmri (ci), Theorem 2 implies

that (LP) achieves exact recovery with high probability.

23

Page 24: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

5 Exact recovery in the ESBM

In this section, we present our recovery results for the ESBM. We also show that for the ESBMwith some special structure, including the SBM, (LP) can perform even better.

For completeness, we start with the simple case m = 1. The next two theorems can be seen asa corollaries of Theorem 2.

Theorem 3. Consider the ESBM with m = 1. For every i ∈ [k], assume that the probability space(µi, B

mri (ci)) satisfies (a1), (a2), (a3). Let R := maxi∈[k] ri and β := maxi∈[k] βi. If for every i 6= j

we have d(ci, cj) > ri + rj + (1 + 2β)R, then (LP) achieves exact recovery with high probability.

Proof. It suffices to check that all assumptions of Theorem 2 are satisfied. For every i ∈ [k], denoteby Ei := Ed(x, ci), where x is a random vector drawn according to µi. Let γ := maxi∈[k] βi(2ri−Ei).Using the fact that ri ∈ (0, R], βi ∈ [1, β], and Ei ∈ (0, R], we can bound γ and obtain

maxi∈[k]

βi(ri − Ei) < γ < 2βR < minj∈[k]

rj + (1 + 2β)R−R < mini 6=j

d(ci, cj)− ri −R ≤ mini∈[k]

βi(Di − Ei).

Let αi := Ei + γβi

for every i ∈ [k]. It remains to show that for every i ∈ [k], ci is the uniquepoint that achieves maxGα(z) | z ∈ Bm

ri (ci). For every i ∈ [k], from the definition of γ we haveαi ≥ 2ri, thus for every z ∈ Bm

ri (ci), we have Bmαi(z) ∩ B

mri (ci) = Bm

ri (ci). For every i ∈ [k] wehave αi ≤ R + γ < (1 + 2β)R which implies Bm

αj (z) ∩ Bmrj (cj) = ∅ for every z ∈ Bm

ri (ci) and everyj ∈ [k] \ i. From Observation 2, we know that for every i ∈ [k] and for every z ∈ Bm

ri (ci) withz 6= ci, we have

Gα(z) =

∫ ri

−ri(αi − d(z, x))dµi(x) = αi − Ed(z, x) < αi − Ed(ci, x) = Gα(ci),

where the inequality follows from Lemma 5.The assumptions of Theorem 2 are satisfied, and so (LP) achieves exact recovery with high

probability.

Theorem 4. Consider the ESBM with m = 1. For every i ∈ [k], assume that the probabilityspace (µi, B

mri (ci)) satisfies (a1), (a2), (a3). For every i ∈ [k], assume ni = n, ri = 1, and

denote by Ei := Ed(x, ci), where x is a random vector drawn according to µi. We further assumeE1 = · · · = Ek. If for every i 6= j we have d(ci, cj) > 2 + 2, then (LP) achieves exact recovery withhigh probability.

Proof. It suffices to check that all assumptions of Corollary 1 are satisfied. Let Θ := mini 6=j d(ci, cj)−2 > 2, let ε ∈ (0,Θ − 2), and let α′ := 2 + ε. Note that we have 2 < α′ < Θ, which in particularimplies

1 < α′ < mini 6=j

d(ci, cj)− 1.

Let αi := α′ for every i ∈ [k]. It remains to show that for every i ∈ [k], ci is the unique pointthat achieves maxGα(z) | z ∈ Bm

1 (ci). For every i ∈ [k] and for every z ∈ Bm1 (ci), α

′ > 2 impliesBmαi(z) ∩ B

m1 (ci) = Bm

1 (ci) and α′ < Θ implies Bmαj (z) ∩ B

m1 (cj) = ∅ for every j ∈ [k] \ i. From

Observation 2, we know that for every i ∈ [k] and for every z ∈ Bm1 (ci) with z 6= ci, we have

Gα(z) =

∫ 1

−1(αi − d(z, x))dµi(x) = αi − Ed(z, x) < αi − Ed(ci, x) = Gα(ci),

where the inequality follows from Lemma 5.The assumptions of Corollary 1 are satisfied, and so (LP) achieves exact recovery with high

probability.

24

Page 25: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Theorem 4 implies that in the SBM with m = 1, a sufficient condition for (LP) to achieve exactrecovery is that the distance between any pair of points from the same ball is always smaller thanthe distance between any pair of points from different balls. We remark that under this assumptiona simple threshold algorithm can also achieve exact recovery. It is currently unknown if in the SBMwith m = 1 a pairwise distance smaller than 4 may be sufficient to guarantee exact recovery.

Next, we present our most interesting results, which consider exact recovery for the ESBM andthe SBM with m ≥ 2.

Theorem 5. Consider the ESBM with m ≥ 2. For every i ∈ [k], assume that the probability space(µi, B

mri (ci)) satisfies (a1), (a2), (a3). Let β, r,R ∈ R such that for every i ∈ [k] we have ri ∈ [r,R]

and βi ≤ β. Then there is a function ε(k,m) = C√k logm/m, where C is a positive constant, such

that, if for every i 6= j we have d(ci, cj) > (1 + β)R + maxri, rj + ε(k,m), then (LP) achievesexact recovery with high probability.

Theorem 6. Consider the ESBM with m ≥ 2. For every i ∈ [k], assume that the probabilityspace (µi, B

mri (ci)) satisfies (a1), (a2), (a3). For every i ∈ [k], assume ni = n, ri = 1, and

denote by Ei := Ed(x, ci), where x is a random vector drawn according to µi. We further assumeE1 = · · · = Ek. If for every i 6= j we have d(ci, cj) > 2 + 1.29, then (LP) achieves exact recoverywith high probability.

Theorem 7. Consider the ESBM with m ≥ 2. For every i ∈ [k], assume that the probabilityspace (µi, B

mri (ci)) satisfies (a1), (a2), (a3). For every i ∈ [k], assume ni = n, ri = 1, and

denote by Ei := Ed(x, ci), where x is a random vector drawn according to µi. We further assumeE1 = · · · = Ek. Then there is a function ε(k,m) = C

√k logm/m, where C is a positive constant,

such that, if for every i 6= j we have d(ci, cj) > 2 + ε(k,m), then (LP) achieves exact recovery withhigh probability.

Theorem 8. Consider the SBM with m ≥ 2. Assume that the probability space (µ,Bm1 (0)) satisfies

(a1), (a2), (a3). Assume that µ has a density function p(x) and assume that, for x1, x2 ∈ Bm1 (0)

with ‖x1‖ < ‖x2‖ we have p(x1) > p(x2). If for every i 6= j we have d(ci, cj) > 2, then (LP)achieves exact recovery with high probability.

For the SBM, Theorem 6 gives the best known sufficient condition for exact recovery, whichdoes not depend on k or m. Furthermore, if k does not grow fast, Theorem 7 gives a near optimalcondition for exact recovery in high dimension. Furthermore, Theorem 8 corrects the result in [8]by adding assumptions on the probability measure. Beyond the SBM, Theorem 5 shows that ifwe consider the much more general ESBM, exact recovery still happens, as long as the numbersof points drawn from each ball have the same order. We already discussed in Section 3 that thisassumption is necessary and cannot be dropped (see Example 1 in Appendix A).

The remainder of the section is devoted to proving Theorems 5 to 8.

5.1 Analysis of Gα(z)

According to Section 4, we know that exact recovery is closely related to the function Gα(z). Inthis section, we present an in-depth study of the function Gα(z). These results on Gα(z) will beused to prove our exact recovery results in dimension m ≥ 2, i.e., Theorems 5 to 8. This is why inseveral results in this section we assume m ≥ 2.

25

Page 26: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

5.1.1 The random variable θ

In the following, for r ≥ 0, we denote by µr the uniform probability measure with support Sm−1r (0).

Let v be a fixed unit vector in Rm and let x be a random vector in Rm drawn according to µ1.We define the random variable θ(x) to be the angle between v and x. Since both v and x are unitvectors we can write

θ(x) := arccos〈v, x〉 ∈ [0, π].

We can then use µ(θ) to denote the probability measure of θ. In the next observation we showthat the probability measure µ(θ) also arises from probability measures more general than µ1. Werecall that two random variables A,B have the same probability measure if for every ψ ∈ R, wehave P(A ≤ ψ) = P(B ≤ ψ).

Observation 12. Let (µ,Bmr (0)) be a probability space that satisfies (a1), (a3). Let v be a fixed

unit vector in Rm, and let x be a random vector in Rm drawn according to µ. We define the randomvariable θ′(x) to be the angle between v and x if x 6= 0, and θ′(x) := π/2 if x = 0. Then θ′ has thesame probability measure as θ.

Proof. Since v is a unit vector we have for every x 6= 0,

θ′(x) = arccos〈v, x

‖x‖〉 ∈ [0, π].

If x = 0, we have θ′(0) = π/2.Since (µ,Bm

r (0)) satisfies (a3), we have P(x = 0) = 0. Since (µ,Bmr (0)) satisfies (a1), we have

that, for x 6= 0, x/ ‖x‖ is a random vector drawn according to µ1. So for every ψ ∈ [0, π], we have

P(θ′(x) ≤ ψ) = P(θ′(x) ≤ ψ, x 6= 0) + P(θ′(x) ≤ ψ, x = 0)

= P(θ′(x) ≤ ψ | x 6= 0) P(x 6= 0) = P(θ(x) ≤ ψ).

We then obtain that θ′ and θ have the same probability measure.

When m ≥ 2 we note that the random variable θ has a density function and we denote it byp(m)(θ) := dµ/dθ. In the remainder of this section we study the density function p(m)(θ) thus wealways assume m ≥ 2. In the following, we denote by Γ(x) the gamma function.

Observation 13. Let m ≥ 2. We have

p(m)(θ) =1√π

Γ(m2 )

Γ(m−12 )

sinm−2 θ.

Proof. Let ψ be a fixed angle in [0, π]. We know that those x ∈ Sm−11 (0) such that θ(x) = ψ

form a m− 2 dimensional sphere in Rm centered at v cosψ with radius sinψ, which we denote bySm−2

sinψ (v cosψ). Formally, we define

Sm−2sinψ (v cosψ) := x ∈ Sm−1

1 (0) | θ(x) = ψ.

In the following we denote by λm−1(·) the (m−1)-dimensional volume and by λm−2(·) the (m−2)-dimensional volume. Then

λm−1(x ∈ Sm−11 (0) | θ(x) ≤ ψ) =

∫ ψ

0λm−2(Sm−2

sin θ (v cos θ))dθ = λm−2(Sm−21 (0))

∫ ψ

0sinm−2 θdθ.

26

Page 27: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

In particular,

λm−1(Sm−11 (0)) = λm−1(x ∈ Sm−1

1 (0) | θ(x) ≤ π) = λm−2(Sm−21 (0))

∫ π

0sinm−2 θdθ

= λm−2(Sm−21 (0))

√π

Γ(m−12 )

Γ(m2 ).

Since x is drawn uniformly from Sm−11 (0), we know that

P(θ ≤ ψ) =λm−1(x ∈ Sm−1

1 (0) | θ(x) ≤ ψ)λm−1(Sm−1

1 (0))=

1√π

Γ(m2 )

Γ(m−12 )

∫ ψ

0sinm−2 θdθ.

Thus, we obtain

p(m)(θ) =1√π

Γ(m2 )

Γ(m−12 )

sinm−2 θ.

Observation 14. Let m ≥ 2. Then there exists a threshold sm ∈ (0, 1) such that

p(m)(θ)− p(m+1)(θ)

≥ 0 if 0 ≤ sin θ ≤ sm,< 0 if sm < sin θ ≤ 1.

Proof. Using Observation 13 we can write

p(m)(θ)− p(m+1)(θ) =1√π

Γ(m2 )

Γ(m−12 )

sinm−2 θ − 1√π

Γ(m+12 )

Γ(m2 )sinm−1 θ

=

(Γ(m2 )

Γ(m−12 )−

Γ(m+12 )

Γ(m2 )sin θ

)1√π

sinm−2 θ.

(37)

We set

sm :=Γ(m2 )2

Γ(m−12 )Γ(m+1

2 ).

Since Γ(x) is a positive strictly logarithmically convex function for x ∈ (0,∞), we have sm ∈ (0, 1).We note that if sin θ = sm, then p(m)(θ) − p(m+1)(θ) = 0. Since the gamma function is positive,when 0 ≤ sin θ < sm, from (37) we obtain

p(m)(θ)− p(m+1)(θ) ≥ 0.

On the other hand, when sm < sin θ ≤ 1, from (37) we obtain

p(m)(θ)− p(m+1)(θ) < 0.

Observation 15. Let m ≥ 2 and let θ ≤ π2 . Let g(θ) be a nonnegative decreasing function on

(0, θ). Then we have∫ θ

0 g(θ)(p(m)(θ)− p(m+1)(θ))dθ ≥ 0.

27

Page 28: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Proof. Let sm ∈ (0, 1) be the threshold for p(m) − p(m+1) from Observation 14. Let ψ ∈ (0, π/2)such that sinψ = sm. We then have

p(m)(θ)− p(m+1)(θ)

≥ 0 if 0 ≤ θ ≤ ψ,< 0 if ψ < θ ≤ π/2.

We consider separately two cases. In the first case we assume θ ≤ ψ. Since p(m)(θ)−p(m+1)(θ) ≥0 when θ ∈ (0, θ) and since g(θ) is a nonnegative decreasing function on (0, θ), we obtain∫ θ

0g(θ)(p(m)(θ)− p(m+1)(θ))dθ ≥ g(θ)

∫ θ

0(p(m)(θ)− p(m+1)(θ))dθ ≥ 0.

In the second case we assume ψ < θ ≤ π2 . We have∫ θ

0g(θ)(p(m)(θ)−p(m+1)(θ))dθ =

∫ ψ

0g(θ)(p(m)(θ)− p(m+1)(θ))dθ +

∫ θ

ψg(θ)(p(m)(θ)− p(m+1)(θ))dθ

≥ g(ψ)

∫ ψ

0(p(m)(θ)− p(m+1)(θ))dθ + g(ψ)

∫ θ

ψ(p(m)(θ)− p(m+1)(θ))dθ

= g(ψ)

∫ θ

0(p(m)(θ)− p(m+1)(θ))dθ = −g(ψ)

∫ π2

θ(p(m)(θ)− p(m+1)(θ))dθ ≥ 0.

The last equality follows from the fact that∫ π

20 p(m)(θ)dθ =

∫ π2

0 p(m+1)(θ)dθ = 12 . The last inequality

uses the fact that g(ψ) ≥ 0 and∫ π

2

θ(p(m)(θ)− p(m+1)(θ))dθ ≤ 0.

Lemma 16. Let m ≥ 2 and let [φ1, φ2] ⊆ [0, π] such that π2 6∈ [φ1, φ2]. Denote by φ an angle

θ ∈ φ1, φ2 for which sin θ is the largest. Then P(θ ∈ [φ1, φ2]) <√π

2

√m2 sinm−2 φ.

Proof. Gautschi’s inequality implies that for every x > 0 and every s ∈ (0, 1), the following in-equality holds

Γ(x+ 1)

Γ(x+ s)< (x+ 1)1−s.

We apply Gautschi’s inequality with x = m/2− 1, for m ≥ 3, and s = 1/2. Hence, when m ≥ 3 wehave

Γ(m2 )

Γ(m−12 )

<

√m

2.

Notice that the above inequality also holds for m = 2 since

Γ(1)

Γ(12)

=1√π< 1 =

√2

2.

Using Observation 13 we obtain

P(θ ∈ [φ1, φ2]) =

∫ φ2

φ1

p(m)(θ)dθ =

∫ φ2

φ1

1√π

Γ(m2 )

Γ(m−12 )

sinm−2 θdθ

≤ 1√π

√m

2(φ2 − φ1) sinm−2 φ <

√π

2

√m

2sinm−2 φ,

where the last inequality holds because φ2 − φ1 <π2 .

28

Page 29: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

5.1.2 Three functions related to Gα(z)

According to Observation 2, we have

Gα(z) =∑i∈[k]

βi

∫Bmαi (z)∩B

mri

(ci)(αi − d(z, x))dµi(x).

Then, for every vector z ∈ Rm, the function Gα(z) can be seen as the sum of the contributionsthat z gets from each singular ball Bm

ri (ci). Motivated by this observation, in this section we willanalyze Gα(z) by defining three new functions. The first function can be seen as Gα(c1) − Gα(z)for z ∈ Bm

r1(c1), in the case k = 1, c1 = 0, α1 > r1, and β1 = 1.

Definition 8. Let (µ,Bmr (0)) be a probability space that satisfies (a1) and let α > r. We define

the function H(α,µ,m)(z) : Bmr (0)→ R as

H(α,µ,m)(z) :=

∫Bmr (0)

(α− ‖x‖)dµ(x)−∫Bmα (z)∩Bmr (0)

(α− d(z, x))dµ(x).

The second function is the special case of H(α,µ,m) where µ = µr.

Definition 9. Let r, α ∈ R+ with α > r. We define the function T (α,m)(z) : Bmr (0)→ R as

T (α,m)(z) :=

∫Bmr (0)

(α− ‖x‖)dµr(x)−∫Bmα (z)∩Bmr (0)

(α− d(z, x))dµr(x).

The third function can be seen as the part of Gα(z) for z /∈ Bmr1(c1), coming from the ball 1, in

the case c1 = 0, α1 ≥ r1, β1 = 1.

Definition 10. Let (µ,Bmr (0)) be a probability space that satisfies (a1) and let α > r. We define

the function R(α,µ,m)(z) : Rm \Bmr (0)→ R as

R(α,µ,m)(z) :=

∫Bmα (z)∩Bmr (0)

(α− d(z, x))dµ(x).

Since the probability measures considered in Definitions 8 to 10 are invariant under rotationscentered in the origin, we obtain that H(α,µ,m)(z), T (α,m)(z), and R(α,µ,m)(z) are also invariantunder rotations centered in the origin. Therefore, in some parts of this section, we fix a unit vectorv, and we study the three above functions evaluated in points of the form z = tv, where t = ‖z‖ ≥ 0.

The rest of the section is devoted to deriving bounds for H(α,µ,m), T (α,m), and R(α,µ,m). Westart with an observation that will be used several times in the analysis of H(α,µ,m) and T (α,m).

Observation 17. Let r, α ∈ R+ with α > r and let s ∈ [0, r]. Let v be a unit vector in Rm and lett ∈ [0, r]. Then H(α,µs,m)(tv) can be written in the form

H(α,µs,m)(tv) =

t if s = 0

α− s−∫ π

0

(α−√s2 + t2 − 2st cos θ

)dµ(θ) = Ed(tv, x)− s if 0 < s ≤ α− t

α− s−∫ θ

0

(α−√s2 + t2 − 2st cos θ

)dµ(θ) if s ≥ α− t.

In the second case x is a random vector drawn according to µs. In the third case

θ := arccoss2 + t2 − α2

2st≤ π.

29

Page 30: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Proof. Note that we can write H(α,µs,m)(tv) in the form

H(α,µs,m)(tv) = α− s−∫Bmα (tv)∩Sm−1

s (0)(α− d(tv, x)) dµs(x). (38)

If s = 0 we have Sm−10 (0) ⊆ Bm

α (tv) and we obtain

H(α,µ0,m)(tv) = α−∫Sm−10 (0)

(α− d(tv, x)) dµ0(x) = α− α+ t = t.

In the rest of the proof we assume s > 0. For x ∈ Sm−1s (0), we have d(tv, x) =

√s2 + t2 − 2st cos θ′,

where θ′ is the angle between v and x. This implies that the function under the integral sign in(38) can be written as a function of θ′. Let x be a random vector in Rm drawn according to µs

and denote by µ the probability measure of θ′. According to Observation 12, the random variableθ′ has the same probability measure as the random variable θ studied in Section 5.1.1.

We now consider separately two cases. In the first case we assume 0 < s ≤ α− t. We then haveSm−1s (0) ⊆ Bm

α (tv) and from (38) we obtain

H(α,µs,m)(tv) = α− s−∫Sm−1s (0)

(α− d(tv, x)) dµs(x), (39)

thus H(α,µs,m)(tv) = Ed(tv, x)− s, where x is a random vector drawn according to µs. From (39)we continue

H(α,µs,m)(tv) = α− s−∫ π

0

(α−

√s2 + t2 − 2st cos θ′

)dµ(θ′)

= α− s−∫ π

0

(α−

√s2 + t2 − 2st cos θ

)dµ(θ).

In the second case we assume s ≥ α− t. We define the angle

θ := arccoss2 + t2 − α2

2st,

and observe that θ ≤ π. Then we get

H(α,µs,m)(tv) = α− s−∫ θ

0

(α−

√s2 + t2 − 2st cos θ′

)dµ(θ′)

= α− s−∫ θ

0

(α−

√s2 + t2 − 2st cos θ

)dµ(θ).

Analysis of the function T (α,m). Our goal in the next lemmas is to study the properties ofT (α,m)(z) in order to obtain a lower bound for it.

Lemma 18. Let r, α ∈ R+ with α > r. Let z ∈ Bmr (0) \ 0 with ‖z‖ ≤ α − r. Then we have

T (α,m)(z) > 0.

Proof. From Observation 17 with s = r and tv = z we have T (α,m)(z) = Ed(z, x)− r = Ed(z, x)−E ‖x‖, where x is a random vector drawn according to µr. Since z 6= 0, from Lemmas 5 and 6, weobtain Ed(z, x)− E ‖x‖ > 0.

30

Page 31: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Lemma 19. Let m ≥ 2 and let r ∈ R+. Let z ∈ Bmr (0). Then T (α,m)(z) is strictly increasing in α

when α ∈ (r, r + ‖z‖) and is constant in α when α ≥ r + ‖z‖.

Proof. Let v be a unit vector in Rm. Since T (α,m)(z) is invariant under rotations centered in theorigin, it suffices to consider vectors z ∈ Bm

r (0) of the form z = tv, for t ∈ [0, r].Consider first the case α ≥ r + t. From Observation 17 with s = r we have T (α,m)(tv) =

Ed(tv, x)− r, where x is a random vector drawn according to µr. Hence in this case T (α,m)(tv) isconstant in α.

Next, consider the case α ∈ (r, r + t). From Observation 17 with s = r we have

T (α,m)(tv) = α− r −∫ θ

0

(α−

√r2 + t2 − 2rt cos θ

)dµ(θ)

= α− r −∫ θ

0

(α−

√r2 + t2 − 2rt cos θ

)p(m)(θ)dθ,

where

θ := arccosr2 + t2 − α2

2rt< π.

We derive with respect to the variable α and obtain

∂T (α,m)

∂α(tv) = 1−

∫ θ

0dµ(θ)−

(α−

√r2 + t2 − 2rt cos θ

)p(m)(θ)

∂θ

∂α

= 1−∫ θ

0dµ(θ) = 1− P (θ ≤ θ) > 0.

Here, the second equality holds because α−√r2 + t2 − 2rt cos θ = 0 and the second equality holds

because θ < π. Hence in this case T (α,m)(tv) is strictly increasing in α.

Lemma 20. Let m ≥ 2 and let r, α ∈ R+ with α > r. Let z ∈ Bmr (0) and z′ ∈ Bm+1

r (0) with‖z‖ = ‖z′‖. Then we have T (α,m+1)(z′) ≥ T (α,m)(z).

Proof. Let z = tv and z′ = tv′, where v is a unit vector in Rm and v′ is a unit vector in Rm+1.Then according to Observation 17 with s = r, we have

T (α,m+1)(tv′)− T (α,m)(tv) =

∫ θ

0

(α−

√r2 + t2 − 2rt cos θ

)(p(m)(θ)− p(m+1)(θ))dθ,

where

θ :=

π if t ≤ α− r,arccos r

2+t2−α2

2rt < π if t > α− r.

We let f(t, θ) := α−√r2 + t2 − 2rt cos θ and write

T (α,m+1)(tv′)− T (α,m)(tv) =

∫ θ

0f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ. (40)

We define f(t, θ) := f(t, π − θ) = α −√r2 + t2 + 2rt cos θ. It can be checked that f(t, θ) is a

decreasing function in θ, when θ ∈ (0, π) and t is fixed in [0, r]. Furthermore, we have f(t, θ) ≥ 0.Thus f(t, θ) ≥ 0 when θ ≤ θ. Next, we will discuss several cases for θ.

31

Page 32: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

In the first case we assume θ ≤ π2 . From (40) and Observation 15, we obtain T (α,m+1)(tv′) −

T (α,m)(tv) ≥ 0.In the second case we assume θ > π

2 . Let sm ∈ (0, 1) be the threshold for pm(θ) − p(m+1)(θ)from Observation 14. Let ψ ∈ (π/2, π) such that sinψ = sm.

We first show that ∫ π

θf(t, θ)(p(m)(θ)− p(m+1)(θ))dθ ≤ 0. (41)

If θ = π, (41) obviously hold, so we assume θ < π. Now assume θ ∈ [ψ, π). We have f(t, θ) = 0,thus f(t, θ) ≤ 0 for every θ ∈ [θ, π]. On the other hand we have p(m)(θ)− p(m+1)(θ) ≥ 0 for everyθ ∈ [θ, π]. Hence (41) holds also in this case. So we now assume π

2 < θ < ψ. We notice that∫ π

θf(t, θ)(p(m)(θ)− p(m+1)(θ))dθ = −

∫ π

θ−f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ

= −∫ 0

π−θ−f(t, π − ξ)(p(m)(π − ξ)− p(m+1)(π − ξ))d(π − ξ)

= −∫ π−θ

0−f(t, ξ)(p(m)(ξ)− p(m+1)(ξ))dξ.

Here, in the second equality we perform the change of variable θ = π−ξ and in the third equality weuse the fact that p(m)(ξ) = p(m)(π− ξ) for every m and every ξ. We observe that, for ξ ∈ [0, π− θ],−f(t, ξ) is a decreasing function and

−f(t, ξ) =√r2 + t2 + 2rt cos ξ−α ≥

√r2 + t2 + 2rt cos(π − θ)−α =

√r2 + t2 − 2rt cos θ−α = 0.

By Observation 15, we conclude that∫ π−θ

0 −f(t, ξ)(p(m)(ξ) − p(m+1)(ξ))dξ ≥ 0, thus (41) holds.This concludes the proof of (41).

Next, we show

T (α,m+1)(tv′)− T (α,m)(tv) ≥∫ π

2

0(f(t, θ) + f(t, θ))(p(m)(θ)− p(m+1)(θ))dθ. (42)

From (40) we have

T (α,m+1)(tv′)− T (α,m)(tv) =

∫ θ

0f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ

=

∫ π2

0f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ +

∫ θ

π2

f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ.

Now note that∫ θ

π2

f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ ≥∫ π

π2

f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ

=

∫ 0

π2

f(t, π − ξ)(p(m)(π − ξ)− p(m+1)(π − ξ))d(π − ξ)

=

∫ π2

0f(t, π − ξ)(p(m)(ξ)− p(m+1)(ξ))dξ.

32

Page 33: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Here, in the inequality we use (41), in the first equality we perform the change of variable θ = π−ξ,and in the last equality, we use the fact that p(m)(θ) = p(m)(π − θ) for every m and every θ. Thuswe continue

T (α,m+1)(tv′)− T (α,m)(tv) ≥∫ π

2

0f(t, θ)(p(m)(θ)− p(m+1)(θ))dθ +

∫ π2

0f(t, π − ξ)(p(m)(ξ)− p(m+1)(ξ))dξ

=

∫ π2

0(f(t, θ) + f(t, θ))(p(m)(θ)− p(m+1)(θ))dθ,

where in the equality we use the fact that f(t, θ) = f(t, π − θ). This concludes the proof of (42).To finish the proof it suffices to show that f(t, θ) + f(t, θ) is a nonnegative decreasing function

for θ ∈ (0, π2 ). In fact, using (42) and Observation 15, we can then conclude that T (α,m+1)(tv′) −T (α,m)(tv) ≥ 0.

We derive f(t, θ) + f(t, θ) with respect to the variable θ and obtain

∂(f + f)

∂θ(t, θ) =

∂(

2α−√r2 + t2 − 2rt cos θ −

√r2 + t2 + 2rt cos θ

)∂θ

= rt sin θ

(1√

r2 + t2 + 2rt cos θ− 1√

r2 + t2 − 2rt cos θ

).

Hence the derivative is nonpositive for θ ∈ (0, π2 ) and so f(t, θ) + f(t, θ) is decreasing for θ ∈ (0, π2 ).

This implies that, for θ ∈ (0, π2 ), we have f(t, θ) + f(t, θ) ≥ f(t, π2 ) + f(t, π2 ) = 2α− 2√r2 + t2. The

latter quantity is nonnegative. In the case t > α − r, this is because cos θ = r2+t2−α2

2rt < 0 whenθ > π

2 . In the case t ≤ α− r, this is because we have α2 ≥ (r + t)2 ≥ r2 + t2.

In the next lemma, we use Lemma 16 to bound the function T (α,m)(z).

Lemma 21. Let m ≥ 2, let r ∈ R+, let ε ∈ (0, 1), and let α = r(1 + ε). Let z ∈ Bmr (0) with

‖z‖ ≥ εr. Then we have

T (α,m)(z) ≥ rε2

8− r√πm

2

(1− ε2

16

)m−22

.

Proof. Let v be a unit vector in Rm. Since T (α,m)(z) is invariant under rotations centered in theorigin, it suffices to consider vectors z ∈ Bm

r (0) of the form z = tv, for t ∈ [εr, r]. We define

θε := arccosε

4<π

2, θ := arccos

r2 + t2 − α2

2rt< π.

We consider two cases. In the first case we assume θ ≤ θε. Then, according to Observation 17,we have

T (α,m)(tv) = α− r −∫ θ

0

(α−

√r2 + t2 − 2rt cos θ

)p(m)(θ)dθ

≥ α(1− P(θ ∈ (0, θ)))− r ≥ α(1− P(θ ∈ (0, θε)))− r

≥ r√

1 +ε2

2(1− P(θ ∈ (0, θε)))− r ≥ r

√1 +

ε2

2

(1−√π

2

√m

2sinm−2 θε

)− r

= r

√1 +

ε2

2

(1−√π

2

√m

2

(1− ε2

16

)m−22

)− r.

33

Page 34: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

The third inequality holds because α = r(1 + ε) > r√

1 + ε2/2, and the fourth inequality followsfrom Lemma 16.

In the second case we assume θ > θε. Then, according to Observation 17, we have

T (α,m)(tv) = α− r −∫ θ

0

(α−

√r2 + t2 − 2rt cos θ

)p(m)(θ)dθ

= α− r −∫ θε

0

(α−

√r2 + t2 − 2rt cos θ

)p(m)(θ)dθ −

∫ θ

θε

(α−

√r2 + t2 − 2rt cos θ

)p(m)(θ)dθ

≥ α− r − αP(θ ∈ (0, θε))−∫ θ

θε

(α−

√r2 + t2 − 2rt cos θ

)p(m)(θ)dθ

≥ α− r − αP(θ ∈ (0, θε))−

(α− r

√1 +

ε2

2

)P(θ ∈ (θε, θ))

≥ α− r − αP(θ ∈ (0, θε))−

(α− r

√1 +

ε2

2

)(1− P(θ ∈ (0, θε)))

= r

√1 +

ε2

2(1− P(θ ∈ (0, θε)))− r ≥ r

√1 +

ε2

2

(1−√π

2

√m

2sinm−2 θε

)− r

= r

√1 +

ε2

2

(1−√π

2

√m

2

(1− ε2

16

)m−22

)− r.

Here, the second inequality holds because√r2 + t2 − 2rt cos θ ≥

√r2 + t2 − 2rt cos θε ≥

√r2 + ε2r2 − 2εr2 cos θε = r

√1 + ε2/2

when θ ≥ θε and t ≥ εr and the last inequality follows from Lemma 16.Since (1 + ε2/8) ≤

√1 + ε2/2 ≤ 2, we obtain

T (α,m)(z) ≥ r√

1 +ε2

2

(1−√π

2

√m

2

(1− ε2

16

)m−22

)− r ≥ rε2

8− r√πm

2

(1− ε2

16

)m−22

.

Analysis of the function H(α,µ,m). Our next goal is to derive a lower bound on H(α,µ,m) usingthe lower bound for T (α,m) given in Lemma 21.

Lemma 22. Let (µ,Bmr (0)) be a probability space with m ≥ 2 that satisfies (a1) and let α > r. Let

z ∈ Bmr (0). Then we have H(α,µ,m)(z) ≥ T (α,m)(z).

Proof. Let x be a random vector drawn according to µ. Since (µ,Bmr (0)) satisfies (a1), we know

that, conditioned on the event that ‖x‖ = s, x is drawn according to µs.In the following, we let D := Bm

α (z) ∩Bmr (0), we denote by ID(x) the indicator function of D,

and by ν be the probability measure of ‖x‖. Then we have

H(α,µ,m)(z) =

∫Bmr (0)

(α− ‖x‖ − (α− d(z, x))ID(x)) dµ(x)

=

∫ r

0dν(s)

∫Sm−1s (0)

(α− ‖x‖ − (α− d(z, x))ID(x)) dµs(x)

=

∫ r

0H(α,µs,m)(z)dν(s).

34

Page 35: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

To complete the proof of the lemma it suffices to show that the scalar r achieves

minH(α,µs,m)(z) | s ∈ [0, r]

. (43)

In fact, this implies

H(α,µ,m)(z) =

∫ r

0H(α,µs,m)(z)dν(s) ≥ H(α,µr,m)(z) = T (α,m)(z).

Let v be a unit vector in Rm. Since H(α,µs,m)(z) is invariant under rotations centered in theorigin, it suffices to consider vectors z ∈ Bm

r (0) of the form z = tv, for t ∈ [0, r].If s = 0, then from Observation 17 we have H(α,µs,m)(tv) = t. If 0 < s ≤ α− t, Observation 17

implies H(α,µs,m)(tv) = Ed(tv, x)− s = Ed(tv, x)− E ‖x‖ = E(d(tv, x)− ‖x‖) ≤ Et = t, where x isa random vector drawn according to µs. So we only need to show (43) for s ∈ (0, r] rather thans ∈ [0, r].

We now consider separately two cases. In the first case we assume s ∈ (0, α − t]. FromObservation 17 we can write

H(α,µs,m)(tv) = α− s−∫ π

0

(α−

√s2 + t2 − 2st cos θ

)dµ(θ).

We derive with respect to the variable s and obtain

∂H(α,µs,m)

∂s(tv) =

∫ π

0

s− t cos θ√s2 + t2 − 2st cos θ

dµ(θ)− 1 ≤ 0,

because (s − t cos θ)/√s2 + t2 − 2st cos θ ≤ 1. This implies that the function H(α,µs,m)(tv) is de-

creasing in s, when s ∈ (0, α− t].In the second case we assume s ∈ (α− t, r]. From Observation 17 we can write

H(α,µs,m)(tv) = α− s−∫ θ

0

(α−

√s2 + t2 − 2st cos θ

)dµ(θ)

= α− s−∫ θ

0

(α−

√s2 + t2 − 2st cos θ

)p(m)(θ)dθ,

where

θ := arccoss2 + t2 − α2

2st< π.

We derive with respect to the variable s and obtain

∂H(α,µs,m)

∂s(tv) = −1 +

∫ θ

0

s− t cos θ√s2 + t2 − 2st cos θ

dµ(θ)−(α−

√s2 + t2 − 2st cos θ

)p(m)(θ)

∂θ

∂s

= −1 +

∫ θ

0

s− t cos θ√s2 + t2 − 2st cos θ

dµ(θ) ≤ −1 + P(θ ≤ θ) < 0.

Here, the second equality holds because α−√s2 + t2 − 2st cos θ = 0 and the first inequality holds

because (s− t cos θ)/√s2 + t2 − 2st cos θ ≤ 1 and θ < π. So we conclude that H(α,µs,m)(tv) is also

decreasing in s, when s ∈ (α− t, r].The above two cases imply that H(α,µs,m)(tv) is decreasing in s, when s ∈ (0, r]. Thus, for every

z ∈ Bmr (0), the scalar r achieves (43).

35

Page 36: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

According to Lemma 22, we know that every lower bound for T (α,m)(z) is also a lower boundfor H(α,µ,m)(z).

Lemma 23. Let (µ,Bmr (0)) be a probability space with m ≥ 2 that satisfies (a1), let ε ∈ (0, 1), and

let α = r(1 + ε). Let z ∈ Bmr (0) with ‖z‖ ≥ εr. Then we have

H(α,µ,m)(z) ≥ rε2

8− r√πm

2

(1− ε2

16

)m−22

.

Proof. Directly from Lemmas 21 and 22.

Analysis of the function R(α,µ,m). In the next lemma, we will provide an upper bound forR(α,µ,m)(z)

Lemma 24. Let (µ,Bmr (0)) be a probability space with m ≥ 2 that satisfies (a1), (a3) and let

α > r. Let z ∈ Rm with ‖z‖ ∈ (α, α+ r). Then we have

R(α,µ,m)(z) ≤ (α+ r − ‖z‖)√π

2

√m

2

‖z‖

)m−2

.

Proof. For every x ∈ Bmr (0), let θ′(x) be the angle between x and z. Let D := Bm

α (z) ∩ Bmr (0).

For every x ∈ D, we denote by Πx(z) the orthogonal projection of z on the line containing 0 andx. Then we know that for every x ∈ D we have

sin θ′(x) =d(z,Πx(z))

‖z‖≤ d(z, x)

‖z‖≤ α

‖z‖< 1.

Let θ∗ := arcsin(α/ ‖z‖) ∈ (0, π/2). Thus we obtain D ⊆ x ∈ Bmr (0) | θ′(x) ≤ θ∗.

According to Observation 12, the random variable θ′ has the same probability measure as therandom variable θ studied in Section 5.1.1. So we obtain

R(α,µ,m)(z) =

∫Bmα (z)∩Bmr (0)

(α− d(z, x))dµ(x) ≤ (α+ r − ‖z‖)P(x ∈ D)

≤ (α+ r − ‖z‖)P(θ ≤ θ∗) ≤ (α+ r − ‖z‖)√π

2

√m

2

‖z‖

)m−2

,

where the first inequality holds because d(z, x) ≥ ‖z‖−r and the last inequality holds by Lemma 16.

5.2 Proof of Theorem 5

It suffices to check that all assumptions of Theorem 2 are satisfied. For every i 6= j, we defineΘij > 0 so that d(ci, cj) = (1 + β)R + maxri, rj + 2Θij . We also define Θ := mini 6=j Θij andγ := βR + Θ. For every i ∈ [k], denote by Ei := Ed(x, ci), where x is a random vector drawnaccording to µi. We can then bound γ as follows.

maxi∈[k]

βi(ri − Ei) < βR < γ < (1 + β)R+ 2Θ−R ≤ mini 6=j

d(ci, cj)−maxri, rj −R ≤ mini∈[k]

βi(Di − Ei).

For every i ∈ [k], we define αi := Ei + γβi

. It remains to show that for every i ∈ [k], ci is theunique point that achieves maxGα(z) | z ∈ Bm

ri (ci). Using the fact that ri ∈ [r,R], βi ∈ [1, β],and Ei ∈ (0, R] for i ∈ [k], we obtain

ri < R+Θ

β=γ

β< αi ≤ R+ γ = (1 + β)R+ Θ < min

j∈[k]\id(ci, cj)− ri = Di. (44)

36

Page 37: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Lemma 9 implies that Bmαj (ci) ∩ B

mrj (cj) = ∅ for every j ∈ [k] \ i. From Observation 2, we know

that for every i ∈ [k],

Gα(ci) = βi

∫Bmri (ci)

(αi − d(ci, x))dµi(x).

From Observation 2 we obtain that for every z ∈ Bmri (ci),

Gα(ci)−Gα(z) = βi

(∫Bmri (ci)

(αi − d(ci, x))dµi(x)−∫Bmαi (z)∩B

mri

(ci)(αi − d(z, x))dµi(x)

)

−∑

j∈[k]\i

βj

∫Bmαj (z)∩Bmrj (cj)

(αj − d(z, x))dµj(x).

(45)

It then suffices to show that, when Θ is large, the right hand side of (45) is positive for everyz ∈ Bm

ri (ci) \ ci. So we now fix a vector z in Bmri (ci) \ ci.

From (44) we obtain

αi > R+Θ

β=

(1 +

Θ

βR

)R ≥

(1 +

Θ

βR

)ri. (46)

We now consider separately two cases.In the first case we assume d(ci, z) ≤ Θri/(βR). Notice that under this assumption, for every

j ∈ [k] \ i and for every x ∈ Bmαj (z), y ∈ B

mrj (cj), we have

d(x, y) ≥ d(z, cj)− rj − αj ≥ d(ci, cj)− d(z, ci)− rj − αj≥ d(ci, cj)− d(z, ci)− rj − (1 + β)R−Θ

≥ 2Θij −Θ− d(z, ci) ≥ Θ− d(z, ci) > 0,

where the third inequality follows from (44). So we must have Bmαj (z)∩B

mrj (cj) = ∅ for j ∈ [k]\i.

Therefore, from (45) we have

Gα(ci)−Gα(z) = βi

(∫Bmri (ci)

(αi − d(ci, x))dµi(x)−∫Bmαi (z)∩B

mri

(ci)(αi − d(z, x))dµi(x)

)= βiH

(αi,µ′i,m)(z − ci) ≥ βiT (αi,m)(z − ci) > 0,

where µ′i is the image of µi under the translation x′ = x−ci. The first inequality above follows fromLemma 22 and the last inequality follows from Lemma 18 because from (46) we have d(ci, z) ≤Θri/(βR) < αi − ri. Thus, in the first case Theorem 2 implies that (LP) achieves exact recoverywith high probability.

In the remainder of the proof we only need to consider the second case, where we assumed(ci, z) > Θri/(βR). We notice that in this case d(ci, z) ≤ ri implies Θ/(βR) < 1. We first showthat for every j ∈ [k] \ i, we have∫

Bmαj (z)∩Bmrj (cj)(αj − d(z, x))dµj(x) ≤ R

√π

2

√m

2

(1− Θ

(1 + β)R+ 2Θ

)m−2

. (47)

If d(z, cj) ≥ αj+rj , then Bmrj (cj)∩B

mαj (z) contains at most one point and (47) clearly holds because

(a3) implies ∫Bmαj (z)∩Bmrj (cj)

(αj − d(z, x))dµj(x) = 0.

37

Page 38: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

If d(z, cj) < αj + rj , we can apply Lemma 24 to z, since we also have

d(z, cj) ≥ d(ci, cj)− ri ≥ (1 + β)R+ 2Θ ≥ αj + Θ > αj , (48)

where the last inequality follows by (44). If we denote by µ′j the image of µj under the translationx′ = x− cj , we then obtain∫

Bmαj (z)∩Bmrj (cj)(αj − d(z, x))dµj(x) = R(αj ,µ

′j ,m)(z − cj)

≤ (αj + rj − d(z, cj))

√π

2

√m

2

(αj

d(z, cj)

)m−2

≤ (rj −Θ)

√π

2

√m

2

(αj

αj + Θ

)m−2

= (rj −Θ)

√π

2

√m

2

(1− Θ

αj + Θ

)m−2

≤ R√π

2

√m

2

(1− Θ

(1 + β)R+ 2Θ

)m−2

,

where in the second inequality we use d(z, cj) ≥ αj + Θ from (48) and the last inequality followsbecause αj ≤ (1 + β)R+ Θ from (44). This concludes the proof of (47).

From (47) we obtain

∑j∈[k]\i

βj

∫Bmαj (z)∩Bmrj (cj)

(αj − d(z, x))dµj(x) ≤ kβR√π

2

√m

2

(1− Θ

(1 + β)R+ 2Θ

)m−2

≤ kβR√π

2

√m

2exp

(− (m− 2)Θ

(1 + β)R+ 2Θ

),

(49)

where the last inequality we use the fact that 1− x ≤ e−x for every x.Now let α′i := ri(1 + Θ

βR). We know from (46) that αi > α′i. If we denote by µ′i the image of µiunder the translation x′ = x− ci, we obtain∫

Bmri (ci)(αi − d(ci, x))dµi(x)−

∫Bmαi (z)∩B

mri

(ci)(αi − d(z, x))dµi(x) = H(αi,µ

′i,m)(z − ci)

≥ T (αi,m)(z − ci) ≥ T (α′i,m)(z − ci) ≥riΘ

2

8β2R2− ri

√πm

2

(1− Θ2

16β2R2

)m−22

≥ riΘ2

8β2R2− ri

√πm

2exp

(−(m− 2)Θ2

32β2R2

).

(50)

The first inequality holds by Lemma 22, the second inequality holds by Lemma 19, and the thirdinequality holds by Lemma 21 with ε := Θ/(βR) which satisfies ε ∈ (0, 1). In the last inequalitywe use 1− x ≤ e−x for every x.

To show Gα(ci) − Gα(z) > 0, it is sufficient to show (Gα(ci) − Gα(z))/(βiri) > 0. From (45),(49), and (50) we then obtain

Gα(ci)−Gα(z)

βiri≥ Θ2

8β2R2−√πm

2exp

(−(m− 2)Θ2

32β2R2

)− kβR

βiri

√π

2

√m

2exp

(− (m− 2)Θ

(1 + β)R+ 2Θ

)≥ Θ2

8β2R2−√πm

2exp

(−(m− 2)Θ2

32β2R2

)− kβR

r

√π

2

√m

2exp

(− (m− 2)Θ

(1 + β)R+ 2Θ

)≥ Θ2

8β2R2− k√πm

2exp

(−(m− 2)Θ2

32β2R2

)− kβR

r

√π

2

√m

2exp

(− (m− 2)Θ

(1 + β)R+ 2Θ

).

(51)

38

Page 39: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Note that the lower bound on (Gα(ci) − Gα(z))/(βiri) obtained in (51) does not depend on theindex i and is an increasing function in Θ. Next we show that (Gα(ci)−Gα(z))/(βiri) is positivewhen Θ > C

√k logm/m, where C is a large constant. To do so we use the lower bound in (51)

and the fact that β, r,R are fixed constants. We have

Gα(ci)−Gα(z)

βiri

≥ Θ2

8β2R2− k√πm

2exp

(−(m− 2)Θ2

32β2R2

)− kβR

r

√π

2

√m

2exp

(− (m− 2)Θ

(1 + β)R+ 2Θ

)> k

(C2 logm

8mβ2R2−√πm

2exp

(−C

2(m− 2)k logm

32mβ2R2

)− βR

r

√π

2

√m

2exp

(−

C(m− 2)√k logm/m

(1 + β)R+ 2C√k logm/m

))

≥ k

(C2 logm

8mβ2R2−√πm

2exp

(−C

2(m− 2) logm

32mβ2R2

)− βR

r

√π

2

√m

2exp

(−

C(m− 2)√

logm/m

(1 + β)R+ 2C√

logm/m

))

=k logm

m

(C2

8β2R2− F (C,m)

),

where, to simplify the notation, we let

F (C,m) :=m

logm

√πm

2exp

(−C

2(m− 2) logm

32mβ2R2

)+

m

logm

βR

r

√π

2

√m

2exp

(−

C(m− 2)√

logm/m

(1 + β)R+ 2C√

logm/m

).

It then suffices to show that, for every m ≥ 2, we have C2/(8β2R2) > F (C,m) for some constant Clarge enough. It can be checked that for every m ≥ 2, F (C,m) is a decreasing function in C. Also itcan be checked that there is some threshold C ′ > 0 such that if C ≥ C ′ then limm→∞ F (C,m) = 0.This implies that supF (C,m) | C ≥ C ′,m ≥ 2 = supF (C ′,m) | m ≥ 2 < ∞. Therefore itsuffices to choose C > C ′ large enough so that C2/(8β2R2) > supF (C,m) | C ≥ C ′,m ≥ 2.

5.3 Proof of Theorem 6

It suffices to check that all assumptions of Corollary 1 are satisfied. Let Θ := mini 6=j d(ci, cj)− 2 >1.29, α′ := 1.29, and let αi := α′ for every i ∈ [k]. It remains to show that for every i ∈ [k], ci isthe unique point that achieves maxGα(z) | z ∈ Bm

1 (ci).Since mini 6=j d(ci, cj) = 2 + Θ > 3.29, we know that for every i ∈ [k] and for every z ∈ Bm

1 (ci),we have Bm

1.29(z) ∩ Bm1 (cj) = ∅ for every j ∈ [k] with j 6= i. Thus according to Observation 2, for

every i ∈ [k] and for every z ∈ Bm1 (ci) we have

Gα(z) =

∫Bm1.29(z)∩Bm1 (ci)

(1.29− d(z, x))dµi(x).

Now we fix i ∈ [k] and z ∈ Bm1 (ci) \ ci. Then we have

Gα(ci)−Gα(z) = H(1.29,µ′i,m)(z − ci) ≥ T (1.29,m)(z − ci) ≥ T (1.29,2)(z − ci),

where µ′i is the image of µi under the translation x′ = x − ci. The first inequality follows byLemma 22 and the second inequality follows by Lemma 20. Since T (1.29,2)(z − ci) is invariantunder rotations centered in ci, we define a unit vector v ∈ Rm and a scalar t ∈ (0, 1] such thatz − ci = tv. If t ≤ 0.29, then Lemma 18 implies T (1.29,2) ∗ (tv) > 0. Hence, in the remainder of the

39

Page 40: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

proof we assume t > 0.29. According to Observation 13, we know that p(2)(θ) = 1/π. So applyingObservation 17 with r = s = 1 and α = 1.29, we get

T (1.29,2)(tv) = H(1.29,µ1,2)(tv) = 0.29− 1

π

∫ θ

0

(1.29−

√1 + t2 − 2t cos θ

)dθ,

where

θ = arccos1 + t2 − 1.292

2t.

Using the above formula it can be checked that T (1.29,2)(z) > 0 for every t ∈ (0.29, 1]. The graphof the function T (1.29,2)(z) can be seen in Figure 2. Thus, for every i ∈ [k], ci is the unique point

Figure 2: The graph of the function T (1.29,2)(z) in the proof of Theorem 6.

that achieves maxGα(z) | z ∈ Bm1 (ci).

5.4 Proof of Theorem 7

It suffices to check that all assumptions of Corollary 1 are satisfied. Let Θ := mini 6=j d(ci, cj) − 2,α′ := 1 + Θ/2 ∈ (1, 1 + Θ), and let αi := α′ for every i ∈ [k]. It remains to show that for everyi ∈ [k], ci is the unique point that achieves maxGα(z) | z ∈ Bm

1 (ci).For every i ∈ [k], from Observation 2 and Lemma 9 (with ai = bi = αi) we obtain

Gα(ci) =∑j∈[k]

∫Bmαj (ci)∩Bm1 (cj)

(αj − d(ci, x))dµj(x) =

∫Bm1 (ci)

(αi − d(ci, x))dµi(x).

So for every i ∈ [k] and for every z ∈ Bm1 (ci) we have

Gα(ci)−Gα(z) =

(∫Bm1 (ci)

(αi − d(ci, x))dµi(x)−∫Bmαi (z)∩B

m1 (ci)

(αi − d(z, x))dµi(x)

)

−∑

j∈[k]\i

∫Bmαj (z)∩Bm1 (cj)

(αj − d(z, x))dµj(x).

(52)

40

Page 41: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

We will show that, under the assumptions of the theorem, the right hand side of (52) is positivefor every z ∈ Bm

1 (ci) \ ci.We now fix i ∈ [k] and z ∈ Bm

1 (ci) \ ci. If d(ci, z) ≤ Θ/2, then for every j ∈ [k] \ i, the setBmαj (z) ∩B

m1 (cj) contains at most one point. In this case, (a3) implies

Gα(ci)−Gα(z) = H(αi,µ′i,m)(z − ci) ≥ T (αi,m)(z − ci) > 0,

where µ′i is the image of µi under the translation x′ = x − ci. The first inequality follows byLemma 22 and the last inequality follows by Lemma 18. Hence, in the remainder of the proof weassume d(ci, z) > Θ/2. Then, we must have Θ/2 < 1, since 1 ≥ d(ci, z) > Θ/2.

Next, we show that for every j ∈ [k] \ i we have∫Bmαj (z)∩Bm1 (cj)

(αj − d(z, x))dµj(x) ≤√π

2

√m

2

(1− Θ

2(1 + Θ)

)m−2

. (53)

First consider the case d(cj , z) ≥ αj + 1. Then Bmαj (z) ∩ B

m1 (cj) contains at most one point and

(a3) implies ∫Bmαj (z)∩Bm1 (cj)

(αj − d(z, x))dµj(x) = 0 ≤√π

2

√m

2

(1− Θ

2(1 + Θ)

)m−2

.

Now consider the case d(cj , z) < αj + 1. We obtain∫Bmαj (z)∩Bm1 (cj)

(αj − d(z, x))dµj(x) = R(αj ,µ′j ,m)(z − cj) ≤ (αj + 1− d(cj , z))

√π

2

√m

2

(αj

d(cj , z)

)m−2

≤(

1− Θ

2

) √π

2

√m

2

(1 + Θ

2

1 + Θ

)m−2

≤√π

2

√m

2

(1− Θ

2(1 + Θ)

)m−2

,

where µ′j is the image of µj under the translation x′ = x − cj . The first inequality follows fromLemma 24 since d(cj , z) ≥ 1 + Θ > αj and in the second inequality we use d(cj , z) ≥ 1 + Θ. Thisconcludes the proof of (53).

On the other hand, we know from Lemma 23, with ε := Θ/2 ∈ (0, 1), that∫Bm1 (ci)

(αi − d(ci, x))dµi(x)−∫Bmαi (z)∩B

m1 (ci)

(αi − d(z, x))dµi(x) = H(αi,µ′i,m)(z − ci)

≥ Θ2

32−√πm

2

(1− Θ2

64

)m−22

.

(54)

From (52), (53), and (54), we obtain

Gα(z)−Gα(ci) ≥Θ2

32−√πm

2

(1− Θ2

64

)m−22

− k√π

2

√m

2

(1− Θ

2(1 + Θ)

)m−2

≥ Θ2

32−√πm

2exp

(−(m− 2)Θ2

128

)− k√π

2

√m

2exp

(−(m− 2)Θ

2(1 + Θ)

).

(55)

Notice that the lower bound on Gα(z)−Gα(ci) obtained in (55) is an increasing function in Θ. Wenext show that Gα(z) − Gα(ci) is positive when Θ > C

√k logm/m, where C is a large constant.

41

Page 42: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

We have

Gα(z)−Gα(ci)

≥Θ2

32−√πm

2exp

(−(m− 2)Θ2

128

)− k√π

2

√m

2exp

(−(m− 2)Θ

2(1 + Θ)

)>k

(C2 logm

32m− 1

k

√πm

2exp

(−C

2(m− 2)k logm

128m

)−√π

2

√m

2exp

(−

(m− 2)C√k logm/m

2(1 + C√k logm/m)

))

≥k

(C2 logm

32m−√πm

2exp

(−C

2(m− 2) logm

128m

)−√π

2

√m

2exp

(−

(m− 2)C√

logm/m

2(1 + C√

logm/m)

))

=klogm

m

(C2

32− F (C,m)

),

where, to simplify the notation, we let

F (C,m) :=m

logm

√πm

2exp

(−C

2(m− 2) logm

128m

)+

m

logm

√π

2

√m

2exp

(−

(m− 2)C√

logm/m

2(1 + C√

logm/m)

).

It then suffices to show that, for every m ≥ 2, we have C2/32 > F (C,m) for some constant C largeenough. It can be checked that for every m ≥ 2, F (C,m) is a decreasing function in C. Also itcan be checked that there is some threshold C ′ > 0 such that if C ≥ C ′ then limm→∞ F (C,m) = 0.This implies that supF (C,m) | C ≥ C ′,m ≥ 2 = supF (C ′,m) | m ≥ 2 < ∞. Therefore itsuffices to choose C > C ′ large enough so that C2/32 > supF (C,m) | C ≥ C ′,m ≥ 2.

5.5 Proof of Theorem 8

For every i ∈ [k], let µi := µ + ci. We first show that for every i ∈ [k] and for every z ∈⋃j∈[k]B

m1 (cj) \ cjj∈[k] such that Bm

1 (z) ∩Bm1 (ci) has positive measure, we have∫

Bm1 (z)∩Bm1 (ci)(d(z, x)− d(ci, x))dµi(x) > 0. (56)

Let H be the unique hyperplane that contains Sm−11 (z) ∩ Sm−1

1 (ci) (see Figure 3). We obtain thatthe balls Bm

1 (z) and Bm1 (ci) are the reflection of each other with respect to H. Let f(x) : Bm

1 (z)∩Bm

1 (ci)→ Bm1 (z)∩Bm

1 (ci) be the function that reflects x with respect toH. Let S+ := x ∈ Bm1 (z)∩

Bm1 (ci) | d(z, x)−d(ci, x) > 0 and S− := x ∈ Bm

1 (z)∩Bm1 (ci) | d(z, x)−d(ci, x) < 0. Let x+ ∈ S+

and let x− := f(x). We then have x− ∈ S− since d(z, x−)−d(ci, x−) = d(x+, ci)−d(x+, z) < 0. Letpi(x) be the density function of µi(x). Since d(x+, ci) < d(ci, x−), the assumption of the theoremon p(x) implies that we have pi(x+) > pi(x−). We obtain∫Bm1 (z)∩Bm1 (ci)

(d(z, x)− d(ci, x))dµi(x) =

∫S+

(d(z, x)− d(ci, x)) pi(x)dx+

∫S−

(d(z, x)− d(ci, x)) pi(x)dx

=

∫S+

(d(z, x)− d(ci, x)) (pi(x)− pi(f(x)))dx > 0.

This concludes the proof of (56).Next we show that for every i ∈ [k] and for every z ∈

⋃j∈[k]B

m1 (cj) \ cjj∈[k] such that

Bm1 (z) ∩Bm

1 (ci) has positive measure, we have∫Bmα (z)∩Bm1 (ci)

(d(z, x)− d(ci, x))dµi(x) > 0 ∀α > 1. (57)

42

Page 43: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Figure 3: The balls Bm1 (z) and Bm

1 (ci) and the hyperplane H in the proof of Theorem 8.

Let S+ := x ∈ Bmα (z) ∩ Bm

1 (ci) | d(z, x) − d(ci, x) > 0 and S− := x ∈ Bmα (z) ∩ Bm

1 (ci) |d(z, x)− d(ci, x) < 0. Furthermore, let S+ and S− be defined as in the proof of (56). Clearly wehave S+ ⊆ S+. We observe that S− = S−. This is because S− ⊆ S− and for every x ∈ Bm

1 (ci) withd(z, x) > 1 we must have d(z, x)− d(ci, x) > 0, which implies x 6∈ S−. Then we have∫Bmα (z)∩Bm1 (ci)

(d(z, x)− d(ci, x))dµi(x) =

∫S+

(d(z, x)− d(ci, x)) pi(x)dx+

∫S−

(d(z, x)− d(ci, x)) pi(x)dx

≥∫S+

(d(z, x)− d(ci, x)) pi(x)dx+

∫S−

(d(z, x)− d(ci, x)) pi(x)dx

=

∫Bm1 (z)∩Bm1 (ci)

(d(z, x)− d(ci, x))dµi(x) > 0,

where the last inequality holds by (56). This concludes the proof of (57).Next, we claim that there exists ε ∈ (0,mini 6=j d(ci, cj)− 2) such that, for every i ∈ [k] and for

every z ∈ Bm1 (ci), there exists a set Dj , for each j ∈ [k], obtained from Bm

1+ε(z) ∩ Bm1 (cj) via a

rotation centered in cj followed by the translation cj − ci, such that the sets Dj , for j ∈ [k], do notintersect. We now prove our claim. Let i ∈ [k] and z ∈ Bm

1 (ci). Note that, since the balls Bm1 (cj),

for j ∈ [k], do not intersect, we have that the sets Bm1 (z)∩Bm

1 (cj), for j ∈ [k], do not intersect. LetH be the unique hyperplane that contains Sm−1

1 (z) ∩ Sm−11 (ci). It follows that also the reflections

with respect to H of the sets Bm1 (z)∩Bm

1 (cj), for j ∈ [k], do not intersect. Note that the reflectionwith respect to H of each set Bm

1 (z)∩Bm1 (cj) can be seen as the set obtained from Bm

1 (z)∩Bm1 (cj)

by first applying a rotation centered in cj and then the translation cj − ci. Hence, we have shownthat there exists a set Dj , for each j ∈ [k], obtained from Bm

1 (z) ∩Bm1 (cj) via a rotation centered

in cj followed by the translation cj − ci, such that the sets Dj , for j ∈ [k], do not intersect. Bycontinuity, for every i ∈ [k] and for every z ∈ Bm

1 (ci), there exists εi,z > 0 small enough such thatthere exists a set Dj , for each j ∈ [k], obtained from Bm

1+εi,z(z) ∩ Bm

1 (cj) via a rotation centeredin cj followed by the translation cj − ci, such that the sets Dj , for j ∈ [k], do not intersect. Since∪i∈[k]B

m1 (ci) is a compact set, we can define ε := minεi,z | i ∈ [k], z ∈ Bm

1 (ci) > 0. By eventuallydecreasing ε, we can also assume ε < mini 6=j d(ci, cj)− 2, and this concludes the proof of our claim.

43

Page 44: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Let α′ := 1 + ε < mini 6=j d(ci, cj) − 1 and define αi := α′ for every i ∈ [k]. In order toapply Corollary 1, it remains to show that for every i ∈ [k], ci is the unique point that achievesmaxGα(z) | z ∈ Bm

1 (ci). We now fix i ∈ [k] and z ∈ Bm1 (ci) \ ci. For every j ∈ [k], let Dj be

the set obtained from Bmα′(z) ∩ Bm

1 (cj) as stated in the previous claim. Note that Dj ⊆ Bm1 (ci).

Since µi is a translation of µj , we know that∫Bmα′ (z)∩B

m1 (cj)

d(cj , x)dµj(x) =

∫Dj

d(ci, x)dµi(x). (58)

We obtain

Gα(ci) =∑j∈[k]

∫Bmα′ (ci)∩B

m1 (cj)

(α′ − d(ci, x))dµj(x) =

∫Bm1 (ci)

(α′ − d(ci, x))dµi(x)

>∑j∈[k]

∫Dj

(α′ − d(ci, x))dµi(x) =∑j∈[k]

∫Bmα′ (z)∩B

m1 (cj)

(α′ − d(cj , x))dµj(x).

(59)

In the first equality we use Observation 2, in the second equality Lemma 9 (with ai = bi = α′ andz = ci), in the inequality we use the fact that the sets and Dj , for j ∈ [k] are disjoint subsets ofBm

1 (ci), and in the last equality we use (58). Therefore from (59) and Observation 2 we obtain

Gα(ci)−Gα(z) >∑j∈[k]

∫Bmα′ (z)∩B

m1 (cj)

(d(z, x)− d(cj , x))dµj(x) > 0,

where the inequality follows from (57).

References

[1] E. Abbe, A.S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEETransactions on Information Theory, 62(1):471–487, 2016.

[2] N. Agarwal, A.S. Bandeira, K. Koiliaris, and A. Kolla. Multisection in the Stochastic BlockModel Using Semidefinite Programming, pages 125–162. Springer International Publishing,Cham, 2017.

[3] B.P.W. Ames. Guaranteed clustering and biclustering via semidefinite programming. Mathe-matical Programming, 147(1):429–465, 2014.

[4] B.P.W. Ames and S.A. Vavasis. Convex optimization for the planted k-disjoint-clique problem.Mathematical Programming, 143(1):299–337, 2014.

[5] A.A. Amini and E. Levina. On semidefinite relaxations for the block model. The Annals ofStatistics, 46(1):149 – 179, 2018.

[6] S. Arora, P. Raghavan, and S. Rao. Polynomial time approximation schemes for euclideank-medians and related problems. In ACM STOC, volume 98, 1998.

[7] V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local searchheuristics for k-median and facility location problems. SIAM Journal on computing, 33(3):544–562, 2004.

44

Page 45: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

[8] P. Awasthi, A.S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, noneed to round: integrality of clustering formulations. Preprint, arXiv:1408.4045, 2015.

[9] P. Awasthi, A.S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, noneed to round: Integrality of clustering formulations. In Proceedings of the 2015 Conferenceon Innovations in Theoretical Computer Science, pages 191–200, 2015.

[10] Y. Bartal. Probabilistic approximation of metric spaces and its algorithmic applications. InProceedings of 37th Conference on Foundations of Computer Science, pages 184–193, 1996.

[11] D. Bertsimas and J.N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientific,Belmont, MA, 1997.

[12] M. Charikar and S. Guha. Improved combinatorial algorithms for the facility location andk-median problems. In 40th Annual Symposium on Foundations of Computer Science (Cat.No. 99CB37039), pages 378–388. IEEE, 1999.

[13] M. Charikar, S. Guha, E. Tardos, and D.B. Shmoys. A constant-factor approximation algo-rithm for the k-median problem. Journal of Computer and System Sciences, 65(1):129–149,2002.

[14] Y. Chen, A. Jalali, S. Sanghavi, and H. Xu. Clustering partially observed graphs via convexoptimization. Journal of Machine Learning Research, 15(1):2213–2238, 2014.

[15] A. De Rosa and A. Khajavirad. The ratio-cut polytope and k-means clustering. Preprint,arXiv:2006.15225, 2020.

[16] A. Del Pia, A. Khajavirad, and D. Kunisky. Linear programming and community detection.Preprint, arXiv:2006.03213, 2020.

[17] R. Durrett. Probability: Theory and Examples. Cambridge Series in Statistical and Proba-bilistic Mathematics. Cambridge University Press, 2010.

[18] Y. Fei and Y. Chen. Hidden integrality of SDP relaxations for sub-gaussian mixture models. InConference On Learning Theory, COLT 2018, volume 75 of Proceedings of Machine LearningResearch, pages 1931–1965, 2018.

[19] B. Hajek, Y. Wu, and J. Xu. Achieving exact cluster recovery threshold via semidefiniteprogramming. IEEE Transactions on Information Theory, 62(5):2788–2797, 2016.

[20] T. Iguchi, D.G. Mixon, J. Peterson, and S. Villar. Probably certifiably correct k-means clus-tering. Mathematical Programming, Series A, 165:605–642, 2017.

[21] O. Kariv and S.L. Hakimi. An algorithmic approach to network location problems, part II:p-medians. SIAM Journal on Applied Mathematics, 37(3):539–560, 1979.

[22] S.G. Kolliopoulos and S. Rao. A nearly linear-time approximation scheme for the euclideank-median problem. SIAM Journal on Computing, 37(3):757–782, 2007.

[23] X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do birds of a feather flock together?k-means, proximity, and conic programming. Mathematical Programming, 179(1):295–341,2020.

45

Page 46: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

[24] J. Lin and J.S. Vitter. Approximation algorithms for geometric median problems. InformationProcessing Letters, 44(5):245–249, 1992.

[25] S. Ling and T. Strohmer. Certifying global optimality of graph cuts via semidefinite relaxation:A performance guarantee for spectral clustering. Foundations of Computational Mathematics,20(3):367–421, 2020.

[26] N. Megiddo and K.J. Supowit. On the complexity of some common geometric location prob-lems. SIAM Journal on Computing, 13(1), 1984.

[27] B.S. Mityagin. The zero set of a real analytic function. Mathematical Notes, 107(3):529–530,2020.

[28] D.G. Mixon, S. Villar, and R. Ward. Clustering subgaussian mixtures by semidefinite pro-gramming. Information and Inference: A Journal of the IMA, 6(4):389–415, 2017.

[29] A. Nellore and R. Ward. Recovery guarantees for exemplar-based clustering. Information andComputation, 245:165–180, 2015.

[30] A. Tamir. An O(pn2) algorithm for the p-median and related problems on tree graphs. Oper-ations Research Letters, 19(2):59–64, 1996.

[31] R. Vershynin. High-Dimensional Probability: An Introduction with Applications in Data Sci-ence. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge UniversityPress, 2018.

[32] H. Witold and W. Henry. Dimension Theory (PMS-4), Volume 4. Princeton university press,2015.

46

Page 47: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Online Supplemental Material

Appendix A On the assumption ni = βin in the ESBM

In this section, we present an example which justifies the assumption that in the ESBM the numberni of points drawn from each ball i ∈ [k] satisfies ni = βin. In fact, Example 1 shows that if in theSBM we allow to draw different numbers ni of data points from different balls, and the ni are ofdifferent orders, then with high probability (LP) does not achieve exact recovery, no matter howdistant the balls are. In the following example we denote by e1, . . . , em the vectors of the standardbasis of Rm.

Example 1. Consider the SBM with k = 2. Let c1 := 0 and let c2 := de1 where d > 2. Let µ bethe uniform probability measure on Bm

1 (0). For each i ∈ [2], we draw ni(n) random vectors insteadof n as in the definition of the SBM, and we assume that limn→∞ n1/n2 = ∞. Then with highprobability every feasible solution to (IP) that assigns each point to the ball from which it is drawnis not optimal to (IP).

Proof. For every i ∈ [k], we denote by x(i)∗ the median of x(i)

` `∈[ni]. Among all the feasiblesolutions (y, z) to (IP) that assign each point to the ball from which it is drawn, the ones with thesmaller objective function have the property that, for every i ∈ [k], the component of the vecror y∗

corresponding to x(i)∗ is equal to one. Let (y∗, z∗) be such a solution. It then suffices to show that

with high probability (y∗, z∗) is not optimal to (IP).We first evaluate the objective value obj∗ of (y∗, z∗). We have

obj∗

n1 + n2=

∑`∈[n1] d(x

(1)` , x

(1)∗ )

n1

n1

n1 + n2+

∑`∈[n2] d(x

(2)` , x

(2)∗ )

n2

n2

n1 + n2>

∑`∈[n1] d(x

(1)` , x

(1)∗ )

n1

n1

n1 + n2.

Let x be a random vector drawn according to µ. Since µ is the uniform probability measure, weknow that E ‖x‖ ∈ (0, 1). Let ε ∈ (0, 1) be a small number. From Lemma 8, we know that with

high probability we have∣∣∣∑`∈[n1] d(x

(1)` , x

(1)∗ )/n1 − E ‖x‖

∣∣∣ < ε. Since limn→∞ n1/n2 =∞, we know

that when n is large enough, with high probability, we have

obj∗

n1 + n2>

∑`∈[n1] d(x

(1)` , x

(1)∗ )

n1

n1

n1 + n2> E ‖x‖ − 2ε.

Consider now the point s := −e1/2 ∈ Bm1 (0), and define the sets S1 := x ∈ Bm

1 (0) | x1 ≤ −12

and S2 := x ∈ Bm1 (0) | x1 > −1

2. Let x be a random vector drawn according to µ. We knowthat when x ∈ S1, we have d(s, x) < ‖x‖. To simplify the notation, let ξ := min‖x‖− d(s, x) | x ∈S1 > 0. Then we have

Emind(s, x), ‖x‖ =

∫Bm1 (0)

mind(s, x), ‖x‖dµ(x) ≤∫S1

d(s, x)dµ(x) +

∫S2

‖x‖ dµ(x)

≤∫Bm1 (0)

‖x‖ dµ(x)− ξP(x ∈ S1) = E ‖x‖ − ξP(x ∈ S1).

(60)

Note that mind(x(1)` , s), ‖x(1)

` ‖, for ` ∈ [n1], are independent random variables bounded by theinterval [0, 1]. From Hoeffding’s inequality, with high probability we have∣∣∣∣∣∣

∑`∈[n1] mind(x

(1)` , s), ‖x(1)

` ‖n1

− Emind(s, x), ‖x‖

∣∣∣∣∣∣ < ε.

47

Page 48: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

So with high probability, we obtain∑`∈[n1] mind(x

(1)` , s), ‖x(1)

` ‖n1

< Emind(s, x), ‖x‖+ ε < E ‖x‖ − ξP(x ∈ S1) + ε, (61)

where the last inequality follows from (60). Since µ is the uniform probability measure on Bm1 (0),

we know that with high probability there is some point x ∈ Bmε (s) ∩ x(1)

` `∈[n1] and there is some

point x ∈ Bmε (0) ∩ x(1)

` `∈[n1]. Now we construct a feasible solution (y′, z′) to (IP) with objectivevalue obj′ such that obj′ < obj∗. We choose x and x as the centers of the two clusters. We then

assign each point x(i)` , for i ∈ [2] and ` ∈ [ni], to the cluster with the closest center. Let (y′, z′) be

the feasible solution to (IP) corresponding to this choice. Then we have

obj′

n1 + n2=

∑`∈[n1] mind(x

(1)` , x), d(x

(1)` , x)

n1

n1

n1 + n2+

∑`∈[n2] mind(x

(2)` , x), d(x

(2)` , x)

n2

n2

n1 + n2

≤∑

`∈[n1] mind(x(1)` , x), d(x

(1)` , x)

n1

n1

n1 + n2+ (d+ 2)

n2

n1 + n2.

Here, the inequality follows because mind(x(2)` , x), d(x

(2)` , x) ≤ d+ 2 for every ` ∈ [n2]. Next, we

use the fact that when n is large enough (d+ 2)n2/(n1 + n2) can be arbitrarily small and thus canbe bounded by ε. So with high probability we obtain

obj′

n1 + n2≤∑

`∈[n1] mind(x(1)` , x), d(x

(1)` , x)

n1+ ε ≤

∑`∈[n1] mind(x

(1)` , s), ‖x(1)

` ‖+ ε

n1+ ε

≤ E ‖x‖ − ξP(x ∈ S1) + 3ε,

where the second inequality follows by the triangle inequality, and in the last inequality we use(61).

Notice that when ε < ξP(x ∈ S1)/5, we have obj′ < obj∗, which implies that with highprobability, (y∗, z∗) is not an optimal solution to (IP).

Appendix B Counterexample to Theorem 7 in [8]

In this section we present an example which shows that Theorem 7 in [8] is false. In Example 2, weconstruct a probability measure that satisfies all assumptions in the statement of Theorem 7 in [8]and mini 6=j d(ci, cj) = 2.2. We then show that with high probability (LP) does not achieve exactrecovery. The key problem in the proof of Theorem 7 in [8] is discussed in Appendix C.

Example 2. There is an instance of the SBM with m = 2, k = 7, mini 6=j d(ci, cj) = 2 + 0.2, whereµ has a continuous density function and the probability space (µ,Bm

1 (0)) satisfies (a1), (a2), suchthat with high probability (LP) does not achieve exact recovery.

Proof. Let µ be a probability measure that has a continuous density function and the probabilityspace (µ,Bm

1 (0)) satisfies (a1), (a2). Let ε ∈ (0, 1) be a small number. We further assume that µand ε satisfy

(0.292− 8ε)P(‖x‖ ≥ 1− ε) > 0.279 + 6ε+ (3 + 2ε)P(‖x‖ < 1− ε). (62)

Note that assumption (62) can be fulfilled as long as P(‖x‖ < 1 − ε) and ε are small enough. Wedefine c1 := 0 and, using polar coordinates, ci := (2.2,−(i − 2)π/3) for every i ∈ [7] \ 1 (see

48

Page 49: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Figure 4: Instance of the SBM considered in Example 2.

Figure 4). In particular, for every i, j ∈ [7] with i 6= j, we have d(ci, cj) = 2.2. This concludes thedescription of the instance of the SBM that we consider. In the remainder of the example we showthat with high probability (LP) does not achieve exact recovery.

For every z ∈ Bm1 (0), let c(z) be a point among c2, . . . , c7 that is closest to z and we define

f(z) := d(z, c(z)). For every z ∈ Bm1 (0), let θ(z) be the angle between the vectors z and c(z).

Clearly, for every z ∈ Bm1 (0), we have θ(z) ∈ [0, π/6]. Let z be a random vector drawn according

to µ. Since µ satisfies (a1), we know that the random variable θ(z) is uniform on [0, π/6], thus itsdensity function is constant on [0, π/6] and equal to 6/π.

Next, we show the upper bound∫Bm1 (0)

(f(x) + 2ε− ‖x‖)dµ(x) < 0.279 + 4ε+ (3 + 2ε)P(‖x‖ < 1− ε). (63)

Let z ∈ Sm−11 (0), then f(z) =

√(2.2)2 + 1− 4.4 cos θ. Let L := Bm

1 (0) \ Bm1−ε(0). The triangle

inequality implies that for every z ∈ L we have

f(z) <√

(2.2)2 + 1− 4.4 cos θ(z) + ε, d(z, 0) ≥ 1− ε. (64)

So we obtain∫Bm1 (0)

(f(x) + 2ε− ‖x‖)dµ(x) =

∫Bm1−ε(0)

(f(x) + 2ε− ‖x‖)dµ(x) +

∫L

(f(x) + 2ε− ‖x‖)dµ(x)

≤ (3 + 2ε)P(‖x‖ < 1− ε) +

∫L

(f(x) + 2ε− ‖x‖)dµ(x)

≤ (3 + 2ε)P(‖x‖ < 1− ε) +

∫L

(√(2.2)2 + 1− 4.4 cos θ(x) + ε+ 2ε− (1− ε)

)dµ(x)

≤ (3 + 2ε)P(‖x‖ < 1− ε) +6

π

∫ π6

0

(√(2.2)2 + 1− 4.4 cos θ + ε+ 2ε− (1− ε)

)dθ

= (3 + 2ε)P(‖x‖ < 1− ε) +6

π

∫ π6

0

(√(2.2)2 + 1− 4.4 cos θ − 1

)dθ + 4ε

< 0.279 + 4ε+ (3 + 2ε)P(‖x‖ < 1− ε).

49

Page 50: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Here, the first inequality uses the fact that f(x) ≤ 3 for every x ∈ Bm1−ε(0) and the second inequality

holds because of (64). The third inequality follows by the fact that θ(x) does not depend on ‖x‖and has a density function π/6. In the last inequality, we use the fact that

6

π

∫ π6

0

(√(2.2)2 + 1− 4.4 cos θ − 1

)dθ < 0.279.

This concludes the proof of (63).Let s := e1, where e1 is the first vector of the standard basis of Rm, and let µi := µ + ci for

every i ∈ [k]. Next, we prove the lower bound

7∑i=1

∫Bm1 (ci)

(d(x, ci)− 2ε− d(x, s))+dµi(x) > (0.292− 8ε)P(‖x‖ ≥ 1− ε). (65)

For ease of notations we give the following definitions. For every x ∈ Bm1 (0), let ψ(x) be the angle

between the vectors x and s. For every x ∈ Bm1 (c2), let φ(x) be the angle between x−c2 and s−c2.

We also define L1 := x ∈ Bm1 (0) | ψ(x) ≤ π/3, ‖x‖ ≥ 1 − ε and L2 := x ∈ Bm

1 (c2) | φ(x) ≤θ′, d(x, c2) ≥ 1− ε, where θ′ := arccos 0.6.

Notice that for every x ∈ Sm−11 (0), we have d(x, s) =

√2− 2 cosψ(x) and for every x ∈

Sm−11 (c2), we have d(x, s) =

√(1.2)2 + 1− 2.4 cosφ(x). Using the triangle inequality, we obtain

d(x, s) ≤√

2− 2 cosψ(x) + ε ∀x ∈ L1, (66)

d(x, s) ≤√

(1.2)2 + 1− 2.4 cosφ(x) + ε ∀x ∈ L2. (67)

We obtain

7∑i=1

∫Bm1 (ci)

(d(x, ci)− 2ε− d(x, s))+dµi(x)

≥∫L1

(‖x‖ − 2ε− d(x, s))+dµ(x) +

∫L2

(d(x, c2)− 2ε− d(x, s))+dµ2(x)

≥∫L1

(1− 3ε− d(x, s))dµ(x) +

∫L2

(1− 3ε− d(x, s))dµ2(x)

≥∫L1

(1− 4ε−

√2− 2 cosψ(x)

)dµ(x) +

∫L2

(1− 4ε−

√(1.2)2 + 1− 2.4 cosφ(x)

)dµ2(x)

= P(‖x‖ ≥ 1− ε) 1

π

∫ π3

0

(1− 4ε−

√2− 2 cosψ

)dψ + P(‖x‖ ≥ 1− ε) 1

π

∫ θ′

0

(1− 4ε−

√(1.2)2 + 1− 2.4 cosφ

)dφ

> (0.292− 8ε)P(‖x‖ ≥ 1− ε).

Here, the second inequality follows from the definition of L1 and L2. The third inequality followsby (66) and (67). The equality holds because ψ(x) does not depend on ‖x‖ and φ(x) does notdepend on d(c2, x). The last inequality holds because

1

π

∫ π3

0

(1−

√2− 2 cosψ

)dψ +

1

π

∫ θ′

0

(1−

√(1.2)2 + 1− 2.4 cosφ

)dφ > 0.292.

This completes the proof of (65).

50

Page 51: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

Using Hoeffding’s inequality, with high probability we have

1

n

∑`∈[n]

(f(x(1)` ) + 2ε− ‖x(1)

` ‖)−∫Bm1 (0)

(f(x) + 2ε− ‖x‖) dµ(x) < ε,

and using (63) with high probability we have

1

n

∑`∈[n]

(f(x(1)` ) + 2ε− ‖x(1)

` ‖) <∫Bm1 (0)

(f(x) + 2ε− ‖x‖) dµ(x) + ε

< 0.279 + 5ε+ (3 + 2ε)P(‖x‖ < 1− ε).(68)

Using Hoeffding’s inequality, with high probability we have

7∑i=1

∫Bm1 (ci)

(d(x, ci)− 2ε− d(x, s))+dµi(x)− 1

n

∑i∈[7]

∑`∈[n]

(d(x(i)` , ci)− 2ε− d(x

(i)` , s))+ < ε,

and using (65) with high probability we have

1

n

∑i∈[7]

∑`∈[n]

(d(x(i)` , ci)− 2ε− d(x

(i)` , s))+ >

7∑i=1

∫Bm1 (ci)

(d(x, ci)− 2ε− d(x, s))+dµi(x)− ε

>(0.292− 8ε)P(‖x‖ ≥ 1− ε)− ε.

(69)

For every i ∈ [k], we denote by x(i)∗ the median of x(i)

` `∈[ni]. Among all the feasible solutions(y, z) to (IP) that assign each point to the ball from which it is drawn, the ones with the smallerobjective function have the property that, for every i ∈ [k], the component of the vecror y∗ corre-

sponding to x(i)∗ is equal to one. Let (y∗, z∗) be such a solution. It then suffices to show that with

high probability (y∗, z∗) is not optimal to (LP).

We know from Lemma 7 that with high probability, for every i ∈ [7], we have d(x(i)∗ , ci) < ε.

Next, we show that we can use Theorem 1 to prove that (y∗, z∗) is not optimal to (LP) with highprobability. To do so, we just need to show that there is no α that satisfies conditions (7)–(10).

For ease of notation we denote by α(i)` the component of α corresponding to the point x

(i)` .

Suppose that α satisfies (9) and (10), thus d(x(i)∗ , x

(i)` ) ≤ α(i)

` ≤ d(x(j)∗ , x

(i)` ) for every i, j ∈ [k] with

i 6= j and for every ` ∈ [n]. Then we have

1

nCα(x

(1)∗ ) =

1

n

∑`∈[n]

(α(1)` − d(x

(1)∗ , x

(1)` )) ≤ 1

n

∑`∈[n]

(f(x(1)` ) + 2ε− ‖x(1)

` ‖)

< 0.279 + 5ε+ (3 + 2ε)P(‖x‖ < 1− ε),(70)

where in the first inequality we use the fact that α(1)` ≤ d(x

(j)∗ , x

(1)` ) ≤ d(cj , x

(1)` ) + ε for every

j ∈ [7] \ 1 and d(x(1)∗ , x

(1)` ) ≥ ‖x(1)

` ‖ − ε, and the second inequality follows from (68).Let N := Bm

ε (s) ∩ L and note that the assumption (62) on µ imply that with high probability

there exists a point x′ ∈ N ∩ x(i)` `∈[ni]. We have

1

nCα(x′) =

1

n

∑i∈[7]

∑`∈[n]

(α(i)` − d(x′, x

(i)` ))+ ≥

1

n

∑i∈[7]

∑`∈[n]

(d(ci, x(i)` )− 2ε− d(s, x

(i)` ))+

> (0.292− 8ε)P(‖x‖ ≥ 1− ε)− ε,(71)

51

Page 52: arXiv:2109.02547v1 [math.OC] 6 Sep 2021

where in the first inequality we use that for every i ∈ [7] we have α(i)` ≥ d(x

(i)∗ , x

(i)` ) ≥ d(ci, x

(i)` )− ε

and d(x′, x(i)` ) ≥ d(s, x

(i)` ) − ε, and the second inequality follows from (69). The inequalities (70)

and (71) imply Cα(x′) > Cα(x(1)∗ ) due to assumption (62). This implies that conditions (7),(8)

cannot hold. Thus, according to Theorem 1, with high probability (LP) does not achieve exactrecovery.

Appendix C Problem in the proof of Theorem 7 in [8]

In this section we point out the key problem in the proof of Theorem 7 in [8]. To prove this theorem,the authors introduce two conditions: the separation condition and the central dominance condition.When the two conditions happen together, then (LP) achieves exact recovery. We refer the readerto [8] for more details about these two conditions. In the proof of Theorem 7 the authors show thatthe separation condition happens with high probability according to the law of large number, whilethe central dominance condition happens in expectation and thus happens with high probability.Formally, the authors prove the following lemma about the central dominance condition.

Lemma 25 (Lemma 13 in [8]). In the hypothesis of Theorem 7, there exists α > 1 such that forall j ∈ [k], EP (α,...,α)(z) restricted to z ∈ Bm

1 (cj) attains its maximum in z = cj.

In the proof of Lemma 25, the goal of the authors is to obtain some α > 1 such that ci achievesmaxEP (α,...,α)(z) | z ∈ Bm

1 (ci) for every i ∈ [k], where

EP (α,...,α)(z) =∑i∈[k]

∫x∈Bm1 (ci)

(α− d(z, x))+dµi(x).

In order to do so, they select some α > 1 such that for every i ∈ [k] and for every z ∈ Bm1 (ci), the sets

Bmα (z)∩∪j 6=iBm

1 (ci), for j ∈ [k]\i, can be copied isometrically inside Bm1 (ci) along the boundary

without intersecting each other. Their goal is to use the fact that Bm1 (ci) contains all these copies

to show that EP (α,...,α)(ci) > EP (α,...,α)(z). The problem is that, although the copies have the samearea of the original sets, the density function may differ from a point x ∈ Bm

α (z) ∩ ∪j 6=iBm1 (ci)

to the corresponding point x′ ∈ Bm1 (ci) with d(z, x) = d(ci, x

′). If the probability measure isanti-concentrated, which means that the area near the boundary of each ball has a very largeprobability, then the choice of α given by the authors may cause EP (α,...,α)(ci) < EP (α,...,α)(z) forsome z ∈ Bm

1 (ci) \ ci.We also remark that there is also a requirement omitted in the statement of Lemma 25. In fact,

in the statement the authors require α > 1. However, in order to satisfy the central dominancecondition, α cannot be chosen too large. In particular, the requirement α < 1 + Θ, where Θ =minj 6=i d(ci, cj)− 2, should be added to the lemma.

52