Mass Purges: Top-down Accountability in Autocracy · In 1901, when writing his revolutionary agenda...
Transcript of Mass Purges: Top-down Accountability in Autocracy · In 1901, when writing his revolutionary agenda...
Mass Purges: Top-down Accountability in Autocracy∗
B. Pablo Montagnes
Emory University
Stephane Wolton
London School of Economics
This version: July 16, 2016
Most recent version accessible here
Abstract
This paper contends that mass purges are a salient method of top-down accountability
used by totalitarian regimes to increase party performance and shape party membership. In
our theoretical framework, party members work on independent projects. Their fate, however,
is linked through the purge, and a member’s effort depends on the activism of all others via
what we call the pool size effect. In turn, the autocrat’s incentive to purge depends on the
informativeness of different performance indicators, a function of all members’ effort via what
we term the pool makeup effect. These novel pool effects emerge from the many (party
members) to one (autocrat) accountability problem faced by the principal. Our approach
also highlights how violence affects top-down accountability in autocracy. Greater intensity
of violence increases effort, but can impede selection. The autocrat thus cannot escape a
trade-off between love (less unity) and fear (more activism).
∗We thank Scott Ashworth, Dan Bernhardt, Alessandra Casella, Ali Cirone, Torun Dewan, Tiberiu Dragu, Scott
Gehlbach, Thomas Groll, Haifeng Huang, Navin Kartik, Roger Myerson, Salvatore Nunnari, Carlo Prato, Arturas
Rozenas, Milan Svolik, Scott Tyson, conference and seminar participants at the Sixth LSE-NYU Conference, Priorat
Workshop in Theoretical Political Science, the Third Formal Theory & Comparative Politics Conference at Chicago,
3rd PECO Conference in Washington, D.C., Bocconi University, Columbia University for helpful comments and
suggestions. All remaining errors are the authors’ responsibility.
1
1 Introduction
Party struggles lend a party strength and vitality; the greatest proof of a party’s
weakness is its diffuseness and the blurring of clear demarcations; a party becomes
stronger by purging itself
From a letter of Lassalle to Marx, of June 24, 1852
In 1901, when writing his revolutionary agenda What is to be done?, Vladimir Ilyich Ulyanov
(alias Lenin) chose one particular sentence as an epigraph. This sentence reproduced above calls
for the use of purges as a mean of shaping the membership of communist parties. Often repeated
(e.g., Stalin at the 13th Party Congress, the Pravda in 1949, see Brzezinski, 1956, 207, 148), this
sentence has been used to justify the many purges of rank-and-file party members experienced by
the Communist Party of the Soviet Union (CPSU). These mass purges—by the sheer number of
individuals concerned—are not just a Soviet phenomenon. They have been recurrent in Communist
China and occurred in both Fascist Italy (in 1931) and Nazi Germany (in 1938).
In a single party system, the autocrat faces the twinned issues of motivating and selecting her
agents. We model this concern, common to many hierarchical organizations (the military, bureau-
cracy, firm), as a top-down accountability problem. Mass purges, we contend, provide a means
of addressing it.1 Unlike elite purges, which increase an autocrats hold on power (De Mesquita
and Smith, 2011), mass purges both “produce necessary momentum and dynamism for great social
changes” and “cleanse the system in anticipation” (Brzezinski, 1956, 19) from the ideologically
passive that the party attracts due to the economic benefits of membership.
Because they affect all party members, mass purges illustrate that top-down accountability
can take the form of a many-to-one problem where a single principal seeks to hold accountable a
mass of agents. The autocrat evaluates a party member based on his performance relative to his
peers. We show that this particular feature generates novel strategic interdependencies that are not
accounted for in models where one or many principals interact with a single agent (e.g., electoral or
ministerial accountability). Mass purges also highlight a specific feature of autocracies, namely the
ever present possibility of violence. Even though the autocrat faces little formal constraints on her
1The autocrat is thus the principal to whom we reserve the pronoun ‘she’ to follow the tradition in agency models.
We are, however, well aware that, with the possible exception of Indira Gandhi in 1975-77, all autocrats have been
male.
2
actions, the use of violence sometimes generates an inescapable trade-off between fear (motivation)
and love (selection) as her agents strategically respond.
In our theoretical framework, party members can be either ideologues (sharing the autocrat’s
ideological objectives) or opportunists (entering the party for its associated benefits). They exert
effort on individual projects where they can either succeed (e.g., meet the quotas) or fail. The
autocrat only observes the outcomes of members’ project. She then decides the purge breadth,
defined as the proportion of party members being removed from the party. The autocrat also
decides the intensity of violence, defined as the cost of being purged for party members. All
purged members are replaced by new members from an available replacement pool (e.g., the pool
of candidates to the party in USSR and Communist China).
Even though party members work on independent tasks, their fate is linked through the purge.
A member’s probability of being purged depends on the overall performance of the party via what
we term the ‘pool size effect.’ Hence, a member’s incentive to exert effort on his project depends
on all other members’ level of activity. We refer to this strategic interdependency as the ‘pool size
effect.’ This pool size effect is positive (i.e., higher party performance encourages greater effort)
when only members who fail in their project are purged—defined as a ‘discriminate purge’—since
failure becomes riskier. It is negative in a ‘semi-indiscriminate purge’ when some successful party
members are purged because higher effort by other members decreases the probability of surviving
the purge even if successful.
The autocrat carries out a purge to increase the ideological unity of the party, but the screen-
ing is never perfect. In any purge, some ideologues are eliminated, whereas some opportunists
survive. The autocrat’s incentive to purge is a function of the informativeness of different perfor-
mance indicators which is affected by all party members’ activism. We term this second strategic
interdependency the ‘pool makeup effect.’ We show that as party members anticipate higher purge
threat, the pool makeup effect is weakly positive, leading to weakly greater purge breadth.
The autocrat also selects the intensity of violence which affects both members’ effort, what
we term the ‘fear effect,’ and the future composition of the party, what we call the ‘love effect.’
The fear effect is positive: increased intensity of violence always improves present performance by
increasing the cost of being purged. However, greater intensity of violence can impede the screening
of party members and limits gains in ideological unity induced by the purge: the love effect can
be negative. The autocrat may therefore face a trade-off between short-term (higher effort) and
long-term (party unity) benefits of the purge.
3
Finally, we benchmark our theoretical results against historical evidence on mass purges. Our
theory predicts that the nature of the purge depends critically on the intensity of violence with
violent purges more likely to be semi-indiscriminate and (relatively) mild purges discriminate. In
turn, violent purges are not necessarily broader as the purge breadth is non-monotonic in violence.
Our results are in line with the differences and similarities between Maoist purges in the fifties
and Stalinist purges of the thirties. As Teiwes (1993, 25-27) describes, Chinese campaigns were
less violent and more targeted than Stalinist purges when “flouting commands court danger, but
even enthusiastic compliance is no guarantee of safety.” The proportion of party members purged,
however, appear to have been relatively similar in both countries (see Teiwes, 1993; Getty, 1987).
Related Formal Literature
Our paper advances the idea that mass purges are a salient method of accountability in autocracy.
In contrast, the literature has put great emphasis on elite purges: the shaping of an autocrat’s
close circle and contests for power. Bueno de Mesquita and Smith (2015) consider under which
conditions a leader might purge the selectorate to increase his payoff and probability of survival.
Svolik (2009) examines how autocrats can use elite purges to acquire more power. Acemoglu et
al. (2008, 2009) show that there exists limits to elite purges: members of the autocrat’s winning
coalition would oppose too much purging as it increases the risk of their being removed in the
future. Egorov and Sonin (2015) consider how elite purges in the present breeds more elite purges
in the future as winners of a contest for power anticipate the risk of sparing losing contenders.
In contrast, our analysis highlights the importance of motivating and selecting a mass of party
members.
We thus provide a framework for analyzing many-to-one accountability problems. Dozens of
studies focus on one-to-many accountability such as electoral accountability (see Ashworth, 2012, for
a review). More recently, several papers have examined the specificity of one-to-one accountability.
In democratic politics, Dewan and Myatt (2007, 2010) study a prime minister’s choice whether
to retain or fire a minister following a scandal. Their analysis shares some similarities with ours,
including (as we will see) the effect of the quality of the replacement pool on the principal’s
decision.2 Other works on one-to-one accountability focus on the difficulty for an autocrat to hold
her officials accountable. Egorov and Sonin (2011) consider the optimal contract an autocrat can
2Many models of bureaucracy can be interpreted as one-to-one accountability. See Gailmard and Patty (2012)
for a review.
4
offer to his close circle to avoid betrayal. They show that the autocrat can find it optimal to
recruit incompetent viziers as it can be too expensive to induce loyalty from competent ones (see
also Zakharov, 2016). Egorov et al. (2009) and Lorentzen (2014) examine under which conditions
autocrats can allow a (relatively) free press to limit corruption in the bureaucracy. Gehlbach and
Simpser (2015) analyze how an autocrat can use and manipulate elections to increase bureaucrats’
effort and her chance of survival. These works study moral hazard or adverse selection problems,
whereas our theoretical framework includes both. Further, due to the nature of the accountability
problem studied, these papers are unable to identify the pool size or pool makeup effects we uncover
and their impact on agents’ effort and selection.
Our paper also innovates by letting the autocrat chooses the intensity of violence and joins
a small literature on the role of punishment and violence. In the context of industrial organiza-
tion, Bernhardt and Mongrain (2010) shows how the threat of being fired can lead to inefficient
investment in firm-specific human capital. Bernhardt and Mongrain, however, consider a set-up
where the principal is fully informed when she makes her decision whether to retain an employee so
the pool size and pool makeup effects we identify are entirely absent. In a principal-agent set-up,
Acemoglu and Wolitzky (2011) show how coercion (in the form of slavery) can minimize an agent’s
moral hazard rents by reducing his outside option. Because a principal (slave-owner) only deals
with a single agent (slave), an agent’s risk of being punished does not depend on all other agents’
behaviors unlike in our set-up. Bloch and Rao (2002); Dal Bo et al. (2006) highlight how violence
or the threat thereof increases a player’s bargaining position and thus improves his payoff in various
bargaining settings. None of these studies includes adverse selection and so cannot examine the
effect of violence on the screening of agents like in our paper.
2 Evidence on mass purges
Before presenting our theory, we provide some evidence justifying why mass purges are best un-
derstood as a salient method of top-down accountability in autocracy. Mass purges have been
extensively used in communist regimes under Lenin and Stalin (in USSR) and Mao (in China).
The Communist Party of the Soviet Union (CPSU) experienced mass purges (“chitska”—a sweep-
ing, cleansing) in 1919, 1921-23, 1924, 1928, 1929, 1931, 1933-34, 1935, 1936 (Getty, 1987), 1949
(Brzezinski, 1956),1951-53 (Rigby, 1968), and 1971 (Schapiro, 1977). In China, mass purges
(“quingchu”—to weed out) were part of the rectification campaigns (which included a reeduca-
5
tion component absent from Soviet purges) which occurred in 1947-48, 1950,1951-54, 1953, 1957,
1957-58, 1959-60, 1960-61, 1962-63, 1964-65. The list of mass purges, you will notice, does not
include the Great Terror (1936-38) or the Cultural Revolution (1966-76). We exclude these two
important events for two reasons. First, both had some elements of elite purges (Red Army purge in
USSR, “bombard the headquarter” in China), which are absent from our theory. More importantly,
they were characterized by broad terror against the general population which is not explained by
any model of purges.3 Mass purges are not just a communist phenomenon, but are less frequent in
other totalitarian systems. In Fascist Italy, the party experienced a mass purge in 1931 (Morgan,
2012). In Germany, the NSDAP was purged in 1938 (Orlow, 1969, 241-2).
According to (Brzezinski, 1956, 9), purges in totalitarian systems are a “technique of govern-
ment”. Elite purges serve to resolve power struggle, but mass purges have different purposes. Getty
(1987, 38, emphasis in text) asserts that “[i]n the majority of purges, political crimes or deviations
pertained to a minority of those expelled” from the CPSU.4 Similarly, Teiwes (1993, 5) argues that
most rectification movements and associated purges in China were responses to problems arising
outside the top party leadership.5
The goals of mass purges are twofold. First, they provide the necessary momentum to accom-
plish the grand designs of the totalitarian regimes (Brzezinski, 1956, 19), to sustain a high level
of activity (Teiwes, 1993, 37). Second, they are meant to cleanse the party ranks from the op-
portunists who join for the social and economic benefits (e.g., reserved positions, special shops)
associated with membership (Brzezinski, 1956, 21, 148; Getty, 1987, 41, 89, 169; Teiwes, 1993, 6-7).
By removing a proportion of party members, mass purges also provide for the influx of new mem-
3There obviously exist non-formal theories of totalitarian terror, in particular Arendt (1973), to which we cannot
do justice here.4Rigby (1968), on the other hand, argues that the purges of the CPSU in the 1930’s paved the way for the Great
Terror and the show trials of 1936-38. However, he still admits that the 1933-34, 1951-53, and especially 1921-23
purges were not caused by conflict between leaders (see, e.g., 282-83). Far from generating purges, the contest for
power in USSR between Stalin, Trotsky, Zinoviev, and Bukharin lead to an increase in party membership as Stalin
tried to recruit allies (Rigby, 1968, 131).5Chinese rectification movements were also meant to educate ranks-and-files, a concern generally absent from
Soviet purges. Interestingly, the few elite purges of the Chinese leadership did not trigger mass purges even when
the purged leaders, like Gao Gang and Rao Sushi, were accused of establishing independent kingdoms (Teiwes,
1993, chapter 5, especially 142 and 162). The dismissal of Peng Dehuai in 1959, however, was concomitant with the
1959-60 Rectification Campaign. Nonetheless, Teiwes argues that the two events were uncorrelated (ibid., 339 and
341).
6
bers (Brzezinski, 1956, 9, 131, 168, Teiwes, 1993, 42-43) drawn from the pool of candidates to the
party (Rigby, 1968, 52-53). Consequently, while elite purges target specific individuals, the target
of mass purges is more diffuse, it is the mass of party members composed of potentially millions
of individuals (Teiwes, 1993, 5).6 Further, while elite purges depend on circumstances, communist
leaders thought to regulate the periodicity of mass purges (Getty, 1987, 38, 41), with Mao pre-
scribing rectification campaigns twice every five year (Teiwes, 1993, 224). On this dimension, mass
purges thus share more similarities with elections than elite purges.
3 Set-up
We study a two-period (t ∈ {1, 2}) model with an autocrat (A) and a [0, 1] continuum of party
members, indexed by the superscript m. Each party member is characterized by a type τ ∈ {i, o},
where τ = i corresponds to an ideologue and τ = o to an opportunist.7 A party member’s type is
his private information, however it is common knowledge that there is a proportion λ of ideologues
among party members.
Each period, party member m exerts effort em ∈ [0, 1] at cost (em)2/2 on an individual project.
The probability member m’s project is successful is equal to em.8 While a member’s effort is not
observed by the autocrat, the outcome of his project is (e.g., whether he has fulfilled his quota).
This assumption corresponds to historical evidence that officials in charge of the purges had little
information about local circumstances and could only judge according to how successful problem
cases or certain projects were handled (Teiwes, 1993, 28, 42, Rigby, 1968, 96). At the end of period
1, the autocrat decides to purge a proportion κ of party members, which we refer to as the ‘purge
breadth.’
Mass purges entail a loss in term of human capital and organizational knowledge as well as
the cost of potentially deporting party members or delay in finding suitable replacement for the
purged party member. This cost is captured by the cost function c(κ) with c(0) = 0 and (for ease
6As described by Weinberg (1993, 23) in his case study of mass purges in Birobizhan, leaders do not have “detailed
list of individuals to be purged.”7We use the term opportunist for simplicity to encompass members attracted to the party for the benefits attached
to it (Getty, 1987, 32-33), members who lacked “a wholehearted commitment to the Party’s cause” (Teiwes, 1993,
114-115), and members not in line with the current Party’s policy (see the quote from Stalin in Gregory et al., 2011,
36).8Alternatively, we could assume that effort is translated in project success via some concave function.
7
of exposition) marginal cost c′(κ) = c0 + c1κ, c0 ≥ 0, c1 > 0. When a party member is purged, a
new member replaces him (e.g., from the pool of candidates). The proportion of ideologues among
the replacement pool is ri.
Being purged has two distinct consequences for a party member. First, the party member is
expelled from the party and cannot enjoy the benefit associated with party membership in the
second period. Second, he suffers a direct loss L which corresponds to the “intensity of violence”
of the purge. The loss L can be relatively low if a party member is only fined or very large if a
party member is killed, her or his spouse deported, and their children sent to orphanage as it was
commonplace in Stalin’s USSR (Brzezinski, 1956, 110).9 The autocrat determines the intensity of
violence at the beginning of the game (e.g., investment in the security apparatus) at a cost ζ(L)
with ζ(0) = 0 and marginal cost ζ ′(L) = ζ0 + ζ1L, ζ0 ≥ 0 and ζ1 > 0.
In period 1, a party member enjoys a benefit R ≥ 0, which captures all the special privileges
accorded to party members. In addition, If he is not purged from the party, km = 0, an ideologue
obtains b > 0 when his project is successful, whereas an opportunist gets 0 regardless of the outcome
of his project. When purged, km = 1, a party member suffers the loss L > 0.10 Party member m’s
first-period payoff thus assumes the following form:
um1 (e; τ) = R + (1− km)
I{τ=i}b if project succeeds
0 otherwise
+ km(−L)− e2
2(1)
In period 2, if m survives the purge, given that there is no subsequent purge, his payoff can be
expressed as the sum of party membership benefit (R) and the net gain from successful project:
um2 (e; τ) = R +
I{τ=i}b if project succeeds
0 otherwise
− e2
2(2)
To simplify the exposition, we assume throughout that a party member does not discount the
future.
The autocrat gets a positive payoff—normalized to 1—when a party member’s project is suc-
cessful, and 0 otherwise. The autocrat thus wants to maximize the proportion of successful project,
which is equal to party members’ average effort in each period. In the first period, the autocrat
also bears the cost of investing in the intensity of violence and the cost of purging. Denote et the
9For example, the law of June 1934 in USSR established that all family members are legally responsible for the
illegal acts of one of them (Wolton, 2015, 271).10All our results hold if an ideologue enjoys the payoff from successful project even after being purged.
8
average effort in period t ∈ {1, 2}, we can thus express the autocrat’s first-period and second-period
payoffs as, respectively:
uA1 (κ, L) =e1 − c(κ)− ζ(L) (3)
uA2 =e2 (4)
The autocrat has a discount factor of β ∈ (0, 1), which captures, among other things, the risk
(perceived or real) of losing power between the two periods.
To summarize, the timing of the game is:
Period 1:
1. Autocrat chooses the intensity of violence L ≥ 0;
2. Member m chooses effort em1 ;
3. Project outcome (success/failure) is determined and observed by autocrat. Autocrat chooses
the purge breadth κ;
4. Purged members are replaced by new party members, and first-period payoffs are realized;
Period 2:
1. (Surviving and new) member m chooses effort em2 ;
2. Project outcome is determined;
3. Game ends and second-period payoffs are realized.
Note that the assumption that the autocrat commits to an intensity of violence at the beginning of
the game is not innocuous. Without commitment, at the moment of the purge, the autocrat would
either choose no violence (if violence is costly) or the highest feasible intensity (if it is costless). This
is because once efforts choices are made, violence has no effect on selection (and clearly on effort).
This assumption, however, has some historical ground. Funds for the Great Terror were earmarked
before its launch (Wolton, 2015, 317; unfortunately, there is no similar historical evidence for mass
purges). Observe further that the autocrat prefers to commit whenever her preferred intensity of
violence is positive.
The equilibrium concept is Perfect Bayesian Equilibrium (PBE), which implies that each party
member correctly anticipates the autocrat’s purging decision and other members’ effort when choos-
ing his own effort, and, in turn, the autocrat correctly anticipates the level of effort by each type
when determining her investment in violence and purging strategy. For simplicity, we impose that
agents are anonymous, so all agents with a successful (failed) project face the same probability of
being purged. Finally, to deal with measurability issue, we assume that when the autocrat observes
9
an out-of-equilibrium event, she treats the deviation as a mistake and does not distinguish between
the party member who deviated and the rest of the party which followed the prescribed strategy.11
If after these restrictions, multiple PBE arise, we select the one which maximizes the autocrat’s
ex-ante expected welfare (henceforth autocrat welfare). In what follows, “equilibrium” refers to
this class of equilibria.
Throughout, we use the following notation. v(τ) corresponds to a party member’s flow payoff
if his project is successful; i.e., v(i) = b and v(o) = 0. V2(τ) denotes a party member m’s expected
payoff from being in the party in period 2 as a function of his type. Simple algebra yields V2(i) =
R + b2/2 and V2(o) = R. The (ex-ante) average payoffs are denoted by v = λv(i) + (1 − λ)v(c)
and V2 = λV2(i) + (1− λ)V2(o). For the autocrat, denote W2(τ) her second-period expected payoff
induced by a type τ ∈ {i, o}. It can be checked that W2(i) = b and W2(o) = 0. The gain from
replacing an opportunist by an ideologue is Di,o := W2(i)−W2(o).
We also impose the following restrictions on parameter values:
βriDi,o < c0 + c1 ≤ βriDi,o + c11 + v
2− βλ
1−v(i)2− v(i)−v
2− (V (i)− V2)
1−v2
Di,o (5)
The left-hand side states that it is never optimal for the autocrat to purge the whole party even if
there is no ideologue among current party members (e.g., due to the risk of popular rebellion if the
party work is too disrupted). The right-hand side is a technical condition meant to limit the number
of cases to be considered. All results hold substantially when this inequality is relaxed. Finally, we
assume that the highest feasible intensity of violence, denoted L, satisfies L := 1− v(i)− V2(i).12
The analysis proceeds in three steps. First, we consider the optimal purge breadth for exogenous
levels of violence and uncover the ‘pool size’ and ‘pool make-up’ effects. Then, we examine the
conflicting effects of violence on effort and selection, which we term the love-fear trade-off. Finally,
we characterize the optimal solution to the love-fear dilemma and how it is affected by underlying
fundamentals.
11Alternatively, we could define two subsets of party members ξ0 and ξ1 who always exert (respectively) effort 0
and 1, implying that there is no out-of-equilibrium event of measure 0.12This assumption guarantees that v(i) + V2(i) + L ≤ 1 so there is no corner solution in effort (which only
complicates the analysis). A party member cannot work all the time for a long period of time so his effort is
naturally bounded above by the number of hours available in a day.
10
4 Purge Breadth
Due to our equilibrium refinement, there is no equilibrium in which agents exert zero effort.13
When they exert effort, party members endogenously sort into pools of failure and success, which
constitute, with the proportion of ideologues among existing party members (λ) and potential
replacements (ri), the only information available to the autocrat at the time of her purging decision.
However, given that ideologues receive an intrinsic benefit from a successful project, they always
have more incentive to exert effort than opportunists and are more likely to belong to the success
pool (i.e., the single-crossing condition holds). Consequently, the autocrat always first targets
unsuccessful party members.
When choosing their effort, party members take into account both the intrinsic value of success
and the risk of being purged after failure or success. As success on a project provides full or partial
inoculation from a purge, all party members have incentive to exert effort in order to survive
the purge. The benefit from a successful project, however, depends on the relative probability
of being purged when in the success and failure pools, which we refer to as the (success/failure)
pool incidences. These pool incidences can take three qualitatively distinct forms which determine
the nature of the purge. When only a portion of the failure pool is purged, we say that the
purge is “partially discriminate.” When the entire failure pool is purged, we label the purge “fully
discriminate.” Finally, when even some successful members are purged, we use the qualifier “semi-
indiscriminate.” In turn, the purge incidence faced by a member m is a function of two factors:
the breadth of the purge and the efforts of other party members.
Suppose that the purge breadth and all other efforts were exogenous. How would a party
member respond to either quantity? A party member’s effort depends critically on the nature of
the purge. In a partially or fully discriminate purge, the payoff from belonging to the success
pool is large: a successful party member obtains his flow payoff and is inoculated against the
purge. As such, a party member exerts high effort when anticipating a discriminate purge. In a
semi-indiscriminate purge, the benefit from belonging to the success pool is relatively low. Even if
successful, there is a risk a party member’s effort is wasted as he may be purged anyway.
13Absent our restrictions, there would exist an equilibrium in which all party members exert 0 effort and the
autocrat would purge with probability 1 a successful party members. This equilibrium would only be sustained by
the arguably unreasonable out-of-equilibrium belief that a successful member is likely to be an opportunist even
though only ideologues are intrinsically motivated to exert effort.
11
In turn, holding the purge breadth constant, a change in other party members’ level of activity
affects a party member’s incentive to exert effort by altering the relative size of the success and
failure pools. This is the pool size effect, which we say is positive when a member’s effort increases
with other members’ level of activity and negative otherwise. In a discriminate purge, higher
activism by other members reduce the size of the failure pool, which increases the failure pool
incidence and therefore encourages effort. Thus, in a discriminate purge, efforts by party members
are strategic complements and the pool size effect is positive. In contrast, in a semi-indiscriminate
purge, efforts are strategic substitute and the pool size effect is negative. Increased level of activity
by other members depresses a member’s effort because the benefit of being in the success pool is
reduced as the failure pool becomes thinner and the success pool incidence increases. Finally, notice
that the nature of the purge depends on all party members’ effort. As the failure pool becomes
too thin relative to the breadth of the purge (i.e., 1 − e1 < κ), a discriminate purge becomes
semi-discriminate.
Lemma 1 summarizes the reasoning above noting that in a discriminate purge, a party member
considers the flow payoff from a successful project (v(τ)) as well as the expected loss from being
purged ( κ1−e1 (V2(τ) +L)), whereas in a semi-indiscriminate, he takes into account the benefit from
success (v(τ) + V2(τ) + L) weighted by the probability of surviving the purge (1− κ−(1−e1)e1
).
Lemma 1. A type τ ∈ {i, o} party member m chooses effort:
em1 (τ) =
v(τ) + κ1−e1 (V2(τ) + L) if 1− e1 ≥ κ(
1− κ−(1−e1)e1
)(v(τ) + V2(τ) + L) if 1− e1 < κ
, (6)
As explained above, in a discriminate purge, party members’ efforts are strategic complement,
whereas they are strategic substitute in a semi-indiscriminate purge. These effects, however, are
secondary compared to the direct impact of the purge breadth. Consequently, the nature of the
purge (discriminate or semi-indiscriminate) is determined solely by the purge breadth and the
intensity of violence, and is thus fully in the autocrat’s hands.
Lemma 2. A purge is semi-indiscriminate if and only if κ > 1− v − V2 − L := κ(L).
When deciding whom to purge, the autocrat observes only the outcome of a party members’
project, and forms a posterior that a party member is an ideologue based on success (denoted
µS(e1)) or failure (µF (e1)). The autocrat’s posteriors incorporate her conjectures (correct in equi-
librium) of the different levels of effort exerted by ideologues and opportunists. Due to ideo-
logues’ intrinsic motivation, a successful project is a positive signal of ideological alignment, so
12
µF (e1) < λ < µS(e1). However, this signal is never perfect: ideologues sometimes fail and op-
portunist sometimes succeed. Consequently, in any purge, some ideologues are purged and some
opportunists survive.
Consider the autocrat’s incentives to purge a member after observing his project is unsuccessful.
Recall that W2(τ) is the autocrat’s second period expected payoff induced by a type τ ∈ {i, o}
party member. If the autocrat retains the party member after failure, her expected payoff is:
µF (e1)W2(i) + (1− µF (e1))W2(o). Since the proportion of ideologues in the replacement pool is ri,
the autocrat’s payoff from purging an unsuccessful party member is: riW2(i) + (1− ri)W2(o). The
autocrat’s expected benefit from purging a member after failure is thus:
WF = [ri − µF (e1)]Di,o2 , (7)
By a similar reasoning, the expected benefit from purging a successful party member is:
WS = [ri − µS(e1)]Di,o2 (8)
The autocrat’s incentive to purge thus depends critically on the informativeness of her posteriors,
which are a function of party members’ endogenous sorting into the success and failure pools. Thus,
in addition to the pool size effect, resulting from the interdependence between party members’
efforts, there exists a pool makeup effect which captures how changes in party members’ efforts
affect the autocrat’s learning. We say that the pool makeup effect is positive when the target pool
becomes more tainted and the autocrat’s incentive to purge increases. As party members’ effort
depends on the purge breadth, the pool makeup effect is a function of κ and the nature of the purge.
To make sense of it, we first examine how an exogenous increase in the purge breadth affects the
autocrat’s relevant posteriors.
Anticipating a discriminate purge (relatively low κ), both ideologues and opportunists increase
their effort and exit the failure pool in response to higher purge threat. Ideologues, however, have
more to lose from being purged and so increase their effort relatively more. Since ideologues are
also less likely to fail to start with, they exit the failure pool at a higher rate, and the target pool
becomes more tainted. The pool makeup effect is thus positive.
In a semi-indiscriminate purge (relatively high κ), the autocrat’s benefit from purging a party
member is determined at the margin by the informativeness of success (see (8)). The sign of the pool
makeup effect is thus a function of the differential exit rate between ideologues and opportunists
from the success pool (since an increase in κ decreases effort). Ideologues are more responsive to
13
a change in purge breadth, but (relatively) more likely to belong to the success pool before it. In
our set-up, the two effects compensate each other because party members’ cost of effort exhibits
constant elasticity. Both types exit the success pool at the same rate, resulting in a null pool
makeup effect.
In equilibrium, the purge breadth, pool size and pool makeup effects are jointly determined.
However, because all players correctly anticipate each other’s strategy, we can simply compare the
marginal cost of purging an additional number given by c′(κ) = c0 + c1κ with the marginal benefit
(determined by (7) and (8)) which incorporates the pool size and makeup effects. Recall that
κ(L) = 1− v − V2 − L, we obtain the following lemma.
Lemma 3. There exists a unique equilibrium purge breadth κ∗(L). Further, if at κ = κ(L),
(i) c0 + c1κ >WF , the purge is partially discriminate;
(i) WS ≤ c0 + c1κ ≤ WF , the purge is fully discriminate;
(ii) c0 + c1κ <WS, the purge is semi-indiscriminate.
Lemma 3 indicates that, somewhat unsurprisingly, the autocrat chooses a partially discriminate
purge if the cost of carrying a fully discriminate purge is too high (point (i)). In turn, she prefers
a semi-discriminate purge whenever the cost is sufficiently low (point (iii)). In all other cases, the
purge is fully discriminate (point (ii)).14 Notice that since the success pool has a greater proportion
of ideologues (i.e., µS(e1) > µF (e1)), the benefit of purging a successful party member is strictly
lower than the benefit of purging a failed one (WS < WF ) and a fully discriminate purge is not
a knife-edge result. Further, since a party member’s success is always an (imperfect) signal of
ideological congruence with the autocrat, Lemma 3 implies that semi-indiscriminate purges can
occur only if the replacement pool is better on average than the current party members.
Corollary 1. If ri > λ, there exists a non-measure zero set of parameter values such that the
equilibrium purge is semi-indiscriminate.
14Due to the strategic complementary, in a partially discriminate purge, a member’s effort is convex in the
(anticipated) purge breadth. The pool makeup effect is thus increasing and so is the marginal benefit of purging.
The marginal benefit WF may thus intersect the marginal cost more than once in the range [0, κ(L)]. Despite this
complication, points (i) and (ii) hold because the autocrat always prefers the highest feasible purge breadth to take
advantage of the higher marginal benefit from purging.
14
5 The love-fear tradeoff
In the previous analysis, we fixed the intensity of violence. Here, we examine how changes in L
affect the autocrat’s and party members’ strategies and reserve the analysis of the optimal intensity
of violence to the next section.
We first consider the effect of greater intensity of violence on the performance of the party in
the first period, what we term the fear effect which is positive whenever an increase in L increases
average effort. The next proposition establishes that the fear effect is always positive. As the cost
of being purged increases, all party members have greater incentives to exert effort.
Fear, however, motivates differentially ideologues and opportunists since any change in efforts
triggers the pool size and makeup effects identified above and thus affects the purge breadth. While
the cost of being purged is similar for all types, the indirect pool effects on the probability a member
survives are weighted by a member’s second-period payoff, greater for ideologues. Consequently,
the strength of the fear effect depends on the nature of the purge.
In a discriminate purge, the greater level of activity caused by higher L generates a positive
pool size effect. As ideologues are more responsive to the pool size effect, they exit the failure
pool faster than opportunists producing a positive make-up effect. The autocrat then purges more
failures. Since greater purge breadth is also conducive to more effort, all effects go in the same
direction. Equilibrium effort by all party members increases at a (relatively) high rate with the
intensity of violence.
When the purge is semi-discriminate, the pool size effect is negative (i.e., party members’ efforts
are strategic substitutes) so greater activism depresses the incentives to exert effort, especially for
ideologues. Consequently, ideologues increase their effort less than opportunists, and the success
pool becomes more tainted. The pool makeup effect is thus positive, and the purge breadth in-
creases, further reducing the benefit of effort. The equilibrium effort thus increases at a (relatively)
low rate with the intensity of violence.15
The fear effect is summarized in Proposition 1 and illustrated in Figure 2a below.
15Importantly, the indirect pool effects only reduce the positive impact of greater intensity of violence on effort in
a semi-indiscriminate purge. To see that, suppose to the contrary that the fear effect is negative so average effort
decreases with L. Since the pool size effect is negative, members have greater incentive to exert effort, especially
ideologues. This would lead to a negative pool makeup effect (the autocrat has less incentives to purge the success
pool), reducing the proportion of successful members being purged and increasing the value of success. Consequently,
all members would increase their effort, contradicting the assumption of decreased average effort.
15
Proposition 1. The first-period equilibrium average level of effort increases with the intensity of
violence. Average effort increases at a faster rate in a fully discriminate purge than in a semi-
indiscriminate purge.
Our next result establishes that the nature of the purge, but not the breadth is determined by
the intensity of violence. In a partially discriminate purge, as L increases, the failure pool becomes
more tainted and this positive pool make-up effect leads to an increase in purge breadth. But
greater breadth increases effort and reduces the failure pool. As L continues to increase, the failure
pool becomes so thin that all failures are purged: the purge becomes fully discriminate for intensity
of violence above some Lfull. In a fully discriminate purge, although the failure pool becomes more
tainted, the pool makeup effect has no effect on the breadth as the entire pool is already being
purged. The only effect remaining is the fear effect which decreases the size of the failure pool and
thus the purge breadth.
As violence increases further, two effects lead a purge to move from fully discriminate to semi-
indiscriminate. First, as relatively more opportunists join the success pool, the latter becomes more
tainted and purging successful party members is more attractive for the autocrat. Second, as the
failure pool shrinks and the breadth of the purge declines, the (marginal) cost of purging successful
members decreases. Once the purge becomes semi-indiscriminate (above some intensity Lind), the
positive pool make-up effect induced by greater intensity of violence again implies an increase in
purge breadth. Overall, the purge breadth is non-monotonic in violence because of the coarseness
of the information available to the autocrat.16
Proposition 2 summarizes these findings and Figure 1 illustrates them.
Proposition 2. There exist unique Lfull < L and Lind ∈ (Lfull, L] such that:
(i) For L < Lfull, the purge is partially discriminate and breadth strictly increases with violence;
(ii) For L ∈ [Lfull, Lind], the purge is fully discriminate and breadth strictly decreases with violence;
(iii) For L > Lind, the purge is semi-indiscriminate and breadth strictly increases with violence.
Our theory thus predicts that violent purges (L > Lind) are semi-indiscriminate, whereas (rel-
atively) mild purges are discriminate. This prediction seems in line with historical evidence. As
described by Teiwes (1993, 25-27), Chinese rectification campaigns were characterized by low inten-
sity of violence and by a high level of predictability and so resemble discriminate purge. In contrast,
16This non-monotonicity result would hold in any setting in which the autocrat’s information is discrete (but not
necessarily binary).
16
Figure 1: Equilibrium purge breadth and intensity of violence
Parameter values: λ = 1/3, ri = 2/3, R = 0, b = 1/4, β = 0.9, c0 = 0, c1 = 0.17.
during the purges of the thirties in USSR, the intensity of violence was high and the target of the
purges less delimited as “flouting commands court danger, but even enthusiastic compliance is no
guarantee of safety” (ibid., 25). Stalinist purge are thus examples of semi-indiscriminate purges
(we provide some reasons why it may have been so below).
The intensity of violence, however, does not determine the purge breadth due to the non-
monotonous relationship uncovered in Proposition 2. This appears again to correspond to historical
facts. Despite the differences in intensity of violence and nature, Soviet and Chinese purges had
similar breadth. During the 1930s Stalinist purges, the proportion of purged members varied from
5% in 1930, 1931, and 1937 to 22% in 1933-34 (see Table 1 in Appendix A for more details). In
turn, the expulsion rate in rectification campaigns in China fluctuated between 9% in 1957-58 and
23% in 1947-48 (see Table 2).17
Having examined the effect of violence on effort and the nature and breadth of the purge, we now
consider its impact on selection. Observe that since a new party member is always better on average
(from the autocrat’s perspective) than a purged party member, the second-period ideological unity
(defined by the proportion of ideologues) of the party is always greater following the purge. An
increase in the intensity of violence, however, affects the selection benefit for the autocrat. This is
what we refer to as the love effect, which is positive (resp. negative) if greater violence improves
(worsens) selection. The next proposition establishes conditions under which the love effect is
negative for the survivors of the purges and for all party members, Figure 2b illustrates this result.
17Given the differences in size and population, our preferred measure is the proportion of party members purged.
The number of expelled was always much larger in China.
17
Proposition 3.
(i) The proportion of ideologues among surviving members of the purge strictly increases with L if
and only if L < Lfull, and strictly decreases otherwise.
(ii) The proportion of ideologues in the party in the second period weakly increases with L for all
L if and only if λ ≥ ri.
(iii) If ri ∈ (λ, 2λ], the proportion of ideologues in the party in the second period strictly increases
with L for L < Lfull and strictly decreases otherwise.
The first part of the proposition highlights that the autocrat’s ability to screen party members
decreases with L unless the purge is partially discriminate. For L < Lfull, as the intensity of
violence increases, more party members exit the failure pool and more failures are purged. Among
surviving members, a greater proportion of party members thus belongs to the success pool, which
is of higher quality than the failure pool. The love effect is then positive. When the purge is fully
discriminate or semi-discriminate, surviving members all belong to the success pool. Screening
then worsens because the success pool becomes more tainted as more opportunists enter the pool
relative to the stocks of both types. The love effect is then necessarily negative.
Even though purged party members are replaced by (in expectation) more ideological members,
the love effect can still be negative when it comes to (second-period) party membership. Maybe
surprisingly, the love effect is always negative for high enough intensity of violence (L ≥ Lfull) when
the replacement pool is better than the existing pool of party members (Proposition 3(ii)). This
occurs because in a fully discriminate purge, a lower proportion of party members are replaced
by better new party member (Proposition 2(ii)). A decreasing purge breadth, however, is not
necessary to produce a negative love effect for second-period party membership. Indeed, whenever
the difference between the replacement pool and existing party numbers is not too large (ri ≤ 2λ, a
sufficient condition), the deterioration of the pool of survivors discussed above dominates increased
replacement, and the love effect is again negative.
This section therefore establishes that under certain circumstances (such as a better replace-
ment pool), the autocrat faces a trade-off between fear (better first-period performance) and love
(lower second-period ideological unity), not too dissimilar from the dilemma identified long ago
by Machiavelli (2005, Chapter 17). While there is little historical evidence on the effect of purges
on selection, the trade-off we identify might explain why Stalin, who understood that “saboteurs
18
(a) Fear effect (b) Love effect
In Figure 2b, the plain line corresponds to proportion of ideologues among all party members in period 2, the dashed
line to the proportion of ideologues among survivors of the purge. Parameter values: λ = 1/3, ri = 2/3, R = 0,
b = 1/4, β = 0.9, c0 = 0, c1 = 0.17.
disguise themselves by over-fulfilling the plan” (cited in Dallin and Breslauer, 1970, 57), allegedly
asserted that it is better to induce loyalty by fear than by conviction.18
6 Intensity of violence
In this section, we consider how the autocrat’s choice of violence balances the (positive) fear and
(potentially negative) love effects. When L is low (L < Lfull), the love effect is positive. Hence,
the autocrat has strong incentive to increase the intensity of violence. A purge, therefore, will be
partially discriminate only if the cost of investing in the security apparatus is very large. When L is
large (L > Lind), the love effect is negative. Further, the fear effect is relatively small (Proposition
1). A purge is thus semi-indiscriminate only if the cost of investing in the security apparatus is
low.19
18Dallin and Breslauer (1970, 42 footnote 37) reproduces an anecdote circulating among Moscow party members
in 1931. “Yagoda was alleged to have asked Stalin: ‘Which would you prefer Comrade Stalin: that party members
should be loyal to you from conviction or from fear?’ Stalin is alleged to have replied: ‘From fear.’ Whereupon
Yagoda asked, ‘Why?’ To which Stalin replied: ‘Because convictions can change: fear remains.”’19At L = Lfull, by definition κ(L) = 1− e1. Effort thus satisfies em1 (τ) = v(τ) + V2(τ) + L (see Lemma 1). The
posterior µF = λ1−em1 (i)1−e1 thus depends only on model parameters and L. For L > Lfull, the posterior µS does not
depend on κ (see the discussion prior to Lenna 3). Hence, the quantities used in the text of Proposition 4 only
depend on the intensity of violence and model parameters.
19
Proposition 4. There exists a (almost always) unique equilibrium intensity of violence L∗. Further,
(i) If at L = Lfull, ζ0 + ζ1L ≥ 1 + βDi,o(λ− µF (e1)), then L∗ ≤ Lfull;
(ii) If at L = Lind, ζ0 + ζ1L <12
(1 + βDi,o (λ−µS(e1))
c1(v+V2+L)
)+ βDi,o(λ− µS(e1)), then L∗ > Lind.
For low L, point (i) establishes formally that the fear effect (equals to 1) and the love effect
(equals to βDi,o(λ− µF )) are both positive. In contrast, for high intensity of violence (point (ii)),
the fear effect, while still positive, is relatively low (12(1 + βDi,o (λ−µS)
c1(v+V2+L))) and the love effect is
negative (βDi,o(λ− µS) < 0).
Given the tradeoff between love and fear for high intensity of violence, it remains to determine
under which conditions a semi-discriminate purge occurs. The next corollary lists three necessary
and sufficient conditions. As explained above, the replacement pool must be sufficiently good
quality, and in particular better on average than existing party members (condition 1.). Further,
the cost of purging (c0, c1) must be sufficiently low to compensate for the relatively low marginal
benefit of purging a successful party member (condition 2.). Finally, in line with Proposition 4,
the cost of investing in the security apparatus must also be relatively small.20
Corollary 2. A purge is semi-indiscriminate if and only if:
1. The proportion of ideologues in the replacement pool ri is strictly higher than some ri ≥ λ;
2. The cost parameters c0 and c1 are respectively strictly below some c0(ri) and c1(ri, c0);
3. The cost parameters ζ0 and ζ1 are respectively strictly below some ζ0(ri) and ζ1(ri, ζ0).
As noted above, the Stalinist purges of the thirties resembled semi-indiscriminate purges. His-
torical evidence do not permit to evaluate whether the conditions of the corollary were satisfied.
Interestingly, however, they strongly suggest that the pool of candidates to the CPSU was markedly
different than existing party members. While the 1920s were marked by a divide between ideo-
logically committed and technically proficient officials, starting in 1928, Stalin took great interest
in “training a new generation of cadres that would be both Red and experts” (Fitzpatrick, 1979,
382). Around 170,000 of these new cadres graduated between 1928-32 and 370,000 between 1933
and 1938 (ibid, 398).21 These cadres were among the main beneficiaries of the purges of the 1930s.
While their relative productivity compared to old cadres is still debated (e.g., Dallin and Breslauer,
20The thresholds described in Corollary 2 depend on all (other) parameter values. We have reduced notation to
facilitate the exposition.21Brzezinski (1956, 90-91) puts the total number of engineers emerging from the Stalinist state schools between
1933 and 1938 at 1 million.
20
1970, 37), there are clear evidence that the new cadres were more loyal to Stalin (Wolton, 2015,
267-68).22
Observe that Proposition 4 does not characterize the intensity of violence when conditions (i)
and (ii) do not hold. Due to the positive fear and love effects as well as the complementarity in
party members’ effort in a partially discriminate purge, the marginal benefit of violence is strictly
convex for L ≤ Lfull. This means that the marginal benefit may intersect the marginal cost more
than once, and the autocrat must choose between the lowest intersection and L = Lfull.23 This
implies that predicting the optimal intensity of violence is difficult even though it is generically
unique for the autocrat. Further, small changes in the underlying fundamentals can be associated
with a large increase in the intensity of violence.
Proposition 5. There exists a non-measure zero set of parameter values Pd such that if (λ, ri, b, c0, ζ0) ∈
Pd, there exists cd1 and ζd1 satisfying limc1↑cd1
L∗ < limc1↓cd1
L∗ and limζ1↑ζd1
L∗ < limζ1↓ζd1
L∗.
The unpredictability in violence outlined in Proposition 5 also implies that the purge breadth
may also seem random (by Proposition 2). While there is little ways to test this prediction, it
should be noted that there was great variation in the number of party members affected by mass
purges both in USSR and Communist China (see Tables 1 and 2 in Appendix A).
7 Mass Purges and Top-Down Accountability in Autocra-
cies
In our theoretical framework, mass purges are a salient method of top-down accountability in
autocracy. In totalitarian regimes, the political and professional responsibilites of party members
are fused and “[f]ailure in the latter thus automatically becomes a case of political accountability”
(Brzezinski, 1956, 86). The autocrat’s top-down accountability problem thus bears similarities
with bottom-up accountability where voters’ evaluation of their representative based on economic
indicators (for example, Besley, 2007, chapter 4).
There exists, however, a fundamental difference between the autocrat and voters’ accountabil-
ity problems. While many voters hold a single representative accountable so one is accountable to
22Notice that all our results hold as long as the replacement pool is expected to be more active (on average) than
existing party members. It is of little consequence whether the difference is due to greater loyalty to the leader,
greater productivity, or both.23The second intersection being a local minimum.
21
many, the autocrat faces a mass of party members so many are accountable to one. Our analysis
highlights forces specific to many-to-one accountability. In our setting, there is no team production
problem and no yardstick competition as in a tournament, party members work on independent
projects. Nonetheless, their fate is linked through the purge. This generates strategic interdepen-
dence in party members’ activism—the pool size effect. In turn, the pool size effect changes the
autocrat’s inference problem as her ability to identify an agent’s type depends on the behavior
of all party members—the pool makeup effect. Consequently, top-down accountability cannot be
apprehended by studying a single or few agents in isolation; researchers must take a comprehensive
perspective. Top-down accountability is, by definition almost, a large N problem.
While the many-to-one feature we identify is not unique to top-down accountability in autoc-
racy, the critical role of violence due to the autocrat’s monopoly in the political and judicial areas
is. We show that the intensity of violence determines the nature of the purge, but not necessarily
its breadth (Proposition 2). Further, even though the autocrat faces little constraint on her ac-
tions, due to party members’ strategic response, she cannot escape a trade-off between fear (higher
performance) and love (less unity). This trade-off implies that a rational autocrat does not use
violence bluntly. We should observe greater violence when the need to mobilize the autocrat’s
agents is high,24 and more restraint when the autocrat is interested in selection.
This result may also explain why the Fascist and Nazi regimes used purges parsimoniously
(as documented above) despite opportunists entering the parties (Orlow, 1969, 45-50, Morgan,
2012, 312). In both countries, the party only played a secondary role. “[T]he Nazi party at no
point attained the effective supremacy over the administration of the State which the Soviet party
established in the early year of the Bolshevik revolution” (Unger, 1965, 459). As Orlow (1969,
176) describes, “party life [in the NSDAP] in early 1936 was, in a word, boring.” In Italy, the
Fascist Party legally became a subordinate to the State after the Gran Consiglio of 1928 (Gentile,
1984). Relatedly, right-wing totalitarian regimes resorted to top-down accountability in the form
of purges only when party members were given a more active role in politics. In Italy in 1931, the
purge was concomitant with a conflict over the rights of the Catholic Church (Morgan, 2012). In
Germany in 1938, the purge occurred when the NSDAP acquired a greater role in judicial, church,
and economic affairs after the invasion of Austria (Orlow, 1969, 241-2).
24This particular comparative statics is in line with historical evidence on Chinese rectification campaigns (Teiwes,
1993).
22
8 Conclusion
This paper provides a theoretical framework to understand mass purges as a method of top-down
accountability in autocracy. Our theory highlights that the nature of top-down accountability is
often many-to-one with a mass of agents accountable to a single principal. The fate of all party
members is linked through the purge generating new strategic interdependency which affects both
agents’ efforts (pool size effect) and principal’s learning (pool makeup effect).
The many-to-one nature of top-down accountability is not limited to autocracy. Many hierar-
chical institutions share this feature. In large firms, in the military, or the bureaucracy, a principal
is faced with a mass of agents whose individual efforts are difficult to ascertain. The principal’s
methods of accountability there takes the form of mass layoffs or performance based up or out
promotion standards, which are not too dissimilar from purges. The approach developed in this
paper has thus a large range of applications for the study of public and private institutions.
Autocracies, however, are specific in their use of violence as the autocrat faces little restraints
on her actions due to her monopoly in the political and judicial areas. Even though the autocrat
faces little constraint on her actions, due to party members’ strategic response, she cannot escape
a trade-off between fear (more effort) and love (less unity).
We, however, focus exclusively on one particular form of violence, mass purges. Researchers
would do well to consider other means available to the autocrat and explore why mass purges have
been used by only a limited number of autocrats (Lenin, Stalin, Mao, Mussolini, and Hitler). Other
important questions include: the effectiveness of mass purges at achieving other objectives of the
autocrat such as rooting out corruption; their dynamic consequences; and their use in conjunction
with competitions for power and resources.
23
References
Acemoglu, Daron and Alexander Wolitzky, “The economics of labor coercion,” Econometrica,
2011, 79 (2), 555–600.
, Georgy Egorov, and Konstantin Sonin, “Coalition formation in non-democracies,” The
Review of Economic Studies, 2008, 75 (4), 987–1009.
, , and , “Do Juntas Lead to Personal Rule?,” The American Economic Review: Papers &
Proceedings, 2009, 99, 298–303.
Arendt, Hannah, The origins of totalitarianism, Vol. 244, Houghton Mifflin Harcourt, 1973.
Ashworth, Scott, “Electoral accountability: recent theoretical and empirical work,” Annual Re-
view of Political Science, 2012, 15, 183–201.
Bernhardt, Dan and Steeve Mongrain, “The Layoff Rat Race,” The Scandinavian Journal of
Economics, 2010, 112 (1), 185–210.
Besley, Timothy, “Principled agents?: The political economy of good government,” OUP Cata-
logue, 2007.
Bloch, Francis and Vijayendra Rao, “Terror as a bargaining instrument: A case study of
dowry violence in rural India,” The American Economic Review, 2002, 92 (4), 1029–1043.
Bo, Ernesto Dal, Pedro Dal Bo, and Rafael Di Tella, “Plata o Plomo?: Bribe and Pun-
ishment in a Theory of Political Influence,” American Political Science Review, 2006, 100 (01),
41–53.
Brzezinski, Zbigniew K, The permanent purge, Harvard University Press Cambridge, MA, 1956.
Dallin, Alexander and George W Breslauer, Political terror in communist systems, Stanford
University Press, 1970.
de Mesquita, Bruce Bueno and Alastair Smith, “Political Succession: A Model of Coups,
Revolution,Purges and Everyday Politics,” 2015.
Dewan, Torun and David P Myatt, “Scandal, protection, and recovery in the cabinet,” Amer-
ican Political Science Review, 2007, 101 (01), 63–77.
24
and , “The declining talent pool of government,” American Journal of Political Science, 2010,
54 (2), 267–286.
Economist, The, “No ordinary Zhou,” 2014.
Egorov, Georgy and Konstantin Sonin, “Dictators And Their Viziers: Endogenizing The
Loyalty–Competence Trade-Off,” Journal of the European Economic Association, 2011, 9 (5),
903–930.
and , “The Killing Game: Reputation and knowledge in politics of succession,” Research in
Economics, 2015, Forthcoming.
, Sergei Guriev, and Konstantin Sonin, “Why resource-poor dictators allow freer media:
A theory and evidence from panel data,” American Political Science Review, 2009, 103 (04),
645–668.
Fitzpatrick, Sheila, “Stalin and the Making of a New Elite, 1928-1939,” Slavic Review, 1979, 38
(3), 377–402.
Gailmard, Sean and John W Patty, “Formal models of bureaucracy,” Annual Review of
Political Science, 2012, 15, 353–377.
Gehlbach, Scott and Alberto Simpser, “Electoral manipulation as bureaucratic control,”
American Journal of Political Science, 2015, 59 (1), 212–224.
Gentile, Emilio, “The problem of the party in Italian Fascism,” Journal of Contemporary History,
1984, 19 (2), 251–274.
Getty, John Arch, Origins of the great purges: the Soviet Communist Party reconsidered, 1933-
1938, Vol. 43, Cambridge University Press, 1987.
Gregory, Paul R, Philipp JH Schroder, and Konstantin Sonin, “Rational dictators and
the killing of innocents: Data from Stalins archives,” Journal of Comparative Economics, 2011,
39 (1), 34–42.
Lorentzen, Peter, “China’s Strategic Censorship,” American Journal of Political Science, 2014,
58 (2), 402–414.
25
and Xi Lu, “Factional Struggle or Efforts in Saving the Party: Evidence from AntiCorruption
Campaign in China,” MPSA Conference 2016, 2016.
Machiavelli, Niccolo, The prince, OUP Oxford, 2005.
Mesquita, Bruce Bueno De and Alastair Smith, The dictator’s handbook: why bad behavior
is almost always good politics, PublicAffairs, 2011.
Morgan, Philip, “The Trash Who are Obstacles in Our Way: the Italian Fascist Party at the
Point of Totalitarian Lift Off, 1930–31,” The English Historical Review, 2012, 127 (525), 303–344.
Orlow, Dietrich, The history of the Nazi party: 1933-1945, Vol. 2, [Pittsburgh]: University of
Pittsburgh Press, 1969.
Rigby, Thomas Henry, “Communist Party membership in the USSR,” Princeton: Princeton
University Press, 1968, 178, 369–491.
Schapiro, Leonard Bertram, The government and politics of the Soviet Union, Taylor & Francis,
1977.
Svolik, Milan W, “Power sharing and leadership dynamics in authoritarian regimes,” American
Journal of Political Science, 2009, 53 (2), 477–494.
Teiwes, Frederick C, Politics and Purges in China: Rectification and the Decline of Party Norms,
1950-1965, ME Sharpe, 1993.
Unger, Aryeh L, “Party and state in soviet Russia and Nazi Germany,” The Political Quarterly,
1965, 36 (4), 441–459.
Weinberg, Robert, “Purge and Politics in the Periphery: Birobidzhan in 1937,” Slavic review,
1993, 52 (1), 13–27.
Wolton, Thierry, Histoire mondiale du communisme, tome 1: Les bourreaux, Grasset, 2015.
Zakharov, Alexei V, “The Loyalty-Competence Trade-Off in Dictatorships and Outside Options
for Subordinates,” The Journal of Politics, 2016, 78 (2), 457–466.
26
A Historical data
Table 1: Proportion of party members purged in USSR
Year Proportion Source
1919 10-15 Getty (1987, Table 2.1)-Rigby (1968, 76)
1921-23 25 Getty (1987, Table 2.1)-Rigby (1968, 97)
1924 3 Getty (1987, Table 2.1)
1925 4 Getty (1987, Table 2.1)
1926 3 Getty (1987, Table 2.1)
1927 6 Rigby (1968, 127)
1928 13 Getty (1987, Table 2.1)
1929 11 Getty (1987, Table 2.1)
1930 5 Getty (1987, Table 2.1)
1931 5 Getty (1987, Table 2.1)
1933-34 17-22 Getty (1987, 55)-Rigby (1968, 204)
1935 9-13 Getty (1987, Table 7.1)-Rigby (1968, 209)+
1936 10 Rigby (1968, 209)+
1937 5 Getty (1987, Table 7.1)
1951-53 5 Rigby (1968, 281)+
All proportions are approximation. + denotes authors’ calculation using Rigby (1968, 52)
.
27
Table 2: Proportion of party members purged in China
Year Proportion Source
1947-48 23 Teiwes (1993, 75)+
1951-54 10 Teiwes (1993, 110)
1957-58 9 Teiwes (1993, 268)
1959-60 20 Teiwes (1993, 339)
1964-65 10 Teiwes (1993, 425)
All proportions are approximation. + denotes authors’ calculation
B Proofs
As second-period actions are subsumed in V2(τ) (for a party member) and W2(τ) for the autocrat,
we ignore the first-period time subscript in all the proofs.
Lemma 4. There is no equilibrium in which the autocrat purges a strictly positive proportion of
successful party members and no failed party members.
Proof. The proof proceeds by contradiction. Suppose there is such equilibrium. It can easily be
checked that in such equilibrium, opportunists exert no effort (they have no gain from effort and
success only increases their probability of being purged). Consider now ideologues. If they exert
strictly positive effort, then the success pool is composed only of ideologues, whereas the failure
pool has all opportunists. The autocrat would have a profitable deviation to purge first from the
failure pool and not purge the success pool, a contradiction. If ideologues exert no effort, then
all party members exert no effort. But due to our equilibrium restriction, there does not exist an
equilibrium in which no party member exerts effort (see the reasoning in the text). We have thus
reached a contradiction.
Corollary 3. There is no equilibrium in which a party member faces a higher probability of being
purged after success than after failure.
Proof. The proof proceeds by contradiction. Suppose there is such equilibrium. By Lemma 4, it
must be that the autocrat is indifferent between purging from the success and failure pools (if she
strictly prefers to purge from the success pool, the reasoning in Lemma 4 applies). But by a similar
reasoning as in the proof of Lemma 4, it can easily be checked that we reach a contradiction.
28
Proof of Lemma 1
Denote γ and ρ the probability that a party member is purged when his project is a failure and
a success, respectively. Since we have a continuum of party members, party member m takes γ
and ρ as given. When he anticipates the breads of the purge to be smaller than the number of
unsuccessful members (1− e ≤ κ), γ ≥ 0 and ρ = 0 so his maximization problem is:
maxe∈[0,1]
R + e(v(τ) + V2(τ)) + (1− e)[γ(−L) + (1− γ)(V2(τ))]− e2
2(9)
If his project is successful (probability e), he receives a flow payoff v(τ) and is not purged so he
also gets his second period expected payoff. If his project fails, he is purged with probability γ
and suffers a loss L. Taking the first order condition and replacing γ by κ/(1− e), we obtain that
em(τ) = v(τ) + κ1−e(V2(τ) + L).
When he anticipates the breads of the purge to be greater than the number of unsuccessful members
(1− e > κ), γ = 1 and ρ > 0 so his maximization problem is:
maxe∈[0,1]
R + e(1− ρ)(v(τ) + V2(τ)) + (eρ+ (1− e))(−L)− e2
2(10)
If his project is successful (probability e), party member m still faces a risk of being purged.
This occurs with probability ρ. If he is not purged, he receives a flow payoff v(τ) and his second
period expected payoff. If his project is unsuccessful, he is always purged and suffers a loss L.
Taking the first order condition and replacing ρ by (k − (1 − e))/e, we obtain that em(τ) =
1−κe
(v(τ) + V2(τ) + L).
For the next lemma, it is useful to denote Lm := 1−v2− V2 and κm(L) := (1−v)2
4(V2+L).
Lemma 5. All ideological (resp. opportunistic) party members exert the same level of effort.
If L ≤ Lm, there exists a unique feasible level of effort for all κ. Otherwise, there exists a unique
feasible level of effort for κ /∈ (1 − v − V2 − L, κm(L)), three feasible levels of effort for κ ∈
(1− v − V2 − L, κm(L)), and two feasible levels of effort at κ = 1− v − V2 − L and κ = κm(L).
Proof. Anticipating future notation, denote κ(L) := 1 − v − V2 − L. The proof proceeds in five
steps. First, we characterize the equation which solves for the feasible average level of effort. We
then show uniqueness when κ ≤ κ(L) for all L. Third, we show uniqueness for the case when
L ≤ Lm. Step 4 considers the case when L > Lm and κ > κ(L). Step 5 shows that there is no
variation in effort within types.
Step 1. Suppose 1−e ≥ κ. By Lemma 1, members’ efforts are determined by the following system
29
of equations (dropping subscripts and superscripts whenever possible):
e(i) =v(i) +κ
1− (λe(i) + (1− λ)e(o))(V2(i) + L) (11)
e(o) =v(o) +κ
1− (λe(i) + (1− λ)e(o))(V2(o) + L) (12)
Multiplying (11) by λ and (12) by 1− λ, summing the resulting equations, we obtain:
e = v +κ
1− e(V2 + L) (13)
Proceeding in similar steps for the case 1− e < κ, we obtain:
e =1− κe
(v + V2 + L) (14)
By (13) and (14), the average effort of party members is thus the solution to the equation: e1 =
H(e1), with
H(e) =
v + κ1−e(V2 + L) for e ≤ 1− κ
1−κe
(v + V2 + L) for e > 1− κ(15)
Notice that the function H(·) is increasing and convex for e1 ∈ [0, 1−κ) and decreasing and convex
for e ∈ (1− κ, 1].
Step 2. Suppose κ ≤ κ(L), then the solution e∗1(κ) is unique. Using the property of H(·), it can
be checked that there is no solution in (1 − κ, 1]. Given H(0) > 0, if the solution is not unique,
there must be at least three solutions. But by (13), e1 must be the solution to the quadratic
equation: e2 − (v + 1)e + v + κ(V2 + L) = 0, which has at most two distinct solutions. Hence, a
contradiction. The reasoning also implies that the equilibrium effort e∗ is the smallest solution to
e2−(v+1)e+v+κ(V2+L) = 0. In fact, both solutions are positive so if the highest solution—denoted
eh(κ, L)—is an equilibrium effort (i.e., eh ≤ 1−κ), then the smallest solution—denoted el(κ, L)—is
also an equilibrium effort. But this contradicts uniqueness. Hence, when κ ≤ κ(L), average effort
is unique and equals to e∗(κ, L) =(v+1)−
√∆(κ,L)
2, with ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v).
Step 3. Suppose now that κ > κ(L). Since H(·) is strictly decreasing on the interval (1 − κ, 1]
and H(1) = (1− κ)(v + V2 + L) < 1, there is a unique average effort satisfying e∗(κ, L) > 1− κ.
In [0, 1− κ], notice that there must be zero or two solutions since H(1− κ) = v + V2 + L > 1− κ
and H(0) > 0. Consider eh(κ, L) =(v+1)+
√∆(κ)
2the highest solution to e2 − (v + 1)e + v +
κ(V2 + L) = 0. eh(κ, L) is a fixed point of H(·) only if eh(κ, L) < 1 − κ. We now show that this
condition is never satisfied for L < Lm. Notice that eh(κ, L) is decreasing and concave in κ since
∂eh(κ, L)/∂κ ∝ −∆(κ, L)−1/2 and ∂2eh(κ, L)/∂κ2 ∝ −∆(κ, L)−3/2. In contrast, 1− κ is obviously
30
linearly decreasing in κ. Further notice that eh(0, L) = 1, eh(κ, L) is minimized at κ = κm(L) and
eh(κm(L), L) = v+12
< 1 − κm(L) if and only if 0 < 2(V2 + L) − (1 − v) ⇔ L > Lm. Given the
properties of eh(κ, L) and 1− κ, we thus obtain that for all L < Lm, eh(κ, L) > 1− κ and so there
is no solution to Equation 15 in [0, 1−κ]. Finally, it can be checked that at L = Lm, κm(L) = κ(L)
and el(κ(L), L) = eh(κ(L), L), hence the equilibrium effort is unique and equals to el(κ(L), L) if
κ = κ(L) and√
(1− κ)(v + V2 + L) if κ > κ(L).
To summarize, when L ≤ Lm, the average communication effort is unique and equals to:
e∗(κ, L) =
(v+1)−
√∆(κ,L)
2(with ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v)) if 1− κ ≥ v + V2 + L√
(1− κ)(v + V2 + L) if 1− κ < v + V2 + L
(16)
Step 4. Suppose now L > Lm. We show that eh(κ, L) < 1 − κ for all κ ∈ (κ(L), κm(L)). First,
it can be checked that at κ = κ(L), ehl (κ, L) = 1 − κ when L > Lm. To see this, notice that at
κ = 1− (v + V2 + L), ∆(κ) = (1− v)2 − 4(1− v)(V2 + L) + 4(V2 + L) = (1− v − 2(V2 + L))2 and
eh(κ, L) = (1+v)+|1−v−2(V2+L)|2
= v + V2 + L if L > Lm, which implies the claim by the properties
of eh(·) (see Step 3). This directly implies that el(κ, L) < 1 − κ on this interval. Therefore, there
are three feasible solutions when L > Lm and κ ∈ (κ(L), κm(L)), At κ = κ(L), eh converges to
v + V2 + L; at κ = κm(L), eh converges to el so there are two feasible levels of effort in these two
cases.
Step 5. While there might be multiple solutions to the average effort, a member’s effort is uniquely
defined for each average effort by (6). Hence, the claim holds.
In the case when L ≤ Lm, we get:
e∗(κ, L; τ) =
v(τ) + 2κ
1−v+√
∆(κ,L)(V2(τ) + L) if 1− κ ≥ v + V2 + L√
(1−κ)
(v+V2+L)(v(τ) + V2(τ) + L) if 1− κ < v + V2 + L
(17)
From the proof of Lemma 5, recall that eh(κ, L) (el(κ, L)) is the highest (smallest) solution
to e2 − (v + 1)e + v + κ(V2 + L) = 0. Further denote eind(κ, L) =√
1− κ√v + V2 + L. Denote
el(κ, L; τ), eh(κ, L; τ), eind(κ, L; τ) the associated effort a type τ ∈ {i, o} party member.
Proof of Lemma 2
Necessity follows directly from the proof of Lemma 5. When L ≤ Lm, sufficiency follows from the
proof of Lemma 5. When L > Lm, we claim:
31
Claim 1. When L > Lm and κ > κ(L), under the assumption, the unique equilibrium average
effort satisfies e∗(κ, L) = eind(κ, L) so 1− e∗(κ, L) < κ.
We prove the claim in the proof of Lemma 3.
Lemma 6. (i) If κ < κ(L), party members’ efforts are strictly increasing and convex in κ.
(ii) If κ > κ(L), party members’ efforts are strictly decreasing and convex in κ whenever e∗(κ, L) =
eind(κ, L).
Proof. By (17), a party member’s effort is continuous in 1 − κ. It is differentiable in κ for κ ∈
[0, κ(L)) and κ ∈ (κ(L), 1]. For κ < κ(L), we get using (16):
∂e∗(κ, L; τ)
∂κ=∂ κ
1−e∗(κ,L)
∂κ(V2(τ) + L)
=(1− e∗(κ, L)) + κ∂e
∗(κ,L)∂κ
(1− e∗(κ, L))2(V2(τ) + L) (18)
Notice that ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v) and so ∂∆(κ)/∂κ = −4(V2 + L) < 0, and by
(16), ∂e∗(κ,L)∂κ
> 0. This directly implies: ∂e∗(κ,L;τ)∂κ
> 0. To see the convexity, notice that:
∂2e∗(κ, L; τ)
∂κ2=∂2 κ
1−e∗(κ,L)
∂κ2(V2(τ) + L)
=∂2e∗(κ,L)
∂κ2 (1− e∗(κ, L)) + ∂e∗(κ,L)∂κ
(κ∂e∗(κ,L)∂κ
+ (1− e∗(κ, L))
(1− e∗(κ, L))3(V2(τ) + L) (19)
By (16), ∂2e∗(κ,L)∂κ2 =
− ∂2∆(κ,L)
∂κ2 ∆(κ,L)+ 12( ∂∆(κ,L)
∂κ )2
∆(κ,L)3/2 > 0 since ∂2∆(κ, L)/∂κ2 = 0. Therefore, ∂2e(κ,L;τ)∂κ2 > 0
as claimed.
Point (ii), under the assumption of the Lemma, the result follows from direct examination of
(17).
For the next lemma, we consider the effect of an exogenous increase in purge breadth on the
autocrat’s posterior after failure and success. While the autocrat does not observe the members’
effort, she correctly anticipates their effort in equilibrium. For ease of exposition, in the next
lemma, we characterize the property of the autocrat’s posterior assuming that the purge breadth
is exogenous, and the average efforts characterized in Lemma 5 are always feasible. We relax all
these assumptions and characterize the proper equilibrium conditions in subsequent lemmas and
propositions.
Lemma 7. (i) Suppose average effort is el(κ, L), the autocrat’s posterior after failure is strictly
decreasing and concave in purge breadth for κ < κm(L).
32
(ii) Suppose average effort is el(κ, L), the autocrat’s posterior after failure is strictly increasing in
purge breadth for κ < κm(L).
(ii) Suppose average effort is eind(κ, L), the autocrat’s posterior after success is constant in purge
breadth for κ > κ(L).
Proof. Point (i). By Bayes’ Rule, µF (e(κ, L)) = λ1−el(κ,L;i)
1−el(κ,L). The relevant comparative statics is
then (omitting superscript):
∂µF (el(κ, L))
∂κ=
λ
(1− el(κ, L))2
[− el(κ, L; i)
∂κ(1− λel(κ, L; i)− (1− λ)el(κ, L; o))
+ (1− el(κ, L; i))
(λ∂el(κ, L; i)
∂κ+ (1− λ)
∂el(κ, L; o)
∂κ
)]=λ(1− λ)(1− el(κ, L; i))(1− el(κ, L; o))
(1− el(κ, L))2
[∂el(κ,L;o)
∂κ
1− el(κ, L; o)−
∂el(κ,L;i)∂κ
1− el(κ, L; i)
](20)
By examination of (17, el(κ, L; i) > el(κ, L; o). By examination of Equation 18, ∂el(κ,L;i)∂κ
> ∂el(κ,L;o)∂κ
.
This directly implies:∂el(κ,L;o)
∂κ
1−el(κ,L;o)−
∂el(κ,L;i)∂κ
1−el(κ,L;i)< 0 and ∂µF (e(κ,L))
∂κ< 0 as claimed.
To see that the posterior is strictly concave in κ, notice that:
∂2µF (el(κ, L))
∂κ2∝[∂2el(κ, L; o)
∂κ2(1− el(κ, L; i))− ∂2el(κ, L; i)
∂κ2(1− el(κ, L; o))
](1− el(κ, L))
+ 2∂el(κ, L)
∂κ
[∂el(κ, L; o)
∂κ(1− el(κ, L; i))− ∂el(κ, L; i)
∂κ(1− el(κ, L; o))
](21)
By examination of (19), ∂2el(κ,L;i)∂κ2 > ∂2el(κ,L;o)
∂κ2 . This directly implies ∂2µF (el(κ,L))∂κ2 < 0 since both
terms in brackets in (21) are negative.
Point (ii). Using Point (i), we only need to consider the sign of∂eh(κ,L;o)
∂κ
1−eh(κ,L;o)−
∂eh(κ,L;i)∂κ
1−eh(κ,L;i). By ex-
amination of (17, we still obtain eh(κ, L; i) > eh(κ, L; o). Further, ∂eh(κ,L;τ)∂κ
has the same sign as∂ κ
1−v−√
∆(κ,L)
∂κ, with
∂ κ
1−v−√
∆(κ,L)
∂κ∝1− v −
√∆(κ, L) +
1
2∆(κ, L)−1/2(−4(V2 + L)κ)
∝(1− v)∆(κ, L)1/2 − (1− v)2 + 2κ(V2 + L),
using ∆(κ, L) = (1−v)2−4κ(V2+L). Notice that(
(1−v)∆(κ, L)1/2)2
= (1−v)4−4κ(1−v)2(V2+L)
and(
(1− v)2− 2κ(V2 +L))2
= (1− v)4− 4κ(1− v)2(V2 +L) + 4κ2(V2 +L)2. Hence,∂ κ
1−v−√
∆(κ,L)
∂κ<
0. By examination of 18 (with the proper substitution), this implies ∂eh(κ,L;i)∂κ
< ∂eh(κ,L;o)∂κ
< 0.
Consequently,∂eh(κ,L;o)
∂κ
1−eh(κ,L;o)−
∂eh(κ,L;i)∂κ
1−eh(κ,L;i)> 0 so µF (eh(κ, L)) is strictly increasing with κ.
33
Point (iii). By Bayes’ rule, the autocrat’s posterior after success is: µS(e∗(κ, L)) = λ eind(κ,L;i)
eind(κ,L). By
a similar reasoning as above, we obtain:
∂µS(eind(κ, L))
∂κ=
λ
eind(κ, L)2
[eind(κ, L; i)
∂κ(λeind(κ, L; i) + (1− λ)eind(κ, L; o))
− eind(κ, L; i)
(λ∂eind(κ, L; i)
∂κ+ (1− λ)
∂eind(κ, L; o)
∂κ
)]=λ(1− λ)eind(κ, L; i)eind(κ, L; o)
eind(κ, L)2
[∂eind(κ,L;i)
∂κ
eind(κ, L; i)−
∂eind(κ,L;o)∂κ
eind(κ, L; o)
](22)
From (17), it can be checked that∂eind(κ,L;τ)
∂κ
eind(κ,L;τ)=− 1
2(v(τ)+V2(τ)+L)√
(1−κ)
√(v+V2+L)√
(1−κ)(v(τ)+V2(τ)+L)√
(v+V2+L)
= − 12√
1−κ . Therefore, ∂µS(eind(κ,L))
∂κ=
0 as claimed.25
Recall κm(L) = (1−v)2
4(V2+L).
Lemma 8. For all κ′, κ′′ ∈ (0, κm(L))2, µF (eh(κ′, L)) < µF (el(κ′′, L)).
Proof. By definition of κm(L), eh(κm(L), L) = eh(κm(L), L) so µF (eh(κm(L), L)) = µF (el(κm(L), L)).
By Lemma 7, µF (eh(κ, L)) is strictly increasing in κ so µF (eh(κ, L)) < µF (eh(κm(L), L)) for all
κ < κm(L) and µF (el(κ, L)) is strictly decreasing in κ so µF (el(κ, L)) > µF (el(κm(L), L)) for all
κ < κm(L), which proves the claim.
Lemma 9. Fixing the purge breadth, µF (el(κ, L)) and µS(eind(κ, L)) are decreasing in L.
Proof. A similar reasoning as in Lemma 1 yields:
∂µF (el(κ, L))
∂L=λ(1− λ)(1− el(κ, L; i))(1− el(κ, L; o))
(1− el(κ, L))2
[∂el(κ,L;o)
∂L
1− el(κ, L; o)−
∂el(κ,L;i)∂L
1− el(κ, L; i)
](23)
Given the definition of el(κ, L), it can be checked that ∂el(κ,L;i)∂L
> ∂el(κ,L;o)∂L
. Hence, ∂µF (el(κ,L))∂L
< 0.
Regarding the autocrat’s posterior after success, a similar reasoning as in Lemma 1 yields:
∂µS(eind(κ, L))
∂L=λ(1− λ)eind(κ, L; i)eind(κ, L; o)
eind(κ, L)2
[∂eind(κ,L;i)
∂L
eind(κ, L; i)−
∂eind(κ,L;o)∂L
eind(κ, L; o)
](24)
Using the definition of eind(κ, L) and Lemma 5, we obtain ∂eind(κ,L;τ)∂L
=√
(1− κ)(v+V2+L)− 1
2(v(τ)+V2(τ)+L)
(v+V2+L)3/2 ,
τ ∈ {i, o}. Therefore,∂eind(κ,L;τ)
∂L
eind(κ,L;τ)= 1
v(τ)+V2(τ)+L− 1
2(v+V2+L). Since v(i) > v(o) and V2(i) > V2(o),
this directly implies:∂eind(κ,L;i)
∂L
eind(κ,L;i)−
∂eind(κ,L;o)∂L
eind(κ,L;o)< 0 and ∂µS(eind(κ,L))
∂L< 0 as claimed.
25The proof of Lemma 7 highlights the general computation. Under the condition of the lemma, it can easily be
checked that µS(eind(κ, L)) = λ
√1−κ v(i)+V2(i)+L√
v+V2+L√1−κ√v+V2+L
= λ v(i)+V2(i)+L
v+V2+L, which does not depend on κ.
34
For the proof of Lemma 3, denote P F (κ;Lκ, e(κ, L)) = κβ[ri−µF (e(κ, L))]Di,o−c(κ) (P S(κ;L, κ, e) =
κβ[ri − µS(e(κ, L))]Di,o − c(κ)) the autocrat’s gross expected payoff from purging κ unsuccessful
(successful) party members when intensity of violence is L, average effort is e(·), and anticipated
purge breadth by party members is denoted κ. Notice that while in the autocrat’s maximiza-
tion problem, we take the purge breadth anticipated by party members as given, it is correct in
equilibrium. In what follows, recall that Lm = 1−v2− V2 and κm(L) = (1−v)2
4(V2+L).
Proof of Lemma 3
The proof proceeds in two steps. First, we consider the case when L ≤ Lm so there is a unique
feasible equilibrium level of effort for all κ (Lemma 5). Second, we consider the case when L > Lm.
Step 1: L ≤ Lm. From the reasoning in the text, the autocrat’s marginal benefit from purging a suc-
cessful (unsuccessful) party member isWS = [ri−µS(e∗(κ, L))]Di,o (WF = [ri−µF (e∗(κ, L))]Di,o).
Further, minκWF > maxκWS since µS(e∗(κ, L)) > λ > µF (e∗(κ, L)) for all κ. So whenever
c0 + c1κ < WS at κ = κ(L), then the equilibrium purge breadth satisfies κ∗(L) > κ(L) and the
purge is semi-indiscriminate. Inversely, if c0 + c1κ ≥ WS, then the equilibrium purge breadth
satisfies κ∗(L) ≤ κ(L) and the purge is discriminate.
It remains to consider two cases at κ = κ(L): (a) c0 + c1κ >WF and (b) c0 + c1κ ≤ WF . In case
(a), since WF is convex in κ due to the convexity of µF (e1(κ, L)) (Lemma 9), it must be that for
allWF intersects c0 + c1κ at most once. The equilibrium purge breadth then is the unique solution
to c0 + c1κ =WF if it exists which satisfies κ∗(L) < κ(L) and 0 otherwise. The purge is partially
discriminate as claimed. In case (b), if c0 + c1κ ≤ WF for all κ ≤ κ(L), then κ∗(L) = κ(L). To
see that, observe that there exist κ1(L), κ2(L) ∈ (0, κ(L)) (ignoring corners which only complicate
the exposition) such that c0 + c1κj(L) = β[ri− µF (e(κj(L), L))]Di,o for j ∈ {1, 2}. There are three
possible purge breadths then: κ1(L), κ2(L), and κ(L) = 1 − e∗(κ, L). Notice that the autocrat’s
equilibrium welfare satisfies: P F (κ1(L);L, κ1(L), e∗) = κ1(L)β[ri−µF (e(κ1(L), L))]Di,o−(c0κ1(L)+
c12
(κ1(L))2)
= c12
(κ1(L))2 (using the FOC above). Therefore, P F (κ1(L);L, κ1(L), e∗(κ1(L), L)) <
P F (κ2(L);L, κ2(L), e∗(κ2(L), L)) < P F (κ(L);L, κ(L), e∗(κ(L), L)). Given our equilibrium selec-
tion, the equilibrium purge breadth is κ∗(L) = κ(L) and the purge is fully discriminate then.
Step 2: L > Lm. Notice that at L = Lm and κ = κ(Lm) = κm(Lm), µF (el(κ, L)) = λ1−v(i)
2− v(i)−v
2−(V (i)−V2)
1−v2
since el(κ(L), L) = v + V2 + L = 1 − κ(L) = eh(κ(L), L), el(κ(L), L; i) = v(i) + V2(i) + L =
eh(κ(L), L; i), and Lm = 1−v2− V2. Under the assumption (Equation 5), c0 + c1κ(Lm) ≤ β[ri −
µF (el(κ(Lm), Lm))]Di,o = β[ri − µF (eh(κ(Lm), Lm))]Di,o. Assuming c0 + c1κ > WS at κ = κ(L),
35
we now show that the equilibrium purge breadth and effort satisfy respectively κ∗(L) = κ(L) and
e∗(κ(L), L) = eh(κ(L), L).
Observe that by Lemma 5, eh(κ(L), L) = 1 − κ(L) = v + V2 + L for all L > Lm (whereas
el(κ(L), L) < 1− κ(L)) and µF (eh(κ(L), L)) = λ1−v(i)−V2(i)−L1−v−V2−L
strictly decreasing with L. Hence for
all L > Lm, c0 + c1κ(L) < β[ri − µF (eh(κ(L), L))]Di,o. Further, P F (κ(L);L, κ(L), eh(κ(L), L)) is
strictly increasing with L. When e = eh(κ(L), L), the autocrat chooses κ = κ(L). (κ(L), eh(κ(L), L)
is thus part of a PBE. We now consider different cases.
Suppose that at L = Lm, for all κ ≤ κ(Lm), the following inequality holds c0 + c1κ ≤ β[ri −
µF (el(κ, L))]W2(i). By Lemma 9, µF (el(κ, L)) is strictly increasing in L. Therefore, c0 + c1κ <
β[ri − µF (el(κ, L))]W2(i) for all L > Lm and κ ≤ κm(L). The autocrat’s best response is thus to
purge all failures with average effort el(·) or eh(·) (given Lemma 8). Therefore, κ∗(L) = κ(L) and
e∗(κ(L), L) are the unique equilibrium solution since 1− el(κ, L) > κ for all κ ≤ κm(L).
Suppose now that there exists L ≥ Lm such that for L ≤ L, there exist κ1(L), κ2(L) ∈ (0, κm(L))2
such that c0 + c1κj(L) = β[ri − µF (el(κj(L), L))]Di,o for j ∈ {1, 2}. By Step 1, we obtain that:
P F (κ2(L);L, κ2(L), el(κ2(L), L)) = c1κ2(L)2
2> P F (κ1(L);L, κ1(L), el(κ1(L), L)). Since µF (el(κ, L))
is strictly decreasing with L (Lemma 9) and β[ri−µF (el(κ, L))]Di,o crosses c0+c1κ from below at κ =
κ2(L), κ2(L) is strictly decreasing with L and so is P F (κ2(L);L, κ2(L), el(κ2(L), L)). Using the fol-
lowing properties (i) at L = Lm, P F (κ2(L);L, κ2(L), el(κ2(L), L)) < P F (κ(L);L, κ(L), eh(κ(L), L))
and (ii) P F (κ(L);L, κ(L), eh(κ(L), L)) strictly increasing with L, we obtain that for all L > Lm,
P F (κ2(L);L, κ2(L), el(κ2(L), L)) < P F (κ(L);L, κ(L), eh(κ(L), L)). Given our equilibrium selection
criterion, κ2(L) cannot be an equilibrium purge breadth. Further, this implies el(κ, L) cannot be
an equilibrium effort when L > Lm.
Suppose now that there exists L > Lm such that for L ≥ L, there exist κ3(L) ∈ (κ(L), κm(L)] such
that c0 +c1κ3(L) = β[ri−µF (eh(κ3(L), L))]Di,o (observe that we consider average effort eh(κ, L)).26
Using a similar reasoning as in point (ii) of Lemma 7, it can be checked that µF (eh(κ, L)) is de-
creasing with L. Further, since β[ri − µF (eh(κ, L))]Di,o crosses c0 + c1κ from above, κ3(L) is
strictly decreasing in L. Hence, P F (κ3(L);L, κ3(L), el(κ3(L), L)) = c1κ3(L)2
2is strictly decreas-
ing with L. Using this property and (i) c1κ3(L)2
2≤ c1
κm(L)2
2, (ii) c1
κm(L)2
2strictly decreasing
with L by definition of κm(L), (iii) c1κm(Lm)2
2≤ P F (κm(Lm);Lm, κm(Lm), eh(κm(Lm), Lm)) (recall
κ(Lm) = κm(Lm)), and (iv) P F (κ(L);L, κ(L), eh(κ(L), L)) strictly increasing with L, we obtain
P F (κ(L);L, κ(L), eh(κ(L), L)) > P F (κ3(L);L, κ3(L), el(κ3(L), L)) for all L > Lm. Hence, by our
26L > Lm because the interval (κ(L), κm(L)] is empty otherwise.
36
equilibrium restriction, κ3(L) cannot be an equilibrium purge breadth.
By the reasoning above, for all L > Lm, the equilibrium purge breadth satisfies either (a) κ∗ = κ(L)
with e∗(κ(L), L) = eh(κ(L), L) = eind(κ(L), L) or (b) κ∗ > κ(L) with e∗(κ∗, L) = eind(κ∗, L). This
directly provides a proof of Claim 1.
Proof of Claim 1. By the reasoning above, when κ > κ(L), el(·) or eh(·) cannot be an equilibrium
effort. The equilibrium average effort is then necessarily e∗(κ, L) = eind(κ, L).
Finally, it remains to determine conditions under which κ∗(L) > κ(L). By a similar reasoning as
in point 1, if c0 +c1κ <WS at κ = κ(L), then the equilibrium purge breadth satisfies κ∗(L) > κ(L)
and the purge is semi-indiscriminate. Inversely, if c0 + c1κ ≥ WS, then the equilibrium purge
breadth satisfies κ∗(L) = κ(L) and the purge is fully discriminate. Notice that since by assumption
(Equation 5) WF ≥ c0 + c1κ at L = Lm and κ = κ(L), the purge is never partially discriminate as
claimed.
To summarize, using Lemma 3, the equilibrium average effort satisfies
e∗(κ(L), L) =v +κ(L)
1− e∗(κ(L), L)(V2 + L) (25)
=(v + 1)−
√∆(κ, L)
2if κ(L) < κ(L) (26)
with ∆(κ, L) = (v + 1)2 − 4(κ(V2 + L) + v)
e∗(κ(L), L) =v + V2 + L if κ(L) = κ(L) (27)
e∗(κ(L), L) =√
1− κ(L)
√v + V2 + L if κ(L) > κ(L) (28)
Proof of Corollary 1
By Lemma 3, the purge is semi-indiscriminate if and only if (i) ri > µS(e∗) = λv(i)+V2(i)+L
v+V2+Land (ii)
c1, c0 satisfy c1 <β(ri−λ
v(i)+V2(i)+L
v+V2+L
)Di,o2 −c0
1−v−V2−L. Notice that as b → 0, µS(e∗) → λ. Therefore, whenever
ri > λ, there exist b, c1, and c0 sufficiently small so that both conditions are satisfied.
In the remaining of the appendix, we assume that if there exists a solution to c0 + c1κ =
β[ri − µF (el(κ, L))]W2(i) for L < Lm, this solution is unique. This restriction is meant to simplify
the exposition of the proofs and are without loss of generality. We discuss below its implication
(see Remark 1). Notice that this implies that at L = Lm, c0 + c1κ ≤ β[ri − µF (el(κ, L))]W2(i) for
37
all κ ≤ κ(Lm).
To facilitate the exposition, we use subscript x to denote the partial derivative of some vari-
able z with respect to x (i.e., ∂z/∂x = zx) and a similar notation for the second partial deriva-
tive. To alleviate notation, we then denote µF (κ∗(L), L) := µF (e∗(κ∗(L), L)) and µS(κ∗(L), L) :=
µS(e∗(κ∗(L), L)). Finally, we ignore superscript and arguments whenever possible
Lemma 10. The equilibrium purge breadth satisfies: κ∗L(L) > 0 if (i) if κ∗(L) < κ(L) and (ii) if
κ∗(L) > κ(L).
Proof. Point (i). κ∗(L) is a solution to c0 + c1κ(L) = β[ri − µF (κ(L), L)]Di,o. By the Implicit
Function Theorem,
κL(L)(c1 + µFκ ) = −βµFLDi,o. (29)
By Lemma 9, µFL < 0. Since we consider the unique interior solution, it must be that c1 + µFκ > 0
(otherwise, the autocrat chooses κ∗(L) = κ(L)). Hence, the claim holds.
Point (ii). The equilibrium purge breadth is the solution to: c0 + c1κ(L) = β[ri−µS(κ(L), L)]Di,o.
Noting that µSκ = 0 by Lemma 7, we obtain:
κL(L)c1 = −βµSLDi,o, (30)
which directly implies the claim since µSL < 0 (Lemma 9).
Proof of Proposition 1
We first prove that equilibrium effort is increasing in the intensity of violence. We obtain:
de(κ(L), L)
dL= κL(L)eκ + eL (31)
When κ∗(L) < κ(L), by Lemma 6, eκ > 0. By Lemma 10, κL(L) > 0. By the proof of Lemma 9,
eL > 0. Hence de∗/dL > 0.
When κ∗(L) = κ(L), then e∗(κ∗(L), L) = v + V2 + L so de∗/dL = 1 > 0.
When κ∗(L) > κ(L), e∗(κ∗(L), L) =√
1− κ∗(L)√v + V2 + L. Hence,
eκ =− 1
2
√v + V2 + L√1− κ(L)
< 0
eL =1
2
√1− κ(L)√v + V2 + L
> 0
38
So
κL(L)eκ + eL =1
2
1√1− κ(L)
√v + V2 + L
(1− κ(L)− κL(L)(v + V2 + L)
)=
1
2
1√1− κ(L)
√v + V2 + L
(1− κ(L) + β
µSLW2(i)
c1
(v + V2 + L))
=1
2
1√1− κ(L)
√v + V2 + L
(1− κ(L)− β(1− λ)
W2(i)
c1
v + V2
v + V2 + L
)=
1
2
1√1− κ(L)
√v + V2 + L
(1− βW2(i)
c1
(ri − µS) +c0
c1
− β(1− λ)W2(i)
c1
v + V2
v + V2 + L
)=
1
2
1√1− κ(L)
√v + V2 + L
(1 +
c0
c1
− βW2(i)
c1
ri + βW2(i)
c1
v + V2 + λL
v + V2 + L
− β(1− λ)W2(i)
c1
v + V2
v + V2 + L
)=
1
2
1√1− κ(L)
√v + V2 + L
(1 +
c0
c1
− βW2(i)
c1
(ri − λ))
(32)
The first second lines comes from Equation 30, the third from µS(κ, L) = λv(i)+V2(i)+L
v+V2+Lso µSL =
−(1 − λ) v+V2
(v+V2+L)2 , the fourth from the definition of the equilibrium purge breadth, the fifth from
the definition of µS. Denote κ(L) := βW2(i)c1
(ri − λ) − c0c1
. Under the assumption (Equation 5),
κ(L) < 1 so κL(L)eκ + eL = 12
1√1−κ(L)
√v+V2+L
(1− κ(L)) > 0.
To see the second point of the proposition, recall that in a semi-indiscriminate purge (κ∗(L) > κ(L)),
1 − e∗ < κ∗. Further, κ(L) > κ∗ since µS > λ. This implies that 12
1√1−κ(L)
√v+V2+L
(1 − κ(L)) =
12
1− ˙κ(L)e∗
< 12. Since average effort increases at rate 1 in a fully discriminate purge, the claim
holds.
Proof of Proposition 2
By Proposition 1, 1−e∗(κ∗(L), L) is strictly decreasing with L. By Lemma 10, for all κ∗(L) < κ(L),
κ∗L(L) > 0. Under the assumption (Equation 5) and by Lemma 3, 1 − e∗(κ∗(Lm), Lm) ≤ κ∗(Lm).
Hence, there exists a unique Lfull ∈ [0, Lm] such that 1 − e∗(κ∗(L), L) ≤ κ∗(L) if and only if
L ≤ Lfull. By Lemma 10, κ∗L(L) > 0 for all L ≤ Lfull.
For all L ≥ Lfull, κ∗(L) ≥ κ(L). By Lemma 9, µS(κ∗(L), L) is strictly decreasing with L then.
Suppose c0 + c1κ(L) ≥ β[ri − µS(κ(L), L)]Di,o. In this case, it is never profitable for the autocrat
to purge from the success pool. Denote Lind := L. Otherwise, given µS is strictly decreasing with
L and κ(L) = 1− (v+ V2 +L), there exists a unique Lind ∈ (Lfull, L) such that κ∗(L) = κ(L) (i.e.,
the purge is fully discriminate) if and only if L ∈ [Lfull, Lind] and κ∗(L) > κ(L) (i.e., the purge is
39
semi-indiscriminate) if and only if L > Lind. By Lemma 10, κ∗L(L) > 0 for all L ≥ Lfull.
Finally for L ∈ [Lind, Lfull), κ∗(L) = κ(L) = 1− v − V2 − L so κ∗L(L) < 0.
Proof of Proposition 3
Point (i). For L < Lfull, the proportion of ideologues in the pool of survivors is:
S(L) =((1− e)− κ(L))µF + eµS
1− κ(L)
=λ− κ(L)µF
1− κ(L)(33)
Consider the function F (x) = λ−xµF1−x , its derivative is F ′(x) = −µF (1−x)+λ−µF x
(1−x)2 = λ−µF(1−x)2 > 0.
We obtain
dS(L)
dL=κL(L)F ′(κ(L))− (µFκ κL(L) + µFL)
κ(L)
1− κ(L)(34)
Since (µFκ κL(L) + µFL) < 0, F ′(x) > 0, κL(L) > 0 (Lemmas 9 and 10), we obtain dS(L)dL
> 0.
For L ≥ Lfull, the proportion of ideologues in the pool of survivors is
S(L) = µS (35)
Since µSL < 0 (Lemma 9), dS(L)dL
< 0.
Points (ii) and (iii). For L < Lfull, the proportion of ideologues in the party in the second period
is:
P(L) =κ(L)ri + ((1− e)− κ(L))µF + eµS
=κ(L)(ri − µF ) + λ (36)
Since κL(L) > 0 (Lemma 10), µFκ < 0 (Lemma 7), and µFL < 0 (Lemma 9), dP(L)dL
> 0.
For L ∈ [Lfull, Lind), the proportion of ideologues in the party in the second period is:
P(L) =(1− e)ri + eµS (37)
Given e = v+ V2 +L and µS(κ∗(L), L) = λ e∗(κ∗(L),L;i)e∗(κ∗(L),L)
= λv(i)+V2(i)+L
v+V2+L, we obtain: P(L) = (1− (v+
V2 + L))ri + λ(v(i) + V2(i) + L) and
dP(L)
dL= λ− ri (38)
40
PL(L) ≥ 0 ⇔ λ ≥ ri. Since in this case, Lind = L, point (ii) directly holds. To prove point
(iii), suppose that Lind < L (otherwise, the claim holds directly) and L ≥ Lind, the proportion of
ideologues in the party in the second period is:
P(L) =κ(L)ri + (1− κ(L))µS (39)
Using c0 + c1κ(L) = β[ri − µS]Di,o and Equation 30, we obtain:
dP(L)
dL=κL(L)(ri − µS) + (1− κ(L))µSL
=κL(L)(ri − µS)− (1− κ(L))c1κL(L)
βDi,o
=κL(L)
βDi,o
(β(ri − µS)Di,o
c1
− (1− κ(L))
)=κL(L)
βDi,o
(2β(ri − µS)Di,o
c1
− c0 + c1
c1
)(40)
Under the assumption (Equation 5), c0 + c1 ≥ βriDi,o. Therefore,
dP(L)
dL≤ κL(L)
βDi,oβ(ri − 2µS)Di,o
c1
Since µS > λ, the assumption ri ≤ 2λ guarantees that dP(L)dL
< 0.
Before determining the equilibrium intensity of violence, the next lemmas characterize some
properties of the benefit of violence—denoted B(L)—for the autocrat. Throughout we assume
that Lind < L. The reasoning extend to the case when Lind = L. We also use the shorthand
e(τ) := e∗(κ∗(L), L; τ). Recall that the probability a party member is purged is γ(κ(L), L) =
κ(L)1−e(κ(L),L)
after failure in a partially discriminate purge and ρ(κ(L), L) = 1− 1−κ(L)e(κ(L),L)
after success
in a semi-discriminate purge. Denote further W e2 = riW2(i) + (1− ri)W2(o) the expected payoff for
the autocrat from a new party member.
Lemma 11. The benefit of violence is continuous in L. It is strictly increasing in L for L ≤ Lind.
Proof. For L ≤ Lfull, The benefit of violence is:
B(L) =λ[e(i)(1 + βW2(i)) + (1− e(i))
(0 + γβW e
2 + (1− γ)βW2(i))]
+ (1− λ)[e(o)(1 + βW2(o)) + (1− e(o))
(0 + γβW e
2 + (1− γ)βW2(o))]− c(κ(L))
B(L) =e+ βλW2(i) + βκ(L)(ri − µF )W2(i)− c(κ(L)) (41)
41
The last line uses γ = κ(L)1−e so λ(1 − e(i))γ = κ(L)µF . Further, notice that under the assumption
that there is a unique solution to β(ri − µF )W2(i) − cκ(κ(L)) = 0, we obtain: limL↑Lfull
B(L) =
e+ βλe(i)W2(i) + β(1− e)riW2(i)− c(κ(L)), with κ(L) = 1− e and e = v + V2 + L.
Taking the derivatives, we obtain:
dB(L)
dL=κL(L)eκ + eL − βκ(L)(µFκ κL(L) + µFL)W2(i) + κL(L)
(β(ri − µF )W2(i)− c′(κ(L))
)(42)
Given that the purge is partially discriminate, the purge breadth satisfies: β(ri − µF )W2(i) −
cκ(κ(L)) = 0. By Lemmas 6, 7, 9, and Proposition 1, κL(L) > 0, eκ > 0, eL > 0, µFκ < 0, and
µFL < 0 so dB(L)/dL > 0.
For L ∈ (Lfull, Lind], the benefit of violence is:
B(L) =λ[e(i)(1 + βW2(i)) + (1− e(i))βW e
2
]+ (1− λ)
[e(o)(1 + βW2(o)) + (1− e(o))βW e
2
]− c(κ(L))
=e+ βλe(i)W2(i) + β(1− e)riW2(i)− c(κ(L)), (43)
with e = v+V2 +L and κ(L) = 1− e. Hence, limL↓Lfull
B(L) = limL↑Lind
B(L) = e+βλe(i)W2(i) +β(1−
e)riW2(i)− c(κ(L)).
Taking the derivative, we obtain:
dB(L)
dL=1 + β(λ− ri)W2(i) + c′(1− e) > 0 (44)
Finally, when L ≥ Lind, the benefit of violence is:
B(L) =λ[e(i)(1 + ρβW e
2 + (1− ρ)βW2(i)) + (1− e(i))βW e2
]+ (1− λ)
[e(o)(1 + ρβW e
2 + (1− ρ)βW2(o)) + (1− e(o))βW e2
]− c(κ(L))
=e+ βκ(L)riW2(i) + β(1− κ(L))µSW2(i)− c(κ(L)) (45)
The last line uses 1 − ρ = 1−κ(L)e
so λeiρ = (1 − κ(L))µS (by definition of µS). Recall that at
L = Lind, κ(L) = 1 − e, which implies e = v + V2 + L (see Equation 16). Thus limL↓Lind
B(L) =
e + β(1 − e)riW2(i) + βλe(i)W2(i) − c(κ(L)) = limL↑Lind
B(L), which completes the proof that B(L)
is continuous in L.
Taking the derivative, we obtain:
dB(L)
dL=κL(L)eκ + eL + β(1− κ(L))µSLW2(i) + κL(L)
(β(ri − µS)W2(i)− cκ(κ(L))
)=κL(L)eκ + eL + β(1− κ(L))µSLW2(i) (46)
The second line uses the fact we consider a semi-discriminate purge. We discuss the sign of (46)
below.
42
Lemma 12. The marginal benefit of violence satisfies:
limL↑Lfull
dB(L)
dL>dB(L1)
dL>dB(L2)
dLfor all L1 ∈ (Lfull, Lind] and L2 ∈ (Lind, L]
Proof. The proof of a discontinuity in the marginal benefit at L = Lfull proceeds in three steps.
First, we show that eL > 1 as L ↑ Lfull. Second, we show that −κ(L)µFL > λ − µF as L ↑ Lfull.
Finally, we show that BL(L) = 1 + β(λ− µF )W2(i) as L ↓ Lfull.
Step 1. By Equation 13, e = v + γ(V2 + L). We thus obtain:
eL = γL(V2 + L) + γ (47)
γL = eLκ(L)(1−e)2 > 0 and γ = κ(L)
1−eL↑Lfull−−−−→ 1 so eL > 1.
Step 2. Using the definition of µF and κ(L)L↑Lfull−−−−→ (1− e) to obtain:
−κ(L)µFL =(1− e)λeL(i)(1− e)− eL(1− e(i))(1− e)2
=λeL(i)− µF eL
=γ(λ− µF ) + γL(λ(v(i) + V2(i) + L)− µF (v + V2 + L)
)The third line comes from Equation 47. As γ = 1, γL > 0, and λ > µF , we obtain−κ(L)µFL > λ−µF
as claimed.
Using Equation 42, notice that Steps 1 and 2 imply that dB(L)dL
> 1 + β(λ− µF )W2(i) at L = Lfull
(all the other terms are positive).
Step 3. As L ↓ Lfull, c′(κ(L)) = β(ri − µF )W2(i). Hence, we can rewrite Equation 44 as (slightly
abusing notation by using equalities):
dB(L)
dL=1 + β(λ− ri)W2(i) + β(ri − µF )W2(i)
=1 + β(λ− µF )W2(i)
This directly implies limL↑LfulldB(L)dL
> limL↓LfulldB(L)dL
. Using Equation 44, we also obtain:
d2B(L)
dL2= −c′′(1− e) < 0 (48)
This directly implies limL↑LfulldB(L)dL
> dB(L1)dL
for all L1 ∈ (Lfull, Lind].
We now consider the discontinuity in the marginal benefit around Lind. By Equation 44, dB(L1)dL
> 1
for all ∈ (Lfull, Lind]. By Equation 46, dB(L2)dL
< 1 for all L2 > Lind since κL(L)eκ + eL < 1
(Proposition 1) and the other term is negative.
43
Lemma 13. The benefit of violence is strictly convex in L for L ≤ Lfull.
Proof. The proof proceeds in four steps. First, we compute the second (total) derivative of the
marginal benefit of violence. Second, we look at the second (partial) derivatives of effort and
autocrat’s posterior with respect to L and κ. Third, we show that the equilibrium purge breadth
is convex in L. The last step proves the claim.
Step 1. Using Equation 42, we obtain:
d2B(L)
dL2=κLL(L)eκ + κL(L)eκκ + (κL(L) + 1)eκL + eLL − βκLL(L)(µFκ κL(L) + µFL)W2(i)
− βκL(L)(κLL(L)µFκ + κL(L)µFκκ + (κL(L) + 1)µFκL + µFLL)W2(i)
+ κLL(L)(β(ri − µF )W2(i)− cκ(κ(L))
)+ κL(L)
(β(−κL(L)µFκ − µFL)W2(i)− κL(L)cκκ(κ(L))
)(49)
Step 2. Using µF = λ1−e(i)1−e , we obtain for j ∈ {κ, L}:
µFjj =λ(1− λ)(1− e(i))(1− e(o))
(1− e)3×
[(ejj(o)
(1− e(o))− ejj(i)
(1− e(i))
)(1− e) + 2ej
(ej(o)
1− e(o)− ej(i)
1− e(i)
)](50)
µFκL =λ(1− λ)(1− e(i))(1− e(o))
(1− e)3×
[(eκL(o)
(1− e(o))− eκL(i)
(1− e(i))
)(1− e) + eL
(eκ(o)
1− e(o)− eκ(i)
1− e(i)
)]
+λ
(1− e)3
[eL(eκ(o)(1− e(i))− eκ(i)(1− e(o))) + (1− e)(eκ(i)eL(o)− eL(i)eκ(o))
]
=λ(1− λ)(1− e(i))(1− e(o))
(1− e)3×
[(eκL(o)
(1− e(o))− eκL(i)
(1− e(i))
)(1− e) + eL
(eκ(o)
1− e(o)− eκ(i)
1− e(i)
)]
+λ(1− λ)
(1− e)3
[eκ(o)
(eL(1− e(i))− eL(i)(1− e)
)+ eκ(i)
(eL(o)(1− e)− eL(1− e(o))
)]
µFκL =λ(1− λ)(1− e(i))(1− e(o))
(1− e)3×
[(eκL(o)
(1− e(o))− eκL(i)
(1− e(i))
)(1− e) + eL
(eκ(o)
1− e(o)− eκ(i)
1− ei
)]
+λ(1− λ)(1− e(i))(1− e(o))
(1− e)3
[((1− λ)eκ(o) + λeκ(i))
(eL(o)
(1− e(o))− eL(i)
(1− e(i))
)](51)
Using e(τ) = v(τ) + γ(V2(τ) + L), we obtain:
eκκ(τ) =γκκ(V2(τ) + L) (52)
eLL(τ) =γLL(V2(τ) + L) + 2γL (53)
eκL(τ) =γκL(V2(τ) + L) + γκ (54)
44
Using the comparative statics on e), we get:
γκκ =eκ
(1− e)2+
(eκκκ(L) + eκ)(1− e) + 2(eκ)2κ(L)
(1− e)3
γLL =eLLκ(L)(1− e) + 2(eL)2κ(L)
(1− e)3
γκL =(eκLκ(L) + eL)(1− e) + 2eκeLκ(L)
(1− e)3
By Lemma 6 (given L ≤ Lfull, κ∗(L) ≤ κ(L)), eκ ≥ 0 and eκκ ≥ 0. By a similar reasoning,
eL > 0, eLL > 0, and eκL > 0 (by definition of e∗(·), see Equation 26), therefore γκκ, γLL, and γκL
are all strictly positive. Since γL and γκ are also positive, it implies that the quantities defined
in Equation 52-54 are all strictly positive. Further, given V2(i) > V2(o), ejm(i) > ejm(o) for all
j,m ∈ {κ, L}2. Since e(i) > e(o), this implies that µFjj < 0, j ∈ {κ, l} and µFκL < 0.
Step 3. Using Equation 29 and using the Implicit Function Theorem, we obtain (using cκ(κ) =
c0 + c1κ:
κLL(L)(c1 + µFκ ) = β(−µFκκκL(L)− µFκL(κL(L) + 1)− µFLL)Di,o (55)
By the properties of the equilibrium (partially discriminate) purge breadth, c1 + µFκ > 0. By the
reasoning above and κL(L) > 0 (Proposition 2), the right-hand-side of (55) is strictly positive.
Hence κLL(L) > 0.
Step 4. Using Equation 29 and the definition of κ∗(L), we can rewrite Equation 49 as:
d2B(L)
dL2=κLL(L)eκ + κL(L)eκκ + (κL(L) + 1)eκL + eLL − βκLL(L)(µFκ κL(L) + µFL)W2(i)
− βκL(L)(κLL(L)µFκ + κL(L)µFκκ + (κL(L) + 1)µFκL + µFLL)W2(i)
By Steps 2 and 3, it can be checked that d2B(L)dL2 > 0 as claimed.
Proof of Proposition 4
Existence follows from the fact that B(L) is continuous (Lemma 11) and the maximization problem
is over a compact set [0, L]. We now look at various cases using the computation of the marginal
benefit of violance (Lemma 11) and the assumption that the marginal cost of violence is ζ0 + ζ1L.
Case 1. Suppose at L = Lfull, ζ0 +ζ1L ≥ 1+βDi,o(λ−µF ) (point (i)). Then by Lemma 12, it must
be that ζ0 +ζ1L > B(L) for all L ≥ Lfull. Hence, L∗ < Lfull. Since B(L) is convex over the interval
[0, Lfull], we need to consider two cases at L = Lfull: (a) ζ0 + ζ1L ≥ dB(L)dL
and (b) ζ0 + ζ1L <dB(L)dL
.
In case (a), the solution is unique and equals to either 0 or the unique solution to ζ0 + ζ1L = dB(L)dL
.
In case (b), there may be two solutions to ζ0 + ζ1L = dB(L)dL
, call them LMax ≥ 0 and Lmin with
45
LMax < Lmin and LMax (Lmin) a local maximum (minimum).27 There is also a corner solution at
L = Lfull. The autocrat then compares B(LMax) and B(Lfull) and there generically exists a unique
solution unless B(LMax) = B(Lfull).
For the next case, notice that using Equation 32 and Equation 28, we can rewrite for all L > Lind,
dB(L)
dL=
1
2
1
e
(1 +
c0
c1
− βDi,o
c1
(ri − λ))
+ (1− κ(L))µSLβW2(i)
=1
2
1
e
(1 +
c0
c1
− βDi,o
c1
(ri − µS) + βDi,o
c1
(µS − λ))
+ (1− κ(L))µSLβW2(i)
1
2
1
eβDi,o
c1
(µS − λ) + (1− κ(L))( 1
2e+ µSLβW2(i)
)(56)
Further, by Equation 56 and taking the limits L ↓ Lind, we obtain given 1−κ(L)→ e, e→ v+V2+L,
and µS = λ e(i)e
(slightly abusing notations by using equalities):
dB(L)
dL=
1
2
1
eβDi,o
c1
(µS − λ) +1
2− eλe− e(i)
e2 W2(i)
=1
2
(1 + β
Di,o
ec1
(µS − λ)
)+ βDi,o(λ− µS) (57)
The last line comes from W2(o) = 0.
Case 2. Suppose at L = Lfull, ζ0 +ζ1L < 1+βDi,o(λ−µF ) and that maxL∈(Lind,L]
ζ0 +ζ1L− dB(L)dL
< 0.28
In this case, by Lemma 12, the equilibrium intensity of violence is unique and is the solution to
ζ0 + ζ1L = 1 + βDi,o(λ− µF ) if it exists or L = Lind otherwise.
Case 3. Suppose at L = Lind, ζ0 + ζ1L < maxL∈(Lind,L]
dB(L)dL
. The equilibrium intensity is L = Lind if
the equation ζ0 + ζ1L = dB(L)dL
, with dB(L)dL
defined by Equation 46, has no solution. The equilibrium
intensity is the solution to ζ0 + ζ1L = dB(L)dL
if it is unique. It is either the smallest solution to
ζ0 + ζ1L = dB(L)dL
or L = L if there are multiple solutions.
Finally, notice that the condition of point (ii) of the proposition is is contained in Case 3 (using
Equation 57) so L∗ > Lind.
Lemma 14. If µS < ri at L = Lind (assuming Lind < L), there exists c1(ri, c0) > 1/2 such that the
marginal benefit of violence is strictly positive for L > Lind whenever c1 < c1(ri, c0).
Proof. Using Equation 28 and Equation 32, for all L ≥ Lind, we can rewrite Equation 46 as (recall
W2(o) = 0 so Di,o = W2(i)):
dB(L)
dL=
1
2
1
e
(1 +
c0
c1
− βW2(i)
c1
ri
)+ βW2(i)
( λ
2ec1
+ (1− κ(L))µSL
)(58)
27If there is a unique solution, it is necessarily Lmin, i.e., the local minimum.
28Notice that we do not compute the sign of d2B(L)dL2 for L > Lind. Simulations suggest that the sign is ambiguous.
However, it is not critical for our argument.
46
By assumption, 1 + c0c1− βW2(i)
c1ri > 0 so we focus on the second parenthesis.
λ
2ec1
+ (1− κ(L))µSL =λ
2ec1
− (1− κ(L))(1− λ)v + V2
(v + V2 + L)2
=λ
e
(1
2c1
− (1− λ)1− κ(L)
v + V2 + L
√1− κ(L)
v + V2 + L(v(i) + V2(i))
)
Denote Γ(c1) = 12c1− (1 − λ) 1−κ(L)
v+V2+L
√1−κ(L)
v+V2+L(v(i) + V2(i)). Γ(c1) is strictly decreasing with c1,
strictly negative as c1 → ∞, strictly positive as c1 → 0. Hence there exists a unique solution,
denoted c1(ri, c0;L), such that Γ(c1) > 0 for all c1 < c1(ri, c0;L) (dependence on ri and c0—
and other parameter values—follows from observation of Equation 58). The solution satisfies
c1(ri, c0;L) > 1/2. To see this, notice that√
1−κ(L)
v+V2+L(v(i) + V2(i)) < ei ≤ 1 (by (28) properly
arranged) and 1−κ(L)
v+V2+L= (1− ρ)2 < 1 (using the definition of ρ and (28)). Hence Γ(1/2) > 0.
Define c1(ri, c0) := minL∈[Lind,L]
c1(ri, c0;L). By the reasoning above, c1(ri, c0) > 1/2. It can then
be checked that there exists a non-empty set of parameter values such that c1 < c1(ri, c0) and
Equation 5 are simultaneously satisfied.
Proof of Corollary 2
We just prove necessity. Sufficiency follows from the usual argument.
Denote ri := µS(κ(L), L). Using the proof of Proposition 2 and µS(·) decreasing with L (Lemma
9), if ri ≤ ri, Lind = L for all c0, c1 and a purge is never semi-indiscriminate. So ri > ri is a
necessary condition. This is condition 1.
Supposing condition 1. holds, define c0(ri) = β[ri − µS(κ(L), L)]Di,o evaluated at L = L. If
c0 ≥ c0(ri) then for all c1 > 0, the purge cannot be semi-indiscriminate as the marginal cost is
always greater than the marginal benefit. Given c0 < c0(ri), define c1(ri, c0) such that at L = L,
c0 + c1κ(L) = β[ri − µS(κ(L), L)]Di,o. Similarly, if c1 ≥ c1(ri, c0), a purge can never be semi-
indiscriminate. Using Lemma 14, denote c1(ri, c0) = min{c1(ri, c0), c1(ri, c0)} (assuming the upper
bound in Equation 5 does not bind, otherwise the condition can be arranged appropriately). This
is condition 2.
Finally define ζ0(ri) := maxL∈[Lind,L]
dB(L)dL
(by Lemma 14 and condition 2 above dB(L)dL
> 0 over this
range). If ζ0 ≥ ζ0(ri), for all ζ1 > 0, the marginal cost of investing in violence is such that
the autocrat never chooses L > Lind. Assuming dB(L)dL
is weakly decreasing over [Lind, L], denote
ζ1(ri, ζ0) :=dB(L)dL−ζ0
Lat L = Lind. If ζ1 ≥ ζ1(ri, ζ0), then the equilibrium intensity of violence
satisfies L∗ ≤ Lind and the purge is not semi-indiscriminate. If dB(L)dL
is not weakly decreasing
47
over [Lind, L], define L0 := arg maxL∈[Lind,L]
dB(L)dL−ζ0
L(assuming uniqueness). ζ1(ri, ζ0) then must satisfy
ζ1(ri, ζ0) ≤dB(L)dL−ζ0
Lat L = L0 and B(L)− ζ(L) ≥ B(Lind)− ζ(Lind). This is condition 3.
Proof of Proposition 5
The procedure is as such. Step 1: Pick (λ′, r′i, b′, c′0, ζ
′0) ∈ [0, 1]× [0, 1]×R3
+. Step 2: Check whether
there exists cd1 satisfying Equation 5 and ζd1 ∈ R+ such that (i) there exists a local maximum of
B(L)−ζ(L) in [0, Lfull], denoted LMax as in Proposition 4 and (ii) B(LMax) = B(Lfull) (notice that
cd1 and ζd1 are unique if they exist). Step 3: If conditions (i) and (ii) hold then (λ′, r′i, b′, c′0, ζ
′0) ∈ Pd,
if not (λ′, r′i, b′, c′0, ζ
′0) /∈ Pd. Repeat the steps for all possible (λ, ri, b, c0, ζ0). Pd is non-empty as
we can always pick c1 such that a fully discriminate purge is possible and ζ0 and ζ1 such that
conditions (i) and (ii) hold by convexity of the marginal benefit (Lemma 13). Pd is not measure 0
as we can always perturbate the parameters slightly and adjust ζ0 and ζ1. Due to the convexity of
the marginal benefit of violence and conditions (i) and (ii), the claim holds directly.29
Remark 1. Supposer there exist multiple solutions to c0 + c1κ = β[ri − µF (el(κ, L))]W2(i) for
L ≥ L, L < Lm. Then B(L) is not continuous in L, but there exists a generically unique equilibrium
intensity of violence.
Proof. Denote κ1(L) the minimum solution to c0 + c1κ = β[ri−µF (el(κ, L))]W2(i). By the proof of
Lemma 3, for all L < L, κ∗(L) = κ1(L) < κ(L), but there exists L1 > L such that for all L ∈ [L, L1],
κ∗(L) = κ(L). Hence, the purge breadth and, consequently, B(L) are not continuous in L. Further,
limL↑L
B(L) < B(L). It can be checked that all the other properties of B(L) described in Lemmas
11-14, Propositions 4 and 5, Corollary 2 hold. This implies that all proofs carry through, but the
proof of existence and uniqueness of an equilibrium intensity of violence. It needs to be amended as
such. Suppose there exists L′ such that ζ0 + ζ1L′ = dB(L′)
dLand L′ < L. If there exists L′′ > L such
that ζ0 + ζ1L′′ = dB(L′′)
dL, then the equilibrium intensity of violence satisfies L∗ = arg max
L∈{L′,L′′}B(L)
and is generically unique. If there is no such L′′, then the equilibrium intensity of violence satisfies
L∗ = arg maxL∈{L′,L}
B(L) and is generically unique.
29It is important to observe that for all (λ, ri, b, c0, ζ0) ∈ Pd, the condition described in the text of the proposition
is knife-edge. However, the properties of Pd indicate that this knife edge condition can arise for a non-trivial set of
parameter values.
48