Multi Party

download Multi Party

of 18

Transcript of Multi Party

  • 8/12/2019 Multi Party

    1/18

    American Political Science Review Vol. 93, No. 1 March 1999

    Statistical Model for Multiparty Electoral ataJONATHAN N KATZ University of ChicagoGARY KING Haward University

    propose a com prehensive statistical mod el for an alyzing multiparty, district-level elections. Th ismod el, which provides a tool for comparative politics research analogous to that w hich regressionanalysis provides in the American two-party context, ca n be used to explain or predict ho wgeographic distributions of electoral results depend upon economic conditions, neighborhood ethniccompositions, campaign spending, and other features of th e election camp aign or aggregate areas. W e alsoprovide new graphical representations for data exploration, mod el eva luation, and substantive interpretation.W e illustrate the use of this m ode l by attempting to resolve a controversy over the size of and trend in theelectoral advantage of incum benc y in Britain. Contraiy to previous analyses, all based o n measures no wkno wn to be biased, we demonstrate that the advantage is small bu t meaningfkl, varies substantially acrossthe parties, and is not growing. Finally, we show h ow to estimate the party from which each party s advantageis predominantly drawn.

    propose the first internally consistent statis-tical model for analyzing multiparty, district-level aggregate election data. Our model canbe applied directly to explain or predict how thegeographic distribution of electoral results dependsupon economic conditions, neighborhood ethnic com-positions, campaign spending, or other features of theelection campaign or characteristics of the aggregateareas. We also provide several new graphical represen-tations for help in data exploration, model evaluation,and substantive interpretation.1Our general model is intended to address threeserious lacunae in the study of comparative politics.First, most literatures focusing on non-American elec-tions are dominated by survey data alone rather thanalso including studies of real election results.2 SurveyJonathan Katz is Assistant Professor of Political Science, Universityof Chicago, 5828 South University Avenue, Chicago, IL 60637([email protected]). Gary King is Professor of Government,Harvard University, Cambridge, MA 02138 ([email protected],http://GKing.Harvard.edu).

    n earlier version of this paper was presented at the 1997 meetingsof the Midwest Political Science Association, Chicago, and won thePi Sigma lpha Award for the best paper presented there. Ourthanks to Jim Alt, Larry Bartels, Neal Beck, Gary Cox, Nick Cox, MoFiorina, M. F. Fuller, Dave Grether, Mike Herron, James Honaker,Chuanhai Liu, Ken Scheve, Ken Shepsle, and Bob Sherman forhelpful suggestions; Josh Tucker for useful suggestions and researchassistance; Gary Cox for his British election data; and Selina Chenfor her help and expertise in collecting additional British data. BurtMonroe saw the virtues of the compositional data analysis literatureat essentially the same time as we did, and we appreciate hiscomments. For research support, Jonathan N. Katz thanks theHaynes Foundation, and Gary King thanks the National ScienceFoundation (SBR-9729884), the Centers for Disease Control andPrevention (Division of Diabetes Translation), the National Insti-tutes on Aging, the World Health Organization, and the GlobalForum for Health Research. All information, data, and softwarenecessary to replicate the results in this article are available in areplication data set to be deposited in the ICPSR's PublicationRelated Archive upon publication (ICPSR PRA 1190).Our model can also be used to evaluate features of electoralsystems, such as whether the districting system is fair to all thepolitical parties and electorally responsive, although we leave detailsof this task to future work.A few examples of the good survey analyses conducted in multi-party democracies include those in England (e.g., Goodhart and

    research has enormous advantages for studying indi-vidual-level preferences, but as analyses of randomselections of isolated individuals from unknown geo-graphical locations, they necessarily miss much ofelectoral politics. As such, they are often best comple-mented with studies of aggregate electoral returns.Second, with surprisingly few exceptions, electoralanalyses in comparative politics based on real electionreturns use national rather than regional, district, orprecinct-level data.3 This approach has the advantageof allowing more countries to be included in theanalysis without much data collection effort, but it alsohas serious disadvantages. Aggregate national-levelstudies prevent researchers from learning where votescome from and why, and they generally result in studiesbased on small numbers of observations and littlevariation on many relevant dimensions. Studies ofpostwar OECD countries usually contain only about adozen observations see Paldam 1991,18, Table I), andanalvses of former communist countries could includeonlithree or four free elections. This is often insuffi-cient information with which to parse out many of theinteresting effects, and it ignores the substantial infor-mation content in the often vast differences acrossdifferent regions of a country.Finally, the vast majority of electoral studies inmultiparty democracies dichotomize the electoral sys-tem into a pseudo-two-party contest. Researchers an-alyze the vote for the incumbent party versus all othersgrouped together, or the vote for a-particular group,such as left-wing parties, versus the combination of allBhansali 1970), Mexico (Dominguez and McCann 1996), Poland(Przeworski 1996), Peru (Stokes 1996), Russia (White et al. 1997),Denmark (Miller and Listhaug 1985), Italy (Bellucci 1984), and WestGermany (Frey and Schneider 1980) as well as multicountry studies(Lewis-Beck 1988), among many others.Such national-level studies have used data from France (Rosa andAmson 1976), Japan (Inoguchi 1980), England (Whiteley 1980), andItaly (Bellucci 1984); time-series of cross-sectional data from multi-ple countries (Host and Paldam 1990; Paldam 1986,1991; Powell andWhitten 1993); and single cross-sections of multiple countries(Lewis-Beck and Bellucci 1982), among many others. Subnationalanalyses include Bellucci (1984, 1991), Conford, Dorling, and Tether(1995), Rattinger (1991), Slider (1994).

  • 8/12/2019 Multi Party

    2/18

    A Statistical Model for Multiparty Electoral Data March 1999

    the others. This procedure has the advantage of en-abling the use of standard statistical methods, but sincethese methods are best applied to the study of two-party systems (largely in U.S. data), two serious prob-lems result: bias and information loss. The procedure isbiased whenever all parties do not field candidates inevery election district. For example, even when thegoverning party contests every election, different num-bers of parties composing the other category willgenerally have large effects on a variable such as thepercentage of the vote for the governing party. Becausethe vote a party expects to receive will normally berelated to whether it runs a candidate, the observedvariable will systematically overstate the true underly-ing support for the governing party when its truesupport is highest. For example, when the governingparty is expected to do so well that it scares offopposition parties from running their own candidates,the fraction of the vote actually received by the gov-erning party will exceed its true support in the elector-ate. This, and other similar problems, can combine toinduce severe bias in inferences based on such data.Moreover, even if partially contested elections hap-pen to cause no bias in a particular case, importantinformation, critical to comparative politics, is alwayslost by these methods. For example, when the eco-nomic pain caused by promarket reforms in postcom-munist countries results in the reformers being thrownout of office (Przeworski 1991), or when the increasingsalience of ethnic divisions upsets the political order(Horowitz 1985, 1993; Offe 1992; Tucker 1996), whichparties benefit? How do the electoral fortunes of eachof the parties depend on the degree of economichardship or ethnic divisions? Answering these ques-tions about multiparty systems requires statistical mod-els that permit multiparty outcomes. Shoehorning acomplex multiparty democracy into a fake two-partysystem in order to perform an analysis that looks likethose conducted in American politics takes the wronglessons from that subfield. Making methodologicaldecisions merely to accommodate the requirements offamiliar statistical methods risks missing the mostdistinctive and interesting aspects of the electoralsystem under analysis. The bottom line is that multi-party systems require the development of multipartystatistical models. It would appear that much substan-tive knowledge can be gained, and bias reduced, bydesigning models of electoral systems with the specialfeatures of these systems in mind.

    Although we intend our model to be applicable to awide variety of multiparty electoral data, we apply ithere to resolve one important scholarly controversy:the size of and trend in the electoral advantage ofincumbency in the United Kingdom. For decades, theconventional wisdom has been that U.K. incumbencyadvantage is small to nonexistent and not increasing,but this conclusion has come under strong attackrecently by researchers whose results seem to show thatincumbency advantage is moderate to large and grow-ing fast. Unfortunately, it turns out that all estimatesgiven in the literature are based on measures now

    known to be biased. In partial agreement and disagree-ment with the substantive results from both sides, wedemonstrate that the incumbency advantage is smallbut markedly different for each of Britain's three majorparties. Our methods also provide information thatothers have not attempted to estimate, such as theparty from which each party's incumbency advantage isprimarily drawn. The study of incumbency advantage inthe United States has been greatly enhanced by aquarter-century of scholarly work that has increasedthe precision and accuracy of estimates in this two-party system, and we hope a similar gain will result inelectoral studies of multiparty democracies, such as theUnited Kingdom.From a methodological perspective, analyses of ag-gregate electoral data fall into two fundamental cate-gories that should be carefully distinguished-contex-tual effects and ecological inferences. Researchquestions about the relationships; among aggregatevariables require a model of contextual effects, such asthat offered here (or, e.g., Huckfeldt and Sprague1993). In contrast, research questions about the char-acteristics of the individuals who make up aggregateelectoral data require ecological inferences and thespecial models designed for this purpose (see King1997). For example, a study of the effect on the vote formore liberal parties of having a college in town is acontextual effect, for which the model we propose isdirectly useful. In contrast, using aggregate electoraldata to study whether college students are more likelythan others to vote for liberal political parties requiresan ecological inference. Our statistical model applies toquestions at the district level, such as incumbencyadvantage estimates or predicting which candidate willwin. Formal models of individual behavior may be ofinterest for some purposes, but they are not alwaysnecessary in cases like these. For much of our discus-sion, we make the common assumption that candidatesare strategic and well informed. Our working (testable)assumption about voters is that, conditional on thecandidates, their voting behavior follows regular pat-terns of some sort.In the sections that follow, we briefly describe ourmotivating substantive problem, discuss the more gen-eral characteristics of multiparty electoral data, sum-marize the problems that need to be resolved in orderto develop a general statistical model for these data,introduce a simple version of the model for cases inwhich all parties contest, introduce assumptions to dealwith partially contested elections, and show how toestimate the model and compute quantities of interestfrom these estimates. We present substantive results asthe model is developed.

    INCUMBENCY DV NT GE INGRE T BRIT INUntil the 1980s, scholars generally agreed that Britishelections were decided by national and not local forces.The electoral advantage of incumbency was thought to

  • 8/12/2019 Multi Party

    3/18

    American Political Science Review Vol. 93. No. 1

    be essentially nonexistent in Britain.4 For example,Butler and Stokes (1969, 6) repeatedly emphasize theimportance of national political issues and events asopposed to more local influences on the choice of theindividual elector. They even go so far as to conclude(p. 8) that so important are the [national] parties ingiving meaning to contests in the individual parliamen-tary constituencies in Britain that for many voterscandidates have no identity other then their partisanone (see also Butler and Kavanagh 1980, 292).The few numerical estimates of incumbency advan-tage in Britain come from a newer liteiature thatcontradicts this conventional wisdom. For example,Curtice and Steed (1980, 1983) find the sophomoresurge for Labour-the average difference in the votefor Labour in open seats it wins and the vote in thesubsequent election for the now-incumbent Labourparty candidate-was about 1,500 votes (about 3.8%)in the 1979 and 1983 elections. Norton and Wood(1990) modify sophomore surge by correcting for re-gional swings and find a surge of 1.6% for the Conser-vatives and a remarkable 7.8% for Labour. Finally,Wood and Norton (1992), along with the prominentsurvey-based analyses of Cain, Ferejohn, and Fiorina(1987), also strongly argue for the proposition thatincumbency advantage is increasing.Although the idea of sophomore surge, on whichmost measures are based, is very intuitive, Gelman andKing (1990) prove that it gives biased estimates of thecausal effect of incumbency. In fact, in Britain, theproblem may actually be worse than in two-partysystems: Because most British elections have very fewsophomores (usually only about two dozen), the mea-sure discards more than 95% of the district observa-tions and is therefore exceptionally inefficient. More-over, the measure is usually applied without controlsand without any feature of the statistical model whichrecognizes that the system being analyzed has morethan two parties.Thus, we have .on one side the conventional wisdom,based on many years of traditional analyses, thatincumbents have no electoral advantage. On the otherside, we have a growing systematic quantitative litera-ture which argues that the incumbency advantage ismoderate to large and steadily growing. We hope toresolve this scholarly dispute.Our data for this article include constituency-levelelection results from England for 1959 to 1992. We alsohave more limited data from the 1955 elections, whichwe use whenever possible. Geographic districts inEngland are called constituencies, but we use the twowords interchangeably because our model appliesmore generally. For convenience, we usually refer to

    The concept of the personal vote is used in Britain to include anylocal, candidate-specific effects, but for empirical analyses it isnormally treated synonymously with incumbency advantage. In theUnited States the personal vote is considered to be the fraction of theincumbent's advantage attributable to the person rather than theparty. For this article, we use only the total effect and refer to it as theincumbency advantage, as is most common in the Americanpolitical science literature (see Gelman and King 1990 for a formaldefinition).

    the Liberal Party and its alliance with the SocialDemocratic Party more simply as the Alliance.5

    CH R CTERISTICS OF MULTIP RTY D TWe now identify the statistically important character-istics of multiparty electoral data. We first do so insimple algebraic terms and then translate the algebrainto a useful graphic display.Let Vij denote the proportion of the yote (theunderline denotes our mnemonic labeling convention)in district i (i = 1, . . n) for party j j 1, J .Two fundamental features of multiparty voting dataare that each proportion falls within the unit interval

    V-,. E [0, 11 for all i and j, 1)and the set of vote proportions for all the parties in adistrict sum to one:

    Vij= 1 for all i. 2)j=1

    Thus, an important criterion of a good (and logicallypossible) statistical model of multiparty voting data isthat it satisfies the constraints in equations 1 and 2.Variables that meet these constraints fall in a regiongenerally referred to as the simplexWe now illustrate this simplex sample space graph-ically for the two- and then the three-party case. Foreach case, we apply a simple trick to reduce the numberof dimensions required, making the graphical presen-tation more manageable and ultimately informative,without losing information. The graphic version ofthese relationships will also be useful for exploring thedata and understanding the model fit.For the two-party example, we use ViD for theDemocratic and ViR for the Republican shares of thevote in district i for candidates for the U.S. House ofRepresentatives. Obviously, we can easily representboth variables by just one, say, ViD, since the other ismerely ViR = 1 - ViD. Figure 1A plots ViD by ViR.Because of the constraint in equation 2, all district votefractions fall on a single line segment, and due to theconstraint in equation 1, the line ends at the axes. Thus,all the points in the two-dimensional plane in Figure1A fall on a simpler one-dimensional line segment.Presenting this line segment in Figure 1B reduces theproblem from two to one dimension without losing any

    information.Figure 2 provides analogous information for threeparties. Figure 2A plots in three dimensions the threevariables from the British electoral system, Vie, ViL,and Vd, for the Conservative, Labour, and Alliancevote proportions, respectively. The constraints in equa-We began with the data set British General Elections, 1955-1992,(version 8, August 1993), constructed by D.F.L. Dorling of theUniversity of Newcastle, extracted data from England only, updatedit, and added information on incumbency status. The number ofobservations in our data for each election year are: 1955, 460; 1959,460; 1964 , 455; 196 6, 488; 1970 , 471; 197 4 (Feb.), 463; 1974 (Oct.),491; 19 79, 467; 1983, 491; 1987, 522; and 1992, 521.

  • 8/12/2019 Multi Party

    4/18

    A Statistical Model for Multiparty Electoral Data March 1999

    FIGUR 1 The Sim~lexor Two PartiesA 2 Parties 2 Dimensions

    0 .25 5 75ViDB. 2 Parties 1 Dimension

    Mote: This figure explains graphically how to reduce two vote variablesto one dimension. Graph A portrays the familiar relationship betweenthe vote for the Republicans V,,) and Democrats V,,) in the U Stwo-party system; because of the constraints of equations 1 and 2 allsoints fall on a line segment and can be portrayed more simply asBraph B.

    tion 2 imply that valid points must fall on the planecutting through the three-dimensional space. The con-straints in equation require this plane to end at eachof the three axes. The resulting area that satisfies bothconstraints is the equilateral triangle shaded in A.Because all the points in three dimensions fall in atwo-dimensional area, we can save space by presentingthe triangle alone in two dimensions, which we do inFigure 2B. This graph is a version of a tern ry diagram(or trilinear, triaxial, or barycentric plot; see Upton1989, 1994), to which we have added several newfeatures. In this triangle, each dot fully characterizes asingle constituency result from the 1979 British elec-

    tion. Roughly speaking, the closer a dot is to a vertex(with a party's label, C, L, or A), the higher is the votetotal for that party; more precisely, a vote total for aparty equals the perpendicular distance from the sideof the triangle opposite to the labeled vertex, ascalibrated on the scales we have added. That is. thevertical positions of the dots in the figure indicate thevalues of Vic, as indicated on the scale on the left. Asa dot falls farther from the side opposite the L vertex,the larger is the 6 variable (as can be seen bycomparing the point to the scale at the top).For exam~le. n addition to the real data. we haveadded one (hypothetical) election result as a small boxin the bottom left of Figure 2B. In order to clarify howto read the voting results in this district. we addeddashed lines conn&ting this point to the three axes. Inthis district, the Conservatives received 25 of thevote, as can be seen by following the dashed line fromthe box to the left axis that calibrates V . The dashedline traces the shortest distance from 'the axis to thepoint, that is, a line perpendicular to the axis. The samedistrict also gave 25 of its vote to the Labour Party(see the dashed line that heads northward to the Vaxis) and half its votes to the Alliance (as can bedetermined from the dashed line that heads down tothe right to meet the V axis). The electoral results forall the real districts, represented by dots, also can beread by tracing out perpendicular lines to each axis.With all this precision available when needed, it is stillworth remembering the easier rough way to interpretthis ternary diagram: The closer a point is to a vertex ofthe triangle, the more votes that district gave to theparty whose label appears there.We have removed most of the sides of the triangle inorder to make visible districts with zero votes for one ofthe parties. For example, districts uncontested by theAlliance fall on the right side of the triangle, where thebottom axis reveals that V 0. Substantively, thesepartially contested districts appear to be generated by adifferent process than the mass of (fully contested)districts that fall inside the triangle. This can be seensince the distribution of points does not gradually getsmaller (or larger) as it approaches the side of thetriangle; instead, there appears to be an area withoutdots, indicating every party that merely appears on theballot receives at least 15-20 of the vote.We have also added lines that divide the triangle intothirds. We call these win lines, since they indicatewhich party wins, depending on the region in which apoint falls. For example, if a point falls in the region atthe top of the graph, the Conservatives win a pluralityof the votes and (by the electoral rules) the seat for thatconstituency. Points that fall within the left region arewins for the Alliance, and those in the right go to theLabour Party. The same logic applies to multipartyelections with > parties, even though graphicaldisplays become more unwieldy.PRO LEMS TO RESOLVEStandard regression-type models applied to multipartyelectoral data usually generate nonsensical results. For

  • 8/12/2019 Multi Party

    5/18

    American Political Science Review Vol. 93, No. 1

    FIGURE 2 The Simplex for Three PartiesA 3 Parties 3 Dimensions B. 3 Parties 2 Dimensions

    Note: In a manner analogous to Figure 1, this figure reduces three vote variables to two dimensions. GraphA portrays he relationship among the votesfor the Conservative V,/,,), abour (V,), and Alliance (V parties; because of the constraints of equations1 and 2, all points all on an equilateral rianglethat is the intersection of a plane with the three dimensional figure. Graph B portrays this more simply in two dimensions in a version of what is knownas a ternary diagram. Values of the three variables can be read by where the dots fall perpendicular to the three numbered axes. The litt le square point(with dotted lines referencing he axes) is the example discussed in the text.

    example, one common approach is to use the vote foreach party as a dependent variable (fraction for theConservatives, fraction for the Labour Party, etc.) andto regress each on a set of explanatory variables. TheseJ regressions are run separately, or via a seeminglyunrelated system of equations. Since neither con-straint from equations 1 and is satisfied, this ap-proach generally fails to give sensible results. That is,the results often imply that some parties will get fewerthan zero votes, or that the sum of votes for all partieswill be greater or less than 100%. Moreover, even whenpoint predictions happen to fall within the constraintsof the simplex, the full probabilistic implications of themodel are virtually always logically impossible, as someof the predictive density always falls outside the sim-plex. Some of the few who recognize this problemtransform j to an unbounded scale (separately foreach party j , such as with a logistic function, and thenapply separate or seeming unrelated regressions, butthis, too, is insufficient: The results will satisfy equation1 but not equation 2. Similarly, running only J 1regressions and computing the predictions for Vi romthe others satisfies equation but not equation 1.Various other ad hoc approaches can be taken tocorrect different parts of the problem, but especially

    because computing most quantities of interest requiresthe full probabilistic model, we decided to pursue amore general approach.The model we develop can be considered a general-ization of two independent lines of statistical research.The first line includes models for compositional data(Aitchison 1986), a term that describes data sets withmultiple outcome variables that sum to unity for eachobservation. A few of the many examples of composi-tional data from other fields include soil samples ingeology and pedology (with measurements of the frac-tions of sand, silt, and clay), rock samples in geochem-istry (with fractions of alkali, Fe203, and M,O), andblood measurements in biology (proportions of whiteblood cell types measured for each patient). Composi-tional data are also common in political science andeconomics (as in multiparty voting data, the allocationof ministerial portfolios among political parties, tradeflows or international conflict directed from each na-tion to several others, or proportions of budget expen-ditures in each of several categories), but researchers inthese fields have not taken advantage of the connectionto this more general statistical approach. That is un-fortunate, because compositional data seem to beclosely related to the raison d'&tre of political science

  • 8/12/2019 Multi Party

    6/18

    A Statistical Model for Multiparty Electoral Data March 1999

    research: If politics is the authoritative allocation ofresources, then fractions of resources received by eachgroup are exactly compositional data.The key contribution of this literature is statisticalmodels that allow only possible outcomes to occur.That is, predictions or simulations from such a modelsatisfy equations 1 and 2 or, in other words, havepositive density only over the simplex. The most influ-ential models of compositional data are due primarilyto Aitchison (1986), who criticized earlier modelsbased on Dirichlet distributions (see citations in Aitchi-son 1986, 61-2), since those models require the ratiosof compositions (votes for each party in our applica-tion) to be independent. Aitchison avoided this unre-alistic assumption by applying the normal distributionto the log-ratios of the individual components. Thisprocedure starts with the multivariate normal fit to theunconstrained real plane and then maps it into thesimplex via the multivariate logistic transformation.This works in the same way as, for example, thelog-normal maps the real number line onto the positivereal numbers.

    The second line of research that we generalize aremodels of votes and seats for two-party systems(Gelman and King 1991, 1994a, 1994b; King 1989a;King and Browning 1987; King and Gelman 1991) andfor multiparty systems (King 1990). Like compositionalmodels, some of these vote and seat models alsotransform votes (in different ways) using logistic trans-.formations and then stochastically model the trans-formed variables. The resulting statistical models differin a variety of ways, but they also constrain the result tothe proper sample space so that equations 1 and 2 aresatisfied.

    Unfortunately, the models from neither line of re-search will work without modification for multipartyvoting data. One problem is that our extensive evalu-ations of the assumptions of normality underlying themodels proposed for compositional data (which wepresent belowj indicate that they do not fit real electiondata. Another problem is that these models also do notcapture a fundamental feature of voting data: thepattern of missing data that occurs when at least oneparty receives zero votes in a district, as when itpresents no candidate for election in a district. Zerovote totals in electoral data constitute politically crucialinformation and therefore must be treated very differ-ently from examples in which zeros entries are consid-ered indicators of missing data, due to slight measure-ment error (as when instruments for measuring thecompositions of soil samples miss the always-presenttraces of some elements). The models discussed abovefor analyses of seats and votes fit two-party systems welland (King 1990) they can fit multiparty data on seatsgiven votes, and some of these include special featuresfor uncontested districts, but they are not directlyapplicable to explaining or predicting multiparty elec-toral data.Thus, a proper model of multiparty voting data musthave the following special features. It must have posi-

    tive density only over the simplex and must use adistribution more flexible than the multivariate normal,which does not fit real voting data. It also must allowcovariates (explanatory variables). Below, we providethis basic model for fully contested district elections.The model also must provide special features to dealwith uncontested and partially contested seats, whichwe do in the subsequent section. The complete likeli-hood function is then given.Finally, a proper model must allow for estimates ofprecisely the quantities of scholarly interest, and thesemay differ across applications. That is, we should nothave to teach readers to interpret the arcane results ofstatistical models; rather, the models should be modi-fied to produce results in the form of most naturalinterest to substantively oriented political scientists.For example, the raw coefficients estimated by modelsof compositional data are not the quantities of interestfor any political or, indeed, virtuallx any nonpoliticalapplication. We use methods of simulation, describedbelow, to compute estimates of a wide range of theo-retically interesting quantities. These methods are crit-ical to political science applications of this new model.

    We also believe the methods will enable those in otherscholarly disciplines, who use somewhat related modelsfor very different purposes, to compute numericalquantities of more interest to their research than theusual results of compositional data models.

    THE B SIC MODEL FOR FULLYCONTESTED ELECTIONSIn this section, we only consider district elections inwhich all parties contest and every party gets at leastone vote: Vij E (0, 1) for all i and j. Because, inpractice, no officially registered candidate who appearson the ballot ever gets fewer than 15-20% of the votein our data, the only real assumption here is that allparties contest all district elections. We generalize thismodel to deal with partially contested elections in thenext section.Let Vi = (Vil, . Vi(j-,)) be a (J - 1) X 1vectorfor each district (i = 1, n . This vector containsall the information in the individual vote fractions,since the votes for party J can be computed determin-istically from the others:

    The model we are about to propose is symmetric inthe sense that changing the party labeled J does notaffect anything of substantive importance.Aitchison (1986) proposes that compositional datalike Vi be modeled with his additive logistic normaldistribution. This distribution can be formed as follows.First let Yi be the vector of J - 1 log-ratios Yij =ln(VijlViJ), for party j j= 1 . J - 1 relative toparty J . Then assume that the (J 1) X 1vector Yi(Yil, Yi(,-,)) is multivariate normal with mean

  • 8/12/2019 Multi Party

    7/18

    American Political Science Review Vol. 93, No.

    vector p and variance matrix Z. To get to the observedvotes, use the multivariate logistic transformation:

    Although compositional data analysts have foundthis specification to be useful for their applications, wedemonstrate below that it is inappropriate for multi-party voting data. In our data, a majority of districtstend to be more highly clustered, and a minority muchmore widely dispersed, than the multivariate normalimplies.Political scientists modeling seats and votes haveavoided this distributional problem by combining mix-tures of independent normals and appropriately cho-sen covariates, but these solutions are insufficient for ageneral approach to multiparty voting data.We now derive a new model that solves theseproblems. We label the distribution the additive logistictudent t (LT) distribution, which we demonstrate issuperior when used to fit political data to the additivelogistic normal, which it includes as a limiting specialcase. To derive the LT distribution, first let i emultivariate Student t (Johnson and Kotz 1972) andthen apply the transformation in equation 4:

    T[ln(Vi/Vir>l i, Z]lnJ-l Viij 1

    where the extra factor in the denominator is theJacobian of the transformation (required when creat-ing a new distribution from an existing one through adeterministic transformation), the expected value andvariance of Y . are pq and 2v/(v-2), and v (v > 0 is thedegrees of freedom parameter. The (J 1) X (J 1)parameter Z is known as the scatter matrix. Thisdistribution happens to be equivalent to the predictivedistribution, under certain conditions, when using theadditive logistic normal (see Aitchison 1986, 174).This model differs from the additive logistic normalwhen v < w and it differs more the smaller is v Wefind in practice that our estimates of v are fairly small,and thus the LT distribution differs significantly fromthe additive logistic normal.6 For example, Figure 3gives two ternary diagrams with normal (for A and t(for B) confidence regions fit to real electoral data

    SO that we consider only cases in which the moments exist on thelogistic scale, we impose the technical restriction that v > 2 Thisassumption, while not necessary for our model or estimator (sincethe moments of the additive logistic are always finite), does makeestimation and simulation simpler. Given that our estimates of v stayfar from the boundary (even when permitted to do otherwise), thistechnical assumption is unambiguously supported by our data.

    FIGURE 3. The Fit of the Logistic Normaland Logistic Distribtions

    A Normal Based Confidence Regions

    B Based Confidence Regions

    Note: Both graphs give a ternary diagram for 1970 British House ofCommons electoral data, with uncontested districts deleted, and 50and 95 confidence regions based on the additive logistic normal in A)and additive logistic t in (B).The better fit to the t distribution is indicatedby the approximately 50 and 95 of the constituencies hat fall withinthe 50 and 95 confidence region, respectively, for thet distribution,but only 64 and 91 for the normal. Note also how the inner regionis much narrower, and the outer region is wider, for the t than thenormal.

  • 8/12/2019 Multi Party

    8/18

    A Statistical Model for Multiparty Electoral Data March 1999

    (from 1970). For both, the inner loop is the 50%confidence region; if the model is appropriate, half thepoints should fall within it. In fact, 66.7% fall within thenormal-based region, whereas 48.5% fall within thet-based region. A similar situation, but slightly lessextreme, holds for the 95% confidence region, which isthe outer loop in both graphs. (Because of the largenumber of constituencies, the figure is more useful forunderstanding the differences in how the two models fitthese data and the nature of confidence regions on thesimplex, rather than making it easy to count pointswithin each region.) This demonstrate~s clearly theadvantage of the distribution for British electoraldata.7When v is sufficiently arge, the normal and distri-butions are identical. This means that our generaliza-tion has great potential benefits, because it fits a muchwider range of data more common in multiparty de-mocracies, and it is also essentially costless (i.e., exceptfor the trivial efficiency loss caused by estimating theextra degrees of freedom parameter). Given this riskprofile, there seems little reason not to use this moregeneral model.

    The generalization that the additive logistic t pro-vides would be traditionally described (by using thegeneral textbook description for t-based distributions)as allowing for fatter tailsn-a small number of con-stituencies surprisingly (according to the normal) farfrom the center of the distribution. This description isaccurate, but perhaps a more informative characteriza-'tion is the other half of the story: When v is small, mostof the constituencies are surprisingly (according to thenormal) heavily clustered together (compare the innerconfidence region in the two graphs in Figure 3). Thatis, what the traditional description of t-based distribu-tions misses is that when is small, a t distribution withthe same variance as a normal has both fat tails andheavier cluster around the mode. The two featuresmust exist simultaneously to counterbalance eachother, in order that the result is a proper distribution.Our reason for also emphasizing the heavy cluster isthat this describes more of the points than focusing onthe relatively small number of outliers in the tails.As can be seen by the counts of districts within theconfidence regions in Figure 3, the additive logistic tmodel fits the data better than the additive logisticnormal. The substantive reason is that most constitu-encies in England have vote fractions that are verysimilar to one another, but a smaller set of constituen-cies are quite far from this main cluster.In Figure 4 we present summaries of the fit of thetwo distributions for all the elections in our data. Itshows for each election year the percentage of districtsthat fall within the 50%, 80%, and 95% confidenceregions. Figure 4A shows that for the additive logisticnormal model, the actual fractions of points withineach of these regions (indicated by solid lines) vary

    Th e fit of the model could b e closer to a normal after conditioningon explanatory variables, but our studies indicate that this is notusually the case with our multiparty data. W e present the simple casein Figure 3, without covariates, for ease of presentation.

    quite a distance from the theoretically correct (straightdotted) lines. (For visual clarity, the solid lines connectthe points at the elections, where the estimation wasactually conducted.) In contrast, the actual and theo-retical values are very close for the additive logistic t,portrayed in Figure 4B. The normal seems to fit betterfor more recent elections than it once did, but there isno reason to think that this trend will continue.For applications, we let the means of the log-ratiosbe linear functions of vectors of explanatory variables:where i j is a p 1 vector of explanatory variables,and p, are parameters to be estimated. For mostapplications the explanatory variables will be the samefor allj but this is not required. The parameters pj, 2,and are of little direct interest, but we show belowhow to compute quantities of interest from them.

    SSUMPTIONS FOR P RTLY CONTESTEDDISTRICTSWe now introduce methods of generalizing the basicmodel to allow for districts in which some parties donot contest the outcome.We follow King and Gelman (1991) by setting as thegoal of estimation the effective vote-values of V hatwe would observe if all J parties contested the electionin district i. In districts with all parties contesting, theeffective vote is the observed vote. In partly contesteddistricts, the effective vote for all parties is unobservedbut can be estimated. (That is, in districts in which anyparty chooses not to contest, we lose information aboutthe effective vote proportion for all parties, since thosethat contest might get different vote fractions if theywere to face more competition.)

    The effective vote concept covers all nationalpolitical parties, even if they do not contest all elec-tions. We distinguish regional parties and do not try toestimate strained counterfactuals, such as what wouldhappen if the Scottish nationalists ran in Englishconstituencies. Regional parties are easy to include inour model, but for expository purposes we omit thisissue here.In order to analyze the effective vote in districts notfully contested, some assumptions must be made. Weintroduce several designed for electoral systems inwhich the candidates and parties decide for themselveswhether to contest a district election.8 A reasonableassumption under these circumstances is that a non-contesting party would not have won if it had nomi-nated a candidate. After all, if they would have won,they probably would have nominated someone in the

    Other assumptions would be necessary when, for example, aprogovernment election commission prevents opposition partiesfrom entering a race because they might win, as occurs in somefledgling Eastern Euro pean democracies. Similarly, when a smallparty makes a d eal with a larger party not to co ntest in certain area sas in the recent New Zealand elections), these assumptions wouldnot hold. Our model could be modified accordingly. In all cases,scholars should tune th e mod el assumptions to what we know aboutthe details of party politics.

  • 8/12/2019 Multi Party

    9/18

    American Political Science Review Vol. 93, No.

    FIGURE4 Confidence Region CoverageA Coverage of Normal Confidence Regions

    earB. Coverage of t Confidence Regions

    Note: These graphs summarize the fit of the additive logistic normal A) and additive logistic t (B) distributions or all U.K. elections in our data set. Foreach election, the solid lines mark the percentage of coverage or the 50 , 80 , and 95 confidence regions (where dotted lines are drawn). The betterfit of the t distribution s indicated by the actual number of constituencies within each region (indicated by the solid line) staying much closer to the dottedline for the t than for the normal.

    first place. Even if this assumption is false, it is unlikely that did nominate candidates, and so we expand thethat any statistical analyst can devise a more realistic assumption to this more encompassing version. Weassumption than the noncontesting party is effectively recognize that this broader assumption occasionallyable to do for us. may not hold. In other words, it is conceivable that theIt also seems highly likely that the noncontesting noncontesting party, if it ran, might get more votesparty would have received fewer votes than the parties and yet still lose) than one of the parties that chose to

  • 8/12/2019 Multi Party

    10/18

    A Statistical Model for Multiparty Electoral Data March 1999

    run. Yet, even if this assumption were violated, thedegree of violation would very rarely be large enoughto make a substantive difference. Moreover, the alter-native possible assumptions are more arbitrary andwould be difficult to justify.Other assumptions could be chosen, based on mod-els of candidate entry and exit for different electoralsvstems or on different features of the British electoralsystem (such as the loss of a monetary deposit bycandidates who do not receive a certain fraction of thevote). Our methods for deriving the model belowunder our chosen assumptions can be easily modifiedto handle these alternatives.If covariates that predict which parties contest ineach constituency are available, then they can be usedin interactions with indicator variables that code for thepatterns of uncontestedness in order to avoid assump-tions about parameter equivalence between districtelections that are fully and partly contested. In mostcases, these variables will be useful but not necessary.An alternative is to develop full-blown models thatpredict which parties contest in each district as sepa-rate equations. Although future researchers may wishto consider this approach, we do not pursue it becauseit is unnecessary, would make the model less robust,and would require data that are very hard to obtain inmost applications.We make no assumption analogous to indepen-dence of irrelevant alternatives, as is sometimes nec-essary for individual-level, survey-based statisticalmodels of multiparty voter choice (see Alvarez andNagler n.d.). That is, our assumptions, and the modelbuilt from them, allow the entry or exit of a party intoa district election contest to affect the relative votetotals of the parties already in the race.

    ESTIM TIONIn this section, we propose methods of estimating theparameters of the model, {Pj}, Z and v In thenext section, we explain how to compute quantities ofinterest given these results.If all districts are contested by all parties, we canestimate the parameters by maximum likelihood. Thatis, we would maximize the log of the additive logisticdistribution in equation 5, summed over all observa-tions, with respect to the parameters. Complicationsarise in partly contested districts, however. One attrac-tive approach for missing data problems such as this isto use Markov Chain Monte Carlo (MCMC) methods(see Tanner 1997). For example, impute the missingdata given a guess for the parameters; then estimatethe parameters given these completed data via max-imum likelihood; then use these better parameterestimates to impute more realistic values for the miss-ing data; and so on, until (stochastic) convergence.We implemented a version of the MCMC approach,but in our experience and with our three-party data,this procedure is relatively slow, primarily because eachof the two steps in every iteration is itself iterative. Weoffer a direct likelihood approach that is approximatelytwenty times faster. Our studies indicate that this

    alternative is faster for smaller numbers of parties, butthe MCMC approach may be more computationallyefficient for larger numbers of parties.Our description of the direct likelihood approachbegins by denoting the set of parties contesting theelection in district i as Pi set that can take on sevenpatterns: (1, 2, 3 , (2, 31, (1, 3), (1, 21, {I), (21, and{3>.9When the effective vote is observed for all parties,the likelihood is the probability density of the observedvariables. For simplicity, we write the likelihood as afunction of Y, rather than V,, although the two giveequivalent results. For districts with fully contestedelections, the observed vote (V:l, V V:3) equals theeffective vote (Vil, Viz, Vi3). Thus, Yil 1n(VillK3)and Y,, ln(Vi,IK3) are both observed, and thelikelihood function is the bivariate probability density:

    with parameters

    (and where ul, u,, and make up the scatter matrix).This density differs from the additive logistic for Vi inequation 5 by a constant factor (the Jacobian of thetransformation), which thus establishes the equiva-lence of writing the likelihood as a function of either Vior Y,.When some of the effective votes are not observed(due to noncontesting parties), our assumptions desig-nate a region in which the vote variable falls, in whichcase the likelihood is the area (or volume) under theprobability density corresponding to this known region.The Appendix derives the likelihood function for thesecases.The complete likelihood function is the product ofthe likelihoods for the fully contested case in equation7 as well as the partially contested cases derived in theAppendix and given in equations 12,15, 16, 17,18, and19:

    where we define the product over a null set (when nodistrict election of the type exists) as equaling unity.We also substitute pi l Xi lPl and piz Xi& tointroduce (overlapping, identical, or different sets of)covariates Xi, and Xi,. For our present application, wedefine Xi, and Xi, to include a lag of Yil, a lag of Y,,,and three indicator variables to represent incumbencystatus for each party. We have conducted many otherruns with demographics and other variables included,but, as is consistent with the results from analyses in theUnited States and other democracies, these tend tohave only minor effects on our estimates of incumbencyadvantage. Since our goal is to estimate the total causaleffect of incumbency status, we exclude covariates that9 Suppose they held an election and nobody ran? We ignore thisamusing eighth possible pattern despite its occasional appearance insome very low-visibility local U.S. elections. In general the numberof patterns of missing data is 2J

  • 8/12/2019 Multi Party

    11/18

    American Political Science Review Vol. 93, No.

    are consequences of this key causal variable (for thesame reason that in estimating the effect of an individ-ual s unemployment on his or her vote, we would notwish to control for voting intentions five minutes beforewalking into the voting booth; see Cox and Katz 1996;Gelman and King 1990; and King, Keohane, and Verba1994, 173-5).To facilitate maximization, and to make the asymp-totic normal approximations we use below feasible withfewer observations, it is helpful to reparameterize sothat all parameters are unbounded (ranging betweena and w), as p1 and p, already are. Thus, the full setof transformations is:

    where the form of the equation for , is the inverse ofFisher s Z transformation (keeping between andno matter what value , takes) and that forconstrains v to be greater than two in order to guaran-tee that the moments of the distribution on the logisticscale exist.To summarize all our knowledge about and uncer-tainty in the parameter vector

    we maximize the likelihood function. This gives us anasymptotic normal posterior distribution with the max-imum likelihood point estimates as a mean vector and,as usual, the inverse of the negative of the Hessian asthe variance matrix10Because maximum likelihood is invariant to repa-rameterization (see King 1989b), the same point esti-mates are obtained whether we estimate and trans-form to get or estimate directly. We also use thestandard empirical Bayes approach to specify normalpriors (i.e., the hyperparameters are estimated fromthe means and variances of the maximum likelihoodestimates across our ten election years). Our empiricalresults below are qualitatively the same as withstraightforward maximum likelihood, but, as usual,empirical Bayes helps to reduce the random variabilityacross and within years.

    COMPUTING QU NTITIES OF INTERESTBecause the point of developing a model of voting in amultiparty democracy is to explain and predict electiono W e verified this asymptotic normal approximation by comparisonswith the exact i.e., fmite samp le) posterior distribution, thus avoid-ing the large-n assumption a ltogether. We did this with the techniqueof importance sampling, an iterative simulation method based on aprobabilistic rejection algorithm see Tann er 1997). Ou r experimentsindicate that the point estimates we report below are correct, and thestandard errors are if anything conservative is ., somewhat largerthan they should be). Because im portance sampling is very compu-tationally intensive and hence would be more difficult for others toapply, and since we found th at the two approaches did not suggestany real substantive differences in our data, the analyses below arebased on the asymptotic normal approximation.

    results, our model ought to be capable of computingquantities on the scale of reported votes. That is, theestimated parameters that result from maximizingthe likelihood are important, but they are of little directinterest. For starters, they are reparameterized forestimation via equation 10. But even transforming backto the original scale, by inverting these equations, isnot very helpful since the estimation was done on theadditive logistic scale rather than on the scale ofsubstantive interest-the votes. The quantities of directsubstantive interest are complicated functions of theseparameters.Computing some of these quantities of interest ispossible analytically (through Taylor series approxima-tions and the like), but it would be difficult. Computingmany others is impossible. We simplify these problemsby substituting computer time for human effort via anincreasingly popular technique called random simula-tion (also called stochastic simulation, Monte Carlosimulation, etc.) (see Gelman et al. 19 95; Jackman1996; Tanner 1997). Because simulation can generateresults with any desired degree of precision, the tech-nique entails no compromises (given a sufficientlypowerful computer). The methods for interpretationand presentation we use here are also applicable in thecontext of most other statistical models (see King,Tomz, and Wittenberg 1998).We describe the calculation of three quantities ofinterest in this section, a predicted vote, an expectedVote, and a causal effect. With each, we use a combi-nation of classical and Bayesian techniques. We savethe calculation of other quantities, such as bias andresponsiveness, for a future article.

    Predicted VoteThe quantity of interest here is the probability distri-bution describing the predicted allocation of votes in adistrict conditional on a fixed value for each of theexplanatory variables. The prediction is therefore aprobability distribution over the simplex.Our first requirement is a method for drawing onerandom election result from the approximate posteriordistribution given the estimated model, which we label(ppl, rp,, Vp3), here p is prediction and the tildeindicates that the values have been simulated. We drawthis simulated district election result given a set ofvalues for the explanatory variables Xpl and X eachbeing row vectors). To accomplish th~ s e fofiow thisalgorithm:

    Maximize the likelihood function in equation 9(with the empirical Bayes priors), and r ~co rdhevector of maximum likelihood estimates, , and thevariance matrix, V(C ).Take one random draw of , which we designate as6 from a mu1tivaria;e Annormal distribution withmean and variance V( ) .Reparameterize from 6 to (J by using equation 10,where we use Xpl and Xp, in computing ppl andPP2.

  • 8/12/2019 Multi Party

    12/18

    A Statistical Model for Multiparty Electoral Data March 19994. Draw ypl and Yp randomly from a bivariatedistribution with parameters .5. Compute ppl, P and T/, deterministically fromFpland Fp2 y t multivar~ateogistic transforma-tion in equation 4.

    To compute the distribution of election results givenX we repeat steps 2-5 of this algorithm M times (wefind that M = 1,000 is sufficient for most purposes).Then our approximate posterior distribution of Vpl ismerely a histogram of the simulated values. A pointestimate of the three-party vote results can be com-puted by taking the numerical average of the simula-tions for each party. A standard error can be computedby taking the standard deviation of the simulations foreach party. Similarly, a (say) 80% confidence intervalcan be computed by sorting the values in numericalorder and taking the values at the 10th and 90thpercentiles. The full approximate posterior distributionmay be calculated by a two-dimensional histogram overthe simplex.For an example of simulating predictive quantities ofinterest, we give an inference about a predicted valuefrom a typical open seat. To be specific, we firstestimated the model for 1987 with lags of Y,, and Y,,and two indicators for incumbency status. We includedall variables in both equations. We then set the explan-atory variables (Xpl and X,, such that no candidatestanding for election is a member of the House ofCommons, and the previous vote (i.e., in 1983) is equqlto the average vote across constituencies (V,, = 0.46,V = 0.28, and VpA 0.26). We then applied theahirithm above to yield 500 simulations of the threevote vectors.We use two graphical methods for portraying theresults from this prediction, both shown in Figure 5.Figure 5A plots the 500 simulations in one ternarydiagram. The simulations are all predictions for asingle district and thus vary only due to uncertainty inthe prediction; the collection of dots in this figure thenportrays the full nature of the probabilistic predictionabout where the point (given the values of the explan-atory variables) is likely to be. That is, we have higherconfidence that the actual district vote in the averageopen seat district will be where the heavy cluster of t i edots falls, and the result will fall with smaller probabil-ity where there are fewer dots. Substantively, he resultshows that this typical district is very likely to be won bythe Conservatives, since most of the points fall into theupper third of the triangle. (The actual probability thatthis district will be won by the Conservatives equals ihefraction of simulated dots that fall in this top region,defined bv the win lines.)Figure 5B gives density estimates (smooth versionsof histograms) for each of the three vote variables. Thisgraph helps emphasize the separate, but still obviouslyrelated, nature of the three variables. Each densityestimate portrayed in the graph is an approximateposterior distribution of that quantity (i.e., it can bethought of as a pile of predictions or simulations),indicating where the future value of that vote is likelyto be. Judging from the very little overlap in the

    FIGURE 5 Simulations of a Predicted ValueA Ternary Plot of Predicted Votes

    Note: This figure interprets the results of a model by computing thedistributional mplications of a single prediction (for an open seat in theconstituency with the average vote). Graph A plots 500 simulations romthis prediction on a ternary diagram; GraphB gives density estimates ofthe simulations from the same three vote variables. According to theprediction, the district s vote heavily favors the Conservatives.

    B. Density Plot of Predicted Votes4

    distributions, it is highly likely that the Conservativeswill out-poll the Labour and Alliance parties. TheLabour Party will likely do better than the Alliance inthis constituency, but because of the heavy overlap inthese two distributions, this inference is less certain.Note that all information in can also be found in A,although the images emphasize different aspects of thedata.In an application with a variety of explanatory vari-

    12P A PLPC

  • 8/12/2019 Multi Party

    13/18

    American Political Science Review Vol. 93, No. 1

    ables, we would normally do many different computa-tions such as this. That would enable the researcher tounderstand the many substantive implications of mod-els like this. To do so, we would set the explanatoryvariables at many different sets of values (low incomeand heavily minority, high income and rural, etc.). Inthis situation, we may wish more parsimonious summa-ries of the simulations, such as point estimates, stan-dard errors, and confidence intervals.

    Expected VoteOur knowledge of all real-world random processes isaffected by both fundamental variability and estimationvariability. The latter results from the limited numberof observations collected (or the limited number ofdistricts analyzed). If n were very large, then estimationvariability would vanish. In contrast, even if we had aninfinite number of observations, the fundamental vari-ability in the real world would still prevent our votepredictions from being perfect. Estimation variability isintroduced because of the investigators' "failings,"whereas fundamental variability affects results becausethe world we study is intrinsically variable.Our procedure for computing the predicted votereflects both sources of variability. In the precedingalgorithm, step 2 simulates estimation variability (bydrawing from its distribution), and step 4 simulatesfundamental variability (by drawing the logit of thevote variables from a distribution). Since we wishedthe simulations to reflect our knowledge of the distri-bution of votes, both sources of variability were essen-tial.Closely related to computing the predicted vote isestimating the expected vote in a district: [E(Vpl),E(Vp2), E(Vp3)]. Like the predicted vote, the expectedvote also is conditional on chosen values of the explan-atory variables, X,. Although fundamental variabilityaffects our estimate of the expected value, we need toaverage over it to produce the expectation. In otherwords, the expected vote is fixed, and only our estima-tion of it is imperfect: If n were sufficiently large, thenthe expected vote simulations would be constant. Inpractice, of course, our estimation procedure will pro-duce uncertain estimates of the (fixed) expected vote.11To compute one simulation of the expected vote,which we denote [E(v,,), E(v,,), E(vP3)], we followthis algorithm.1 Maximize the likelihood function in equation 9 .

    (with the empirical Bayes priors), and keep thevector of maximum iikelihood estimates, , and thevariance matrix, V(+) .2. Take one random draw of , which we designate as6 , from a multivariate nnormal distribution withmean and variance V(+).3. Reparameterize from 6 to t by using equation 10,l The difference between the expected vote and the predicted voteresides primarily in the variability around the mean. If the modelwere linear, then the average of the simulations of the predicted andexpected vote would be identical; in our case, the two are close.

    where we use X,, and Xp2 in computing ppl andpp2, respectively.4. Draw m values of Fpl and Fp2 andomly from abivariate t distribution with parameters I . (m100 is usually sufficient.)5. Compute m simulations of vel and vp2 etcrminis-tically from each of the m simulations of Ypl andFp2 y using the multivariate logistic transformationin equation 4.6. Calculate the numerical average of the m simula-tions of vpland rp2o yield one simulation of theexpected votes, E(Vpl) and E(v,,), respectively.Compute the simulation of he expected vote forparty 3 by subtraction: E(V,,) 1 E V p 1 )

    E(VP2).We repeat steps 2-6 of this algorithm M times toproduce M simulations of the expected vote, the meanof which is our point estimate, the standard deviation isthe standard error, and a histogram ofkach componentis the full probability density (M 1,000 is usuallysufficient).

    Causal Effects Including IncumbencyAdvantageA causal effect is the difference between two expectedvotes, given a change in the value of only one explan-atory variable. For example, the incumbency advantageis the difference in the expected vote in a district withan incumbent running and the expected vote in thesame district at the same time when the incumbent'sparty decides to nominate the best available nonincum-bent willing to run (Gelman and King 1990). That is,under this thought experiment, everything is held con-stant up to the start of the general election campaign,at which point the incumbent either runs for reelectionor does not.The causal effect of incumbency status in multipartydemocracies is, of course, somewhat more complicatedthan in two-party systems. The effect on the expectedvote of, for example, a Conservative incumbent seekingreelection may be of a different magnitude than for anAlliance or Labour incumbent. Such a partisan differ-ential also would seem more likely in legislatures withmore parties. In multiparty democracies, we mightestimate the incumbency advantage averaged over theparties, but we prefer to estimate each separately inorder to highlight several interesting substantive differ-ences in our data.

    Computing a causal effect thus requires two sets ofexpected votes, one with an incumbent and the otherwithout. To draw simulations of the causal effect ofincumbency, we take the difference between a simula-tion of the expected vote when the incumbency statusvariable in Xp indicates (1) a particular party's incum-bent is running for reelection and 2) an open seat,with all other variables held constant at (say) theirmeans. That is, we maximize the likelihood and thenrun the expected value algorithm twice, with a changeonly in the incumbency status variable.We estimated the advantage due to three types of

  • 8/12/2019 Multi Party

    14/18

    A Statistical Model for Multiparty Electoral Data March 1999

    FIGURE 6 Incumbency AdvantageEffect of Conservative lncumbent

    59 64 66 70 74 F) 74 0) 79 83 87 92 S.E.

    Effect of Labour lncumbent

    -1.0 I59 64 66 70 74 F) 74 0) 79 83 87 92 S.E.

    Effect of Alliance lncumbent

    59 64 66 70 74 F) 7 4 0 79 83 87 92 S.E.Note: The vertical distance of each arrow above the line indicates the advantage of running an incumbent, as compared to a nonincumbent, to the partyindicated. The direction of the arrow shows from which of the other parties support is being drawn as indicated by the ends of the standard error bars,at the right). Note that the scale, in percentage points for all three parties, is larger for the Alliance graph than for the other two.

    changes in incumbency status-open seat to a Conser-vative incumbent, an open seat to a Labour incumbent,and an open seat to an Alliance incumbent-in each ofthe ten election years from 1959 to 1992. The addi-tional complication is that for one year, and for onetype of incumbency effect, we need to record changesto all three vote variables. To display all this informa-tion succinctly, we have devised a new graphical dis-play. Figure 6 presents the raw results for each yearand type of effect. The top panel shows the effect of aConservative incumbent, and each arrow in the largeleft portion of this graphic is the change from an openseat-which we construct so that it begins at the pointon the line at the year indicated-to where the ex-pected vote would be with a Conservative incumbent,as if each were part of a ternary diagram. Hence, thehigher each arrow extends vertically i.e., not the lengthof the arrow, although the two are obviously related),the larger is the incumbency advantage to the Conser-vatives. The left axis is in percentage points of incum-bency advantage.The direction each arrow leans indicates from whichparty the Conservatives draw votes when they run anincumbent versus running another candidate in anopen contest, with the vertices of the implicit ternary

    diagram indicated around the standard errors at theright. Note also that there is a standard error in eachdirection, indicated by the length of each of the stan-dard error arrows at the right.12) For example, thearrow for 1964 Alliance incumbents leans away fromthe bottom left, which means Alliance incumbents getelectoral advantage by drawing votes disproportion-ately from Labour Party candidates L in the standarderror part of the graph on the right). Arrows at anglesbetween the extremes indicated on the standard errorgraph draw from a proportionate combination of thetwo other parties.) Thus, in the top panel, since all thearrows lean at least somewhat to the right, Conserva-tive incumbents derive their advantage by drawingmore from the Alliance than from the Labour Party.To our knowledge, this type of effect has not beenestimated in the elections literature of any country.The most striking observation about Figure 6 is theunambiguous positive effect of incumbency, for all ten2 Because the Alliance receives many fewer votes on average thanthe o ther parties, th e maximum range of votes that could be drawnfrom the Alliance to form an incumbency advantage for any otherparty s inc umbe nt is quite small. As a result, the A lliance standarderrors in figures and 7 are smaller than for the Conservatives orLabour.

  • 8/12/2019 Multi Party

    15/18

    American Political Science Review Vol. 93, No.

    FIGURE 7 Average Incumbency Advantage

    Conse wative

    Average S ELabour

    Average S E

    Average S ENote: This figure gives the incumbency advantage averaged over all theyears portrayed in Figure 6 for which the interpretation is analogous.Note the scale for the Alliance graph which is larger than the others.

    elections and all three parties over half a century ofBritish politics. This is indicated by all 30 arrowspointing above the zero line. Most of the individualeffects in Figure 6 are larger than their standard errors.When pooled, the average effects for each party'sincumbent are from two to five times their standarderrors, and hence by any relevant statistical standardthey are clearly greater than zero.Figure 7 gives these average effects for each party.(As always, the standard error of an average is smallerthan the standard error of its independent component

    parts.) The summary in Figure 7 indicates that theaverage incumbency effect is about half a percentagepoint for the Conservatives and twice that for Labour.In fact, in every one of the ten elections shown inFigure 6, the Labour incumbency effect is larger thanthe Conservative effect. Our interpretation of thisdisparity (which is necessarily more speculative thanour results) is that Labour incumbents have a working-class constituency that benefits more from governmentservices than Conservative voters. Incumbents have thediscretion to influence the position of people on vari-ous types of lists for social services, such as to get into,or renovate, council housing. Conservative incumbentshave fewer such opportunities to serve their relativelymore wealthy constituents, so their incumbency advan-tage should not be as large.Alliance incumbents receive an advantage of threepercentage points (thrice the Labour advantage). Thisis less than the incumbency advantage af 8-9 percent-age points in the U.S. House of Representatives(Gelman and King 1990), but it is only slightly smallerthan the average advantage in the House in the mid-1960s or in most American state legislatures today(King 1991). This effect is in part because the counter-factual involved in an open seat versus an Allianceincumbent is much less likely in parts of Britain thanthe analogous counterfactual involved for estimatingmajor party incumbency effects. An Alliance incum-bent implies that the major party hold on the politicalystem has been broken, and voters take this cue toreevaluate their votes. Expressed another way, thecollective action problem of moderate voters whoprefer the Alliance but do not want to waste their voteis solved with an Alliance incumbent in office. Theyhave less reason to vote strategically ( tactically, as itis called in Britain) and instead cast their vote for theirsincere preference. Much more than the major parties,the Alliance runs local, almost U.S.-style, candidate-centered campaigns (sometimes nominating well-known nonpolitical personalities), rather than national,party-oriented campaigns. The result, we believe, is thelarge Alliance incumbency advantage.Taken together, our findings support the claims ofthe new quantitative literature on the existence of theincumbency effect in Britain, but the size of the effectfor all three parties is substantially smaller than previ-ous biased methods had indicated. That is, our methodis not biased and is also sufficiently powerful to be ableto distinguish a small (but politically meaningful) effectfrom none at all. Our method is also able to discerndistinctly different incumbency effects across the threeparties.There appears to be a hint in the top panel of Figure6 that the Conservative incumbency advantage may begrowing slightly, but formal statistical tests clearlyreject this possible trend. Moreover, Labour or Alli-ance incumbency effects display no such apparentpattern. Thus, we find no support for the other claim ofthe new quantitative literature, which argues, againstthe conventional wisdom, that the incumbency advan-tage has been dramatically increasing in recent years.When the Conservatives run an incumbent (as com-

  • 8/12/2019 Multi Party

    16/18

    A Statistical Model for Multiparty Electoral Data March 1999

    pared to no incumbent running), they pull most of theirextra incumbency advantage from the Alliance. Asexplained, this can be seen because the arrow in thefirst panel of Figure 7 and all ten arrows in the firstpanel of Figure 6 lean to the right-away from the Ain the standard error diagram. Our interpretation ofthis clear result is that the Alliance is a transitionaloption for voters on the way to supporting one of themajor parties; only 20% of voters stick with the Alli-ance for more than one election (Butler and Stokes1969,315-38). Presumably because the Labour incum-bency advantage stems more from constituency servicethan does the Conservatives' advantage (and becausethere are usually as many Conservative voters whocould benefit from a Labour member's constituencyservices as there are voters for the much smallerAlliance who could benefit), Labour incumbents pullapproximately equally from the Conservatives and theAlliance (see Cain, Ferejohn, and Fiorina, 1987).l3

    CONCLUQING REMARKSOur results may help resolve an ongoing debate in theBritish elections literature. The conventional positionis that incumbency has never been an advantage. Incontrast, a newer literature holds that incumbencyadvantage is moderate to large and is growing. Withour approach, we find that both sides are right to anextent. That is, the incumbency advantage in Englandis small but meaningfully above zero. There is noevidence, however, that it is trending in any direction.In addition, our model detects important differencesamong the parties, with the incumbency advantage forLabour being about twice that for the Conservatives, andthe advantage for the Alliance triple that of Labour.We have developed and presented our statisticalmodel for three parties and applied it to the Britishelectoral system. The model and estimation proceduresare all directly applicable to electoral systems with anynumber of parties. Further research will be necessaryto implement the more general version of our model.Our analytical approach, while much faster for threeparties, does not scale up as well as a more generalMCMC approach. Our experiments with MCMC ap-plied to this problem convince us that it will scale fairlywell. Priors may be needed, given the large number ofparameters relative to a smaller number of districts,but there is substantial information in multiparty elec-toral systems that would be extraordinarily valuable toresearchers in comparative politics, so this extensionseems well worth the effort. Our graphical displaysobviously do not generalize directly to more than fourparties, but with judicious use of color, shading, andl The Alliance does not predictably draw more from either majorparty. There is one other feature of the last panel of Figure 6however, that may reflect a systematic pattern: Alliance incumbencyadvantage was drawn almost exclusively from the Labour Partybetween 959 and 1970,but this abruptly changed, and votes startedto be drawn in different ways from Labour and the Conservativesover the subsequent six elections. Since this pattern is based ononly four elections, further research is required before drawing firmsubstantive conclusions as to its cause.

    perspective, we have found it possible to display resultsfor up to eight parties, depending on how complicatedand empirically clear the relationships are in the data.Many opportunities for future research remain, threeof which we note here. First, we can easily extend theinterpretation of the model to include other quantities ofinterest, such as bias and responsiveness of the electoralsystem, which are important and controversial issues inBritain and elsewhere. This can be done by clearlydefining the quantity to be computed and then makingslight modifications of the algorithms described above.Second, some two-party models of seats and voteshave in recent incarnations included random effectsterms, which are more highly modeled versions of whatour empirical Bayes approach accomplishes for themultiparty case. These terms help keep the estimatesreasonable even if certain types of explanatory vari-ables are not observed and included in the model. Theyare especially well suited to modelsn of legislatures,since the structure of the random effects model canreflect the panel structure of the data.Finally, electoral systems vary considerably acrosscountries. All but two countries have districts of somekind, and most elect legislators to more than one seatin a district. Modifications to fit all the different typesof electoral systems will require some detailed work,but these should be achievable by straightforwardextensions of the model presented here.APPENDIX THE LIKELIHOOD FUNCTIONFOR PARTIALLY CONTESTED DISTRICTSDistricts with Two of Three PartiesContestingFirst consider elections in which the parties contesting indistrict include Pi 2 , 3). The effective votes for all threeparties, Vil, Vi2, and Vi3, are unobserved. Nevertheless, ourassumptions for less-than-fully contested elections intro-duced above imply that we have some information aboutthese quantities. In particular, we know that Vil min (Viz,Vi3) and therefore Yi, min (0, Yi2) or, equivalently, Yl0 and Yl Y2. Thus, the likelihood function for this case isas follows:

    where F, is the cumulative distribution function of the(univariate) t ,

    is the conditional mean of Yi2 given Yil, and

    is the conditional variance. These conditional distributionsare analogous and mathematically similar to the more com-

  • 8/12/2019 Multi Party

    17/18

    American Political Science Review Vol. 93, No.

    monly known conditional normals see Liu 1994). The func-tion in equation 12 is easily calculated by one-dimensionalnumerical integration.By a parallel logic, the likelihood function for districtelections in which parties and 3 contest is:

    When parties 1and 2 contest, but party 3 is missing, a slightcomputational difference occurs because Vi3 appears in thedenominator of both Yil and Yi2. As a result, we know, by ourassumption, that Vi3 < min Vi,, ViJ, which translates intoYi, > 0 and Yi2 > 0. Hence:

    where F, is the cumulative distribution function of in thiscontext) the bivariate t To compute this function, we followthe standard procedure of applying one-dimensional numer-ical integration after factoring the joint distribution into amarginal and conditional-directly analogous to equation 12.

    Districts with One of Three PartiesontestingWhen only party 1 contests and hence automatically wins),we have no information about Yi2, but we know that Yil > 0.We use this information to form the likelihood function:

    Similarly, the likelihood function for the remaining twocases is

    and

    REFEREN ESAitchison, J. 1986. The Statistical Analysis of Compositional Data.London: Chapman and Hall.Alvarez, R. Michael, and Jonathan Nagler. N.d. When Politics andModels Collide: Estimating Models of Multi-Party Elections.American Journal of Political Science. Forthcoming.Bellucci, Paolo. 1984. The Effect of Aggregate Economic Condi-tions on the Political Preferences of the Italian Electorate, 1953-1979. European Journal of Political Research l2(4):387- 401.Bellucci, Paolo. 1991. Italian Economic Voting: A Deviant Case orMaking a Better Theory? In Economics and Politics: The Calculusof Support ed. Helmut Norpoth, Michael S. Lewis-Beck, andJean-Dominique Lafay. Ann Arbor: University of Michigan Press.Pp. 63-81.

    Butler, David E., and Dennis Kavanagh. 1980. The British GeneralElection of 1979. London: Macmillan.Butler, David E., and Donald Stokes. 1969. Political Change inBritain: Forces Shaping Electoral Choice. New York: St. Martin's.Cain, Bruce, John Ferejohn, and Morris Fiorina. 1987. The PersonalVote: Constituency Service and Electoral Independence. Cambridge:Harvard University Press.Cornford, J. R., Dorling, D. F. L., and Tether, B. S. 1995. HistoricalPrecedent and British Electoral Prospects. Electoral Studies14(2):123-42.Cox, Gary, and Jonathan N. Katz. 1996. Why Did the IncumbencyAdvantage in U.S. House Elections Grow? American Journal ofPolitical Science 40(2):478-97.Curtice, John, and Michael Steed. 1980. An Analysis of Voting. InThe British General Election of 1979 ed. David E. Butler andDennis Kavanagh. London: Macmillan. Pp. 390-431.Curtice, John, and Michael Steed. 1983. The Voting Analyzed. InThe British General Election of 1983 ed. David E. Butler andDennis Kavanagh. London: Macmillan. Pp. 333-73.Dominguez, Jorge, and James McCann. 1996.Democratizing Mexico:Public Opinion and Electoral Choices. Baltimore, MD: JohnsHopkins University Press.Frey, Bruno, and Fredrick Schneider. 1980. ~ o ~ i l a r i t yunctions:The Case of the U.S. and West Germany. Models of PoliticalEconomy. In Models of Political Economy ed. Paul Whiteley.London: Sage. Pp. 47-84.Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B.Rubin. 1995. Bayesian Data Analysis. New York: Chapman andHall.Gelman, Andrew, and Gary King. 1990. Estimating IncumbencyAdvantage without Bias. American Journal of ~olitical cience34(November):1142-64.Gelman, Andrew, and Gary King. 1991. Systemic Consequences ofIncumbency Advantage in the U.S. House. American Journal ofPolitical Science 35(February):110-38.Gelman, Andrew, and Gary King. 1994a. Enhancing DemocracyThrough Legislative Redistricting. American Political SctenceReview 88(September):541-59.Gelman, Andrew, and Gary King. 1994b. A Unified Method ofEvaluating Electoral Systems and Redistricting Plans. AmericanJournal of Political Science 38(May):514-54.Goodhart, C. A. E., and R. J. Bhansali. 1970. Political Economy.Political Studies 18(1):43-106.Horowitz, Donald L. 1985. Ethnic Groups in ConJEict. Berkeley:University of California Press.Horowitz, Donald L. 1993. Democracy in Divided Societies.Journal of Democracy 4(4):18-38.Host, Viggon, and Martin Paldam. 1990. An International Elementin the Vote? European Journal of Political Research 18(2):221-39.Huckfeldt, Robert, and John Sprague. 1993. Citizens, Contexts, andPolitics. In Political Science: The State of the Discipline II ed. AdaFinifter. Washington, DC: American Political Science Association.Pp. 281-303.Inoguchi, Takashi. 1980. Economic Conditions and Mass Support inJapan, 1960-76. In Models of Political Economy ed. Paul White-ley. London: Sage. Pp. 121-51.Jaclunan, Simon. 1996. Bayesian Tools for Social Scientists. Pre-sented at the annual meetings of the American Political ScienceAssociation, San Francisco.Johnson, Norman Lloyd, and Samuel Kotz. 1972. Distributions inStatistics: Continuous Multivariate Distributions. New York: Wiley.King, Gary. 1989a. Representation Through Legislative Redistrict-ing: A Stochastic Model. American Journal of Political Science33(November):787-824.King, Gary. 1989b. Unifying Political Methodology: The LikelihoodTheory of Statistical Inference. New York: Cambridge UniversityPress.King, Gary. 1990. Electoral Responsiveness and Partisan Bias inMultiparty Democracies. Legislative Studies Quarterly 15(May):159-81.King, Gary. 1991. Constituency Service and Incumbency Advan-tage. British Journal of Political Science 21(January):119-28.

  • 8/12/2019 Multi Party

    18/18

    A Statistical Model for Multiparty Electoral Data March 1999King, Gary. 1997. A Solution to the Ecological Inference Problem:Reconstructing Individual Behavior jkom Aggregate Data. Princeton,NJ: Princeton University Press.King, Gary, and Robert X Browning. 1987. Democratic Represen-tation and Partisan Bias in Congressional Elections. AmericanPolitical Science Review 81(December):1251-73.King, Gary, and Andrew Gelman. 1991. Systemic Consequences ofIncumbency Advantage in the US. House. American Journal ofPolitical Science 35(February):110-38.King, Gary, Robert 0 Keohane, and Sidney Verba. 1994.DesigningSocial Inquiry: Scienti3c Inference in Qualitative Research. Prince-

    ton, NJ: Princeton University Press.King, Gary, Michael Tomz, and Jason Wittenberg. 1998. Makingthe Most of Statistical Analyses: Improving Interpretation andPresentation. Presented at the annual meetings of the AmericanPolitical Science Association, Washington, DC. http://ghg.Harvard.edu/preprints.shtml (February 10, 1999).Lewis-Beck, Michael S. 1988. Economics and Elections: The MajorWestern Democracies. Ann Arbor: University of Michigan Press.Lewis-Beck, Michael S., and Paolo Bellucci. 1982. Economic Influ-ences on Legislative Elections in Multi-Party Systems: France andItaly. Political Behavior 4(1):93-107.Liu, Chuanhai. 1994. Statistical Analysis Using the Multivariate tDistribution, Ph.D. diss. Department of Statistics, Harvard Uni-versity.Miller, A. H., and Ola Listhaug. 1985. Economic Effects and theVoting in Norway. In Economic Conditions and Electoral Out-comes ed. Michael S. Lewis-Beck and Heinz Eulau. New York:Agathon. Pp. 125-43.Norton, Philip, and David Wood. 1990. Constituency Service byMembers of Parliament: Does It Contribute to a Personal Vote?Parliamentary Affairs 43(April):196-208.Offe, Claus. 1992. Strong Causes, Weak Cures. East EuropeanConstitutional Review 1(1):21-3.Paldam, Martin. 1986. The Distribution of Election Results and the

    Lewis-Beck, and Jean-Dominique Lafay. Ann Arbor: University ofMichigan Press. Pp. 9-31.Powell, G. B., and G. D. Whitten. 1993. A Cross-National Analysisof Economic Voting: Taking Account of the Political Context.American Journal of Political Science 37(2):391-414.Przeworski, Adam. 1991. Democracy and the Market. New York:Cambridge University Press.Przeworski, Adam. 1996. Public Support for Economic Reforms inPoland. Comparative Political Studies 29(5):520-43.Rattinger, Hans. 1991. Unemployment and Elections in WestGermany. In The Economics of Politics: The Calculus of Supported. Helmut Norpoth, Michael S. Lewis-Beck, and Jean-DominiqueLafay. Ann Arbor: University of Michigan Press. Pp. 49-62.Rosa, J.-J., and Daniel Amson. 1976. Conditions Economiques etElections. Revue Francaise de Science Politique 26(6):1101-19.Slider, Darrell. 1994. Political Tendencies in Russia's Regions:Evidence from the 1993 Parliamentary Elections. Slavic Review53(3):711-32.Stokes, S. C. 1996. Economic Reform and Public Opinion in Peru,1990-95. Comparative Political Studies 29(5):544-65.Tanner, Martin A 1997. Tools for Statistical Inference: Methods forthe Exploration of Posterior Distributions and Likelihood Functions3d ed. New York: Springer-Verlag.Tucker, Joshua. 1996. The White Fear ~ ~ ~ o t h e s i seets VladimirZhironovsky: Ethnic Composition of Regions and Voting Patternsin the 1993 Russian Duma Elections. Presented at the annualmeetings of the Northeastern Political Science Association, Bos-ton.

    Upton, Graham J. G. 1989. The Components of Voting Change inEngland, 1983-1987. Electoral Studies 8(1):59-74.Upton, Graham J. G. 1994. Picturing the 1992 British GeneralElection. Journal of the Royal Statistical Society SeriesA General157(Part 2):231-52.White, Stephen, Richard Rose, and Ian McAllister. 1997.How RussiaVotes. Chatham. NJ: Chatham House.Two Explanations of the Cost of Ruling. Europaische Zeitschvift Whiteley, Paul. 1980. Politico-Econometric Estimation in Britain:fur Politische OkonomielEuropean Journal of Political Economy An Alternative Explanation. In Models of Political Economy ed.2(1):5-24. Paul Whiteley. London: Sage. Pp. 85-100.Paldam, Martin. 1991. How Robust Is the Vote Function?: A Study Wood, David, and Philip Norton. 1992. Do Candidates Matter?of Seventeen Nations over Four Decades. In The Economics of Constituency-Specific Vote Changes for Incumbent MPs, 1983-Politics: The Calculus of Support ed. Helmut Norpoth, Michael S. 1987. Political Studies 40(June):227-38.