The WTO Trade Effect
Pao-Li Chang∗
School of Economics
Singapore Management University
Singapore 178903
Myoung-Jae Lee†
Department of Economics
Korea University
Seoul, Korea
August 1, 2008
Abstract
This paper proposes to reexamine the trade effect of GATT/WTO based on non-
parametric econometric techniques. Our estimation framework uses the simplest gravity
model that explains bilateral trade volumes with country sizes and trade resistance, with-
out imposing parametric assumptions typically made in gravity-type econometric models.
Specifically, matching estimators coupled with permutation tests and signed-rank tests
are employed to obtain point and interval estimates. While observable covariates (country
sizes and determinants of trade resistance) are controlled for nonparametrically, unob-
servable selection bias is dealt with by a sensitivity analysis. The results suggest large
GATT/WTO effects on trade in both cross-sectional and time-series dimensions, and the
results hold regardless of the GATT/WTO indicators (defined by formal membership or
de facto participation).
Keywords: GATT/WTO; treatment effect; matching; permutation test; signed-rank test;
sensitivity analysis
JEL classification: F13; F14; C14; C21; C23
∗Tel: +65-6828-0830; fax: +65-6828-0833. E-mail address: [email protected] (P.-L. Chang).†Tel: +82-16-347-7867; E-mail address: [email protected]
1
1. INTRODUCTION
This paper joins the debate on the trade effect of GATT/WTO that was sparked off by
Rose’s (2004). In that study, Rose did an extensive statistical analysis of bilateral trade
1948-99, based on gravity-type econometric methods, to find that GATT/WTO-membership
dummies fail to reveal any statistically significant, and robust influence on the volume of
bilateral trade. Rose’s findings have been partly reversed by more recent studies: see, for
example, Tomz et al. (2007) and Subramanian and Wei (2007), and Rose (2006) for a survey.
Most studies in this literature, however, primarily rely on parametric estimation of gravity-
type equations. There are by now several theoretical papers justifying the gravity model, in
which the volume of trade between two countries is proportional to the product of their
economic size, and the factor of proportionality depends on measures of ‘trade resistance’
between them. See, for example, Anderson (1979), Bergstrand (1985), Deardorff (1998), and
Anderson and van Wincoop (2003). Although we may specify the set of factors that affect
trade resistance, it takes a great leap of faith to believe that these determinants of trade
resistance sum up log-linearly (and the error term associated with trade resistance is nicely
distributed as log-normal) as most empirical specifications of gravity equations assume. Al-
though gravity equations have gained popularity in empirical research due to its good fit
to data, this good prediction power does not necessarily imply correct model specifications.
This is partly evident in many applications of gravity-type models that result in contradic-
tory conclusions on particular determinants of trade resistance, such as the controversy in the
literature over the currency union effect (Persson, 2001; Rose, 2001), the free trade agreement
effect (Frankel, 1997; Baier and Bergstrand, 2007b), the national border effect (McCallum,
1995; Anderson and van Wincoop, 2003), and the GATT/WTO membership effect.
In addition to potential biases arising from arbitrary functional form assumptions of trade
resistance, another potential bias in gravity estimates arises from the fact that policies that
affect trade flows are potentially endogenous or self-selected. Examples include the formation
of currency unions (Persson, 2001) and that of free trade areas (Baier and Bergstrand, 2007b).
Similarly, the status of GATT/WTO membership is likely not exogenous. Whether countries
that trade more or that trade less are more likely to join the GATT/WTO (and hence
the direction of selection bias) is a priori unknown, although a researcher may reasonably
suspect a country’s accession decision to depend on trade resistance among other factors. For
2
example, Baier and Bergstrand (2004) find that the same factors that determine trade flows
tend to determine the formation of free trade areas as well. Although parametric approaches
exist to correct the selection bias, they are subject to misspecification pitfalls.
In this paper, we propose nonparametric techniques that do not assume correct specifi-
cation of ‘trade resistance’ and allow ‘selection on observables.’ Specifically, a pair-matching
estimator is used in the first step to estimate the ‘treatment’ effect of membership in the
GATT/WTO. Among many nonparametric approaches to evaluating the treatment effect of
a policy or program, the pair-matching method is relatively straightforward to implement.
Matching is widely used in the fields of labor and health economics. See, for example, Heck-
man et al. (1997), Imbens (2004), and Lee (2005), and applications in Heckman et al. (1998),
Lechner (2000), Lee et al. (2007), and Lu et al. (2001). Recently, different versions of match-
ing methods have also been applied to the study of the currency union effect (Persson, 2001)
and the free trade agreement effect (Baier and Bergstrand, 2008).
Applying the terminology in the matching literature to our current context, the state of
both countries being GATT/WTO members in a trading relationship is a ‘treatment,’ and
the dyad in the state is a ‘treated’ subject, in contrast with a ‘control’ subject where both
countries are nonmembers. In essence, a pair-matching method compares the ‘responses’
(trade flows) of a treated subject and a control subject who differ in their treatments but
are otherwise similar. The difference in their bilateral trade flows is then attributed to
the treatment effect of GATT/WTO membership. As argued earlier, we do have strong
theoretical foundations in the basic gravity equation that relates trade flows to economic size
and trade resistance, although we do not have a strong conviction in the functional form
of trade resistance. On the other hand, the empirical gravity literature has more or less a
consensus on the list of variables that are likely to have a bearing on trade resistance. This
set of variables serve as the set of covariates that we need in measuring the similarity between
any two subjects and in matching them. In particular, we use the same set of covariates as
in Rose (2004) to allow comparison with other studies in this literature that mostly use the
same set of trade resistance determinants (along with economic sizes and year dummies). By
using the matching method, however, we do not have to assume a particular functional form
for trade resistance; that avoids the bias stemming from parametric misspecifications.
A second advantage of using matching estimators as claimed above is that the match-
ing estimator is suited for non-random selection into treatment (i.e. membership in the
3
GATT/WTO). The absolute probability of selecting into membership is allowed to vary
across trading relationships and be different from 0.5 (i.e. random selection). Matching of
two subjects similar in economic sizes and in determinants of trade resistance removes the
difference in absolute selection probabilities based on observable covariates and minimizes the
bias stemming from selection on observables. By using matching, similarly, we do not have
to assume knowledge of the exact functional form for accession decision or for the probability
of selecting into GATT/WTO membership.
Although matching estimators are popular in practice, their asymptotic properties are not
fully understood (see, however, exceptions such as Abadie and Imbens (2006) for the case of
iid data). In practice, a standard t-statistic or a bootstrap procedure is often used to derive
the p-value or confidence intervals. Standard t-statistic is straightforward but theoretical
justifications in most cases are not available; bootstrap is computationally demanding and
is argued by Abadie and Imbens (2006) to be invalid. In this paper, we propose using
permutation tests and for good reasons. As noted above, asymptotic-based tests for matching
estimators have yet to be well developed. Permutation test, on the other hand, is an exact
inferential procedure conditional on the observed data. Under the null hypothesis of no
treatment effect, two subjects in a matched pair are ‘exchangeable’ in the labeling of their
treatment status (treated or untreated). Obtaining all possible permutations of the treatment
labels in all pairs, the exact p-value of the observed matching estimator can be computed by
placing it in the “empirical” distribution of the permutations. When the number of matched
pairs is large, we show that the exact p-value can be approximated by the standard t-test and
one can skip the computation time required for the permutation. Thus, implementation-wise,
the permutation test is straightforward (this also offers a theoretical justification for the use
of standard t-statistic albeit from an exact-inferential perspective). Furthermore, as will be
argued in the main text, permutation tests can accommodate a wide range of data structure
with heterogeneity and serial correlation, which are likely to be present in bilateral trade
data. To our knowledge, permutation tests have not been used in international economics
applications. See Pesarin (2001) and Good (2004) for an introduction to the concept, and
Ernst (2004) for a survey. Recently, Imbens and Rosenbaum (2005) showed that confidence
intervals constructed by permutation methods are far more reliable than other confidence
intervals in the returns-to-education literature.
In applying the matching estimator and the permutation test, one first uses the observed
4
numerical differences in trade flows across matched pairs to estimate the treatment effect and
then find its statistical significance using the permutation test. Alternatively, one may use
the signed ranks of these differences and derive the corresponding permutation distribution
of the Wilcoxon (1945) signed-rank test statistic. The derived treatment effect estimator has
the extra advantage of being robust to outliers. The signed-rank test statistic is also readily
amenable to a sensitivity analysis procedure we will introduce below.
As noted earlier, self-selection into GATT/WTO membership is not a problem in the
matching framework, if the selection is based on observable variables. However, if the se-
lection is also based on some unobservables that vary systematically across the treated and
control groups and if the unobservables also affect trade flows, then selection bias due to
unobservables arises. In other words, the treatment is not random conditional on the ob-
servables. This problem of ‘selection on unobservables’ applies to both parametric gravity
estimation and matching estimation. We propose using a sensitivity analysis following Rosen-
baum (2002) to evaluate the sensitivity of matching estimates obtained under the assumption
of no ‘selection on unobservables.’ This method examines how severe the unobserved selec-
tion problem must be to eliminate an original finding of positive (or negative) effects or to
overturn an original finding of zero effect. Similar sensitivity analyses have appeared in other
fields of economics; see, for example, Aakvik (2001), Imbens (2003), Hujer et al. (2004),
Altonji et al. (2005), and Lee et al. (2007).
Applying the nonparametric matching methods and permutation-based inference proce-
dures to Rose’s data, this paper reaches a conclusion that is in stark contrast to conclusions
reached by Rose (2004): membership in the GATT/WTO has large and significant trade-
promoting effects. We explore robustness of this result to various possible critiques. First,
the assumption of no ‘selection on unobservables,’ i.e., “random treatment conditional on the
observables” may fail if the set of observable variables that we use following Rose (2004) is
not appropriate. This is partly addressed by the Rosenbaum (2002) sensitivity analysis intro-
duced above. Alternatively, we also conduct restricted matching, where a potential match for
a subject is further limited to observations from the same dyad, the same year/period, or the
same relative development stage. This eliminates potential bias arising from unobservable
heterogeneity across dyads, years, GATT/WTO periods, or countries of different develop-
ment stages. Second, Tomz et al. (2007) emphasize the importance of de facto participation
in the GATT/WTO by colonies, newly independent nations, and provisional members, and
5
find strong GATT/WTO effects on trade when these types of nonmember participation is
taken into account. We conduct the same nonparametric analysis introduced above using the
data set of Tomz et al. (2007) and find even stronger results than those based on Rose’s data
set, validating the arguments of Tomz et al. (2007). Third, relative trade resistance rather
than absolute trade resistance is argued by some gravity theories to be more appropriate in
explaining trade flows, cf. Anderson and van Wincoop (2003), and multilateral resistance
terms resembling country-specific effects should be included in the list of explanatory vari-
ables. In a way, we have accounted for dyad-specific and hence country-specific effects when
we conduct the matching within the same dyad as introduced above. The strong effects of
GATT/WTO membership (or participation) remain.1
Recent studies of Helpman et al. (2007) and Felbermayr and Kohler (2007) have empha-
sized the importance of zero trade flows and potential bias in the estimates of GATT/WTO
membership effect using only observations with positive trade flows in estimating the gravity
equation. In particular, an analysis in Helpman et al. (2007) implies that this will result in
a downward bias in the estimates of treatment effects, as dyads who are not GATT/WTO
members but still observed trading with each other are likely to have lower unobserved trade
resistance. Thus, regression analysis of gravity equations should include an extra term, i.e.
the inverse Mills’ ratio, to account for the effect of sample truncation at zero. By using Rose’s
data set, we also use only observations with positive trade flows. With observations matched
on observable determinants of trade flows, the difference in the conditional mean levels of
trade flows due to these determinants are eliminated, and the remaining difference in the con-
ditional means between the treated and the untreated observations in a matched pair thus
includes the true treatment effect and the effect of unobservable trade resistance. The argu-
ment above suggests that unobservable trade resistance is likely to be systematically lower
for the untreated than the treated and that the GATT/WTO membership effect likely to be
underestimated. Thus, the matching estimates we will present are conservative estimates of
the GATT/WTO membership effect. Our nonparametric procedure does allow the possibil-
ity of departure from the assumption of no ‘selection on unobservables,’ and evaluates such1On the other hand, in the matching framework, we do not have a good way to control for time-varying
country-specific effects as emphasized by some parametric studies of gravity equations, cf. Subramanian andWei (2007). For example, for the observation unit ijt with trade flows between countries i and j at time t,there is no other observation and hence potential match that also contains information on both country-specificeffects (φit, φjt) at time t. See, however, recent studies of Baier and Bergstrand (2007a, 2008) for suggestionsto approximate the multilateral terms.
6
effects using the Rosenbaum (2002) sensitivity analysis. If a positive effect is found under
the assumption of no ‘selection on unobservables,’ the above argument of negative selection
will only strengthen the initial finding of a positive effect.
Helpman et al. (2007) also distinguish the direct partial effect of trade resistance on trade
flows from their indirect effect on trade flows through changes in the number of exporters.
In this paper, we do not attempt to make this distinction. In our view, the larger trade flows
due to an increase in the number of exporters should also be considered as part of the benefit
of a GATT/WTO membership. Thus, the matching estimates we will present refer to the
total effect of GATT/WTO membership, including both the direct and indirect effects.
The rest of this paper is organized as follows. Section 2 introduces the nonparametric
procedures. Section 3 explains the data used. Section 4 summarizes the available theories of
GATT/WTO and caveats of using membership to define involvement in the GATT/WTO.
Our estimation results and findings are reported in Sections 5 and 6. Section 7 concludes.
Technical details are relegated to the appendix in Section 8.
2. METHODOLOGY
Suppose there are N countries in the world. Let yijt denote the real bilateral trade volume
in logarithm between countries i and j in year t. Define xijt and εijt as the observed and
unobserved variables for dyad (i, j) in year t, respectively, that can affect both bilateral
trade volume and probability of selecting into GATT/WTO membership. The vector of
covariates xijt typically includes proxies for economic sizes and observed determinants of
trade resistance, while εijt summarizes unobserved trade resistance. We investigate two types
of GATT/WTO membership treatment and their effects: the effect when both countries are
GATT/WTO members (both-in), relative to when both are nonmembers (none-in), and the
effect when only one country is a GATT/WTO member (one-in), relative to when both
are nonmembers (none-in). Rose (2004) found that the Generalized System of Preferences
(GSP) had a larger effect than GATT/WTO membership. To verify this result, we will also
investigate the effect of GSP treatment: the effect when one country in a bilateral trading
relationship is a GSP beneficiary of the other country or vice versa, relative to when neither
is a GSP beneficiary of the other.
Note that we will use the word ‘dyad’ for an observation unit that comprises two coun-
7
tries between which trade volume is recorded and ‘pair’ for two observation units which are
considered a match by the matching method discussed below.
2.1 Mean Effects and Matching
To facilitate exposition, focus on the effect of both-in treatment. Let the observed treatment
status be dijt for dyad (i, j) in year t, where
dijt = 1 if both in and 0 if none in.
The analysis for the effect of one-in or GSP treatment is analogous. Define two ‘potential’
response variables:
y1ijt : potential treated response for dyad (i, j) in year t,
y0ijt : potential untreated response for dyad (i, j) in year t.
We will denote ‘dyad (i, j) in year t’ simply ’subject ijt’ from now on. It may be a little
awkward, but we need to imagine that subject ijt has (dijt, y1ijt, y
0ijt, xijt, εijt). Yet we observe
only either y1ijt or y0
ijt depending on dijt = 1 or 0. The group of observations with dijt = 1
that reveal only (y1ijt, xijt) is called the treatment group, and the group with dijt = 0 that
reveal only (y0ijt, xijt) is called the control group.
The subject-ijt treatment effect is y1ijt − y0
ijt, which is, however, not identified, as only
one of the two potential responses is observed for a subject. Instead, a popular choice is to
estimate the mean effect E(y1 − y0), which is identified by the group mean difference
E(y|d = 1)− E(y|d = 0) = E(y1|d = 1)−E(y0|d = 0) = E(y1 − y0) if (y0, y1)q d
where ‘q’ stands for statistical independence. In other words, the selection into treatment is
random, as far as the trade volume (y0, y1) is concerned. If the randomization of treatment
applies only to the potential baseline response, i.e. y0 q d, then
E(y|d = 1)− E(y|d = 0) = E(y1|d = 1)− E(y0|d = 0) = E(y1|d = 1)−E(y0|d = 1)
= E(y1 − y0|d = 1), which is the ‘effect on the treated ’.
8
Analogously, if y1 q d, then
E(y|d = 1)− E(y|d = 0) = E(y1 − y0|d = 0), which is the ‘effect on the untreated ’.
Since
E(y1 − y0) = E(y1 − y0|d = 0)P (d = 0) + E(y1 − y0|d = 1)P (d = 1),
if the effect on the treated is the same as the effect on the untreated, then both equal
E(y1−y0). Otherwise, the representative effect E(y1−y0) is a weighted average of the effect
on the treated and the effect on the untreated, with the probability of being treated and
untreated as their respective weight.
In observational data, treatment is self-selected. The factors that determine GATT/WTO
accession decisions may also affect potential trade volumes. Matching on the observed vari-
ables (xijt) helps removing the “overt selection bias” caused by observed difference across the
treatment and control groups. In terms of equations, matching is expressed by including x in
the conditioning set in the preceding equations. The above group mean difference equation
becomes
E(y|d = 1, x)−E(y|d = 0, x) = E(y1|d = 1, x)−E(y0|d = 0, x) = E(y1−y0|x) if (y0, y1)qd|x,
and the E(y1 − y0)-decomposition equation becomes
E(y1 − y0|x) = E(y1 − y0|d = 0, x)P (d = 0|x) + E(y1 − y0|d = 1, x)P (d = 1|x).
Once the x-conditional effect is found, x can be integrated out to yield a marginal effect;
here, the integration can be done with different integrators (i.e., weighting functions). For
the effect on the treated, the distribution of x|d = 1 is typically used to render
E(y1 − y0|d = 1) =∫
E(y1 − y0|d = 1, x)dF (x|d = 1)
where F (·|d = 1) denotes the distribution of x|d = 1.
The condition (y0, y1)q d|x requires that any unobserved variables that may affect mem-
bership decisions as well as trade volumes do not vary systematically across the treatment
and control groups. In other words, matched subjects in a pair are on average compara-
9
ble in their potential trade performance and their probability of selecting into GATT/WTO
membership. Thus, the observed untreated response of the untreated subject can be used
as a proxy for the counterfactual potential untreated response of the treated subject, and
the observed treated response of the treated subject as a proxy for the counterfactual poten-
tial treated response of the untreated subject. The condition (y0, y1) q d|x is effectively an
assumption of no “hidden selection bias” caused by systematic unobservable heterogeneity
across the treatment and control groups. While any effect of observable trade determinants
x on trade volumes and membership decisions is dealt with (and the overt bias removed) by
matching, there is no good way to deal with unobservable trade resistance ε, for it is unob-
served. Nevertheless, a sensitivity analysis following Rosenbaum (2002) can be conducted to
account for the difference in ε across the groups (and potential hidden biases); see Section 2.4.
In general, the effect on the treated is of greater interest or policy relevance than the
overall effect or the effect on the untreated. For example, GSP are preferences extended by
higher-income countries to less-developed countries. It is not relevant to estimate the effect
on the untreated, as GSP does not apply to all kinds of trading relationships (for example,
that between two poor countries). For the GATT/WTO effect, both the effect on the treated
(i.e., those who chose to join GATT/WTO) and the effect on the untreated (i.e., those who
chose not to join) seem of interest. Exactly for the unknown nature of the treatment effect
and its unknown dependence on unobserved variables, in general, it is more plausible for
the assumption y0 q d|x (required to identify the effect on the treated) to hold than y1 q d|x(required to identify the effect on the untreated). For example, while some unobservable trade
resistance may not affect the baseline trade volume, it may affect the membership decision
and the trade volume once the dyad are both in the multilateral institution. Heterogeneity
in this unobservable variable across the treatment and control groups does not invalidate
the assumption y0 q d|x as the variable is irrelevant for the baseline trade volume y0, but it
does invalidate y1 q d|x. Hence, we will place more emphasis on the estimation result of the
membership effect on the treated than the effect on the untreated or the mean effect for all.
So far, we introduced the treatment effect framework and the matching concept. In
practice, matching for the effect on the treated proceeds in the following steps (the effect
on the untreated can be handled analogously). First, a treated subject, say subject ijt, is
selected. Second, control subjects are selected who are the closest to the treated subject ijt in
terms of x. If only one control subject is selected, we get ‘pair-matching’; otherwise, ‘multiple-
10
matching.’ Since our conditioning variables x include continuous variables (cf. Section 3), the
likelihood of multiple-matching is negligible. We restrict our attention to the case of pair-
matching. Third, given pair-matching, suppose there are M pairs (where M is the number
of treated subjects that successfully find a match), and ym1 and ym2 are the trade volumes
of the two subjects in pair m ordered such that ym1 > ym2 without loss of generality. Then,
defining
sm = 1 if the first subject in pair m is treated and − 1 otherwise,
the effect on the treated can be estimated with a pair-matching estimator
D ≡ 1M
M∑
m=1
sm(ym1 − ym2) →p E(y1 − y0|d = 1) under y0 q d|x, (1)
which is simply the average of the difference between the treated trade volume and the
untreated trade volume across the M matched pairs, with the unobserved baseline trade
volume for the treated subject estimated by the observed trade volume of a comparable
control subject.
Some remarks are in order. First, the matching scheme can be reversed to result in
an estimator for the effect on the untreated: a control subject is selected first and then
a matching subject from the treatment group later. Second, a measure of the distance
‖xijt − xi′j′t′‖ between two subjects ijt and i′j′t′ in terms of x must be chosen. We use the
simple scale-normalized distance between xijt and xi′j′t′ . Third, for a treated subject, if there
is no good matching control, then the subject may be passed over; i.e., a ‘caliper’ c may be
set such that a treated subject with
mini′j′t′∈C
‖xijt − xi′j′t′‖ > c, where ‘i′j′t′ ∈ C’ means unit i′j′t′ in the control group C,
is discarded. The number of matched pairs M used in the matching estimator D decreases
accordingly. For more discussions on treatment effect and matching in general, see, for
example, Rosenbaum (2002) and Lee (2005).2
2This paper uses the multivariate distance matching (MDM), while Persson (2001) uses propensity scorematching (PSM) in studying the currency union effect. There is no set rule in the literature on choosingbetween the two. We choose the former for its simplicity. But see Rubin and Thomas (2000) for a recommen-dation of combining the two methods. In our case, the results we obtain based on MDM are very strong, sowe doubt that the findings will change with a different matching scheme.
11
2.2 Permutation Test for Matched Pairs
The test of no treatment effect may take on at least three different levels of interpretations,
from the strongest to the weakest: that there is zero effect for all subjects, y1ijt−y0
ijt = 0, ∀ijt,that there is zero effect on the distribution of trade volumes, F (y0
ijt, y1ijt) = F (y1
ijt, y0ijt), or
that there is zero mean effect, E(y1ijt− y0
ijt) = 0. In this paper, the permutation test we base
our inferences on invokes the second concept of exchangeability. Under the null hypothesis H0
of no effect, the treated and the untreated trade volume are exchangeable without affecting
the joint distribution of the two.
Suppose that matching has been done resulting in M pairs. Under the null hypothesis,
exchangeability implies that, given the data, all 2M possibilities to permute the two responses
in each pair are all equally likely with probability 2−M in the ‘permutation sample space.’
Here, permuting two responses in each pair is equivalent to assigning one response to the
treatment group and the other to the control group. As the concept does not invoke large-
sample theories but is conditional on the observed data, permutation inference is an exact
inferential procedure. Whether H0 is rejected or not depends on how extreme D is in the
permutation distribution—i.e., the p-value of D. In theory, this can be computed exactly as
12M
2M∑
k=1
1[Dk > D] if the H0-rejection region is in the upper tail
where Dk is a “D-like” effect estimator under permutation k.
When M is large (as is the case in our current applications), the number of potential
permutations is huge. In Appendix 8.1, we articulate how to approximate the exact p-value
of D when M is large. This is done by simulating a subset of permutation possibilities from
the complete permutation space. In the appendix, we also show that when M is large, the
exact p-value of D can be approximated by a normally distributed standard t-statistic. For
example, if the H0-rejection region is in the upper tail, it follows that
P (Dk > D) ' P
{N(0, 1) >
D
{∑Mm=1(ym1 − ym2)2/M2}1/2
}, (2)
Thus, permutation test is straightforward to implement, especially when M is large and the
normal approximation is appropriate. Incidentally, the result above also provides a theo-
retical justification for the common practice of using standard t-statistics in evaluating the
12
significance of matching estimators, despite the fact that it is derived from an exact inferential
perspective. We also introduce in the appendix how to obtain the confidence interval (CI) for
the matching estimator D by “inverting” the permutation test procedure (cf. Lehmann and
Romano, 2005). See Martiz (1995), Hollander and Wolfe (1999), and Lehmann and Romano
(2005) among many others, for more on permutation (or randomization) tests in general.
The use of permutation-based tests, in stead of asymptotic tests, is especially convenient
in the current context with a panel of bilateral trade data, which possibly have a complicated
data structure with serial and spatial dependence, rendering the derivation of the asymptotic
distribution of the matching estimator difficult, if not impossible. By relying on exchange-
ability as the null hypothesis of no effect, the permutation test can accommodate potentially
a wide range of data structure. For example, if the joint distribution of the treated and
untreated responses in matched pairs is normal, exchangeability holds if the x-conditional
variances of the two responses are the same. This allows for any form of correlation between
a treated response and an untreated response in any matched pair and across matched pairs,
and any form of heteroskedasticity across matched pairs.
2.3 Signed-Rank Test for Matched Pairs
In this section, we propose a robust version of the matching estimator and the permutation
test introduced above. Instead of relying on actual numerical differences in trade volumes, the
signed-rank test (cf. Wilcoxon, 1945) uses only the signed rank of these observed differences.
This renders the test more robust to outliers in x and y. Incidentally, the signed-rank test is
distribution-free; thus, findings based on the test hold unconditionally as well, which accords
external validity.3 In addition, the signed-rank test is easily amenable to the Rosenbaum
(2002) sensitivity analysis we are proposing in Section 2.4.
Applying the Wilcoxon (1945) signed-rank test to the current matching context, rank
|ym1 − ym2|, m = 1, ..., M , to denote the resulting ranks as r1, ..., rM . For instance, when
M = 3,
|y11 − y12| = 0.3, |y21 − y22| = 0.09, |y31 − y32| = 0.21 =⇒ r1 = 3, r2 = 1, r3 = 2.3The permutation test in Section 2.2 is only ‘conditionally distribution-free’—the permutation distribution
under the null hypothesis is conditional on (x′, y, d), which enters the variance V (D′k). Due to this conditioning,
any finding from the test is applicable only to the sample at hand. That is, the finding has ‘internal validity,’but not ‘external validity.’
13
The signed-rank test statistic is the sum of the ranks for the pairs where the treated subject
has the higher response:
R ≡M∑
m=1
rm1[sm(ym1 − ym2) > 0] =M∑
m=1
rm1[sm = 1].
The inference based on the signed-rank R statistic can similarly follow the permutation
inferential procedure. The permutated version R′ for R is
R′ ≡M∑
m=1
rm1[wmsm(ym1 − ym2) > 0] =M∑
m=1
rm1[wmsm > 0]
=M∑
m=1
rm(1[wm = 1, sm = 1] + 1[wm = −1, sm = −1]),
where wm, m = 1, ..., M , are iid random variables with P (wm = 1) = P (wm = −1) = 0.5.
In other words, if wm = −1, the two responses in pair m are switched (the treated response
assigned to the control group, and the untreated response to the treatment group); otherwise,
no switch. Observe
E(1[wm = 1, sm = 1] + 1[wm = −1, sm = −1]) =1[sm = 1]
2+
1[sm = −1]2
=12.
V (1[wm = 1, sm = 1] + 1[wm = −1, sm = −1]) =14.
Hence, because rm’s are fixed (the only thing “random” is permutation, i.e., artificial treat-
ment assignment wm), under the H0,
E(R′) =M∑
m=1
rm12
=12
M∑
m=1
rm =M(M + 1)
4,
V (R′) =M∑
m=1
r2m
14
=14
M∑
m=1
r2m =
M(M + 1)(2M + 1)24
.
When M is large, the null distribution of {R′ − E(R′)}/SD(R′) can be approximated by
N(0, 1). The normally approximated p-value for R is
P
{N(0, 1) >
R − M(M + 1)/4{M(M + 1)(2M + 1)/24}1/2
}. (3)
The CI’s for the treatment effect can be similarly obtained by inverting the test, following
14
similar arguments as in the permutation test, cf. Appendix 8.1. Conduct level-α tests with
different values of β using
Rβ − M(M + 1)/4{M(M + 1)(2M + 1)/24}1/2
, where Rβ ≡M∑
m=1
rmβ1[sm(ym1 − smβ − ym2) > 0] (4)
and rmβ is the rank of |ym1 − smβ − ym2|, m = 1, ..., M . The collection of β values that are
not rejected is the (1− α)100% confidence interval for β.
In contrast to the point effect estimator D, there is no point effect estimator available in
the signed-rank test. As a point estimator, one may take the Hodges and Lehmann (1963)
estimator, which is obtained by solving for β such that
Rβ =M(M + 1)
4{= E(R′)}. (5)
That is, after transforming the data with the effect estimate β, one obtains a transformed
signed-rank statistic Rβ that coincides with the mean of the statistic under the null.
2.4 Sensitivity Analysis with Signed-Rank Test
As discussed in Section 2.1, the identification of the mean treatment effect relies on the
condition (y0, y1) q d|x (or the weaker version y0 q d|x for the effect on the treated). This
condition implies that there is no systematic unobservable heterogeneity across the treatment
and control groups that affects GATT/WTO membership status as well as trade volumes,
once differences in observable trade determinants x are removed by matching. In practice,
hidden selection bias may result from an inappropriate choice of covariates x that omits
variables that should be included. Alternatively, missing or truncated data may also introduce
another source of bias. For example, not all dyads are observed in all years; this is a missing
variable problem, which is also a selection problem. As discussed in the introduction, the use
of only observations with positive trade flows could lead to a downward bias of the treatment
effect estimate, cf. Helpman et al. (2007). Regardless of the source for hidden bias, in theory,
there is no good way to correct the hidden selection bias caused by unobservables, without
making further assumptions on the nature of the bias. Rather, one may approach this issue
from a different perspective by asking instead: how serious must the hidden selection bias be
(and in a certain direction) to overturn the original conclusion. In this section, we introduce
15
such a sensitivity analysis (cf. Rosenbaum, 2002) that accounts for an unobserved confounder
ε that might affect the treatment status d.4 We introduce the concepts below.
When two subjects in a matched pair differ in ε that influences d, their absolute proba-
bilities of taking the treatment are not the same. In particular, following Rosenbaum (2002),
suppose that the odds ratio of taking the treatment is:
1Γ≤ odds of one subject being treated
odds of the other being treated≤ Γ, for some constant Γ ≥ 1 ∀ pair.
For instance, if the first subject’s probability of taking the treatment is 0.6 (0.6) and the
second subject’s probability is 0.5 (0.4), then the odds ratio is 0.6/0.40.5/0.5 = 1.5 (0.6/0.4
0.4/0.6 = 2.25).
The sensitivity analysis goes as follows. Initially, one proceeds under the assumption of no
unobserved difference (Γ = 1). Then Γ is increased from 1 to see how the initial conclusion is
affected. If it takes a large value of Γ to reverse the initial finding, i.e., if only a strong presence
of ε can overturn the initial conclusion, then the initial conclusion is deemed insensitive to ε.
Otherwise, if it takes only a small value of Γ, then the initial finding is deemed sensitive.
The question is then how “large” is large for Γ. Suppose the initial conclusion is overturned
at the critical value Γ∗ = 2.25. Imagine that ε were observed. By observing ε, one would
be able to tell who is more likely to be treated between the two subjects and ask whether
the probability difference could result in as much as an odds ratio of 2.25. If the answer is
no, Γ∗ = 2.25 is a large value. Thus, in a way, how large is large for Γ depends on what is
included in x. If most relevant variables are in x and if it is hard to think of any important
omitted variables in ε, then even a small value of Γ may be regarded as large. In using similar
sensitivity analysis, Aakvik (2001) seems to regard Γ = 1.5 ∼ 2 as large, while Hujer et al.
(2004) appear to base their discussions on lower numbers of Γ = 1.25 ∼ 1.5.
The above sensitivity analysis can be applied to the signed-rank test in a straightforward
manner. Define
p+ ≡ Γ1 + Γ
≥ 0.5 and p− ≡ 11 + Γ
≤ 0.5.
Further define R+ (R−) as the sum of M -many independent random variables where the mth
variable takes rm with probability p+ (p−) and 0 with probability 1− p+ (1− p−). Writing4For ε to cause a bias to the treatment effect estimator at hand, ε should affect both d and (y0, y1). The
sensitivity analysis of Rosenbaum (2002) looks, however, only at the ε’s influence on d, essentially presumingthat ε affects (y0, y1) as well. Thus, the sensitivity analysis is a conservative one, for the bias may not bepresent if in fact ε does not affect (y0, y1).
16
R+ as∑M
m=1 rmum, where P (um = 1) = p+ and P (um = 0) = 1− p+, we get
E(R+) =M∑
m=1
rmE(um) = p+M∑
m=1
rm =p+M(M + 1)
2
V (R+) =M∑
m=1
r2mV (um) = p+(1− p+)
M∑
m=1
r2m =
p+(1− p+)M(M + 1)(2M + 1)6
.
Doing analogously, we obtain
E(R−) =p−M(M + 1)
2and V (R−) =
p−(1− p−)M(M + 1)(2M + 1)6
.
The means and variances with p+ and p− include E(R′) and V (R′) as a special case when
p+ = p− = 1/2 and two subjects in any matched pair have equal probability of taking the
treatment. Suppose that the H0-rejection interval is in the upper tail. Rosenbaum (2002,
p.111) shows that
P (R+ ≥ a) ≥ P (R′ ≥ a) ≥ P (R− ≥ a).
Using these bounds, the p-value obtained under the assumption of no hidden bias can be
bounded by P (R+ ≥ a) in case of rejection.5
Specifically, suppose that the H0-rejection interval is in the upper tail, and the no-hidden-
bias p-value is
P{N(0, 1) ≥ R− E(R′)SD(R′)
} = 0.001,
leading to the rejection of H0 at level α > 0.001. Rewrite P (R+ ≥ a) ≥ P (R′ ≥ a) as
P{R+ −E(R+)SD(R+)
≥ a− E(R+)SD(R+)
} ≥ P{R′ −E(R′)SD(R′)
≥ a− E(R′)SD(R′)
}
' P{N(0, 1) ≥ a− E(R+)SD(R+)
} ≥ P{N(0, 1) ≥ a− E(R′)SD(R′)
}.
Replacing a in the above equation with the realized R gives the p-value of R on the right-hand
side and its bound on the left-hand side. The left-hand side can be obtained for different
values of Γ. Then find, at which value of Γ, the upper bound crosses the level α. If this5In case of acceptance, P (R− ≥ a) shows the possibility of rejection when ε is taken into account. For a
two-sided test, the p-value gets multiplied by 2. For a lower-tail test, subtract the last display from 1 to get
1− P (R+ ≥ a) ≤ 1− P (R′ ≥ a) ≤ 1− P (R− ≥ a) =⇒ P (R+ < a) ≤ P (R′ < a) ≤ P (R− < a),
which can be used for bounds.
17
happens at, say Γ∗ = 2, then check whether or not Γ = 2 is a large value. If Γ = 2 is deemed
large, then the initial finding of a positive treatment effect is insensitive to the unobserved
difference.
The relevant distribution (R+ or R−) to use for calculating the bound in the sensitivity
analysis indicates the direction of hidden selection bias that would undermine an initial finding
of treatment effect or reverse an initial finding of no effect. For example, if the finding is a
significantly positive effect, we only need to worry about ‘positive’ selection, where a subject
with a higher potential treatment effect is also more likely to be treated; thus, the relevant
distribution is R+ that embodies selection bias in this direction. On the other hand, if the
finding is a zero or significantly negative effect, then ‘negative’ selection, where a subject with
a lower potential treatment effect is also more likely to be treated, can reverse or weaken the
original finding; in this case, the sensitivity analysis in the direction of R− is applicable.
3. DATA
We apply our proposed methodologies to both the data set of Rose (2004) and that of Tomz
et al. (2007). Readers are referred to Rose (2004) for a detailed account of his data set.6
The data set of Tomz et al. (2007) differs from Rose (2004) in its definitions of GATT/WTO
dummies.7 The GATT/WTO dummies are defined in terms of a dyad’s participation status
in the system, instead of its formal membership status.
As discussed in the introduction, there are by now strong theoretical foundations in the
basic gravity model that relates bilateral trade volumes to economic size and trade resistance,
although we do not have strong conviction in the functional form of trade resistance. On the
other hand, the empirical gravity literature has more or less converged to a list of variables
that likely have a bearing on trade resistance. This set of variables serve as the set of
covariates xijt that we need in measuring the similarity between two subjects. In particular,
we use the same set of covariates as in Rose (2004) to allow comparison with other studies
in this literature that mostly use the same set of trade resistance determinants (along with
economic sizes and year dummies).
These conditioning variables or covariates xijt include: (the natural logarithm of) the
6Rose’s data set is available from his Web site (http://faculty.haas.berkeley.edu/arose/GATTdataStata.zip).7There are other minor differences, including some corrections to Rose’s data, as explained in Tomz et al.
(2007, Foonote 32).
18
distance between countries i and j, (the natural logarithm of) the product of real GDP’s of
dyad (i, j) in year t, (the natural logarithm of) the product of real per capita GDP’s of dyad
(i, j) in year t, a binary variable indicating whether dyad (i, j) share a common language,
a binary variable indicating whether dyad (i, j) share a land border, a discrete variable
indicating the number of landlocked countries in dyad (i, j), a discrete variable indicating the
number of island nations in dyad (i, j), (the natural logarithm of) the product of land areas of
dyad (i, j), a binary variable indicating whether dyad (i, j) were ever colonies after 1945 with
the same colonizer, a binary variable indicating whether country i is a colony of country j in
year t or vice versa, a binary variable indicating whether country i ever colonized country j or
vice versa, a binary variable indicating whether dyad (i, j) remained part of the same nation
during the sample, a binary variable indicating whether dyad (i, j) use the same currency
in year t, a binary variable indicating whether dyad (i, j) belong to the same regional trade
agreement, and a list of year dummies for t = 1948, . . . , 1999.
The response variable yijt in our matching framework measures (the natural logarithm of)
the average value of real bilateral trade volumes between countries i and j in year t. Three
treatment effects are investigated: both-in, one-in, and GSP. The treatment dummy variable
dijt corresponds to bothinijt, oneinijt, and GSPijt, respectively. They measure, respectively,
whether both countries i and j are GATT/WTO members in year t, whether only one of
the two countries (i, j) is a GATT/WTO member in year t, and whether country i is a GSP
beneficiary of country j or vice versa in year t. When the GSP effect is being investigated, the
other two dummy variables (bothinijt and oneinijt) become part of the conditioning variables
xijt and are added to the list of covariates. On the other hand, when the both-in or one-in
effect is being investigated, GSPijt is used also as a covariate.
The data cover 178 IMF trading entities and span from 1948 to 1999 (with gaps). This
amounts to a total of 234,597 observations.
4. WHAT TRADE EFFECTS TO EXPECT OF
GATT/WTO MEMBERSHIP
On theoretical grounds, there are several economic models that lead one to expect a positive
both-in effect on trade. Among others, the terms-of-trade argument (Johnson, 1953–1954;
Bagwell and Staiger, 1999, 2001) suggests that multilateral trade agreements help coordinat-
19
ing countries’ trade policies and preventing them from engaging in tariff retaliations driven
by terms-of-trade incentives. With the coordinating mechanism of GATT/WTO, countries
undertake reciprocal trade liberalizations that increase their bilateral trade volumes while
leaving a neutral impact on their bilateral terms of trade. This argument is supported by
a recent empirical study by Broda et al. (2006). They show that non-WTO countries set
import tariffs in a manner that exploits their market power and shifts the terms of trade
to their advantage against their trading partners. Another political-commitment argument
(Staiger and Tabellini, 1987, 1989, 1999) suggests that multilateral trade agreements help
national governments to commit themselves to liberalized trade policies, given the retalia-
tion threat from other countries if the commitment is not carried out. This enhances policy
credibility with respect to domestic private sectors and brings about efficient production and
trade structure.
However, as noted by many in the literature, cf. Rose (2006), using membership to mea-
sure the GATT/WTO trade effect is noisy for several reasons. First, tariff reductions and
policy liberalizations do not necessarily coincide with the date of accession. Some coun-
tries may liberalize beforehand while some may liberalize gradually within a phase-in period.
Second, some GATT/WTO members may extend their most-favored-nation (MFN) treat-
ment to nonmember trading partners as well. Third, some countries (particularly developing
countries) did not liberalize their trade policies in spite of their membership in the GATT
(although this is less the case under the WTO). Fourth, some sectors (e.g. oils and minerals)
face little protectionism with or without the GATT/WTO, while some (e.g. agriculture) are
highly protected with or without the GATT/WTO. The above considerations imply that we
are likely to observe uneven both-in effects across trading relationships that are affected by
these factors to various extents, as emphasized by the work of Subramanian and Wei (2007).
For example, the theoretical both-in effect is likely reduced in practice between a developing
and a developed member where the former does not reciprocate the latter’s lower tariffs.
Their bilateral trade volume may still increase as the developing member gains access to the
developed country’s market, but is less than the theoretical both-in effect if the tariff reduc-
tion is reciprocal. The above considerations also suggest that the beneficial both-in effect
in theory will be blurred by these factors in practice. One may find a zero effect, if these
factors are prevalent. On the other hand, if the theoretical both-in effect is strong enough
that dominates the above factors, we will nonetheless find a positive effect from the data.
20
5. BENCHMARK RESULTS
Table 1 reports the results of the baseline methodology applied to the data set of Rose (2004),
labeled ‘unrestricted matching,’ where the pool of potential matches for an observation are
observations with the opposite treatment status; no further restriction is imposed.
We estimate the treatment effect of both-in, one-in, and GSP, respectively. For the effect
of both-in treatment, the observations with oneinijt=1 are dropped. This leaves a remain-
ing sample with observations where countries in a dyad are either both in the GATT/WTO
(114,750 observations) or both outside the GATT/WTO (21,037 observations). Similarly,
for the treatment effect of one-in, the observations with bothinijt=1 are dropped. The re-
maining sample includes only observations where either only one country in a dyad is in the
GATT/WTO (98,810 observations) or both are outside the GATT/WTO (21,037 observa-
tions). Finally, for the treatment effect of GSP, the sample consists of the whole sample;
a small proportion (54,285 observations) of the sample have GSP arrangements and most
(180,312 observations) do not.
In any given matching exercise, we set the caliper such that only 100%, 80%, 60%, or
40% of matched pairs obtained are qualified for the estimation of the treatment effect. The
number of matched pairs obtained is indicated by M1 for the effect on the treated and M0
for that on the untreated. With the caliper choice of 60%, for example, matched pairs with a
distance (in terms of x) exceeding the upper 60 percentile of all matched pairs obtained are
discarded. The caliper choice of 100% is equivalent to using all matched pairs obtained.
The results are grouped into three general columns: permutation test, signed-rank test,
and sensitivity analysis, following our discussions of these methodologies in Section 2. In
particular, two sets of treatment effect estimates are reported: the first set presenting the
D statistic along with its p-value and CI’s (cf. equations 1, 2, and 6, respectively), and
the second set presenting the Hodges and Lehmann statistic along with its p-value and CI’s
calculated based on the signed-rank R statistic (cf. equations 5, 3, and 4, respectively).8
The third column reports the results on the Rosenbaum (2002) sensitivity analysis for the
signed-rank test based on a significance level of α = 0.05 in a one-sided or two-sided test.8In Section 2.2, we introduced both simulation and normal approximations to calculating the exact p-value
of the D statistic. Although not mentioned explicitly in Section 2.3, in addition to the normal approximation, itis also possible to approximate the exact p-value of the signed-rank R statistic based on simulated permutationsamples. We carried out both approaches in all matching exercises and found almost identical results (whichis expected as the sample size is large). Thus, we report only the normally-approximated p-values andcorresponding CI’s in the tables.
21
Refer to the both-in treatment effect. We note that the both-in effect on the treated
is large and significant. Both sets of point effect estimates report similarly large positive
effects. For example, the point estimates imply that membership in the GATT/WTO by
both countries raises bilateral trade volume by 71% (= e0.535 − 1) to 279% (= e1.332 − 1) for
dyads who both chose to be in the GATT/WTO. The point estimates are all significantly
different from zero, regardless of the caliper choice, as indicated by their corresponding p-
values or confidence intervals.
How robust are these estimates to potential hidden selection bias? Since the finding is a
positive effect, only positive selection is a concern, where the treated subject that is observed
to have higher trade volumes on average than the untreated is also more likely to be both in
the GATT/WTO. Results of the sensitivity analysis indicate that the positive both-in effect
is robust to such positive selection (cf. R+) to the extent that the odds of the treated is not
more than 2.081 times the odds of the untreated to be both in the GATT/WTO (by the 80%
caliper and the two-sided test). The robustness is stronger with a one-sided test (naturally)
and with a larger caliper choice, and ranges from 1.467 to 2.434. These figures are not small
compared with typical critical thresholds (1.25 to 1.5) used in other literatures, cf. Aakvik
(2001) and Hujer et al. (2004). In addition to the Rosenbaum (2002) sensitivity analysis, we
shall discuss further refinements of the matching procedure in Section 6.1 to reduce potential
sources of unobservable heterogeneity across the treatment and control groups.
How about the potential effect on trade for dyads who chose not to join the GATT/WTO?
The estimates of both-in effect on the untreated suggest that bilateral trade volumes would
have increased by 15% (= e0.138 − 1) to 40% (= e0.337 − 1) if both countries in these dyads
were to join the GATT/WTO. The effects are smaller compared to the effect on the treated
but they are still significantly positive. At the same time, these estimates are also less robust
to potential hidden biases. Overall, the mean both-in effect on all trading relationships
is positive and significant, with the estimates ranging from 49% (= e0.399 − 1) to 224%
(= e1.175 − 1). They are reasonably robust to hidden biases.
As argued in Section 2.1, the identification condition (y0qd|x) for applying the matching
estimator to the effect on the treated is more likely to hold than that (y1 q d|x) for the
effect on the untreated. Thus, we will place less emphasis on estimates of the effect on the
untreated (and hence the effect on all, which includes the effect on the untreated) in the
following discussions, although their results will continue to be reported in the tables for
22
readers’ information.
We turn to the one-in effect. Unlike the both-in effect, where one may expect a positive
or a zero effect at worst, a priori, the one-in effect can take either sign. On one hand, import
diversion by the new member from its nonmember trading partner to other member trading
partners may lower the dyad’s bilateral trade volumes. On the other hand, in many cases,
when a country joins the GATT/WTO, its tariff reductions (and other policy liberalizations)
offered to existing members on a MFN basis are also extended to nonmember trading partners.
In this case, imports increase from all sources, including that of existing nonmember trading
partners. Furthermore, when a country gains access to markets of existing GATT/WTO
members with the newly acquired membership, it may increase imports of inputs necessary
for the production of exports to these destinations. Some of these additional imports may
fall on third nonmember countries. For example, with the accession into WTO, China may
increase imports of oil from Iran in its expansion of production and export activities. The
figures in Table 1 suggest that the one-in effect on the treated is overall positive; the estimates
range from 38% (= e0.325−1) to 117% (= e0.773−1). These estimates of the one-in effect are
smaller than the both-in effect but are positive and significant. Thus, our estimation results
suggest that the trade-creating effect of GATT/WTO membership dominates potential trade-
diverting effects when one country in a dyad unilaterally joins the GATT/WTO.
Relative to GATT/WTO membership, a preferential GSP scheme is also found to promote
bilateral trade, by a factor of 79% (= e0.581−1) to 134% (= e0.851−1) for trading relationships
that have applied the scheme in the past. These effect estimates are smaller than the both-
in effect and larger than the one-in effect overall. The finding of this ranking of the three
treatment effects seems reasonable. Since GSP are unilateral trade preferences extended only
from a higher-income country to its poor trading partners, its likely effect on bilateral trade
volumes is a priori smaller than if both the rich and the poor countries in a dyad lower their
import restrictions against each other, which happens presumably if both of them join the
GATT/WTO. On the other hand, any trade-promoting effect of one-in membership is, as
argued in the previous paragraph, indirect and conditional on the spillover of MFN treatment
and on the dyad’s initial trade pattern, while the effect of GSP is directly derived from a
straightforward reduction of dyad-specific trade resistance. As we shall see, this ranking
(both-in effect > GSP effect > one-in effect) holds in general regardless of refinements to the
matching procedures or variations in the data used (although the one-in effect is sometimes
23
larger than the GSP effect).
Readers may notice the relative large range of both-in effect estimates obtained with
different choices of calipers, in contrast with the relative narrow range of GSP effect estimates.
This, in a way, reflects likely uneven both-in effects across subjects as discussed in Section 4.
This is so that the lower bound of the both-in effect estimates can be smaller than that of
the GSP effect estimates. As we shall see, this is not always the case.
It may be also helpful to point out that the positive and stronger trade effect of both-in
is shared by a larger number of bilateral trading relationships (114, 750), than that of GSP
(54, 285). Thus, either on the average or in the aggregate, our estimation results suggest that
the realized trade-creating effect of GATT/WTO membership is larger than GSP.
For the GSP effect, we did not report the effect on the untreated, as GSP does not
apply to all kinds of trading relationships. For example, it is not relevant to propose a GSP
arrangement between two poor countries. On the other hand, the estimates of the GSP
effect on the treated reported above in Table 1 should be taken with a grain of salt, as the
conditioning set of covariates do not control for the development stages of the two countries
in a dyad. The existence of a GSP arrangement between a dyad and their bilateral trade
volumes are both very likely dependent on their relative development stage, which poses
potential selection bias. We shall see a refinement to the matching procedure in Section 6.1
to address this concern.
6. ROBUSTNESS CHECK
6.1 Restricted Matching
In this section, we refine the baseline matching procedure to address some potential sources
of selection bias that may remain conditional on the set of covariates x. Although we did
the Rosenbaum (2002) analysis in the previous section to assess the sensitivity of the bench-
mark results to whatever selection bias may remain, the analysis itself does not remove the
bias. In the literature, four potential sources of bias seem to be of major concern. They
are systematic unobservable heterogeneity across dyads, across years, across GATT/WTO
negotiating rounds, and across developing and developed countries. We refine the baseline
matching procedure and restrict the potential match for an observation to a subset of obser-
vations that have the opposite treatment status (as in the case of unrestricted matching) and
24
further satisfy the criterion that the match must also be from the same dyad, the same year,
the same time period defined according to GATT/WTO negotiating rounds, and the same
combination of relative development stage, respectively. By restricting the potential match
to the observations of the specified criterion (say, the same dyad), we are accounting for the
possibility that systematic unobservable differences exist (say, across dyads) that influence
bilateral trade volumes as well as decisions to join the GATT/WTO or decisions to extend
GSP. Restricted matching, say within the dyad, removes the likely dyad-specific effect on
trade as well as its effect on the selection decision into treatment.
Table 2 reports the results for ‘matching within dyad’ based on the data set of Rose (2004).
In this case, the pool of potential matches for an observation are restricted to observations
with the opposite treatment status and from the same dyad. Some dyads may be both
in the GATT/WTO, be both outside the GATT/WTO, or have only one of them in the
GATT/WTO throughout the sampling years (1948 to 1999). Alternatively, some dyads may
have only one of them in the GATT/WTO for some period of the sampling years and then
be both in the GATT/WTO throughout the rest of the sampling years. In these cases, these
dyads do not have qualified control subjects or treated subjects, and therefore are dropped
from the sample. This explains the much smaller sample size shown in Table 2. While the
current estimates suggest overall smaller treatment effects on the treated, the trade-creating
both-in effect continues to be economically and statistically significant, and larger than either
the one-in effect or the GSP effect. The estimates of the both-in effect on the treated range
from 114% (= e0.760 − 1) to 171% (= e0.996 − 1), the estimates of the one-in effect on the
treated range from 37% (= e0.314 − 1) to 70% (= e0.532 − 1), and that for GSP (in this case,
smaller than the one-in effect) from 30% (= e0.259 − 1) to 64% (= e0.492 − 1).
In Table 3, ‘matching within year,’ the matching is restricted to observations from the
same year and thus controls for possible year-specific effects. In this case, the year dummies
in x are redundant and are dropped from the list of covariates. The results come across as
almost the same as those in Table 1. This indicates that in the case of ‘unrestricted matching,’
the two subjects in a matched pair are often from the same year; thus, the estimates in Table 1
pick up mostly cross-sectional variations. This is not surprising, as the set of covariates x
in ‘unrestricted matching’ include year dummies, which encourages matching observations
of the same year. A further look into the data (not shown in the table) at every five-year
interval (1950, 1955, . . . , 1995) shows that the positive both-in or one-in effect is not lumpy in
25
a few particular years and are felt throughout the years, except 1975 and 1995, when there is
a dip in the membership effects. On the other hand, the GSP effect is low in 1970 (its early
phase) and 1995. The same pattern remains as the data set of Tomz et al. (2007) is used, cf.
Section 6.2.
In contrast with the estimates in Table 3 that measure cross-sectional (or ‘between’)
variations, the estimates in Table 2 with ‘matching within dyad’ measure time-series (or
‘within’) variations. Both ‘between’ and ‘within’ variations indicate that there are significant
gains in trade volumes by joining the GATT/WTO. Drawing a comparison with the finding
in the literature, readers may notice that the estimates presented above based on ‘within’
variations are overall smaller than the estimates based on ‘between’ variations. This seems at
odd with the results in Rose (2004), where he finds stronger effects with fixed-effect estimation
(which reflects ‘within’ variations in a panel framework) than with OLS estimation (which
reflects both ‘within’ and ‘between’ variations). We shall see that this relative ranking is not
universal and changes as the data set of Tomz et al. (2007) is used. It is also helpful to note
that using the same data set of Rose (2004), Subramanian and Wei (2007) also find smaller
‘within’ effects than ‘between’ effects.
The difference in ‘within’ and ‘between’ effects may reflect different theoretical effects
along the time-series and the cross-sectional dimensions. Alternatively, theoretical effects
may be the same along these two dimensions, but empirical factors as discussed in Section 4
obscure the theoretical effect to different extents in these two dimensions. For example, if
the phenomenon of liberalizations prior to or later than the date of accession is prevalent,
‘within’ estimates of the theoretical both-in effect based on the date of accession will be
biased downward. The earlier the advance or the longer the phase-in period, the stronger
the downward bias of the ‘within’ effect estimate. On the other hand, ‘between’ estimates
of the both-in effect based on formal membership are likely affected by the other problems
highlighted in Section 4. For example, comparison of two developing member countries
that do not make significant trade liberalizations with two other comparable developing
nonmember countries implies a zero both-in effect. The stronger the presence of these dyads,
the smaller the ‘between’ estimate of the overall both-in effect. A priori, it is difficult to say
which of the two effect estimates – ‘within’ or ‘between’ – is likely to be larger or smaller.
It is also understandable that the ranking of the two effect estimates can reverse with a
different definition of involvement in the GATT/WTO, if that changes the date of associated
26
treatment and the status of treatment for a significant number of observations.
Table 4 presents the results for ‘matching within period’ based on Rose’s data. The pool
of potential matches for an observation are restricted to observations with the opposite treat-
ment status and from the same period, where the periods are defined according to the various
rounds of trade negotiations sponsored by the GATT/WTO. They are: 1948 (Before Annecy
round), 1949-1951 (Annecy to Torquay round), 1952-1956 (Torquay to Geneva round), 1957-
1961 (Geneva to Dillon round), 1962-1967 (Dillon to Kennedy round), 1968-1979 (Kennedy
to Tokyo round), 1980-1994 (Tokyo to Uruguay round), and 1995-(After Uruguay round).
Given our earlier observations that matched pairs in the case of ‘unrestricted matching’ often
occur among cross-sectional subjects from the same year, it is no surprise to see that the
current estimation results are almost the same as in the case of ‘unrestricted matching’; the
extra criterion of matching within the same period in effect does not impose extra restriction
in most cases.
Table 5 reports the final set of results, ‘matching within relative development stage,’ based
on Rose’s data. In this matching exercise, the pool of potential matches for an observation are
restricted to observations with the opposite treatment status and from the same combination
of relative development stage. The combinations include: low-income/low-income dyads, low-
income/middle-income dyads, low-income/high-income dyads, middle-income/middle-income
dyads, middle-income/high-income dyads, and high-income/high-income dyads. Observa-
tions without a qualified match are discarded. The reason for imposing the extra criterion in
matching is that some may argue that the trade structure between a dyad of developed coun-
tries is likely to be systematically different from that between a dyad of developed/developing
countries (e.g. intra-industry trade versus inter-industry trade) and are not comparable in
terms of their potential trade volumes with or without the membership, even after control-
ling for gravity types of covariates. At the same time, the probabilities of being in the
GATT/WTO may vary systematically across development stages. In this case, matching
within the same relative development stage removes this source of potential bias in mem-
bership effect estimates. As argued earlier, similar critique applies to estimating the GSP
treatment effect and possibly more, given that GSP is directly dependent on a dyad’s relative
development stage. As the set of covariates in this exercise also include year dummies, we
compare the results here with those in Table 1 or 3. With the extra restriction on matching,
the estimates of the effect on the treated are smaller overall, although the relative ranking
27
remains the same. The current estimates suggest that membership raises bilateral trade by
a factor of 44% (= e0.364 − 1) to 208% (= e1.124 − 1) for dyads that both chose to be in the
GATT/WTO, and by a smaller factor 27% (= e0.239 − 1) to 92% (= e0.650 − 1) for dyads
where only one of them chose to be in the GATT/WTO. In comparison, GSP is estimated
to raise bilateral trade by a factor of 46% (= e0.378 − 1) to 108% (= e0.732 − 1).
Are the positive membership effects shared evenly among countries of different develop-
ment stages, or are they concentrated on particular subsets of countries? A further look into
the data (not reported in the table) shows that the positive effects are indeed concentrated on
dyads of middle-income/middle-income, middle-income/high-income, and high-income/high-
income countries. The low-income countries do not benefit much from a membership in
the GATT/WTO. Similar lumpy patterns were found in the work of Subramanian and Wei
(2007), although in the current work, we still find a positive average effect while they found
no positive average effect. This asymmetry may reflect two empirical concerns that major
export sectors (e.g. agriculture) of low-income countries still face steep protectionism from
the rich world with or without the GATT/WTO and that the low-income countries them-
selves do not significantly liberalize their import sectors despite their membership in the
GATT/WTO. Interestingly, the GSP effect also shows some lumpy patterns, where the posi-
tive effect is mostly driven by dyads that involve a high-income country. Similar observations
apply to participation instead of membership when the data set of Tomz et al. (2007) is used,
cf. Section 6.2.
How robust are the above results with restricted matching? When the matching criterion
becomes more stringent such that further potential sources of selection bias are minimized,
one may accept a lower critical threshold for Γ∗ than in the case of unrestricted matching, as
the remaining possibility of selection bias is lower. Both ‘matching within dyad’ and ‘match-
ing within relative development stage’ impose effective constraints relative to the benchmark.
In the latter case, the robustness of the treatment effect estimate is lower in general than the
benchmark. The tolerance threshold (Γ∗) for positive selection now stands at 1.256, 1.197,
and 1.530 for the lower-bound estimates of the both-in effect, the one-in effect, and the GSP
effect, respectively. The GSP estimate seems rather robust in this matching exercise, even
with the extra condition on matching within the same relative development stage, which is
argued above to be a desirable criterion for GSP estimation. Thus, we regard this set of GSP
estimates as our favorite ones based on the data of Rose (2004).
28
On the other hand, with the more stringent criterion of ‘matching within dyad,’ the
robustness of the membership effect estimates strengthens (Γ∗ = 2.503 at the minimum for
the both-in effect, and Γ∗ = 1.508 at the minimum for the one-in effect, cf. Table 2). Thus,
it is relatively comfortable for us to accept the ‘within’ estimates of the both-in and one-in
effects, as the tolerance level for hidden selection bias is higher despite the fact that the
possibility of remaining selection bias is lower with the extra matching criterion.
6.2 Participation instead of Formal Membership
What if the data set of Tomz et al. (2007) is used instead? In their work, Tomz et al. (2007)
stress the importance of de facto participation in the multilateral system by nonmembers
such as colonies, newly independent colonies, and provisional members. They share to a large
extent the same set of rights and obligations under the agreement as formal members. Tomz
et al. (2007) classify these territories as nonmember participants and define participation
to include both formal membership and nonmember participation. Without changing the
estimation framework of Rose (2004), they find significant participation effect on trade.
Tables 6 to 10 report the results based on the data set of Tomz et al. (2007). In this case,
the treatments are participation in the GATT/WTO by both countries in a dyad (both-p)
and by only one country in a dyad (one-p). Because when the GSP effect is estimated,
the participation status (bothpijt and onepijt) of a dyad, instead of their membership status
(bothinijt and oneinijt), is used as part of the conditioning covariates, the results for GSP
are not identical to those based on Rose’s data.9
It is fair to say that participation effects are overall stronger than membership effects on
trade. Both the ‘between’ estimates (cf. Tables 6, 8, 9) and the ‘within’ estimates (cf. Table
7) are larger than corresponding estimates obtained based on Rose’s data. They are also more
robust to hidden selection bias, as indicated by the sensitivity analysis. In particular, the
‘within’ estimates in Table 7 are so much larger that they are now larger than the ‘between’
estimates (cf. Section 6.1). The exception to this pattern of overall stronger participation
effects than membership effects occurs when the matching is restricted within the same
relative development stage. Among other explanations, it is possible that there is a stronger
positive selection into participation than into formal membership across development stages,9Tomz et al. (2007) also corrected some coding errors in the data set of Rose (2004), in particular, the income
status and geography indicator of some territories, which is another factor that may affect the estimation resultsof GSP.
29
such that the estimates of the participation effect drop by a larger extent than the estimates of
the membership effect (compared with their respective benchmark of unrestricted matching)
when this potential bias is removed by matching within the same relative development stage.
On the other hand, the estimates of the GSP effect are overall smaller based on the data
set of Tomz et al. (2007) than on that of Rose (2004), regardless of the matching criteria.
However, the difference is not large. This is possibly due to the fact that the change in
GATT/WTO indicators from membership to participation affects the GSP estimates only
indirectly through conditioning covariates that include many other variables.
In all, the current finding of an overall larger participation effect than membership effect
is consistent with the contrasting results found by Tomz et al. (2007) and by Rose (2004).
7. CONCLUSION
Since the publication of the work by Rose (2004), his finding of no significant GATT/WTO
effect on trade volumes has been partly reversed in a few dimensions: there may be nil effects
on average, but there are significant and positive effects for some subsets of countries, cf.
Subramanian and Wei (2007); there may be nil membership effects, but there are significant
and positive participation effects, cf. Tomz et al. (2007); there may be nil cross-sectional
effects, but there are some positive time-series effects, cf. Rose (2004, 2006); there may be
nil intensive effects, but there are some positive extensive effects, cf. Felbermayr and Kohler
(2007) and Helpman et al. (2007).
In this paper, we found that there are some uneven distributions of membership effects
across countries of different development stages, but on average, there are still significant and
positive membership effects; there are stronger participation effects than membership effects,
but membership effects are positive; there are positive (membership or participation) effects
along both cross-sectional and time-series dimensions. We recognize that our estimates based
on positive trade flows still capture only the intensive margin of GATT/WTO effects, but
the effects will only be strengthened with the extensive-margin argument. Thus, despite all
the caveats in the operation of the GATT/WTO institution and the noisiness of measuring
trade liberalizations by official membership or participation, the data do reveal an overall
positive effect on trade of being part of the multilateral system.
30
8. APPENDIX
8.1 Permutation Test for Matched Pairs
One may simulate a subset of permutation possibilities as follows. Suppose that matching has
been done resulting in M pairs. Generate M -many iid random variables wm, m = 1, ..., M ,
with P (wm = 1) = P (wm = −1) = 0.5. If wm = −1, the two responses in pair m are
switched (the treated response assigned to the control group, and the untreated response to
the treatment group); otherwise, no switch. For pseudo sample k obtained this way, obtain
the pseudo effect estimator
D′k ≡
1M
M∑
m=1
wmsm(ym1 − ym2)
where only wm’s are random. The p-value of D is then
1K
K∑
k=1
1[D′k > D] if the H0-rejection region is in the upper tail,
where K is a number much smaller than 2M , say K = 1000. In principle, a pseudo sample
should not be repeated: ‘sampling from the 2M possibilities without replacement’ is required.
In practice, the chance of a pseudo sample being repeated in K (= 1000) pseudo samples is
negligible and can be ignored. The proof is available from the authors upon request.
So far we showed that the exact p-value can be approximated by simulating K pseudo
samples. A further simplification is possible by applying the central limit theorem (CLT) to
wm’s. Observe
E(D′k) = 0 and V (D′
k) = E(D′2k ) =
1M2
M∑
m=1
E{w2ms2
m(ym1−ym2)2} =∑M
m=1(ym1 − ym2)2
M2.
When M is large, the normally approximated p-value of D is
P (D′k > D) = P{ D′
k
{∑Mm=1(ym1 − ym2)2/M2}1/2
>D
{∑Mm=1(ym1 − ym2)2/M2}1/2
}
' P{N(0, 1) >D
{∑Mm=1(ym1 − ym2)2/M2}1/2
},
if the H0-rejection region is in the upper tail. Using this normal approximation, one can skip
31
the computation time required of simulating wm’s and getting D′k, k = 1, . . . ,K.
As is well known, we can obtain a confidence interval (CI) by “inverting” the test proce-
dure (see for example, Lehmann and Romano, 2005). For instance, suppose that the treat-
ment effect is the same for all pairs: the effect increases the treated response by a constant,
say β. In this case, replace ym1 with ym1−β when sm = 1 or ym2 with ym2−β when sm = −1
to obtain
Dβ ≡ 1M
M∑
m=1
sm(ym1 − smβ − ym2)
and restore the no-effect situation. Define accordingly
D′β ≡
1M
M∑
m=1
wmsm(ym1 − smβ − ym2)
to observe
E(D′β) = 0 and V (D′
β) = E(D′2β ) =
∑Mm=1(ym1 − smβ − ym2)2
M2.
Now conduct level-α tests with
Dβ
{∑Mm=1(ym1 − smβ − ym2)2/M2}1/2
(6)
as β varies. The collection of β values that are not rejected is the (1 − α)100% confidence
interval for β.
REFERENCES
Aakvik, A., 2001. Bounding a matching estimator: the case of a Norwegian training program.
Oxford Bulletin of Economics and Statistics 63, 115–143.
Abadie, A., Imbens, G. W., 2006. Large sample properties of matching estimators for average
treatment effects. Econometrica 74 (1), 235–267.
Altonji, J. G., Elder, T. E., Taber, C. R., 2005. Selection on observed and unobserved vari-
ables: assessing the effectiveness of Catholic schools. Journal of Political Economy 113,
151–184.
32
Anderson, J. E., 1979. A theoretical foundation for the gravity equation. American Economic
Review 69 (1), 106–16.
Anderson, J. E., van Wincoop, E., 2003. Gravity with gravitas: A solution to the border
puzzle. American Economic Review 93 (1), 170–92.
Bagwell, K., Staiger, R. W., 1999. An economic theory of GATT. American Economic Review
89 (1), 215–248.
Bagwell, K., Staiger, R. W., 2001. Reciprocity, non-discrimination and preferential agree-
ments in the multilateral trading system. European Journal of Political Economy 17, 281–
325.
Baier, S. L., Bergstrand, J. H., 2004. Economic determinants of free trade agreements. Journal
of International Economics 64 (1), 29–63.
Baier, S. L., Bergstrand, J. H., 2007a. Bonus vetus OLS: A simple method for approximating
international trade-cost effects using the gravity equation, working paper.
Baier, S. L., Bergstrand, J. H., 2007b. Do free trade agreements actually increase members’
international trade? Journal of International Economics 71, 72–95.
Baier, S. L., Bergstrand, J. H., 2008. Estimating the effects of free trade agreements on trade
flows using matching econometrics, working paper.
Bergstrand, J. H., 1985. The gravity equation in international trade: Some microeconomic
foundations and empirical evidence. Review of Economics and Statistics 67 (3), 474–481.
Broda, C., ao, N. L., Weinstein, D. E., 2006. Optimal tariffs: The evidence. American Eco-
nomic Review, forthcoming.
Deardorff, A. V., 1998. Determinants of bilateral trade: Does gravity work in a neoclassical
world? In: Frankel, J. A. (Ed.), The Regionalization of the World Economy. University of
Chicago Press, Chicago, pp. 7–22.
Ernst, M. D., 2004. Permutation methods: A basis for exact inference. Statistical Science
19 (4), 676–685.
33
Felbermayr, G., Kohler, W., 2007. Does WTO membership make a difference at the extensive
margin of world trade? CESifo Working Paper No. 1898.
Frankel, J. A., 1997. Regional Trading Blocs. Institute for International Economics, Wash-
ington, DC.
Good, P. I., 2004. Permutation, Parametric, and Bootstrap Tests of Hypotheses, 3rd Edition.
Springer Series in Statistics. Springer.
Heckman, J., Ichimura, H., Smith, J., Todd, P., 1998. Characterizing selection bias using
experimental data. Econometrica 66 (5), 1017–1098.
Heckman, J. J., Ichimura, H., Todd, P. E., 1997. Matching as an econometric evaluation
estimator: Evidence from evaluating a job training programme. The Review of Economic
Studies 64 (4), 605–654.
Helpman, E., Melitz, M., Rubinstein, Y., 2007. Estimating trade flows: Trading partners and
trading volumes. Quarterly Journal of Economics, forthcoming.
Hodges, J., Lehmann, E., 1963. Estimates of location based on rank tests. Annals of Mathe-
matical Statistics 34, 598–611.
Hollander, M., Wolfe, D. A., 1999. Nonparametric statistical methods, 2nd Edition. Wiley.
Hujer, R., Caliendo, M., Thomsen, S. L., 2004. New evidence on the effects of job creation
schemes in Germany – a matching approach with threefold heterogeneity. Research in
Economics 58, 257–302.
Imbens, G. W., 2003. Sensitivity to exogeneity assumptions in program evaluation. American
Economic Review (Papers and Proceedings) 93, 126–132.
Imbens, G. W., 2004. Nonparametric estimation of average treatment effects under exogene-
ity: A review. The Review of Economics and Statistics 86 (1), 4–29.
Imbens, G. W., Rosenbaum, P. R., 2005. Robust, accurate confidence intervals with a weak
instrument: quarter of birth and education. Journal of the Royal Statistical Society (Series
A) 168, 109–126.
34
Johnson, H. G., 1953–1954. Optimum tariffs and retaliation. Review of Economic Studies
21 (2), 142–153.
Lechner, M., 2000. An evaluation of public-sector-sponsored continuous vocational training
programs in east germany. The Journal of Human Resources 35 (2), 347–375.
Lee, M.-J., 2005. Micro-econometrics for policy, program, and treatment effects. Oxford Uni-
versity Press.
Lee, M.-J., Hakkinen, U., Rosenqvist, G., 2007. Finding the best treatment under heavy
censoring and hidden bias. Journal of the Royal Statistical Society (Series A) 170, 133–
147.
Lehmann, E. L., Romano, J. P., 2005. Testing statistical hypotheses. Springer.
Lu, B., Zanutto, E., Hornik, R., Rosenbaum, P. R., 2001. Matching with doses in an observa-
tional study of a media campaign against drug abuse. Journal of the American Statistical
Association 96 (456), 1245–1253.
Martiz, J., 1995. Distribution-free statistical methods, 2nd Edition. Chapman&Hall.
McCallum, J., 1995. National borders matter: Canada-U.S. regional trade patterns. American
Economic Review 85 (3), 615–623.
Persson, T., 2001. Currency unions and trade: how large is the treatment effect? Economic
Policy 33, 435–448.
Pesarin, F., 2001. Multivariate Permutation Tests: With Applications in Biostatistics. Wiley.
Rose, A. K., 2001. Currency unions and trade: the effect is large. Economic Policy 33, 449–
461.
Rose, A. K., 2004. Do we really know that the WTO increases trade? American Economic
Review 94, 98–114.
Rose, A. K., 2006. The effect of membership in the GATT/WTO on trade: Where do we
stand? In: Drabek, Z. (Ed.), The WTO and Economic Welfare. forthcoming.
Rosenbaum, P. R., 2002. Observational studies, 2nd Edition. Springer.
35
Rubin, D. B., Thomas, N., 2000. Combining propensity score matching with additional ad-
justments for prognostic covariates. Journal of the American Statistical Association 95,
573–585.
Staiger, R. W., Tabellini, G., 1987. Discretionary trade policy and excessive protection.
American Economic Review 77 (5), 823–837.
Staiger, R. W., Tabellini, G., 1989. Rules and discretion in trade policy. European Economic
Review 33 (6), 1265–1277.
Staiger, R. W., Tabellini, G., 1999. Do GATT rules help governments make domestic com-
mitments? Economics & Politics 11 (2), 109–144.
Subramanian, A., Wei, S.-J., 2007. The WTO promotes trade, strongly but unevenly. Journal
of International Economics 72 (1), 151–175.
Tomz, M., Goldstein, J., Rivers, D., 2007. Do we really know that the WTO increases trade?
comment. American Economic Review 97 (5), 2005–2018.
Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometrics 1, 80–83.
36
Table 1: Rose (2004) data set – unrestricted matchingpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both in GATT/WTO’ treatment effecton the treated (M1 = 114, 750):100% 1.328 0.000 [1.307, 1.349] 1.332 0.000 [1.312, 1.351] 2.434 R+ 2.428 R+
80% 1.075 0.000 [1.052, 1.098] 1.075 0.000 [1.053, 1.096] 2.086 R+ 2.081 R+
60% 0.836 0.000 [0.810, 0.862] 0.835 0.000 [0.810, 0.859] 1.780 R+ 1.775 R+
40% 0.553 0.000 [0.522, 0.584] 0.535 0.000 [0.507, 0.563] 1.472 R+ 1.467 R+
on the untreated (M0 = 21, 037):100% 0.337 0.000 [0.296, 0.379] 0.303 0.000 [0.266, 0.342] 1.250 R+ 1.243 R+
80% 0.239 0.000 [0.192, 0.286] 0.200 0.000 [0.157, 0.241] 1.144 R+ 1.138 R+
60% 0.185 0.000 [0.131, 0.239] 0.138 0.000 [0.090, 0.187] 1.084 R+ 1.077 R+
40% 0.304 0.000 [0.239, 0.368] 0.243 0.000 [0.184, 0.301] 1.177 R+ 1.167 R+
on all (M1 + M0 = 135, 787):100% 1.175 0.000 [1.156, 1.193] 1.161 0.000 [1.143, 1.179] 2.209 R+ 2.205 R+
80% 0.899 0.000 [0.878, 0.919] 0.883 0.000 [0.863, 0.902] 1.858 R+ 1.854 R+
60% 0.636 0.000 [0.613, 0.659] 0.619 0.000 [0.597, 0.640] 1.559 R+ 1.555 R+
40% 0.428 0.000 [0.400, 0.455] 0.399 0.000 [0.374, 0.424] 1.342 R+ 1.338 R+
‘One in GATT/WTO’ treatment effecton the treated (M1 = 98, 810):100% 0.767 0.000 [0.746, 0.789] 0.773 0.000 [0.753, 0.792] 1.759 R+ 1.755 R+
80% 0.564 0.000 [0.540, 0.588] 0.568 0.000 [0.547, 0.589] 1.525 R+ 1.521 R+
60% 0.422 0.000 [0.396, 0.449] 0.428 0.000 [0.405, 0.451] 1.397 R+ 1.393 R+
40% 0.326 0.000 [0.296, 0.357] 0.325 0.000 [0.298, 0.351] 1.294 R+ 1.289 R+
on the untreated (M0 = 21, 037):100% 0.030 0.068 [-0.009, 0.069] 0.034 0.022 [0.000, 0.068] 1.006 R+ 1.001 R+
80% 0.092 0.000 [0.048, 0.135] 0.089 0.000 [0.052, 0.126] 1.057 R+ 1.051 R+
60% 0.078 0.001 [0.028, 0.129] 0.084 0.000 [0.041, 0.127] 1.046 R+ 1.039 R+
40% 0.138 0.000 [0.076, 0.201] 0.149 0.000 [0.096, 0.203] 1.102 R+ 1.094 R+
on all (M1 + M0 = 119, 847):100% 0.638 0.000 [0.619, 0.657] 0.632 0.000 [0.615, 0.649] 1.610 R+ 1.607 R+
80% 0.443 0.000 [0.422, 0.464] 0.437 0.000 [0.418, 0.455] 1.401 R+ 1.397 R+
60% 0.324 0.000 [0.301, 0.347] 0.321 0.000 [0.301, 0.340] 1.297 R+ 1.293 R+
40% 0.225 0.000 [0.198, 0.253] 0.220 0.000 [0.197, 0.243] 1.194 R+ 1.190 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.851 0.000 [0.831, 0.871] 0.792 0.000 [0.774, 0.811] 2.277 R+ 2.269 R+
80% 0.757 0.000 [0.736, 0.778] 0.696 0.000 [0.676, 0.716] 2.125 R+ 2.117 R+
60% 0.693 0.000 [0.668, 0.717] 0.627 0.000 [0.604, 0.649] 1.998 R+ 1.990 R+
40% 0.665 0.000 [0.635, 0.696] 0.581 0.000 [0.553, 0.608] 1.879 R+ 1.869 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status; nofurther restriction is imposed. The number of matched pairs obtained is indicated by M1 for the effect on the treated,and M0 for the effect on the untreated.2. The data set of Rose (2004) is used; the set of conditioning covariates x used in matching are as explained in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
37
Table 2: Rose (2004) data set – matching within dyadpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both in GATT/WTO’ treatment effecton the treated (M1 = 19, 760):100% 0.941 0.000 [0.912, 0.970] 0.996 0.000 [0.970, 1.023] 3.189 R+ 3.170 R+
80% 0.760 0.000 [0.727, 0.792] 0.814 0.000 [0.785, 0.844] 2.559 R+ 2.543 R+
60% 0.833 0.000 [0.797, 0.870] 0.876 0.000 [0.843, 0.910] 2.792 R+ 2.771 R+
40% 0.796 0.000 [0.750, 0.843] 0.816 0.000 [0.773, 0.858] 2.526 R+ 2.503 R+
on the untreated (M0 = 9, 510):100% 1.300 0.000 [1.255, 1.345] 1.359 0.000 [1.317, 1.401] 4.168 R+ 4.129 R+
80% 1.117 0.000 [1.067, 1.167] 1.161 0.000 [1.115, 1.207] 3.475 R+ 3.440 R+
60% 0.989 0.000 [0.931, 1.048] 1.019 0.000 [0.967, 1.071] 3.017 R+ 2.983 R+
40% 0.847 0.000 [0.777, 0.917] 0.840 0.000 [0.781, 0.899] 2.635 R+ 2.600 R+
on all (M1 + M0 = 29, 270):100% 1.058 0.000 [1.033, 1.082] 1.110 0.000 [1.087, 1.132] 3.515 R+ 3.496 R+
80% 0.895 0.000 [0.868, 0.923] 0.943 0.000 [0.918, 0.967] 2.927 R+ 2.911 R+
60% 0.935 0.000 [0.903, 0.967] 0.969 0.000 [0.940, 0.998] 3.021 R+ 3.002 R+
40% 0.873 0.000 [0.833, 0.913] 0.888 0.000 [0.852, 0.923] 2.700 R+ 2.679 R+
‘One in GATT/WTO’ treatment effecton the treated (M1 = 23, 463):100% 0.464 0.000 [0.438, 0.489] 0.532 0.000 [0.510, 0.555] 1.940 R+ 1.931 R+
80% 0.403 0.000 [0.374, 0.432] 0.470 0.000 [0.444, 0.495] 1.782 R+ 1.772 R+
60% 0.371 0.000 [0.335, 0.407] 0.460 0.000 [0.428, 0.491] 1.666 R+ 1.656 R+
40% 0.314 0.000 [0.269, 0.359] 0.391 0.000 [0.351, 0.430] 1.520 R+ 1.508 R+
on the untreated (M0 = 15, 182):100% 0.579 0.000 [0.544, 0.613] 0.641 0.000 [0.611, 0.671] 2.110 R+ 2.097 R+
80% 0.463 0.000 [0.423, 0.503] 0.525 0.000 [0.492, 0.559] 1.818 R+ 1.805 R+
60% 0.386 0.000 [0.339, 0.432] 0.430 0.000 [0.390, 0.469] 1.610 R+ 1.597 R+
40% 0.317 0.000 [0.262, 0.372] 0.352 0.000 [0.306, 0.398] 1.479 R+ 1.465 R+
on all (M1 + M0 = 38, 645):100% 0.509 0.000 [0.488, 0.529] 0.574 0.000 [0.556, 0.592] 2.023 R+ 2.016 R+
80% 0.428 0.000 [0.404, 0.452] 0.492 0.000 [0.472, 0.513] 1.816 R+ 1.808 R+
60% 0.403 0.000 [0.374, 0.432] 0.478 0.000 [0.453, 0.503] 1.706 R+ 1.698 R+
40% 0.291 0.000 [0.256, 0.326] 0.339 0.000 [0.310, 0.368] 1.469 R+ 1.460 R+
GSP treatment effecton the treated (M1 = 52, 025):100% 0.487 0.000 [0.476, 0.499] 0.478 0.000 [0.468, 0.488] 2.579 R+ 2.570 R+
80% 0.492 0.000 [0.479, 0.506] 0.489 0.000 [0.478, 0.501] 2.504 R+ 2.494 R+
60% 0.379 0.000 [0.363, 0.395] 0.371 0.000 [0.357, 0.385] 1.945 R+ 1.937 R+
40% 0.271 0.000 [0.250, 0.292] 0.259 0.000 [0.241, 0.277] 1.536 R+ 1.528 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status andfrom the same dyad; observations without a match are discarded. The number of matched pairs obtained is indicatedby M1 for the effect on the treated, and M0 for the effect on the untreated.2. The data set of Rose (2004) is used; the set of conditioning covariates x used in matching are as explained in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
38
Table 3: Rose (2004) data set – matching within yearpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both in GATT/WTO’ treatment effecton the treated (M1 = 114, 750):100% 1.329 0.000 [1.308, 1.350] 1.334 0.000 [1.314, 1.354] 2.433 R+ 2.427 R+
80% 1.075 0.000 [1.052, 1.098] 1.075 0.000 [1.053, 1.096] 2.086 R+ 2.081 R+
60% 0.836 0.000 [0.810, 0.862] 0.835 0.000 [0.810, 0.859] 1.780 R+ 1.775 R+
40% 0.553 0.000 [0.522, 0.584] 0.535 0.000 [0.507, 0.563] 1.472 R+ 1.467 R+
on the untreated (M0 = 21, 037):100% 0.340 0.000 [0.298, 0.381] 0.305 0.000 [0.267, 0.344] 1.251 R+ 1.245 R+
80% 0.239 0.000 [0.192, 0.286] 0.200 0.000 [0.157, 0.241] 1.144 R+ 1.138 R+
60% 0.185 0.000 [0.131, 0.239] 0.138 0.000 [0.090, 0.187] 1.084 R+ 1.077 R+
40% 0.304 0.000 [0.239, 0.368] 0.243 0.000 [0.184, 0.301] 1.177 R+ 1.167 R+
on all (M1 + M0 = 135, 787):100% 1.176 0.000 [1.157, 1.194] 1.164 0.000 [1.146, 1.181] 2.209 R+ 2.205 R+
80% 0.899 0.000 [0.878, 0.919] 0.883 0.000 [0.863, 0.902] 1.858 R+ 1.854 R+
60% 0.636 0.000 [0.613, 0.659] 0.619 0.000 [0.597, 0.640] 1.559 R+ 1.555 R+
40% 0.428 0.000 [0.400, 0.455] 0.399 0.000 [0.374, 0.424] 1.342 R+ 1.338 R+
‘One in GATT/WTO’ treatment effecton the treated (M1 = 98, 810):100% 0.761 0.000 [0.739, 0.782] 0.767 0.000 [0.748, 0.786] 1.751 R+ 1.747 R+
80% 0.564 0.000 [0.540, 0.588] 0.568 0.000 [0.547, 0.589] 1.525 R+ 1.521 R+
60% 0.422 0.000 [0.396, 0.449] 0.428 0.000 [0.405, 0.451] 1.397 R+ 1.393 R+
40% 0.326 0.000 [0.296, 0.357] 0.325 0.000 [0.298, 0.351] 1.294 R+ 1.289 R+
on the untreated (M0 = 21, 037):100% 0.032 0.054 [-0.007, 0.072] 0.038 0.014 [0.003, 0.071] 1.009 R+ 1.004 R+
80% 0.092 0.000 [0.048, 0.135] 0.089 0.000 [0.052, 0.126] 1.057 R+ 1.051 R+
60% 0.078 0.001 [0.028, 0.129] 0.084 0.000 [0.041, 0.127] 1.046 R+ 1.039 R+
40% 0.138 0.000 [0.076, 0.201] 0.149 0.000 [0.096, 0.203] 1.102 R+ 1.094 R+
on all (M1 + M0 = 119, 847):100% 0.633 0.000 [0.614, 0.652] 0.628 0.000 [0.611, 0.645] 1.605 R+ 1.601 R+
80% 0.443 0.000 [0.422, 0.464] 0.437 0.000 [0.418, 0.455] 1.401 R+ 1.397 R+
60% 0.324 0.000 [0.301, 0.347] 0.321 0.000 [0.301, 0.340] 1.297 R+ 1.293 R+
40% 0.225 0.000 [0.198, 0.253] 0.220 0.000 [0.197, 0.243] 1.194 R+ 1.190 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.850 0.000 [0.830, 0.870] 0.791 0.000 [0.773, 0.810] 2.275 R+ 2.267 R+
80% 0.757 0.000 [0.736, 0.778] 0.696 0.000 [0.676, 0.716] 2.125 R+ 2.117 R+
60% 0.693 0.000 [0.668, 0.717] 0.627 0.000 [0.604, 0.649] 1.998 R+ 1.990 R+
40% 0.665 0.000 [0.635, 0.696] 0.581 0.000 [0.553, 0.608] 1.879 R+ 1.869 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status andfrom the same year; observations without a match are discarded. The number of matched pairs obtained is indicatedby M1 for the effect on the treated, and M0 for the effect on the untreated.2. The data set of Rose (2004) is used; the set of conditioning covariates x used in matching are as explained in Section 3,with the year dummies dropped from the list.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
39
Table 4: Rose (2004) data set – matching within periodpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both in GATT/WTO’ treatment effecton the treated (M1 = 114, 750):100% 1.331 0.000 [1.310, 1.352] 1.336 0.000 [1.316, 1.356] 2.438 R+ 2.432 R+
80% 1.075 0.000 [1.052, 1.098] 1.075 0.000 [1.053, 1.096] 2.086 R+ 2.081 R+
60% 0.836 0.000 [0.810, 0.862] 0.835 0.000 [0.810, 0.859] 1.780 R+ 1.775 R+
40% 0.553 0.000 [0.522, 0.584] 0.535 0.000 [0.507, 0.563] 1.472 R+ 1.467 R+
on the untreated (M0 = 21, 037):100% 0.340 0.000 [0.298, 0.381] 0.305 0.000 [0.267, 0.344] 1.251 R+ 1.245 R+
80% 0.239 0.000 [0.192, 0.286] 0.200 0.000 [0.157, 0.241] 1.144 R+ 1.138 R+
60% 0.185 0.000 [0.131, 0.239] 0.138 0.000 [0.090, 0.187] 1.084 R+ 1.077 R+
40% 0.304 0.000 [0.239, 0.368] 0.243 0.000 [0.184, 0.301] 1.177 R+ 1.167 R+
on all (M1 + M0 = 135, 787):100% 1.177 0.000 [1.158, 1.196] 1.165 0.000 [1.147, 1.183] 2.213 R+ 2.208 R+
80% 0.899 0.000 [0.878, 0.919] 0.883 0.000 [0.863, 0.902] 1.858 R+ 1.854 R+
60% 0.636 0.000 [0.613, 0.659] 0.619 0.000 [0.597, 0.640] 1.559 R+ 1.555 R+
40% 0.428 0.000 [0.400, 0.455] 0.399 0.000 [0.374, 0.424] 1.342 R+ 1.338 R+
‘One in GATT/WTO’ treatment effecton the treated (M1 = 98, 810):100% 0.762 0.000 [0.741, 0.784] 0.768 0.000 [0.749, 0.787] 1.753 R+ 1.749 R+
80% 0.564 0.000 [0.540, 0.588] 0.568 0.000 [0.547, 0.589] 1.525 R+ 1.521 R+
60% 0.422 0.000 [0.396, 0.449] 0.428 0.000 [0.405, 0.451] 1.397 R+ 1.393 R+
40% 0.326 0.000 [0.296, 0.357] 0.325 0.000 [0.298, 0.351] 1.294 R+ 1.289 R+
on the untreated (M0 = 21, 037):100% 0.032 0.054 [-0.007, 0.072] 0.038 0.015 [0.003, 0.071] 1.009 R+ 1.004 R+
80% 0.092 0.000 [0.048, 0.135] 0.089 0.000 [0.052, 0.126] 1.057 R+ 1.051 R+
60% 0.078 0.001 [0.028, 0.129] 0.084 0.000 [0.041, 0.127] 1.046 R+ 1.039 R+
40% 0.138 0.000 [0.076, 0.201] 0.149 0.000 [0.096, 0.203] 1.102 R+ 1.094 R+
on all (M1 + M0 = 119, 847):100% 0.634 0.000 [0.615, 0.653] 0.629 0.000 [0.612, 0.646] 1.606 R+ 1.603 R+
80% 0.443 0.000 [0.422, 0.464] 0.437 0.000 [0.418, 0.455] 1.401 R+ 1.397 R+
60% 0.324 0.000 [0.301, 0.347] 0.321 0.000 [0.301, 0.340] 1.297 R+ 1.293 R+
40% 0.225 0.000 [0.198, 0.253] 0.220 0.000 [0.197, 0.243] 1.194 R+ 1.190 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.851 0.000 [0.831, 0.871] 0.792 0.000 [0.773, 0.811] 2.276 R+ 2.269 R+
80% 0.757 0.000 [0.736, 0.778] 0.696 0.000 [0.676, 0.716] 2.125 R+ 2.117 R+
60% 0.693 0.000 [0.668, 0.717] 0.627 0.000 [0.604, 0.649] 1.998 R+ 1.990 R+
40% 0.665 0.000 [0.635, 0.696] 0.581 0.000 [0.553, 0.608] 1.879 R+ 1.869 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment statusand from the same period, where the periods are: 1948 (Before Annecy round), 1949-1951 (Annecy to Torquay round),1952-1956 (Torquay to Geneva round), 1957-1961 (Geneva to Dillon round), 1962-1967 (Dillon to Kennedy round), 1968-1979 (Kennedy to Tokyo round), 1980-1994 (Tokyo to Uruguay round), and 1995-(After Uruguay round). Observationswithout a match are discarded. The number of matched pairs obtained is indicated by M1 for the effect on the treated,and M0 for the effect on the untreated.2. The data set of Rose (2004) is used; the set of conditioning covariates x used in matching are as explained in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
40
Table 5: Rose (2004) data set – matching within relative development stagepermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both in GATT/WTO’ treatment effecton the treated (M1 = 112, 959):100% 1.124 0.000 [1.103, 1.146] 1.097 0.000 [1.076, 1.118] 2.024 R+ 2.019 R+
80% 0.778 0.000 [0.753, 0.802] 0.734 0.000 [0.711, 0.757] 1.605 R+ 1.601 R+
60% 0.541 0.000 [0.514, 0.569] 0.504 0.000 [0.478, 0.530] 1.389 R+ 1.385 R+
40% 0.393 0.000 [0.359, 0.427] 0.364 0.000 [0.333, 0.395] 1.260 R+ 1.256 R+
on the untreated (M0 = 21, 013):100% 0.309 0.000 [0.268, 0.351] 0.274 0.000 [0.236, 0.312] 1.222 R+ 1.216 R+
80% 0.175 0.000 [0.128, 0.222] 0.131 0.000 [0.089, 0.173] 1.083 R+ 1.077 R+
60% 0.101 0.000 [0.047, 0.155] 0.059 0.008 [0.010, 0.107] 1.015 R+ 1.009 R+
40% 0.077 0.011 [0.011, 0.142] 0.036 0.110 [-0.021, 0.095] 1.011 R− 1.019 R−on all (M1 + M0 = 133, 972):100% 0.997 0.000 [0.977, 1.016] 0.955 0.000 [0.936, 0.973] 1.883 R+ 1.880 R+
80% 0.662 0.000 [0.640, 0.684] 0.612 0.000 [0.592, 0.633] 1.507 R+ 1.504 R+
60% 0.442 0.000 [0.418, 0.467] 0.402 0.000 [0.379, 0.424] 1.313 R+ 1.309 R+
40% 0.247 0.000 [0.217, 0.276] 0.208 0.000 [0.182, 0.235] 1.147 R+ 1.143 R+
‘One in GATT/WTO’ treatment effecton the treated (M1 = 98, 363):100% 0.650 0.000 [0.627, 0.672] 0.628 0.000 [0.608, 0.648] 1.556 R+ 1.552 R+
80% 0.476 0.000 [0.452, 0.500] 0.460 0.000 [0.438, 0.481] 1.394 R+ 1.391 R+
60% 0.342 0.000 [0.315, 0.370] 0.324 0.000 [0.300, 0.347] 1.267 R+ 1.263 R+
40% 0.242 0.000 [0.211, 0.274] 0.239 0.000 [0.212, 0.266] 1.202 R+ 1.197 R+
on the untreated (M0 = 21, 013):100% 0.049 0.007 [0.010, 0.087] 0.034 0.023 [0.000, 0.067] 1.006 R+ 1.001 R+
80% 0.063 0.002 [0.020, 0.105] 0.051 0.003 [0.014, 0.087] 1.020 R+ 1.014 R+
60% 0.034 0.084 [-0.014, 0.083] 0.028 0.092 [-0.012, 0.070] 1.007 R− 1.013 R−40% 0.062 0.023 [0.001, 0.122] 0.048 0.037 [-0.003, 0.100] 1.003 R+ 1.004 R−on all (M1 + M0 = 119, 376):100% 0.544 0.000 [0.524, 0.563] 0.512 0.000 [0.495, 0.530] 1.455 R+ 1.452 R+
80% 0.391 0.000 [0.369, 0.412] 0.369 0.000 [0.350, 0.388] 1.320 R+ 1.316 R+
60% 0.215 0.000 [0.192, 0.239] 0.200 0.000 [0.179, 0.219] 1.164 R+ 1.161 R+
40% 0.175 0.000 [0.148, 0.202] 0.167 0.000 [0.144, 0.190] 1.140 R+ 1.137 R+
GSP treatment effecton the treated (M1 = 53, 811):100% 0.732 0.000 [0.712, 0.752] 0.693 0.000 [0.674, 0.712] 2.018 R+ 2.011 R+
80% 0.588 0.000 [0.567, 0.608] 0.551 0.000 [0.531, 0.570] 1.813 R+ 1.807 R+
60% 0.507 0.000 [0.485, 0.530] 0.474 0.000 [0.452, 0.496] 1.706 R+ 1.699 R+
40% 0.410 0.000 [0.382, 0.437] 0.378 0.000 [0.352, 0.404] 1.538 R+ 1.530 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status andfrom the same relative development stage, where the combinations of relative development stages are: low-income/low-income dyads, low-income/middle-income dyads, low-income/high-income dyads, middle-income/middle-income dyads,middle-income/high-income dyads, and high-income/high-income dyads. Observations without a match are discarded.The number of matched pairs obtained is indicated by M1 for the effect on the treated, and M0 for the effect on theuntreated.2. The data set of Rose (2004) is used; the set of conditioning covariates x used in matching are as explained in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
41
Table 6: Tomz et al. (2007) data set – unrestricted matchingpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both participating in GATT/WTO’ treatment effecton the treated (M1 = 152, 986):100% 1.418 0.000 [1.399, 1.438] 1.318 0.000 [1.300, 1.335] 2.431 R+ 2.426 R+
80% 1.260 0.000 [1.240, 1.281] 1.184 0.000 [1.165, 1.202] 2.289 R+ 2.284 R+
60% 1.089 0.000 [1.066, 1.113] 1.027 0.000 [1.006, 1.048] 2.063 R+ 2.058 R+
40% 0.762 0.000 [0.736, 0.789] 0.731 0.000 [0.707, 0.755] 1.712 R+ 1.706 R+
on the untreated (M0 = 9, 703):100% 0.396 0.000 [0.339, 0.452] 0.390 0.000 [0.338, 0.443] 1.352 R+ 1.341 R+
80% 0.343 0.000 [0.281, 0.406] 0.334 0.000 [0.276, 0.391] 1.290 R+ 1.280 R+
60% 0.361 0.000 [0.288, 0.434] 0.340 0.000 [0.274, 0.407] 1.288 R+ 1.275 R+
40% 0.404 0.000 [0.312, 0.496] 0.381 0.000 [0.299, 0.464] 1.316 R+ 1.301 R+
on all (M1 + M0 = 162, 689):100% 1.357 0.000 [1.339, 1.376] 1.256 0.000 [1.239, 1.272] 2.356 R+ 2.351 R+
80% 1.184 0.000 [1.165, 1.204] 1.107 0.000 [1.089, 1.125] 2.194 R+ 2.189 R+
60% 1.002 0.000 [0.980, 1.024] 0.938 0.000 [0.918, 0.957] 1.961 R+ 1.956 R+
40% 0.666 0.000 [0.641, 0.691] 0.640 0.000 [0.617, 0.663] 1.623 R+ 1.618 R+
‘One participating in GATT/WTO’ treatment effecton the treated (M1 = 71, 908):100% 0.818 0.000 [0.793, 0.844] 0.771 0.000 [0.749, 0.793] 1.782 R+ 1.777 R+
80% 0.631 0.000 [0.603, 0.658] 0.599 0.000 [0.575, 0.622] 1.585 R+ 1.580 R+
60% 0.444 0.000 [0.414, 0.473] 0.443 0.000 [0.418, 0.469] 1.428 R+ 1.423 R+
40% 0.304 0.000 [0.270, 0.338] 0.319 0.000 [0.289, 0.349] 1.300 R+ 1.295 R+
on the untreated (M0 = 9, 703):100% -0.040 0.078 [-0.096, 0.015] 0.014 0.278 [-0.031, 0.061] 1.025 R− 1.033 R−80% 0.006 0.419 [-0.054, 0.066] 0.026 0.157 [-0.023, 0.077] 1.017 R− 1.025 R−60% 0.028 0.215 [-0.041, 0.097] 0.055 0.037 [-0.004, 0.114] 1.004 R+ 1.005 R−40% 0.135 0.001 [0.049, 0.220] 0.171 0.000 [0.096, 0.246] 1.110 R+ 1.097 R+
on all (M1 + M0 = 81, 611):100% 0.716 0.000 [0.693, 0.740] 0.671 0.000 [0.651, 0.692] 1.672 R+ 1.668 R+
80% 0.523 0.000 [0.499, 0.548] 0.501 0.000 [0.480, 0.523] 1.488 R+ 1.484 R+
60% 0.349 0.000 [0.322, 0.375] 0.350 0.000 [0.328, 0.374] 1.339 R+ 1.335 R+
40% 0.206 0.000 [0.175, 0.236] 0.218 0.000 [0.191, 0.245] 1.197 R+ 1.192 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.824 0.000 [0.805, 0.844] 0.768 0.000 [0.750, 0.787] 2.251 R+ 2.243 R+
80% 0.726 0.000 [0.705, 0.747] 0.665 0.000 [0.646, 0.685] 2.072 R+ 2.065 R+
60% 0.667 0.000 [0.643, 0.691] 0.601 0.000 [0.579, 0.624] 1.952 R+ 1.944 R+
40% 0.621 0.000 [0.591, 0.651] 0.536 0.000 [0.508, 0.564] 1.791 R+ 1.782 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status; nofurther restriction is imposed. The number of matched pairs obtained is indicated by M1 for the effect on the treated,and M0 for the effect on the untreated.2. The data set of Tomz et al. (2007) is used; the set of conditioning covariates x used in matching are the same asthose listed in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
42
Table 7: Tomz et al. (2007) data set – matching within dyadpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both participating in GATT/WTO’ treatment effecton the treated (M1 = 8, 005):100% 1.554 0.000 [1.514, 1.594] 1.554 0.000 [1.514, 1.594] 7.634 R+ 7.535 R+
80% 1.513 0.000 [1.467, 1.559] 1.520 0.000 [1.475, 1.565] 6.783 R+ 6.689 R+
60% 1.285 0.000 [1.232, 1.338] 1.281 0.000 [1.230, 1.333] 5.041 R+ 4.969 R+
40% 1.361 0.000 [1.294, 1.428] 1.355 0.000 [1.290, 1.420] 5.228 R+ 5.134 R+
on the untreated (M0 = 4, 144):100% 1.743 0.000 [1.685, 1.801] 1.755 0.000 [1.696, 1.814] 8.342 R+ 8.186 R+
80% 1.561 0.000 [1.496, 1.626] 1.547 0.000 [1.482, 1.613] 6.646 R+ 6.518 R+
60% 1.334 0.000 [1.259, 1.408] 1.291 0.000 [1.217, 1.365] 5.028 R+ 4.927 R+
40% 1.060 0.000 [0.971, 1.150] 0.974 0.000 [0.890, 1.059] 3.607 R+ 3.527 R+
on all (M1 + M0 = 12, 149):100% 1.618 0.000 [1.585, 1.652] 1.622 0.000 [1.589, 1.655] 8.036 R+ 7.950 R+
80% 1.536 0.000 [1.498, 1.574] 1.542 0.000 [1.504, 1.579] 6.760 R+ 6.684 R+
60% 1.417 0.000 [1.373, 1.460] 1.404 0.000 [1.361, 1.446] 5.945 R+ 5.872 R+
40% 1.315 0.000 [1.261, 1.369] 1.291 0.000 [1.238, 1.344] 5.065 R+ 4.992 R+
‘One participating in GATT/WTO’ treatment effecton the treated (M1 = 11, 637):100% 0.852 0.000 [0.816, 0.888] 0.870 0.000 [0.837, 0.904] 2.900 R+ 2.877 R+
80% 0.716 0.000 [0.675, 0.757] 0.730 0.000 [0.692, 0.768] 2.413 R+ 2.393 R+
60% 0.738 0.000 [0.687, 0.789] 0.765 0.000 [0.718, 0.813] 2.301 R+ 2.279 R+
40% 0.546 0.000 [0.483, 0.608] 0.573 0.000 [0.516, 0.629] 1.860 R+ 1.840 R+
on the untreated (M0 = 6, 548):100% 0.754 0.000 [0.703, 0.805] 0.773 0.000 [0.726, 0.821] 2.408 R+ 2.384 R+
80% 0.633 0.000 [0.574, 0.692] 0.645 0.000 [0.591, 0.699] 2.014 R+ 1.993 R+
60% 0.473 0.000 [0.404, 0.542] 0.469 0.000 [0.407, 0.532] 1.624 R+ 1.604 R+
40% 0.396 0.000 [0.311, 0.482] 0.392 0.000 [0.317, 0.466] 1.477 R+ 1.456 R+
on all (M1 + M0 = 18, 185):100% 0.817 0.000 [0.787, 0.846] 0.836 0.000 [0.808, 0.863] 2.741 R+ 2.725 R+
80% 0.719 0.000 [0.685, 0.753] 0.736 0.000 [0.705, 0.767] 2.386 R+ 2.370 R+
60% 0.643 0.000 [0.602, 0.684] 0.660 0.000 [0.623, 0.698] 2.074 R+ 2.059 R+
40% 0.411 0.000 [0.360, 0.461] 0.415 0.000 [0.371, 0.460] 1.579 R+ 1.565 R+
GSP treatment effecton the treated (M1 = 52, 025):100% 0.485 0.000 [0.473, 0.497] 0.478 0.000 [0.468, 0.488] 2.570 R+ 2.561 R+
80% 0.480 0.000 [0.467, 0.494] 0.482 0.000 [0.470, 0.494] 2.416 R+ 2.407 R+
60% 0.375 0.000 [0.358, 0.391] 0.371 0.000 [0.357, 0.386] 1.901 R+ 1.893 R+
40% 0.265 0.000 [0.243, 0.287] 0.257 0.000 [0.238, 0.275] 1.502 R+ 1.494 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status andfrom the same dyad; observations without a match are discarded. The number of matched pairs obtained is indicatedby M1 for the effect on the treated, and M0 for the effect on the untreated.2. The data set of Tomz et al. (2007) is used; the set of conditioning covariates x used in matching are the same asthose listed in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
43
Table 8: Tomz et al. (2007) data set – matching within yearpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both participating in GATT/WTO’ treatment effecton the treated (M1 = 152, 986):100% 1.427 0.000 [1.408, 1.446] 1.327 0.000 [1.310, 1.345] 2.444 R+ 2.439 R+
80% 1.260 0.000 [1.240, 1.281] 1.184 0.000 [1.165, 1.202] 2.289 R+ 2.284 R+
60% 1.089 0.000 [1.066, 1.113] 1.027 0.000 [1.006, 1.048] 2.063 R+ 2.058 R+
40% 0.762 0.000 [0.736, 0.789] 0.731 0.000 [0.707, 0.755] 1.712 R+ 1.706 R+
on the untreated (M0 = 9, 703):100% 0.396 0.000 [0.339, 0.452] 0.390 0.000 [0.338, 0.443] 1.352 R+ 1.341 R+
80% 0.343 0.000 [0.281, 0.406] 0.334 0.000 [0.276, 0.391] 1.290 R+ 1.280 R+
60% 0.361 0.000 [0.288, 0.434] 0.340 0.000 [0.274, 0.407] 1.288 R+ 1.275 R+
40% 0.404 0.000 [0.312, 0.496] 0.381 0.000 [0.299, 0.464] 1.316 R+ 1.301 R+
on all (M1 + M0 = 162, 689):100% 1.365 0.000 [1.347, 1.384] 1.265 0.000 [1.249, 1.282] 2.368 R+ 2.363 R+
80% 1.184 0.000 [1.165, 1.204] 1.107 0.000 [1.089, 1.125] 2.194 R+ 2.189 R+
60% 1.002 0.000 [0.980, 1.024] 0.938 0.000 [0.918, 0.957] 1.961 R+ 1.956 R+
40% 0.666 0.000 [0.641, 0.691] 0.640 0.000 [0.617, 0.663] 1.623 R+ 1.618 R+
‘One participating in GATT/WTO’ treatment effecton the treated (M1 = 71, 908):100% 0.822 0.000 [0.797, 0.847] 0.775 0.000 [0.753, 0.797] 1.787 R+ 1.782 R+
80% 0.631 0.000 [0.603, 0.658] 0.599 0.000 [0.575, 0.622] 1.585 R+ 1.580 R+
60% 0.444 0.000 [0.414, 0.473] 0.443 0.000 [0.418, 0.469] 1.428 R+ 1.423 R+
40% 0.304 0.000 [0.270, 0.338] 0.319 0.000 [0.289, 0.349] 1.300 R+ 1.295 R+
on the untreated (M0 = 9, 703):100% -0.040 0.078 [-0.096, 0.015] 0.014 0.278 [-0.031, 0.061] 1.025 R− 1.033 R−80% 0.006 0.419 [-0.054, 0.066] 0.026 0.157 [-0.023, 0.077] 1.017 R− 1.025 R−60% 0.028 0.215 [-0.041, 0.097] 0.055 0.037 [-0.004, 0.114] 1.004 R+ 1.005 R−40% 0.135 0.001 [0.049, 0.220] 0.171 0.000 [0.096, 0.246] 1.110 R+ 1.097 R+
on all (M1 + M0 = 81, 611):100% 0.720 0.000 [0.696, 0.743] 0.675 0.000 [0.654, 0.695] 1.676 R+ 1.672 R+
80% 0.523 0.000 [0.499, 0.548] 0.501 0.000 [0.480, 0.523] 1.488 R+ 1.484 R+
60% 0.349 0.000 [0.322, 0.375] 0.350 0.000 [0.328, 0.374] 1.339 R+ 1.335 R+
40% 0.206 0.000 [0.175, 0.236] 0.218 0.000 [0.191, 0.245] 1.197 R+ 1.192 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.823 0.000 [0.804, 0.843] 0.767 0.000 [0.749, 0.786] 2.249 R+ 2.241 R+
80% 0.726 0.000 [0.705, 0.747] 0.665 0.000 [0.646, 0.685] 2.072 R+ 2.065 R+
60% 0.667 0.000 [0.643, 0.691] 0.601 0.000 [0.579, 0.624] 1.952 R+ 1.944 R+
40% 0.621 0.000 [0.591, 0.651] 0.536 0.000 [0.508, 0.564] 1.791 R+ 1.782 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status andfrom the same year; observations without a match are discarded. The number of matched pairs obtained is indicatedby M1 for the effect on the treated, and M0 for the effect on the untreated.2. The data set of Tomz et al. (2007) is used; the set of conditioning covariates x used in matching are the same asthose listed in Section 3, with the year dummies dropped from the list.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
44
Table 9: Tomz et al. (2007) data set – matching within periodpermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both participating in GATT/WTO’ treatment effecton the treated (M1 = 152, 986):100% 1.426 0.000 [1.407, 1.445] 1.326 0.000 [1.309, 1.344] 2.443 R+ 2.438 R+
80% 1.260 0.000 [1.240, 1.281] 1.184 0.000 [1.165, 1.202] 2.289 R+ 2.284 R+
60% 1.089 0.000 [1.066, 1.113] 1.027 0.000 [1.006, 1.048] 2.063 R+ 2.058 R+
40% 0.762 0.000 [0.736, 0.789] 0.731 0.000 [0.707, 0.755] 1.712 R+ 1.706 R+
on the untreated (M0 = 9, 703):100% 0.396 0.000 [0.339, 0.452] 0.390 0.000 [0.338, 0.443] 1.352 R+ 1.341 R+
80% 0.343 0.000 [0.281, 0.406] 0.334 0.000 [0.276, 0.391] 1.290 R+ 1.280 R+
60% 0.361 0.000 [0.288, 0.434] 0.340 0.000 [0.274, 0.407] 1.288 R+ 1.275 R+
40% 0.404 0.000 [0.312, 0.496] 0.381 0.000 [0.299, 0.464] 1.316 R+ 1.301 R+
on all (M1 + M0 = 162, 689):100% 1.365 0.000 [1.346, 1.383] 1.264 0.000 [1.248, 1.281] 2.367 R+ 2.362 R+
80% 1.184 0.000 [1.165, 1.204] 1.107 0.000 [1.089, 1.125] 2.194 R+ 2.189 R+
60% 1.002 0.000 [0.980, 1.024] 0.938 0.000 [0.918, 0.957] 1.961 R+ 1.956 R+
40% 0.666 0.000 [0.641, 0.691] 0.640 0.000 [0.617, 0.663] 1.623 R+ 1.618 R+
‘One participating in GATT/WTO’ treatment effecton the treated (M1 = 71, 908):100% 0.820 0.000 [0.795, 0.845] 0.772 0.000 [0.750, 0.794] 1.783 R+ 1.778 R+
80% 0.631 0.000 [0.603, 0.658] 0.599 0.000 [0.575, 0.622] 1.585 R+ 1.580 R+
60% 0.444 0.000 [0.414, 0.473] 0.443 0.000 [0.418, 0.469] 1.428 R+ 1.423 R+
40% 0.304 0.000 [0.270, 0.338] 0.319 0.000 [0.289, 0.349] 1.300 R+ 1.295 R+
on the untreated (M0 = 9, 703):100% -0.040 0.078 [-0.096, 0.015] 0.014 0.278 [-0.031, 0.061] 1.025 R− 1.033 R−80% 0.006 0.419 [-0.054, 0.066] 0.026 0.157 [-0.023, 0.077] 1.017 R− 1.025 R−60% 0.028 0.215 [-0.041, 0.097] 0.055 0.037 [-0.004, 0.114] 1.004 R+ 1.005 R−40% 0.135 0.001 [0.049, 0.220] 0.171 0.000 [0.096, 0.246] 1.110 R+ 1.097 R+
on all (M1 + M0 = 81, 611):100% 0.718 0.000 [0.694, 0.741] 0.672 0.000 [0.652, 0.693] 1.673 R+ 1.669 R+
80% 0.523 0.000 [0.499, 0.548] 0.501 0.000 [0.480, 0.523] 1.488 R+ 1.484 R+
60% 0.349 0.000 [0.322, 0.375] 0.350 0.000 [0.328, 0.374] 1.339 R+ 1.335 R+
40% 0.206 0.000 [0.175, 0.236] 0.218 0.000 [0.191, 0.245] 1.197 R+ 1.192 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.824 0.000 [0.805, 0.844] 0.768 0.000 [0.750, 0.787] 2.250 R+ 2.243 R+
80% 0.726 0.000 [0.705, 0.747] 0.665 0.000 [0.646, 0.685] 2.072 R+ 2.065 R+
60% 0.667 0.000 [0.643, 0.691] 0.601 0.000 [0.579, 0.624] 1.952 R+ 1.944 R+
40% 0.621 0.000 [0.591, 0.651] 0.536 0.000 [0.508, 0.564] 1.791 R+ 1.782 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment statusand from the same period, where the periods are: 1948 (Before Annecy round), 1949-1951 (Annecy to Torquay round),1952-1956 (Torquay to Geneva round), 1957-1961 (Geneva to Dillon round), 1962-1967 (Dillon to Kennedy round), 1968-1979 (Kennedy to Tokyo round), 1980-1994 (Tokyo to Uruguay round), and 1995-(After Uruguay round). Observationswithout a match are discarded. The number of matched pairs obtained is indicated by M1 for the effect on the treated,and M0 for the effect on the untreated.2. The data set of Tomz et al. (2007) is used; the set of conditioning covariates x used in matching are the same asthose listed in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
45
Table 10: Tomz et al. (2007) data set – matching within relative development stagepermutation test signed-rank test sensitivity analysis
one-sided test two-sided testcaliper effect p-value 95% CI effect p-value 95% CI Γ∗ as in Γ∗ as in
‘Both participating in GATT/WTO’ treatment effecton the treated (M1 = 152, 986):100% 1.065 0.000 [1.048, 1.083] 1.085 0.000 [1.068, 1.101] 2.103 R+ 2.099 R+
80% 0.710 0.000 [0.690, 0.729] 0.704 0.000 [0.686, 0.722] 1.629 R+ 1.626 R+
60% 0.515 0.000 [0.491, 0.539] 0.486 0.000 [0.465, 0.507] 1.385 R+ 1.382 R+
40% 0.461 0.000 [0.432, 0.491] 0.425 0.000 [0.399, 0.451] 1.328 R+ 1.324 R+
on the untreated (M0 = 9, 703):100% 0.164 0.000 [0.108, 0.221] 0.152 0.000 [0.100, 0.203] 1.101 R+ 1.092 R+
80% 0.169 0.000 [0.107, 0.231] 0.155 0.000 [0.099, 0.212] 1.102 R+ 1.093 R+
60% 0.168 0.000 [0.095, 0.241] 0.149 0.000 [0.085, 0.214] 1.090 R+ 1.079 R+
40% 0.200 0.000 [0.110, 0.289] 0.162 0.000 [0.081, 0.243] 1.088 R+ 1.075 R+
on all (M1 + M0 = 162, 689):100% 1.012 0.000 [0.995, 1.028] 1.024 0.000 [1.008, 1.040] 2.032 R+ 2.028 R+
80% 0.656 0.000 [0.637, 0.675] 0.646 0.000 [0.628, 0.663] 1.574 R+ 1.571 R+
60% 0.467 0.000 [0.444, 0.490] 0.434 0.000 [0.414, 0.454] 1.344 R+ 1.341 R+
40% 0.402 0.000 [0.374, 0.430] 0.365 0.000 [0.342, 0.390] 1.287 R+ 1.284 R+
‘One participating in GATT/WTO’ treatment effecton the treated (M1 = 71, 908):100% 0.464 0.000 [0.441, 0.488] 0.494 0.000 [0.473, 0.516] 1.461 R+ 1.457 R+
80% 0.278 0.000 [0.251, 0.306] 0.300 0.000 [0.276, 0.324] 1.248 R+ 1.244 R+
60% 0.290 0.000 [0.259, 0.320] 0.292 0.000 [0.265, 0.318] 1.246 R+ 1.241 R+
40% 0.172 0.000 [0.137, 0.207] 0.194 0.000 [0.163, 0.224] 1.159 R+ 1.154 R+
on the untreated (M0 = 9, 703):100% 0.089 0.000 [0.037, 0.142] 0.089 0.000 [0.043, 0.135] 1.052 R+ 1.044 R+
80% 0.099 0.000 [0.042, 0.157] 0.081 0.001 [0.030, 0.131] 1.040 R+ 1.032 R+
60% 0.028 0.205 [-0.039, 0.096] 0.020 0.250 [-0.037, 0.080] 1.030 R− 1.040 R−40% 0.103 0.008 [0.019, 0.187] 0.085 0.013 [0.010, 0.159] 1.022 R+ 1.010 R+
on all (M1 + M0 = 81, 611):100% 0.420 0.000 [0.398, 0.441] 0.443 0.000 [0.423, 0.462] 1.416 R+ 1.412 R+
80% 0.242 0.000 [0.217, 0.267] 0.258 0.000 [0.236, 0.280] 1.216 R+ 1.213 R+
60% 0.238 0.000 [0.211, 0.266] 0.246 0.000 [0.222, 0.269] 1.214 R+ 1.209 R+
40% 0.172 0.000 [0.141, 0.203] 0.179 0.000 [0.153, 0.206] 1.157 R+ 1.152 R+
GSP treatment effecton the treated (M1 = 54, 285):100% 0.688 0.000 [0.668, 0.707] 0.650 0.000 [0.631, 0.668] 1.966 R+ 1.959 R+
80% 0.569 0.000 [0.549, 0.590] 0.529 0.000 [0.510, 0.549] 1.793 R+ 1.786 R+
60% 0.489 0.000 [0.466, 0.511] 0.458 0.000 [0.437, 0.480] 1.686 R+ 1.679 R+
40% 0.401 0.000 [0.373, 0.428] 0.366 0.000 [0.341, 0.392] 1.517 R+ 1.510 R+
Note:1. The pool of potential matches for an observation are restricted to observations with the opposite treatment status andfrom the same relative development stage, where the combinations of relative development stages are: low-income/low-income dyads, low-income/middle-income dyads, low-income/high-income dyads, middle-income/middle-income dyads,middle-income/high-income dyads, and high-income/high-income dyads. Observations without a match are discarded.The number of matched pairs obtained is indicated by M1 for the effect on the treated, and M0 for the effect on theuntreated.2. The data set of Tomz et al. (2007) is used; the set of conditioning covariates x used in matching are the same asthose listed in Section 3.3. The caliper is set such that only 100%, 80%, 60%, or 40% of matched pairs are qualified for the estimation of thetreatment effect. For example, in the case of 60%, matched pairs with a distance (in terms of x) exceeding the upper60 percentile of all matched pairs are discarded.4. In the column ‘permutation test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Dstatistic; the p-value is obtained for the observed D statistic using the permutation test; the CI is obtained by invertingthe permutation test procedure.5. In the column ‘signed-rank test,’ the ‘effect’ sub-column reports the treatment effect estimate based on the Hodgesand Lehmann (1963) estimator; the p-value is obtained for the observed R statistic using the signed-rank test; the CIis obtained by inverting the signed-rank test procedure.6. In the column ‘sensitivity analysis,’ the sensitivity analysis is conducted for the above signed-rank test based on asignificance level of α = 0.05 in a one-sided or two-sided test. R+ or R− (as a function of the odds ratio Γ of takingthe treatment) indicates the relevant distribution in calculating the critical bound Γ∗ at which the conclusion of thesigned-rank test reverses.
46
Top Related