Second-Order Induction and the Importance of Precedents · Argenziano & Gilboa Second-Order...
Transcript of Second-Order Induction and the Importance of Precedents · Argenziano & Gilboa Second-Order...
Second-Order Induction and the Importance ofPrecedents
Rossella Argenziano and Itzhak Gilboa
June 15, 2017
Argenziano & Gilboa Second-Order Induction June 15, 2017 1 / 25
Highlights
How are probabilistic beliefs formed?
Where there aren’t many “identical” casesRevolutions, elections, economic policy
Case-based beliefs
Similarity-weighted relative frequencyAs in kernel estimation of probabilities
Main point: the similarity function is also learnt from the data
“Second-order induction”The “empirical similarity”
Argenziano & Gilboa Second-Order Induction June 15, 2017 2 / 25
Features of the Model
x1, ..., xm predicting y , all binary
Estimating the probability that yp = 1 by
y sp =∑i≤n s(xi , xp)yi∑i≤n s(xi , xp)
The similarity is binary, and defines a partition
The question is, thus, which set of variables to use?
Argenziano & Gilboa Second-Order Induction June 15, 2017 3 / 25
Highlight of Main Results
A larger set of predictors need not provide a better fit
Even in-sample, due to the “curse of dimensionality”We provide conditions under which it willSmaller sets are also preferred due to overfitting
With many predictors, different beliefs (due to different subsets ofpredictors) is to be expected.
Finding the best set is NPC
Hence even if there is a unique best set it may not be found
In a toy model, it is easier to establish reputation than to re-establishit
Argenziano & Gilboa Second-Order Induction June 15, 2017 4 / 25
Motivating Example I: President Obama
A precedent that changes the probability of a non-white presidentabove and beyond its weight in the sample
In our conceptualization, because it changes the weight of thevariable “race” in the similarity function
This could be relevant also to other races (in a non-binary model).
Argenziano & Gilboa Second-Order Induction June 15, 2017 5 / 25
Motivating Example II: Change of Currency
The French Franc dropped two zeroes in 1960
10 Israeli Lira became 1 Israeli Shekel in 1980
1000 Israeli Shekels became 1 New Israeli Shekel in 1985
These changes used perceptual similarity
As such, they don’t seem to be about the empirical similarity
But, at least in the Israeli example, when the perceptual changewasn’t accompanied by change of policy it didn’t work
People seemed to have been smart enough to “compute” theempirical similarity.
Argenziano & Gilboa Second-Order Induction June 15, 2017 6 / 25
Model
M ≡ {1, ...,m} set of predictorsx ≡
(x1, ..., xm
)∈ X ≡ {0, 1}m ; the predicted variable, y ∈ {0, 1}
The prediction problem is a pair (B, xp) where B = {(xi , yi )}i≤n areobservations (or “cases”), xi =
(x1i , ..., x
mi
)∈ X , yi ∈ {0, 1}, and
xp ∈ X is a new data point
Given a function s : X × X → {0, 1}, the probability that yp = 1 isestimated by
y sp =∑i≤n s(xi , xp)yi∑i≤n s(xi , xp)
if ∑i≤n s(xi , xp) > 0 and ysp = 0.5 otherwise.
Argenziano & Gilboa Second-Order Induction June 15, 2017 7 / 25
The Similarity Function
Given weights(w1, ...,wm
)∈ X (≡ {0, 1}m), let
sw (xi , xp) = ∏{j |w j=1 }
1{x ji =x jp}
This is limited in several ways:
Similarity is yes/noAnd assumed transitive
But it suffi ces to convey some points.
Argenziano & Gilboa Second-Order Induction June 15, 2017 8 / 25
The Empirical Similarity
In-sample estimation
y si =∑k 6=i s(xk , xi )yk∑k 6=i s(xk , xi )
if ∑j 6=i s(xj , xi ) > 0 and ysi = 0.5 otherwise
Define the sum of squared errors to be
SSE (s) =n
∑i=1(y si − yi )
2
Argenziano & Gilboa Second-Order Induction June 15, 2017 9 / 25
The Empirical Similarity —cont.
It will also be convenient to consider the mean (squared) error, that is,
MSE (s) = SSE (s) /n
When s is defined by the variables in J ⊂ M (w j = 1{j∈J}), wesimply refer to the above as MSE (J)
And, in order to deal with overfitting, define also
AMSE (J, c) ≡ MSE (J) + c |J |
Argenziano & Gilboa Second-Order Induction June 15, 2017 10 / 25
Non-Monotonicity
Example
i x1i yi1 0 02 0 13 1 04 1 1
It can be seen that the MSE’s of the subsets of variables are given by
J MSE (J)∅ 4/9{1} 1
Argenziano & Gilboa Second-Order Induction June 15, 2017 11 / 25
Replication
The above hinges on the “bins”defined by J = {1} being very smallAnd suggests
DefinitionGiven two databases B = {(xi , yi )}i≤n and B ′ = {(x ′k , y ′k )}k≤tn (fort ≥ 1), we say that B ′ is a t-replica of B if, for every k ≤ tn,(x ′k , y
′k ) = (xi , yi ) where i = k(mod n).
Yet, for a database B ′ which is a t-replica of the above
MSE (∅) =(
2t4t − 1
)2<
(t
2t − 1
)2= MSE ({1}) .
Argenziano & Gilboa Second-Order Induction June 15, 2017 12 / 25
Informativeness
In the Example x1 added no information regarding y
Formally,
DefinitionA variable j ∈ M is informative relative to a subset J ⊂ M\ {j} indatabase B = {(xi , yi )}i≤n if there exists z ∈ {0, 1}J such that|b (J, z · 0)| , |b (J, z · 1)| > 0 and
y (J ·j ,z ·0) 6= y (J ·j ,z ·1)
where the above are the average y’s in the corresponding sub-databases(“bins”)
Argenziano & Gilboa Second-Order Induction June 15, 2017 13 / 25
Monotonicity
Theorem
Assume that j is informative relative to J ⊂ M\ {j} in the databaseB = {(xi , yi )}i≤n. Then there exists a T ≥ 1 such that, for all t ≥ T, fora t-replica of B, MSE (J ∪ {j}) < MSE (J). Conversely, if j is notinformative relative to J, then for any t-replica of B,MSE (J ∪ {j}) ≥ MSE (J), with a strict inequality unless j is a functionof J.
Argenziano & Gilboa Second-Order Induction June 15, 2017 14 / 25
Non-Uniqueness
Example
t replications ofi x1i x2i yi1 1 0 02 1 0 13 0 1 04 0 1 15-8 0 0 09-12 1 1 1
For t = 1,J MSE (J)∅ 0.297{1} 0.2{2} 0.2{1, 2} 0.333
Argenziano & Gilboa Second-Order Induction June 15, 2017 15 / 25
Differences of Opinions
Let n be fixed and let m grow with
P(x ji = 1
∣∣∣ x lk , l < j or (l = j , k < i)) ∈ (ε, 1− ε)
for a fixed ε ∈ (0, 0.5).
Proposition
For every n ≥ 4 and every ε ∈ (0, 0.5), if there are at least two cases withyi = 1 and at least two with yi = 0, then, as m→ ∞, the probability thatthere exist J,J ′ with J ∩ J ′ = ∅ and MSE (J) = MSE (J ′) = 0 tends to 1.
Fixed m, growing n, (science) vs. the other way around (art?)
Argenziano & Gilboa Second-Order Induction June 15, 2017 16 / 25
A Complexity Result
Define
ProblemEMPIRICAL-SIMILARITY: Given integers m, n ≥ 1, a databaseB = {(xi , yi )}i≤n, and (rational) numbers c ,R ≥ 0, is there a setJ ⊂ M ≡ {1, ...,m} such that AMSE (J, c) ≤ R?
Thus, EMPIRICAL-SIMILARITY is the yes/no version of the optimizationproblem, “Find the empirical similarity for database B and constant c”.We can now state
Theorem
EMPIRICAL-SIMILARITY is NPC.
Argenziano & Gilboa Second-Order Induction June 15, 2017 17 / 25
Application: Reputation
There are 2N past cases in which x ji = 0 (a new player is now joiningthe scene)Among these, N times yi = 1 and N times yi = 0There are k + l new cases in which x ji = 1. In k ≥ 0 — yi = 1, and inl ≥ 0 — yi = 0For N and l , let k (N, l) is the minimal k for whichMSE (J ∪ {j}) < MSE (J).
Proposition
Let there be given N > 2 and l ≥ 0. Then:(i) For every l ≥ 0 there exists k0 such that for all k ≥ k0,MSE (J ∪ {j}) < MSE (J); in particular, k (N, l) is finite (ask (N, l) ≤ k0);(ii) k (N, 0) = 2;(iii) k (N, 1) = 5.
Argenziano & Gilboa Second-Order Induction June 15, 2017 18 / 25
A Continuous Model
(x1, ..., xm
)∈ X ⊆ Rm , y ∈ Y ⊆ R
y sp =∑i≤n s(xi , xp)yi∑i≤n s(xi , xp)
Similarity
s(x , x ′
)= exp
(−
m
∑j=1w j(x j − x ′j
)2)w j ∈ R+ ∪ {∞}w j = ∞ means that s (x , x ′) if x j 6= x ′j .
Argenziano & Gilboa Second-Order Induction June 15, 2017 19 / 25
A Continuous Model —cont.
DefineAMSE (w , c) ≡ MSE (w) + c |J (w) |
whereJ (w) =
{j ≤ m
∣∣w j > 0 }We assume a fixed cost for using a variable
Cost of obtaining the dataCost of recalling it and using it in computations.
Argenziano & Gilboa Second-Order Induction June 15, 2017 20 / 25
A Continuous Model —cont.
ProblemCONTINUOUS-EMPIRICAL-SIMILARITY: Given integers m, n ≥ 1, adatabase of rational valued observations, B = {(xi , yi )}i≤n, and (rational)numbers c ,R ≥ 0, is there a vector of extended rational non-negativenumbers w such that AMSE (w , c) ≤ R?
And we can state
Theorem
CONTINUOUS-EMPIRICAL-SIMILARITY is NPC.
Argenziano & Gilboa Second-Order Induction June 15, 2017 21 / 25
Discussion: Equilibrium selection
Most of our examples are equilibrium selection in a coordination game(revolutions, elections, inflation)
The empirical similarity can be viewed as a theory of focal points
It is compatible with agents being very naive or very sophisticated
As well as with a distribution of the agents across Level-K reasoningfor various K’s.
Argenziano & Gilboa Second-Order Induction June 15, 2017 22 / 25
Discussion: Bayesianism
More generally, this theory of belief formation is compatible with
A minimal view of Bayesianism (prior restricted to the possible valuesof y in a given period)A maximal view of Bayesianism (where the prior is defined over theentire history).
Argenziano & Gilboa Second-Order Induction June 15, 2017 23 / 25
Discussion: Associative Rules
Undoubtedly, people also think in terms of rules
In particular, many of our examples can be viewed as association rules
(“If it is a country in the Soviet Bloc, then it will not be allowed tobreak free”)
It is less obvious how one generates probabilities from association rules
Rules can contradict each otherThey can all be silent.
Argenziano & Gilboa Second-Order Induction June 15, 2017 24 / 25
Discussion: Regression
Much of our stories can be told in the language of regression
One difference: regression can only improve, in-sample, by addingvariables
But the complexity the non-uniqueness results hold
“Fact-Free Learning” (Aragones, Gilboa, Postlewaite, Schmeidler,2005)
We find similarity-weighted empirical frequencies somewhat moreplausible as a cognitive model.
Argenziano & Gilboa Second-Order Induction June 15, 2017 25 / 25