Lossless or Quantized Boosting with Integer Arithmetic ...
Transcript of Lossless or Quantized Boosting with Integer Arithmetic ...
![Page 1: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/1.jpg)
Lossless or Quantized Boosting with Integer Arithmetic— Supplementary Material —
Richard NockData61, The Australian National University & The University of Sydney
Robert C. WilliamsonThe Australian National University & Data61
Abstract
This is the Supplementary Material to Paper ”Lossless or Quantized Boosting with IntegerArithmetic”, appearing in the proceedings of ICML 2019. Notation “main file” indicatesreference to the paper.
1
![Page 2: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/2.jpg)
1 Table of contentsSupplementary material on proofs Pg 3Proof of Theorem 5 Pg 3↪→ Comments on properness vs the Q-loss Pg 3↪→ Detailed proof Pg 4Proof of Lemma 6 Pg 6Proof of Theorem 7 Pg 7Proof of Theorem 8 Pg 9Proof of Theorem 10 Pg 10
Supplementary material on experiments Pg 16Implementation Pg 16Domain summary Table Pg 16UCI fertility Pg 18UCI haberman Pg 19UCI transfusion Pg 20UCI banknote Pg 21UCI breastwisc Pg 22UCI ionosphere Pg 23UCI sonar Pg 24UCI yeast Pg 25UCI winered Pg 26UCI cardiotocography Pg 27UCI creditcardsmall Pg 28UCI abalone Pg 29UCI qsar Pg 30UCI winewhite Pg 31UCI page Pg 32UCI mice Pg 33UCI hill+noise Pg 34UCI hill+nonoise Pg 35UCI firmteacher Pg 36UCI magic Pg 37UCI eeg Pg 38UCI skin Pg 39UCI musk Pg 40UCI hardware Pg 41UCI twitter Pg 42Summary of Results Pg 43
2
![Page 3: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/3.jpg)
2 Proof of Theorem 5
2.1 Comments on properness vs the Q-lossWe explain here why we have left open the unit interval for the definition of (2) and why parameterε in the definition of the partial losses of the Q-loss is important for its properness, even whenthe actual value of ε has absolutely no influence on RATBOOST nor the decision tree inductionalgorithm using LQ. A large class of partial losses is defined in Buja et al. (2005, Theorem 1)1,from which the following,
`1(u).
=
∫ 1−ε
u
(1− c)w(dc), (1)
`−1(u).
=
∫ u
ε
cw(dc) (2)
defines partial losses of a proper loss, where w is a positive measure require to be finite on anyinterval (ε, 1− ε) with2 0 < ε ≤ 1/2. The definition of proper losses in Reid & Williamson (2010,Theorem 6) implicitly assumes that the integrals are proper so the limits of (1), (2) exist for ε→ 0.
In our case, it is not hard to reconstruct the partial losses of Definition 4 from (1), (2) providedwe pick
w(dc).
=% · dc
err(c)2, (3)
which indeed meets the requirements of Buja et al. (2005, Theorem 1) (see (9) below). So, theQ-loss implicitly constrains the domain of the pointwise Bayes risk to be (ε, 1 − ε) for it to fitto (1), (2). While this brings the benefit to prevent infinite values for the pointwise Bayes risk(lim0 L
Q(u) = lim1 LQ(u) = −∞), this also does not represent a restriction for learning:
• this restricts in theory the image of HT in RATBOOST to [ψ(ε),−ψ(ε)] using the canonicallink, that is:
ImHT ⊆ % ·(
1
ε− 2
)· [−1, 1] , (4)
but all components of HT have finite values in RATBOOST (including the images of weakhypotheses, wlog), so we can just consider that ε is implicitly fixed as small as possible for(4) to hold (again, learning HT in RATBOOST does not depend on ε);
• this restricts in theory the proportion p of examples of class ±1 at each leaf of a decision treeto be in (ε, 1 − ε) for the tree to be learned with LQ, but this happens not to be restrictive,for three reasons. First, all classical top-down induction algorithms use losses whose Bayesrisk zeroes in 0, 1, so we can train those trees by discarding pure leaves in the computationof L (Section 7). Second, discarding pure leaves from the computation of the loss doesnot endanger the weak learning assumption. Third, in practice DTs are pruned for goodgeneralization: classical statistical methods will in general end up with trees with pure leavesremoved Kearns & Mansour (1998).
1And an even larger class is defined in Schervish (1989, Theorem 4.2).2Buja et al. (2005, Theorem 1) is slightly more general as the integrals bounds depending on ε are replaced by
variables in (ε, 1− ε).
3
![Page 4: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/4.jpg)
2.2 Detailed proofWe use Shuford, Jr et al. (1966, Theorem 1), Reid & Williamson (2010, Theorem 1) to show that theQ-loss is proper. For this to hold, we just need to show that−u`Q1
′(u) = (1−u)`Q−1
′(u), ∀u ∈ (0, 1),
where ’ denotes derivative. We then check that whenever u ≤ 1/2, we have `Q1′(u) = % · (−1/u2 +
1/u) and `Q−1′(u) = % · (1/u), so that
−u`Q1′(u) = % ·
(1
u− 1
)= % ·
(1− uu
);
(1− u)`Q−1′(u) = % ·
(1− uu
), (5)
so the Q-loss is proper. To show that it is strictly proper is just a matter of completing three steps:(i) computing the pointwise Bayes risk LQ, (ii) computing its weight function wQ(u) and showingthat it is strictly positive for any u ∈ [0, 1] Reid & Williamson (2010, Theorem 6). To achieve step(i), we remark that because `Q is proper Reid & Williamson (2010),
1
%· LQ(u)
= LQ(u, u)
= u · `Q1 (u) + (1− u) · `Q−1(u)
=
{−u log ε− 2u+ 1 + u log u− (1− u) log ε+ (1− u) log u if u ≤ 1/2
−u log ε+ u log(1− u)− (1− u) log ε− 2(1− u) + 1 + (1− u) log(1− u) otherwise(6)
= − log ε+
{−2u+ 1 + log u if u ≤ 1/2
−2(1− u) + 1 + log(1− u) otherwise (7)
= − log ε+ log err(u) + 1− 2err(u)
= log
(err(u)
ε
)+ 1− 2err(u), (8)
and we retrieve (11). We then easily check that its weight function equals Buja et al. (2005)
wQ(u).
= −LQ′′(u)
= −% ·({
1u− 2 if u ≤ 1/2
− 11−u + 2 otherwise
)′= % ·
{ 1u2
if u ≤ 1/21
(1−u)2 otherwise
=%
err(u)2, (9)
which is indeed > 0 for any u ∈ [0, 1], and shows that the Q-loss is strictly proper. We also remarkthat LQ is twice differentiable. The computation of the inverse link is then, from (5) (we recall that
4
![Page 5: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/5.jpg)
K = 0),
ψQ−1
(z).
=(−LQ′
)−1(z)
=
(% ·{
2− 1u
if u ≤ 1/2−2 + 1
1−u otherwise
)−1(10)
=
1
2− z%
if z ≤ 01+ z
%
2+ z%
otherwise
=%+ H(−z)
2%+ |z|, (11)
as claimed (link immediate from (10)). The convex surrogate for the Q-loss is obtained from (7),and we first search for (−L)?:
(−LQ)?(z).
= supz′∈dom(LQ)
{zz′ + LQ(z′)}
= supu∈[0,1]
{zu+ % ·
(log
(err(u)
ε
)+ 1− 2err(u)
)}= % · (1− log ε) + max
{sup
u∈[0,1/2]{(z − 2%)u+ % · log u} ,−2%+ sup
u∈(1/2,1]{(z + 2%)u+ % · log(1− u)}
}
= % · (1− log ε) + max
{% log %+ % · z−2%
2%−z − % · log(2%− z) for u = % · 12%−z ∈ [0, 1/2]
% log %− 2%+ (z+%)(z+2%)z+2%
− % · log(2%+ z) for u = z+%z+2%
∈ (1/2, 1]
= % log %− % · log ε+ max
{−% · log(2%− z) for u = % · 1
2%−z ∈ [0, 1/2]
z − % · log(2%+ z) for u = z+%z+2%
∈ (1/2, 1]
= −% log
(ε
%
)+ max
{−% · log(2%− z) for z ≤ 0z − % · log(2%+ z) for z > 0
= −% log
(ε
%
)+
{−% · log(2%− z) for z ≤ 0z − % · log(2%+ z) for z > 0
= −% · log
(2ε+
ε|z|%
)+ H(−z), (12)
and we get
FQ(z).
= (−LQ)?(−z) (13)
= −% · log
(2ε+
ε|z|%
)+ H(z), (14)
as claimed. This derivation also allows us to prove that the Q-loss is proper canonical using Nock &Nielsen (2008, Lemma 1). That the Q-loss is symmetric is just a consequence of its definition Reid& Williamson (2010). This ends the proof of Theorem 5.
5
![Page 6: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/6.jpg)
3 Proof of Lemma 6Denote for short
v.
= z + % ·(
1− 2u
err(u)
). (15)
It is not hard to check that indeed
z � u =%+H(v)
2%+ |v|.
= g(v), (16)
as well as g(−v) = 1 − g(v). So, we focus on the second equality. Denote for short u .= nu/du,
z.
= % · nz/dz. We remark that the definition of z makes % simplify:
z � u =
1 +H
(nz
dz+
1−2·nudu
nudu∧ du−nu
du
)2 +
∣∣∣∣nz
dz+
1−2·nudu
nudu∧ du−nu
du
∣∣∣∣=
1 +H(nz
dz+ du−2nu
nu∧(du−nu)
)2 +
∣∣∣nz
dz+ du−2nu
nu∧(du−nu)
∣∣∣ (17)
Case 1: v ≥ 0 and nu ≤ du − nu. We have
z � u =1
2 + nz
dz+ du−2nu
nu
=1
nz
dz+ du
nu
=nudz
nunz + dudz. (18)
Case 2: v ≥ 0 and nu > du − nu. We have
z � u =1
2 + nz
dz+ du−2nu
du−nu
=1
3 + nz
dz− nu
du−nu
=(du − nu)dz
(du − nu)(3dz + nz)− nudz
=(du − nu)dz
(du − nu)nz + dudz + 2(du − 2nu)dz. (19)
Folding both cases 1 and 2, we get
z � u =(nu ∧ (du − nu))dz
(nu ∧ (du − nu))nz + dudz − 2H(du − 2nu)dz. (20)
Note that this holds when v ≥ 0, equivalent to
nzdz
+du − 2nu
nu ∧ (du − nu)> 0, (21)
6
![Page 7: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/7.jpg)
that is, assuming wlog dz > 0,
(nu ∧ (du − nu))nz > −(du − 2nu)dz. (22)
So, let us denote a .= (nu ∧ (du − nu))dz, b
.= (nu ∧ (du − nu))nz, c
.= dudz, d
.= 2(du − 2nu)dz.
We get that if b+ (d/2) ≥ 0, then
z � u =a
b+ c− H(d), (23)
and if b+ (d/2) < 0, then we remark that −b− (d/2) > 0, so
z � u = 1− a
−b+ c− H(−d)=−b− a+ c− H(−d)
−b+ c− H(−d), (24)
as claimed.
4 Proof of Theorem 7The proof revolves on two simple facts about FQ: (i) since FQ is convex and differentiable, wehave FQ(y)− FQ(x)− (y − x)FQ′(x) ≥ 0 (the left hand side is just the Bregman divergence withgenerator FQ). Also, (ii) FQ being twice differentiable, Taylor Theorem says that for any x, y wecan expand the derivative as FQ′(y) = FQ′(x) + (y − x)FQ′′(z) for some z ∈ [x, y]. Using (i) and(ii) in this order, we get that fo for any i ∈ {1, 2, ...,m}, there exists αi ∈ [0, 1] and
βi = yiHt(xi) + αiδtyiht(xi) ∈ [yiHt(xi), yiHt+1(xi)] (25)
such that:
Ei∼D[FQ(yiHt(xi))
]− Ei∼D
[FQ(yiHt+1(xi))
]≥ Ei∼D
[(yiHt(xi)− yiHt+1(xi))F
Q′(yiHt+1(xi))]
(26)
= Ei∼D[(yiHt(xi)− yiHt+1(xi))F
Q′(yiHt(xi))]
︸ ︷︷ ︸.=X
−Ei∼D[(yiHt(xi)− yiHt+1(xi))
2FQ′′(βi)]
︸ ︷︷ ︸.=Y
.(27)
Because FQ is convex, Y ≥ 0. We want to show that not just X ≥ 0 but in fact the differenceX − Y is sufficiently large for the bound of the Theorem to hold. We first remark
X.
= Ei∼D[(yiHt(xi)− yiHt+1(xi))F
Q′(yiHt(xi))]
= −δtEi∼D[yiht(xi) · −ψQ
−1(−yiHt(xi))
]= δtEi∼D [wtiyiht(xi)]
= δt ·∑
iwtiyiht(xi)
m= a · η2t . (28)
7
![Page 8: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/8.jpg)
We also have FQ′′(z) = %/(2%+ |z|)2, so
Y.
= Ei∼D[(yiHt(xi)− yiHt+1(xi))
2FQ′′(βi)]
= % · Ei∼D[
(yiHt(xi)− yiHt+1(xi))2
(2 + |βi|)2
]= %δ2t · Ei∼D
[h2t (xi)
(2%+ |βi|)2
]. (29)
Now we get because of assumption (M):
Ei∼D[
h2t (xi)
(2%+ |βi|)2
]≤ 1
4%2· Ei∼D
[h2t (xi)
]≤ M2
4%2. (30)
So,
Y ≤ δ2tM2
4%
=a2 · η2tM2
4%. (31)
We finally get
Ei∼D[FQ(yiHt(xi))
]− Ei∼D
[FQ(yiHt+1(xi))
]≥ X − Y
≥(
1− aM2
4%
)· a︸ ︷︷ ︸
.=Z(a)
·η2t . (32)
Suppose now that we fix any π ∈ [0, 1] and then choose any
a ∈ 2%
M2· [1− π, 1 + π] . (33)
It is not hard to check that Z(a) satisfies
Z(a) ≥ (1− π2) · %
M2· η2t , (34)
so we get
Ei∼D[FQ(yiHt(xi))
]− Ei∼D
[FQ(yiHt+1(xi))
]≥ (1− π2)%η2t
M2,∀t, (35)
and so the final classifier HT satisfies
Ei∼D[FQ(yiHT (xi))
]≤ FQ(0)− (1− π2)% ·
∑Tt=1 η
2t
M2. (36)
Remark that this holds regardless of the sequence {ηt}t. If we want to guarantee that Ei∼D[FQ(yiHT (xi))
]≤
FQ(z∗) for some z∗ ≥ 0, then it suffices to iterate untilT∑t=1
η2t ≥FQ(0)− FQ(z∗)
(1− π2)%·M2, (37)
and we get the statement of the Theorem.
8
![Page 9: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/9.jpg)
5 Proof of Theorem 8The proof uses the same basic steps as the proof of Theorem 7. Denote for short
w̃ti.
= wti + κti, (38)
where wti.
= −ψQ−1 (−yiHt(xi)) is the non-quantized weights and κti is the quantization shift inweights. Note that we do not have access to wti. We indicate with a tilda quantities that depend onw̃.
This time, we have for X the expression:
X = −δ̃tEi∼D[yiht(xi) · −ψQ
−1(−yiHt(xi))
]= δ̃tEi∼D [wtiyiht(xi)]
= δ̃t ·(∑
i w̃tiyiht(xi)
m−∑
i κtiyiht(xi)
m
)= a · η̃2t − a · η̃t ·
∑i κtiyiht(xi)
m, (39)
while the expression of Y does not change (yet including ”tilda” parameters affected by thequantization of weights). Denote for short
∆t.
=
∑i κtiyiht(xi)
m. (40)
We get in lieu of (32),
Ei∼D[FQ(yiHt(xi))
]− Ei∼D
[FQ(yiHt+1(xi))
]≥ X − Y
≥(
1− ∆t
η̃t− aM2
4%
)· aη̃2t
=
(4%
M2· η̃t −∆t
η̃t− a)· a︸ ︷︷ ︸
.=Z(a)
·M2η̃2t
4%.(41)
Choose
a ∈ 2%
M2·[η̃t −∆t
η̃t− π, η̃t −∆t
η̃t+ π
], (42)
for any 0 ≤ π ≤ |η̃t −∆t|/η̃t. It follows
Z(a) ≥
((η̃t −∆t
η̃t
)2
− π2
)· %
M2· η̃2t . (43)
Suppose that the quantisation shift satisfies |η̃t −∆t| ≥ ζ · |η̃t| (which holds if |∆t| ≤ (1− ζ) · |η̃t|)for some ζ > 0. We obtain that for any 0 ≤ π < ζ ,
Z(a) ≥(ζ2 − π2
)· %
M2· η̃2t > 0, (44)
9
![Page 10: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/10.jpg)
which leads to the statement of the Theorem after posing κt.
= |∆t|.Remark: assumption (Q) is in fact stronger than what would really be needed to get the Theorem.Under some conditions, we could indeed accept |∆t| > (1− ζ) · |η̃t|, but in the derivations above,the shift in weights due to quantisation would result in a disguised way to strenghten weak learning.Clearly, such an assumption where quantisation compensates for the weakness of the weak classifiersis unfit in a boosting setting.
6 Proof of Theorem 10We assume basic knowledge of the proofs of Kearns & Mansour (1996). We shall briefly presentthe proof scheme as well as the notations, that we keep identical to Kearns & Mansour (1996) forreadability.
The basic of the proof is to show that each time a leaf is replaced by a split under the weaklearning assumption, there is a sufficient decrease of L(H). Denote H+ tree H in which a leaf λ hasbeen replaced by a split indexed with some g : R→ {0, 1} satisfying the weak learning assumption.The decrease in L(.), ∆
.= L(H)− L(H+), is lowerbounded as a function of γ and then used to
lowerbound the number of iterations (each of which is the replacement of a leaf by a binary subtree)to get to a given value of L(.)
It turns out that ∆ can be abstracted by a better quantity to analyze, ∆.
= ω(λ) ·∆LQ(q, τ, δ)with
∆LQ(q, τ, δ).
= LQ(q)− (1− τ)LQ(q − τδ)− τLQ(q + (1− τ)δ) (45)
with q .= q(λ) and δ = γq(1− q)/(τ(1− τ)) with τ denoting the relative proportion of examples
for which g = +1 in leaf λ, following Kearns & Mansour (1996). The following Lemma is the keyto the proof of Theorem 10.
Lemma 1 Suppose the weak hypothesis assumption is satisfied for the current split, for someconstant γ > 0. For any q, τ ∈ [0, 1], using δ = γq(1− q)/(τ(1− τ)) yields:
∆LQ(q, τ, δ) ≥ γ2
2. (46)
Proof Our proof follows the proof of Kearns & Mansour (1996).
Lemma 2 Suppose τ ≤ 1/2, q > 1/2 or τ ≥ 1/2, q < 1/2. If γ ≤ 1/25, ∆LQ(q, τ, δ) is minimizedby some τ ∈ [0.4, 0.6].
Proof To prove the Lemma we use the trick of Kearns & Mansour (1996, Lemma 4), which consistsof studying function
U(q,X).
= LQ(q −X) +XLQ′(q −X)
=
{log (q −X) + X
q−X + 1− 2q if q −X ≤ 12
log (1− q +X)− X1−q+X − 1 + 2q if q −X > 1
2
(47)
10
![Page 11: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/11.jpg)
and show
U(q, τδ) ≤ U(q,−(1− τ)δ),∀τ ≤ 0.4, (48)U(q, τδ) ≥ U(q,−(1− τ)δ),∀τ ≥ 0.6, (49)
Case 1: τ ≤ 0.4 (and therefore q < 1/2). We have two subcases to show (48).
Case 1.1: q + (1− τ)δ < 1/2. In this case, q −X < 1/2 for both instantiations of X in (48). Wethen have
U(q, τδ) = log
(1− γ(1− q)
1− τ
)+
γ(1−q)1−τ
1− γ(1−q)1−τ
+ 1− 2q + log q (50)
= log
(τ − 1 + γ(1− q)
τ − 1
)− γ(1− q)τ − 1 + γ(1− q)
+ 1− 2q + log q (51)
U(q,−(1− τ)δ) = log
(1 +
γ(1− q)τ
)−
γ(1−q)τ
1 + γ(1−q)τ
+ 1− 2q + log q (52)
= log
(τ + γ(1− q)
τ
)− γ(1− q)τ + γ(1− q)
+ 1− 2q + log q, (53)
so (48) is equivalent to showing
log
(τ − 1 + γ(1− q)
τ − 1
)− γ(1− q)τ − 1 + γ(1− q)
≤ log
(τ + γ(1− q)
τ
)− γ(1− q)τ + γ(1− q)
,(54)
which after reorganising and simplification amounts to showing
log
(1− γ(1− q)
(τ + γ(1− q))(1− τ)
)≤ − γ(1− q)
(τ + γ(1− q))(1− τ − γ(1− q)). (55)
We remark that for the log to be defined in (51), we must have τ < 1−γ(1− q), which implies thatthe RHS of (55) is negative. To show (55), we use the fact that log(1−X) ≤ −X −X2/2 whenX ≥ 0, so fixing X .
= γ(1− q)/((τ + γ(1− q))(1− τ)) we obtain
log
(1− γ(1− q)
(τ + γ(1− q))(1− τ)
)≤ − γ(1− q)
τ + γ(1− q)·(
1
1− τ+
γ(1− q)2(τ + γ(1− q))(1− τ)2
).(56)
To show (55), we can then show
1
1− τ − γ(1− q)≤ 1
1− τ+
γ(1− q)2(τ + γ(1− q))(1− τ)2
, (57)
which, after simplification, is equivalent to
1− τ − γ(1− q)2(τ + γ(1− q))(1− τ)
≥ 1, (58)
or equivalently 3τ − 2τ 2 + 3γ(1 − q) − 2τγ(1 − q) ≤ 1. Since τ ≤ 2/5, 3τ − 2τ 2 ≤ 22/25.If we pick γ ≤ 1/25, then 3γ(1 − q) − 2τγ(1 − q) ≤ 3γ(1 − q) ≤ 3γ = 3/25, so that
11
![Page 12: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/12.jpg)
3τ − 2τ 2 + 3γ(1− q)− 2τγ(1− q) ≤ 1, as claimed (end of Case 1.1).
Case 1.2: q + (1− τ)δ > 1/2. In this case,
U(q,−(1− τ)δ) = log(
1− γq
τ
)+
γqτ
1− γqτ
+ 1− 2(1− q) + log(1− q) (59)
= log
(τ − γq
τ
)+
γq
τ − γq+ 2q − 1 + log(1− q). (60)
We also remark that 1 − 2q + log q ≤ 2q − 1 + log(1 − q) for q < 1/2, so to prove (48), it issufficient to show
log
(τ − 1 + γ(1− q)
τ − 1
)− γ(1− q)τ − 1 + γ(1− q)
≤ log
(τ − γq
τ
)+
γq
τ − γq, (61)
which reduces after simplification to showing that
log
(1 +
γ(q − τ)
(τ − γq)(1− τ)
)≤ γ(q − τ)
(τ − γq)(1− τ − γ(1− q)). (62)
Because q+ (1− τ)δ > 1/2, if τ ≥ 10γq(1− q), then q > 0.4 and therefore q > τ . If, on the otherhand τ ≤ 10γq(1− q), then if γ ≤ 1/10, it follows also τ ≤ q. To summarize, q + (1− τ)δ > 1/2and γ ≤ 1/10 imply q ≥ τ .
Using the fact that log(1 +X) ≤ X and γ(1− q) ≥ 0, we easily obtain the proof of (62) viathe chain of inequalities
log
(1 +
γ(q − τ)
(τ − γq)(1− τ)
)≤ γ(q − τ)
(τ − γq)(1− τ)≤ γ(q − τ)
(τ − γq)(1− τ − γ(1− q)). (63)
This ends up the proof for Case 1.
Case 2: τ ≥ 0.6 (and therefore q > 1/2). We have two cases again, this time to show (49).
Case 2.1: q − τδ > 1/2. In this case, q −X > 1/2 for both instantiations of X in (49). We thenhave
U(q, τδ) = log
(1 +
γq
1− τ
)− γq
1− τ + γq− 1 + 2q + log(1− q) (64)
U(q,−(1− τ)δ) = log(
1− γq
τ
)+
γq
τ − γq− 1 + 2q + log(1− q), (65)
To show (49), it is thus sufficient to show that
log
(1 +
γq
1− τ
)− γq
1− τ + γq≥ log
(1− γq
τ
)+
γq
τ − γq, (66)
or equivalently, after reordering and simplifying,
log
(1− γq
τ(1− τ + γq)
)≤ − γq
(τ − γq)(1− τ + γq), (67)
12
![Page 13: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/13.jpg)
which is (55) with the substitution τ 7→ 1 − τ and q 7→ 1 − q. Since then 1 − τ ≤ 0.4, we candirectly apply the proof of (55), which ends the proof of Case 2.1.
Case 2.2: q − τδ < 1/2. In this case,
U(q, τδ) = log
(1− γ(1− q)
1− τ
)+
γ(1− q)1− τ − γ(1− q)
+ 1− 2q + log q, (68)
while we still have
U(q,−(1− τ)δ) = log(
1− γq
τ
)+
γq
τ − γq− 1 + 2q + log(1− q), (69)
and so we want to show
log(
1− γq
τ
)+
γq
τ − γq− 1 + 2q + log(1− q)
≤ log
(1− γ(1− q)
1− τ
)+
γ(1− q)1− τ − γ(1− q)
+ 1− 2q + log q, (70)
We also remark that −1 + 2q + log(1 − q) ≤ 1 − 2q + log q for q > 1/2, so to prove (70), it issufficient to show
log(
1− γq
τ
)+
γq
τ − γq≤ log
(1− γ(1− q)
1− τ
)+
γ(1− q)1− τ − γ(1− q)
, (71)
which reduces after simplification to showing that
log
(1 +
γ(τ − q)(1− τ − γ(1− q))τ
)≤ γ(τ − q)
(τ − γq)(1− τ − γ(1− q)), (72)
wich turns out to be (62) with the substitution τ 7→ 1− τ and q 7→ 1− q. Since then 1− τ ≤ 0.4,we can directly apply the proof of (62), which ends the proof of Case 2.2, and the proof of Lemma2 as well. (end of the proof of Lemma 2)
Following Kearns & Mansour (1996), we define
FLQ(q, τ, δ).
= −τ(1− τ)δ2
2LQ′′(q)− τ(1− τ)(1− 2τ)δ3
6LQ
(3)(q). (73)
We now state and prove the equivalent of (Kearns & Mansour, 1996, Lemma 3).
Lemma 3 For any q, τ, δ ∈ [0, 1],
∆LQ(q, τ, δ) ≥ FLQ(q, τ, δ). (74)
Proof We have
LQ(k)
(q) = % ·
{(−1)k−1(k−1)!
qk− 2 · Jk = 1K if q < 1/2
− (k−1)!(1−q)k + 2 · Jk = 1K if q > 1/2
, (75)
13
![Page 14: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/14.jpg)
and we check that only the first and second order derivatives are defined in q = 1/2. Since LQ issymmetric around 1/2, ∆LQ satisfies
∆LQ
(1
2− q, 1− τ, δ
)= LQ
(1
2− q)− τLQ
(1
2− q − (1− τ)δ
)− (1− τ)LQ
(1
2− q + τδ
)= LQ (q)− τLQ
(1
2− (q + (1− τ)δ)
)− (1− τ)LQ
(1
2− (q − τδ)
)= LQ (q)− τLQ (q + (1− τ)δ)− (1− τ)LQ (q − τδ) = ∆LQ(q, τ, δ),(76)
so we study ∆LQ for q > 1/2 without loss of generality. In this case, all derivatives LQ at orderk ≥ 4 are all negative, which from (Kearns & Mansour, 1996, Lemma 3) guarantees that
∆LQ(q, τ, δ) ≥ FLQ(q, τ, δ), (77)
as claimed. (end of the proof of Lemma 3)
We now lowerbound FLQ(q, τ, δ), which, from Lemma 3, will also provide a lowerbound for thedecrease in ∆LQ(q, τ, δ) and in fact will show Lemma 1. From now on, let us fix δ = γq(1 −q)/(τ(1− τ)), if we denote V (τ, q)
.= (1− 2τ) (q − Jq < 1/2K), then
FLQ(q, τ, δ) = max{q, 1− q}2γ2 ·(
1
2τ(1− τ)+
γ
3τ 2(1− τ)2· V (τ, q)
). (78)
We immediately obtain
Lemma 4 Let δ = γq(1− q)/(τ(1− τ)). Then for any τ, q such that V (τ, q) ≥ 0,
FLQ(q, τ, δ) ≥ γ2
2. (79)
Proof For any τ, q such that V (τ, q) ≥ 0, we have
FLQ(q, τ, δ) ≥ max{q, 1− q}2γ2 · 1
2τ(1− τ)≥ 1
4· γ2 · 2 =
γ2
2, (80)
as claimed (end of the proof of Lemma 4).
Lemma 4 means that when τ ≤ 1/2, q < 1/2 or τ ≥ 1/2, q > 1/2, the drop ∆LQ(q, τ, δ) isguaranteed to be ”big”. If this does not happen, we make use of Lemma 2. In this case, if we pickwlog τ ≤ 1/2, q > 1/2 and get:
FLQ(q, τ, δ) = max{q, 1− q}2γ2 ·(
1
2τ(1− τ)− γ(1− 2τ)(1− q)
3τ 2(1− τ)2
)≥ γ2
2·(
2− γ(1− 2 · 0.4)
3 · 0.42(1− 0.4)2
)= γ2 ·
(1− 625γ
216
)≥ γ2 ·
(1− 25
216
)≥ γ2
2,
which therefore implies that FLQ(q, τ, δ) ≥ γ2/2 in all cases. We just have to use Lemma 3 to finishthe proof of Lemma 1 (end of the proof of Lemma 1).
14
![Page 15: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/15.jpg)
We can now finish the proof of Theorem 10. Suppose the current tree H has t leaves. There must bea leaf with ω(λ) ≥ 1/t, so
∆.
= LQ(H)− LQ(H+)
= ω(λ)∆LQ(q, τ, δ) ≥ γ2
2t
≥ γ2
2t· L
Q(H)
LQ(H0), (81)
where the last inequality follows from the concavity of LQ, letting H0 the single-root node tree forwhich LQ(H0) = LQ(q(S)), and more generally Ht a tree with t+ 1 leaves (thus we have made titerations of the boosting procedure). It therefore comes the recurrence relationship
LQ(Ht+1) ≤(
1− γ2
2LQ(q(S)) · t
)· LQ(Ht), (82)
and we get (see (Kearns & Mansour, 1996, proof of Theorem 10))
LQ(Ht) ≤ exp
(− γ2 log t
4LQ(q(S))
)· LQ(q(S)), (83)
to obtain LQ(Ht) ≤ ρ · LQ(q(S)) for ρ ∈ (0, 1], it therefore suffices that
t ≥(
1
ρ
) 4·LQ(q(S))
γ2
. (84)
We finally remark that LQ(q(S)) ≤ % · log 1/(2ε) and conclude that (84) holds when
t ≥(
1
ρ
) 4%
γ2 log 12ε, (85)
as claimed.Remark: we can compare at this stage our guarantees to those of Kearns & Mansour (1996). Theknowledge of their proofs immediately sheds light on the fact that our lowerbound on ∆LQ(q, τ, δ)in Lemma 10 does not depend on q whereas all of theirs do (Kearns & Mansour, 1996, Lemmata 5,6, 7), and in fact vanish as q → 0, 1. A closer look at the weak learning assumption shows that itin fact precludes this extreme regime for q as it enforces q ∈ [τδ, 1− (1− τ)δ] when δ ≤ 1; as aconsequence their bounds can also be reformulated to exclude q and their convergence rate for theirbest splitting criterion is within the same order as ours.
15
![Page 16: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/16.jpg)
7 Experiments in extenso
7.1 ImplementationWe give here a few details on the implementation. The Java implementation of the algorithms,available separately, implements the version of Nock & Nielsen (2006); Schapire & Singer (1999)respectively for ADABOOSTR and AdaBoost.
The implementation of RATBOOSTE uses methods from class Math that allow to throw anArithmeticExceptionwhen a long overflow happens – in which case we catch the exceptionand redo the corresponding method after quantization. To make the code faster, we have alsoincluded the possibility to trigger quantization when the longs encoding length exceeds a user-fixed threshold.
The implementation of RATBOOSTAb uses a regular k-means with Forgy initialization. Ifyou want to optimize this with your best hard clustering algorithm, you just have to rewrite a fewmethods from class KMeans R in file Misc.java. Note that the implementation also allows touse stochastic weight assignation with adaptive quantization (a combination of RATBOOSTAb andRATBOOSTQb), but it is not reported (see README).
Domain summary TableTable 1 details the UCI domains we have used Blake et al. (1998). We now detail the per-domaintraining curves when there is no stopping criterion (other than to boost for 10 000 iterations). Inthe results reported in Tables 1 (main file) and 2 (this), we keep the classifier which minimizesthe empirical risk among all iterations, which amounts to a cutoff point for boosting around theminimal values of each curve (because of the statistical uncertainty, we are not guarantee that thismay be minimal on testing). Results of ADABOOSTR are omitted to not clutter the plots but theyare included in the full Table 2.
16
![Page 17: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/17.jpg)
Domain m dFertility 100 9
Haberman 306 3Transfusion 748 4Banknote 1 372 4
Breast wisc 699 9Ionosphere 351 33
Sonar 208 60Yeast 1 484 7
Wine-red 1 599 11Cardiotocography (*) 2 126 9CreditCardSmall (**) 1 000 23
Abalone 4 177 8Qsar 1 055 41
Wine-white 4 898 11Page 5 473 10Mice 1 080 77
Hill+noise 1 212 100Hill+nonoise 1 212 100Firmteacher 10 800 16
Magic 19 020 10EEG 14 980 14Skin 245 057 3Musk 6 598 166
Hardware 28 179 95Twitter (***) 583 250 77
Table 1: UCI domains considered in our experiments (m = total number of examples, d = numberof features), ordered in increasing m× d. (*) we used features 13-21 as descriptors; (**) we usedthe first 1 000 examples of the UCI domain; (***) due to the size of the domain, only AdaBoostand ADABOOSTRwere run for T = 5000 iterations, the other algorithms were rum for a smallerT ′ = 1000 iterations.
17
![Page 18: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/18.jpg)
UCI fertility
Figure 1: UCI domain fertility. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 2: UCI domain fertility. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 10000 iterations.
18
![Page 19: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/19.jpg)
UCI haberman
Figure 3: UCI domain haberman. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 4: UCI domain haberman. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 10000 iterations.
19
![Page 20: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/20.jpg)
UCI transfusion
Figure 5: UCI domain transfusion. Results comparing AdaBoost (blue), RATBOOST (green)and RATBOOSTE (purple). Note: there is no other stopping criterion apart from running forT = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 6: UCI domain transfusion. Results comparing AdaBoost (blue), RATBOOST (green)and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb(red), for various values of the quantization index bit-size b. Note: there is no other stoppingcriterion apart from running for T = 10000 iterations.
20
![Page 21: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/21.jpg)
UCI banknote
Figure 7: UCI domain banknote. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 8: UCI domain banknote. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 10000 iterations.
21
![Page 22: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/22.jpg)
UCI breastwisc
Figure 9: UCI domain breastwisc. Results comparing AdaBoost (blue), RATBOOST (green)and RATBOOSTE (purple). Note: there is no other stopping criterion apart from running forT = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 10: UCI domain breastwisc. Results comparing AdaBoost (blue), RATBOOST (green)and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb(red), for various values of the quantization index bit-size b. Note: there is no other stoppingcriterion apart from running for T = 10000 iterations.
22
![Page 23: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/23.jpg)
UCI ionosphere
Figure 11: UCI domain ionosphere. Results comparing AdaBoost (blue), RATBOOST (green)and RATBOOSTE (purple). Note: there is no other stopping criterion apart from running forT = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 12: UCI domain ionosphere. Results comparing AdaBoost (blue), RATBOOST (green)and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb(red), for various values of the quantization index bit-size b. Note: there is no other stoppingcriterion apart from running for T = 10000 iterations.
23
![Page 24: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/24.jpg)
UCI sonar
Figure 13: UCI domain sonar. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 14: UCI domain sonar. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
24
![Page 25: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/25.jpg)
UCI yeast
Figure 15: UCI domain yeast. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 16: UCI domain yeast. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
25
![Page 26: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/26.jpg)
UCI winered
Figure 17: UCI domain winered. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 18: UCI domain winered. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 10000 iterations.
26
![Page 27: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/27.jpg)
UCI cardiotocography
Figure 19: UCI domain cardiotocography. Results comparing AdaBoost (blue), RAT-BOOST (green) and RATBOOSTE (purple). Note: there is no other stopping criterion apart fromrunning for T = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 20: UCI domain cardiotocography. Results comparing AdaBoost (blue), RAT-BOOST (green) and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) /RATBOOSTSb (red), for various values of the quantization index bit-size b. Note: there is no otherstopping criterion apart from running for T = 10000 iterations.
27
![Page 28: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/28.jpg)
UCI CreditCardSmall
Figure 21: UCI domain creditcardsmall. Results comparing AdaBoost (blue), RAT-BOOST (green) and RATBOOSTE (purple). Note: there is no other stopping criterion apart fromrunning for T = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 22: UCI domain creditcardsmall. Results comparing AdaBoost (blue), RAT-BOOST (green) and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) /RATBOOSTSb (red), for various values of the quantization index bit-size b. Note: there is no otherstopping criterion apart from running for T = 10000 iterations.
28
![Page 29: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/29.jpg)
UCI abalone
Figure 23: UCI domain abalone. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 24: UCI domain abalone. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 10000 iterations.
29
![Page 30: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/30.jpg)
UCI qsar
Figure 25: UCI domain qsar. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 26: UCI domain qsar. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
30
![Page 31: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/31.jpg)
UCI winewhite
Figure 27: UCI domain winewhite. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 28: UCI domain winewhite. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 10000 iterations.
31
![Page 32: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/32.jpg)
UCI page
Figure 29: UCI domain page. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 30: UCI domain page. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
32
![Page 33: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/33.jpg)
UCI mice
Figure 31: UCI domain mice. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 32: UCI domain mice. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
33
![Page 34: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/34.jpg)
UCI hill+noise
Figure 33: UCI domain hill+noise. Results comparing AdaBoost (blue), RATBOOST (green)and RATBOOSTE (purple). Note: there is no other stopping criterion apart from running forT = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 34: UCI domain hill+noise. Results comparing AdaBoost (blue), RATBOOST (green)and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb(red), for various values of the quantization index bit-size b. Note: there is no other stoppingcriterion apart from running for T = 10000 iterations.
34
![Page 35: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/35.jpg)
UCI hill+nonoise
Figure 35: UCI domain hill+nonoise. Results comparing AdaBoost (blue), RATBOOST (green)and RATBOOSTE (purple). Note: there is no other stopping criterion apart from running forT = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 36: UCI domain hill+nonoise. Results comparing AdaBoost (blue), RATBOOST (green)and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb(red), for various values of the quantization index bit-size b. Note: there is no other stoppingcriterion apart from running for T = 10000 iterations.
35
![Page 36: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/36.jpg)
UCI firmteacher
Figure 37: UCI domain firmteacher. Results comparing AdaBoost (blue), RATBOOST (green)and RATBOOSTE (purple). Note: there is no other stopping criterion apart from running forT = 10000 iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 38: UCI domain firmteacher. Results comparing AdaBoost (blue), RATBOOST (green)and the quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb(red), for various values of the quantization index bit-size b. Note: there is no other stoppingcriterion apart from running for T = 10000 iterations.
36
![Page 37: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/37.jpg)
UCI magic
Figure 39: UCI domain magic. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 40: UCI domain magic. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
37
![Page 38: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/38.jpg)
UCI eeg
Figure 41: UCI domain eeg. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 42: UCI domain eeg. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
38
![Page 39: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/39.jpg)
UCI skin
Figure 43: UCI domain skin. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 44: UCI domain skin. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
39
![Page 40: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/40.jpg)
UCI musk
Figure 45: UCI domain musk. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 46: UCI domain musk. Results comparing AdaBoost (blue), RATBOOST (green) and thequantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red), forvarious values of the quantization index bit-size b. Note: there is no other stopping criterion apartfrom running for T = 10000 iterations.
40
![Page 41: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/41.jpg)
UCI hardware
Figure 47: UCI domain hardware. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 10000iterations.
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 48: UCI domain hardware. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 10000 iterations.
41
![Page 42: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/42.jpg)
UCI twitter
Figure 49: UCI domain twitter. Results comparing AdaBoost (blue), RATBOOST (green) andRATBOOSTE (purple). Note: there is no other stopping criterion apart from running for T = 5000iterations (AdaBoost) and T ′ = 1000 iterations (RATBOOST, RATBOOSTE).
b = 2 b = 3 b = 4 b = 5 b = 6
Figure 50: UCI domain twitter. Results comparing AdaBoost (blue), RATBOOST (green) andthe quantized versions RATBOOSTAb (black) / RATBOOSTQb (thin orange) / RATBOOSTSb (red),for various values of the quantization index bit-size b. Note: there is no other stopping criterionapart from running for T = 5000 iterations (AdaBoost) and T ′ = 1000 iterations (RATBOOST,RATBOOSTAb, RATBOOSTQb, RATBOOSTSb).
42
![Page 43: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/43.jpg)
Summary of Results
ReferencesBlake, C. L., Keogh, E., and Merz, C. UCI repository of machine learning databases, 1998.
http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Buja, A., Stuetzle, W., and Shen, Y. Loss functions for binary class probability estimation andclassification: structure and applications, 2005. Technical Report, University of Pennsylvania.
Kearns, M. and Mansour, Y. On the boosting ability of top-down decision tree learning algorithms.In Proc. of the 28th ACM STOC, pp. 459–468, 1996.
Kearns, M. J. and Mansour, Y. A Fast, Bottom-up Decision Tree Pruning algorithm with Near-Optimal generalization. In Proc. of the 15th International Conference on Machine Learning, pp.269–277, 1998.
Nock, R. and Nielsen, F. A Real Generalization of discrete AdaBoost. In Proc. of the 17th EuropeanConference on Artificial Intelligence, pp. 509–515, 2006.
Nock, R. and Nielsen, F. On the efficient minimization of classification-calibrated surrogates. InNIPS*21, pp. 1201–1208, 2008.
Reid, M.-D. and Williamson, R.-C. Composite binary losses. JMLR, 11:2387–2422, 2010.
Schapire, R. E. and Singer, Y. Improved boosting algorithms using confidence-rated predictions.MLJ, 37:297–336, 1999.
Schervish, M.-J. A general method for comparing probability assessors. Ann. of Stat., 17(4):1856–1879, 1989.
Shuford, Jr, E.-H., Albert, A., and Massengil, H.-E. Admissible probability measurement procedures.Psychometrika, 31:125–145, 1966.
43
![Page 44: Lossless or Quantized Boosting with Integer Arithmetic ...](https://reader035.fdocuments.in/reader035/viewer/2022071012/62c93a66d1cc190edf3690b3/html5/thumbnails/44.jpg)
AdaBoost
ADABOOSTR
RATBOOST
RATBOOSTE
RA
TB
OO
ST
Qb,b
=R
AT
BO
OS
TS b
,b=
RA
TB
OO
STA
b,b
=
23
45
62
34
56
23
45
6F
38.0
0±10
.33
37.0
0±9.
4940
.00±
9.43
40.0
0±11
.55
47.0
0±14
.94
47.0
0±14
.94
39.0
0±15
.24
42.0
0±11
.35
42.0
0±11
.35
38.0
0±7.
8946
.00±
18.3
839
.00±
7.38
47.0
0±14
.18
52.0
0±16
.19
41.0
0±7.
3846
.00±
12.6
543
.00±
9.49
47.0
0±14
.94
39.0
0±8.
76H
25.5
3±8.
7925
.53±
8.79
25.8
5±8.
3226
.81±
9.71
25.8
4±9.
8325
.84±
9.83
25.8
4±9.
8326
.48±
9.07
26.8
1±9.
7125
.84±
9.34
26.5
1±8.
8626
.49±
8.96
25.5
2±9.
6526
.17±
10.0
125
.52±
9.65
25.5
2±8.
9029
.80±
11.7
828
.18±
11.3
625
.84±
8.31
T38
.78±
6.86
39.0
5±6.
6838
.92±
7.15
34.9
1±7.
2539
.18±
7.02
39.1
8±7.
2238
.53±
9.35
38.5
2±7.
2040
.66±
7.98
30.2
4±8.
2734
.65±
8.25
35.3
1±6.
6237
.58±
5.44
38.2
4±6.
9233
.59±
8.06
30.9
1±8.
4432
.37±
6.14
32.7
8±5.
5735
.71±
5.39
B2.
70±
1.62
2.63±
1.55
2.99±
1.70
4.89±
1.89
15.4
6±2.
5915
.46±
2.59
15.4
6±2.
5913
.64±
2.96
13.9
3±2.
977.
73±
3.12
4.45±
2.32
2.77±
1.78
2.99±
2.08
3.43±
1.33
12.5
4±3.
2211
.01±
4.25
8.53±
2.85
5.32±
2.07
3.57±
2.16
BW
2.86±
2.78
2.86±
2.78
3.29±
2.52
3.14±
2.59
10.0
2±3.
834.
01±
3.07
4.72±
2.94
4.15±
2.56
3.58±
2.72
2.86±
2.13
3.14±
2.68
3.00±
2.73
2.86±
2.43
3.00±
2.65
4.29±
3.09
4.29±
2.85
3.29±
2.34
3.29±
1.91
3.00±
2.37
I11
.39±
4.01
11.1
1±3.
9111
.68±
3.92
12.5
4±5.
2625
.37±
5.82
15.1
0±4.
2813
.67±
4.82
14.8
2±5.
8513
.69±
4.25
12.8
3±3.
6413
.97±
4.58
12.8
2±4.
8913
.97±
4.38
13.4
0±5.
4114
.53±
4.94
11.4
0±2.
7211
.96±
2.21
11.4
1±4.
4913
.40±
3.60
S20
.67±
7.12
20.6
7±7.
1221
.64±
6.47
25.4
8±9.
8830
.69±
12.3
025
.88±
15.0
225
.50±
11.5
027
.40±
11.2
327
.38±
9.72
24.0
2±8.
7128
.40±
8.74
28.8
6±9.
1128
.38±
9.25
26.0
7±9.
2923
.10±
9.62
22.6
2±9.
8826
.90±
7.79
28.8
3±9.
3724
.05±
11.7
6Y
48.1
8±4.
4348
.18±
4.43
48.5
9±4.
5934
.04±
7.09
48.5
2±4.
0048
.52±
4.00
48.7
9±3.
4749
.06±
4.15
48.4
5±4.
6049
.33±
3.83
47.1
7±5.
1846
.23±
4.47
47.1
1±4.
4746
.77±
3.67
48.7
9±2.
6449
.53±
3.18
47.7
1±4.
0248
.72±
3.69
49.3
3±3.
97W
R26
.14±
3.02
26.1
4±3.
1525
.45±
3.70
26.2
7±3.
1830
.77±
3.48
27.5
2±3.
1327
.46±
3.66
26.8
3±4.
3027
.02±
4.23
27.0
8±3.
3626
.71±
4.02
26.2
0±3.
9426
.20±
4.04
26.8
9±4.
0827
.64±
3.45
27.1
4±4.
0227
.14±
2.67
26.2
1±3.
5125
.96±
3.39
Ca
41.6
3±4.
6241
.58±
4.55
39.2
3±4.
4637
.91±
2.88
45.8
6±2.
0445
.86±
2.04
42.4
3±3.
0642
.80±
3.09
42.4
7±2.
8042
.06±
4.33
36.0
8±2.
3838
.10±
1.93
40.5
9±3.
1642
.00±
2.27
38.0
5±3.
8937
.77±
4.40
38.6
2±4.
9836
.60±
4.20
37.1
1±3.
54C
CS
40.0
0±4.
6239
.90±
4.70
40.9
0±3.
3139
.90±
4.56
57.9
0±4.
1257
.90±
4.12
57.6
0±4.
2253
.40±
7.82
42.1
0±5.
9539
.90±
3.87
35.3
0±5.
3637
.00±
3.40
36.5
0±2.
8836
.40±
4.40
42.6
0±4.
5839
.60±
5.27
43.3
0±3.
6541
.40±
4.90
40.9
0±4.
33A
b21
.64±
1.81
21.6
2±1.
8621
.35±
1.60
22.1
0±1.
4124
.18±
1.55
24.1
8±1.
5524
.18±
1.55
24.3
7±1.
4724
.40±
1.42
22.8
6±1.
5421
.81±
1.17
22.0
7±1.
4821
.83±
1.54
21.5
2±1.
4624
.28±
1.40
23.0
1±1.
9822
.82±
1.02
22.6
0±1.
6121
.74±
1.45
Q22
.47±
6.54
22.3
7±6.
5019
.81±
5.14
20.4
8±5.
5531
.47±
4.82
23.9
9±4.
8724
.65±
4.45
22.5
6±4.
9522
.65±
4.44
22.4
7±5.
4921
.33±
4.53
20.4
7±4.
8220
.28±
2.64
22.3
7±5.
7224
.55±
4.60
22.0
0±4.
4223
.22±
4.56
22.4
7±6.
0521
.24±
5.74
WW
30.3
6±2.
1830
.32±
2.09
29.7
7±1.
9529
.64±
2.03
35.8
7±2.
0531
.69±
1.82
31.7
3±1.
7631
.56±
2.04
31.6
9±1.
6831
.46±
2.12
31.9
3±1.
9531
.63±
1.89
31.2
8±1.
8931
.58±
2.23
31.3
0±2.
3029
.87±
2.44
29.4
4±2.
1330
.38±
1.64
29.7
5±1.
23P
19.2
6±1.
9119
.24±
1.84
6.01±
1.18
7.80±
1.45
35.6
1±1.
9328
.69±
2.38
22.3
3±1.
8721
.98±
1.47
21.1
4±1.
7711
.04±
2.20
23.5
5±2.
3823
.61±
2.31
19.9
9±1.
7514
.80±
2.04
22.3
5±1.
6810
.12±
2.47
8.04±
1.91
7.23±
1.19
7.22±
1.07
Mi
4.07±
2.15
3.89±
2.04
4.44±
2.30
7.41±
3.55
26.1
1±4.
3223
.15±
6.19
13.8
9±4.
0511
.02±
3.74
10.0
9±3.
7711
.94±
3.61
11.4
8±2.
239.
54±
2.73
8.80±
2.32
8.33±
2.90
13.7
0±2.
7211
.02±
2.67
8.70±
3.27
7.87±
2.66
7.31±
2.97
H+n
41.9
1±5.
9641
.91±
5.96
35.1
5±5.
3239
.93±
5.56
49.2
5±4.
8549
.25±
4.85
49.3
4±4.
8749
.34±
4.87
44.0
6±6.
7945
.72±
6.26
42.7
4±6.
7540
.26±
4.07
39.2
0±6.
4040
.18±
5.64
42.9
9±5.
5440
.84±
6.88
36.8
0±7.
1635
.23±
8.04
28.9
6±9.
20H
+nn
41.9
9±5.
4541
.99±
5.45
32.9
1±5.
0737
.95±
4.98
48.7
6±4.
7848
.76±
4.78
48.7
6±4.
7848
.76±
4.78
41.9
9±8.
4042
.58±
6.41
46.2
8±5.
5332
.93±
8.25
35.9
8±7.
0837
.47±
5.10
41.6
6±4.
2637
.87±
5.86
20.4
6±8.
1428
.22±
12.4
019
.63±
8.76
Ft12
.23±
0.93
12.3
9±0.
9012
.33±
0.85
13.5
6±1.
0733
.78±
1.62
33.7
8±1.
6219
.81±
1.48
15.3
1±1.
1413
.74±
0.75
17.1
3±1.
1312
.83±
0.81
13.0
4±0.
7413
.37±
1.11
12.5
6±0.
8221
.12±
1.42
15.4
6±0.
9313
.54±
0.80
12.4
5±0.
6712
.57±
0.90
Ma
21.0
0±1.
0021
.01±
0.93
20.9
1±0.
9720
.94±
0.98
26.4
1±0.
9721
.41±
0.88
21.4
5±0.
9321
.46±
0.92
21.1
1±0.
8820
.95±
0.93
20.9
3±1.
0420
.94±
1.08
21.0
1±1.
0620
.94±
0.96
21.4
5±0.
8821
.38±
0.91
21.0
3±0.
9121
.08±
0.94
21.0
1±0.
96E
45.5
5±1.
4845
.55±
1.49
43.4
8±1.
3642
.92±
0.81
47.2
6±1.
4347
.12±
1.39
46.4
6±1.
8845
.23±
1.93
44.0
7±1.
4545
.63±
1.71
44.6
4±1.
6344
.28±
1.36
44.7
5±1.
0644
.51±
1.80
44.8
3±1.
5044
.06±
1.82
43.4
0±1.
0142
.60±
1.19
42.2
5±0.
72Sk
9.62±
0.22
10.1
8±0.
2910
.74±
0.21
9.65±
0.23
33.9
7±0.
2933
.97±
0.29
18.7
9±0.
3110
.68±
0.24
9.87±
0.23
9.74±
0.24
9.61±
0.23
9.62±
0.23
9.61±
0.23
9.61±
0.23
9.62±
0.23
9.61±
0.23
7.99±
1.46
7.19±
0.75
6.77±
0.34
Mu
23.3
6±1.
1923
.28±
1.24
19.4
8±1.
1222
.26±
1.19
46.0
7±5.
1739
.18±
6.37
29.4
6±2.
1228
.22±
1.87
26.2
0±1.
4632
.92±
2.97
32.1
1±2.
7232
.80±
2.15
30.9
6±1.
9628
.13±
2.28
28.5
4±2.
0327
.25±
2.59
25.4
0±2.
7723
.79±
1.66
24.7
2±1.
04H
a1.
94±
0.23
3.11±
0.31
3.11±
0.33
1.76±
0.25
9.73±
0.44
2.28±
0.21
2.28±
0.21
2.29±
0.21
2.28±
0.20
1.80±
0.32
6.28±
0.28
4.85±
0.21
3.85±
0.30
2.72±
0.19
2.29±
0.23
2.26±
0.23
2.14±
0.34
1.66±
0.19
1.66±
0.17
Tw7.
45±
0.08
7.45±
0.08
4.42±
0.11
4.72±
0.10
6.63±
0.07
6.55±
0.08
6.55±
0.08
6.25±
0.09
5.65±
0.14
4.34±
0.11
5.80±
0.09
5.21±
0.15
5.07±
0.12
5.07±
0.11
5.21±
0.19
5.13±
0.23
5.10±
0.12
4.97±
0.12
4.90±
0.15
Tabl
e2:
Com
plet
ere
sults
forT
able
1(i
nm
ain
file)
.Dom
ains
orde
red
follo
win
gTa
ble
1(i
nth
isSM
).E
ach
resu
ltis
the
aver
age
+st
ddev
ofth
ecl
assi
fiers
reta
ined
atea
chC
Vfo
ld.T
hecl
assi
fierr
etai
ned
atea
chfo
ldis
the
one
min
imiz
ing
the
empi
rica
lris
kam
ong
theT,T′
boos
ting
itera
tions
.
44