Research Article Instance Transfer Learning with...

Research ArticleInstance Transfer Learning withMultisource Dynamic TrAdaBoost

Qian Zhang, Haigang Li, Yong Zhang, and Ming Li

School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China

Correspondence should be addressed to Qian Zhang; [email protected]

Received 8 April 2014; Revised 5 July 2014; Accepted 11 July 2014; Published 24 July 2014

Academic Editor: Juan R. Rabunal

Copyright © 2014 Qian Zhang et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Since the transfer learning can employ knowledge in relative domains to help the learning tasks in current target domain, comparedwith the traditional learning it shows the advantages of reducing the learning cost and improving the learning efficiency. Focused onthe situation that sample data from the transfer source domain and the target domain have similar distribution, an instance transferlearning method based on multisource dynamic TrAdaBoost is proposed in this paper. In this method, knowledge from multiplesource domains is used well to avoid negative transfer; furthermore, the information that is conducive to target task learning isobtained to train candidate classifiers. The theoretical analysis suggests that the proposed algorithm improves the capability thatweight entropy drifts from source to target instances by means of adding the dynamic factor, and the classification effectiveness isbetter than single source transfer. Finally, experimental results show that the proposed algorithm has higher classification accuracy.

1. Introduction

In data mining, a general assumption for the traditionalmachine learning is that training data and test data havethe same distribution. However, in the practical application,this assumption cannot be often met [1]. By transferring andsharing different field knowledge for target task learning,transfer learning makes the traditional learning from scratchan addable one. This must improve the learning efficiencyand reduce the learning cost [2, 3]. In 2005, InformationProcessing Techniques Office (IPTO) gave a new mission oftransfer learning: the ability of a system to recognize andapply knowledge and skills learned in previous tasks to noveltasks. In this definition, transfer learning aims to extract theknowledge from one or more source tasks and apply theknowledge to a target task [2]. Since the transfer learningneeds to use information from similar domains and tasks, itseffectiveness is related to the correlation between the sourceand target domains.

However, transfer learning is more complex than tra-ditional machine learning because of the introduction oftransfer. There are many kinds of knowledge representation

in related domains, such as sample instances, feature map-ping, model parameters, and association rules. Due to thesimpleness of implement, the paper selects sample instancesas knowledge representation to design the effective transferalgorithm. In detail, instance transfer learning is used toimprove the classification accuracy by finding training sam-ples in other source domains which have strong correlationwith the target domain and reusing them in the learningof target task [4]. Obviously, how to decide weight of thistraining data should influence the effectiveness of candidateclassifiers [5].

Up to now, researchers have proposed several approachesto solve transfer learning problems. Ben and Schuller pro-vided a theoretical justification for multitask learning [6].Daume and Marcu studied the domain-transfer problem instatistical natural language processing by using a specificGaussian model [7]. Wu and Dietterich proposed an imageclassification algorithm by using both inadequate trainingdata and plenty of low quality auxiliary data [8]. Thisalgorithm demonstrates some improvement by using theauxiliary data, but it does not give a quantitative study usingdifferent auxiliary examples. Liao et al. proposed a new active

Hindawi Publishing Corporatione Scientific World JournalVolume 2014, Article ID 282747, 8 pageshttp://dx.doi.org/10.1155/2014/282747

2 The Scientific World Journal

learning method to select the unlabeled data in a targetdomain to be labeled with the help of the source domaindata [9]. Rosenstein et al. proposed a hierarchical Naive Bayesapproach for transfer learning by using auxiliary data anddiscussed the applying time problem of transfer learning [10].

Transfer AdaBoost algorithm, also called TrAdaBoost, isa classic transfer learning algorithm which is proposed byDai et al. [11]. TrAdaBoost assumes that the source and targetdomain data use exactly the same set of features and labels,but the distributions of the data in the two domains aredifferent. In addition, TrAdaBoost assumes that, due to thedifference in distributions between the source and the targetdomains, some of the source domain data may be useful inlearning for the target domain but some of themmay not andcould even be harmful. Since TrAdaBoost relies only on onesource, its learning effects will become poor when there isa weak correlation between the source and target domains.Moreover, as the literatures [12–14] said, TrAdaBoost has theweaknesses of weight mismatch, introducing imbalance andrapid convergence of source weights. The purpose of thispaper is to remove the weight drift phenomenon efficiently,improve learning efficiency, and inhibit the negative transfer.

2. Multisource DynamicTrAdaBoost Algorithm

Considering the correlation between multiple sourcedomains and the target domain, recently Yao and Dorettoproposed multisource TrAdaBoost (MSTrA) transferlearning algorithms [15]. As an instance-based transferlearning method, MSTrA selects its training samples fromdifferent source domains. At each iteration, MSTrA alwaysselects the most related source domain to train the weakclassifier. Although this can ensure that the knowledgetransferred is relevant to the target task, MSTrA ignoreseffects of other source domains. Samir and Chandanproposed an algorithm (DTrAdaBoost) with an integrateddynamic cost to resolve a major issue in the boosting-basedtransfer algorithm, TrAdaBoost [16].This issue causes sourceinstances to converge before they can be used for transferlearning. But DTrAdaBoost has low efficiency of learning.

In order to overcome the above disadvantage, a multi-source dynamic TrAdaBoost algorithm (MSDTrA) is pro-posed. By using this algorithm, the rate of convergenceof source sample weight will be reduced based on weakcorrelation to target domain [17]. Supposing there are 𝑁

source domains, 𝐷𝑆1

, . . . , 𝐷𝑆𝑁

; 𝑁 source tasks, 𝑇𝑎1

, . . . , 𝑇𝑎𝑁

;and 𝑁 source training data, 𝐷𝑎

1

, . . . , 𝐷𝑎𝑁

, the purpose oftransfer learning is to make good use of them to improvethe learning effectiveness of the target classifier function 𝑓𝑏 :

𝑋 → 𝑌. In detail, the algorithm steps of MSDTrA aredescribed as follows.

Step 1. Initialize the weight vector (𝜔𝑎1

, . . . , 𝜔𝑎𝑁

, 𝜔𝑏), where𝜔𝑎𝑘

= (𝜔1

𝑎𝑘

, . . . , 𝜔𝑛𝑎𝑘

𝑎𝑘

) are the weight vectors of training sam-ples with 𝑘th source domain and 𝜔

𝑏= (𝜔1

𝑏, . . . , 𝜔

𝑛𝑏

𝑏) are the

weight vectors of training samples in target domain.

Step 2. Set the value of 𝛽𝑎as follows:

𝛽𝑎

=1

1 + √2 ∗ ln (𝑛𝑎) /𝑀

, (1)

where 𝑛𝑎 = ∑

𝑘𝑛𝑎𝑘

is the number of all source domainstraining samples and 𝑛𝑎

𝑘

is the sample number of training setswith 𝑘th source domain.

Step 3. Empty the set of candidate weak classifiers andnormalize the weight vectors (𝜔𝑎

1

, . . . , 𝜔𝑎𝑁

, 𝜔𝑏) to 1.

Step 4. Select a base learner to obtain the candidate weakclassifiers (𝑓

𝑡

𝑏)𝑘 based on training set 𝐷𝑎

𝑘

∪ 𝐷𝑏; calculate theerror of (𝑓𝑡

𝑏)𝑘 on 𝐷

𝑏according to the following equation:

(𝜀𝑡

𝑏)𝑘

=

𝑛𝑏

∑

𝑗=1

𝜔𝑗

𝑏∑𝑁

𝑘=1[𝑦𝑗

𝑏= (𝑓𝑡

𝑏)𝑘

⋅ 𝑥𝑗

𝑏]

∑𝑛𝑏

𝑖=1𝜔𝑖

𝑏

; (2)

update the weight of (𝑓𝑡

𝑏)𝑘 by using the vectors update

strategy:

(𝜔𝑡

𝑏)𝑘

=𝑒1−(𝜀𝑡

𝑏)𝑘

𝑒(𝜀𝑡

𝑏)𝑘

. (3)

Repeat the above method until all source domains aretraversed, where (𝜀

𝑡

𝑏)𝑘 is the error rate of candidate weak

classifiers with 𝑘th source domains in target domain. 𝑦𝑗

𝑏=

(𝑓𝑡

𝑏)𝑘

⋅ 𝑥𝑗

𝑏stands for error classified with the candidate

weak classifiers. According to the vectors update strategyabove, the error of each weak classifier in the target trainingset is computed and a weight is assigned to each weakclassifier according to the error. The larger the error is, thesmaller the weight becomes. In other words, source domainswhich correspond to those classifiers with high classificationaccuracy contain much valuable information for the learningof target task.

Step 5. Integrate all weighted weak classifiers to obtain acandidate classifier at the 𝑡th iteration:

𝑓𝑡

𝑏= ∑

𝑘

(𝜔𝑡

𝑏)𝑘

∑𝑘(𝜔𝑡

𝑏)𝑘(𝑓𝑡

𝑏)𝑘

, (4)

where the classification error of 𝑓𝑡𝑏on 𝐷𝑏at iteration 𝑡 is

𝜀𝑡

𝑏=

𝑛𝑏

∑

𝑗=1

𝜔𝑗

𝑏∑𝑀

𝑡=1[𝑦𝑗

𝑏= 𝑓𝑡

𝑏⋅ 𝑥𝑗

𝑏]

∑𝑛𝑏

𝑖=1𝜔𝑖

𝑏

, (5)

where 𝜀𝑡

𝑏must be less than 0.5. Then, calculate the errors

of the candidate classifier on the source and target trainingsets, based on which update the weights of training sampleson the source and target domains. For the correct classifiedsource training samples, their corresponding weights keepunchanged.

The Scientific World Journal 3

Step 6. Set

𝛽𝑡

𝑏=

𝜀𝑡

𝑏

1 − 𝜀𝑡

𝑏

, 𝐶𝑡= 2 (1 − 𝜀

𝑡

𝑏) , 0 ≤ 𝜀

𝑡

𝑏≤

1

2, (6)

where 𝐶𝑡= 2(1 − 𝜀

𝑡

𝑏) is the expression of dynamic factor 𝐶

𝑡.

AndTheorem 1 will provide the deduce process.

Step 7. Update the weight vector of source samples accordingto the following rule:

𝜔(𝑡+1)⋅𝑖

𝑎𝑘

= 𝐶𝑡⋅ 𝜔𝑡⋅𝑖

𝑎𝑘

⋅ (𝛽𝑎)∑𝑀

𝑡=1[𝑦𝑗

𝑏=𝑓𝑡

𝑏⋅𝑥𝑗

𝑏], 𝑖 ∈ 𝐷

𝑎𝑘

. (7)

Update the weight of target samples according to the rule:

𝜔(𝑡+1)⋅𝑖

𝑏= 𝜔𝑡⋅𝑖

𝑏⋅ (𝛽𝑡

𝑏)∑𝑀

𝑡=1[𝑦𝑗

𝑏=𝑓𝑡

𝑏⋅𝑥𝑗

𝑏]

, 𝑖 ∈ 𝐷𝑏,(8)

where the weight update of the source instances usesthe weighted majority algorithm (WMA) mechanism. Thisupdated mechanism is computed by 𝛽

𝑎and 𝐶

𝑡. The target

instance weights are updated by using 𝜀𝑡

𝑏, which is calculated

on Step 6.

Step 8. Retrain all weak classifiers using the training sampleswith updated weights. If the maximum number of iterationsis reached, 𝑡 < 𝑀, return to Step 3; otherwise, turn to Step 9.

Step 9. Decide the final strong classifier

𝑓𝑏= sign{

𝑀

∏

𝑡=1

[(𝛽𝑡

𝑏)−𝑓𝑡

𝑏

] −

𝑀

∏

𝑡=1

[(𝛽𝑡

𝑏)−1/2

]} . (9)

In theMSDTrA algorithm, TrAdaBoost’s ensemble learn-ing is selected to train classifiers based on the combinationset of source and target instances in every step. WMA is usedto adjust weights of the source set by decreasing the weight ofmisclassified source instances and preserving current weightsof correctly classified source instances.

It can be seen from the above algorithm that theMSDTrAallows all source training samples to participate in learningprocess at each iteration, and different source training sam-ples are assigned different weights. If a source training samplecan improve the learning of target task, it will be assigneda large weight. Overall, the MSDTrA takes full advantage ofall useful knowledge from all source domains, and this canobviously enhance the learning effectiveness of target task.

3. Theoretical Analysis

The previous section introduced in detail the proposed newalgorithm, that is, the instance transfer learning algorithm. Inthis section, related theory analyses will be given accordingto single source TrAdaBoost algorithm [13]. First, Theorems1 and 2 will proof the influence of source and target sampleweight vectors with dynamic factor in source weight, respec-tively.

Theorem 1. A dynamic factor of 𝐶𝑡= 2(1 − 𝜀

𝑡

𝑏) that is applied

to the source weights can prevent their weight drift and get theweight vector to update mechanism of source sample.

Proof. Set 𝐴 is sum of correctly classified target weights atboosting iteration 𝑡 + 1 and 𝐵 is sum of misclassified targetweights at boosting iteration 𝑡 + 1. Consider

𝐴 = 𝑛𝑏⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏) ⋅ (

𝜀𝑡

𝑏

1 − 𝜀𝑡

𝑏

)

∑𝑀

𝑡=1[𝑦𝑗

𝑏=𝑓𝑡

𝑏⋅𝑥𝑗

𝑏]

= 𝑛𝑏⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏)

if𝑀

∑

𝑡=1

[𝑦𝑗

𝑏= 𝑓𝑡

𝑏⋅ 𝑥𝑗

𝑏] = 0,

𝐵 = 𝑛𝑏⋅ 𝜔𝑡

𝑏⋅ 𝜀𝑡

𝑏⋅ (

𝜀𝑡

𝑏

1 − 𝜀𝑡

𝑏

)

∑𝑀

𝑡=1[𝑦𝑗

𝑏=𝑓𝑡

𝑏⋅𝑥𝑗

𝑏]

= 𝑛𝑎⋅ 𝜔𝑡

𝑎𝑘

(1 − 𝜀𝑡

𝑏)

if𝑀

∑

𝑡=1

[𝑦𝑗

𝑏= 𝑓𝑡

𝑏⋅ 𝑥𝑗

𝑏] = 1.

(10)

Substituting for 𝐴 and 𝐵 to simplify the source update ofTrAdaBoost, we have

𝜔𝑡+1

𝑎𝑘

=

𝜔𝑡

𝑎𝑘

𝑛𝑎⋅ 𝜔𝑡𝑎𝑘

+ 𝐴 + 𝐵=

𝜔𝑡

𝑎𝑘

𝑛𝑎 ⋅ 𝜔𝑡

𝑎𝑘

+ 2 ⋅ 𝑛𝑏 ⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏).

(11)

Introducing the correction factor into theWMA, becauseof 𝜔𝑡+1𝑎𝑘

= 𝜔𝑡

𝑎𝑘

, we have

𝜔𝑡

𝑎𝑘

=

𝐶𝑡⋅ 𝜔𝑡

𝑎𝑘

𝐶𝑡⋅ 𝑛𝑎⋅ 𝜔𝑡𝑎𝑘

+ 2 ⋅ 𝑛𝑏⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏),

𝐶𝑡 =2 ⋅ 𝑛𝑏⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏)

1 − 𝑛𝑎⋅ 𝜔𝑡𝑎𝑘

=2 ⋅ 𝑛𝑏⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏)

𝑛𝑏⋅ 𝜔𝑡

𝑏

= 2 (1 − 𝜀𝑡

𝑏) .

(12)

Theorem2. Thedynamic factor of𝐶𝑡= 2(1−𝜀

𝑡

𝑏) that is applied

to the source weights makes the target weights converge asoutlined by TrAdaBoost.

Proof. In TrAdaBoost, without any source instances (𝑛𝑎

=

0), target weights for correctly classified instances will beupdated as

𝜔𝑡+1

𝑏=

𝜔𝑡

𝑏

∑𝑛𝑏

𝑗=1𝜔𝑡

𝑏⋅ (𝜀𝑡

𝑏/(1 − 𝜀

𝑡

𝑏))∑𝑀

𝑡=1[𝑦𝑗

𝑏=𝑓𝑡

𝑏⋅𝑥𝑗

𝑏]

=𝜔𝑡

𝑏

𝐴 + 𝐵

=𝜔𝑡

𝑏

2 ⋅ 𝑛𝑏 ⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏)

=𝜔𝑡

𝑏

2 (1) (1 − 𝜀𝑡

𝑏).

(13)


Applying the dynamic factor to update the sourceinstance weight, we can get the update mechanism of thetarget instance weight based on MSDTrA. Consider

𝜔𝑡+1

𝑏=

𝜔𝑡

𝑏

𝑛𝑎 ⋅ 𝜔𝑡

𝑎𝑘

+ 2𝑛𝑏 ⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏)

=𝜔𝑡

𝑏

𝐶𝑡 ⋅ 𝑛𝑎 ⋅ 𝜔𝑡

𝑎𝑘

+ 2𝑛𝑏 ⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏)

=𝜔𝑡

𝑏

2 (1 − 𝜀𝑡

𝑏) ⋅ 𝑛𝑎⋅ 𝜔𝑡𝑎𝑘

+ 2𝑛𝑏⋅ 𝜔𝑡

𝑏(1 − 𝜀

𝑡

𝑏)

=𝜔𝑡

𝑏

2 (1 − 𝜀𝑡

𝑏) (𝑛𝑎 ⋅ 𝜔

𝑡

𝑎𝑘

+ 𝑛𝑏 ⋅ 𝜔𝑡

𝑏)

=𝜔𝑡

𝑏

2 (1 − 𝜀𝑡

𝑏) (1)

.

(14)

Next, we analysis the performance of MSDTrA on thetarget training set.

Theorem 3. The final error on the target training set is

𝜀 ≤ 2𝑀

⋅

𝑀

∏

𝑡=1

√𝜀𝑡

𝑏(1 − 𝜀

𝑡

𝑏). (15)

Proof. Supposing that the final sample set which contains allmisclassified samples on the target domain is𝑇, the final erroris 𝜀 = |𝑇|/𝑛

𝑏.

At each iteration, the error on the target training set is

𝜀𝑡

𝑏= ∑

𝑘

(𝑤𝑡

𝑏)𝑘

∑𝑘(𝑤𝑡

𝑏)𝑘(𝜀𝑡

𝑏)𝑘

=∑𝑘𝑒1−2(𝜀

𝑡

𝑏)𝑘

(𝜀𝑡

𝑏)𝑘

∑𝑘𝑒1−2(𝜀

𝑡

𝑏)𝑘

, (16)

where 0 ≤ (𝜀𝑡

𝑏)𝑘

≤ 1/2.If the error on the target training set is 0, 𝜀𝑡

𝑏= 0, training

sample weights are not updated, 𝑤(𝑡+1)𝑖𝑏

= 𝑤𝑡𝑖

𝑏. If 𝜀𝑡

𝑏= 0 and

𝛽𝑡

𝑏= 𝜀𝑡

𝑏/(1−𝜀

𝑡

𝑏) = 0, the updating rule for the weights of target

training samples is as follows:𝑛𝑏

∑

𝑖=1

𝑤(𝑡+1)𝑖

𝑏=

𝑛𝑏

∑

𝑖=1

𝑤𝑡⋅𝑖

𝑏(𝛽𝑡

𝑏)1−𝜀𝑡

𝑏

≤

𝑛𝑏

∑

𝑖=1

𝑤𝑡⋅𝑖

𝑏(1 − (1 − 𝜀

𝑡

𝑏) 𝛽𝑡

𝑏) .

(17)

Then,𝑛𝑏

∑

𝑖=1

𝑤(𝑀+1)𝑖

𝑏=

𝑛𝑏

∑

𝑖=1

𝑤𝑡⋅𝑖

𝑏(𝛽𝑡

𝑏)1−𝜀𝑡

𝑏

≤

𝑛𝑏

∑

𝑖=1

𝑤𝑖

𝑏

𝑀

∏

𝑡=1

(1 − (1 − 𝜀𝑡

𝑏) 𝛽𝑡

𝑏)

:=

𝑀

∏

𝑡=1

(1 − (1 − 𝜀𝑡

𝑏) 𝛽𝑡

𝑏) .

(18)

In addition, we have the following criterion:𝑛𝑏

∑

𝑖=1

𝑤(𝑀+1)𝑖

𝑏≥ ∑

𝑖∈𝑆

𝑤𝑖

𝑏(

𝑀

∏

𝑡=1

𝛽𝑡

𝑏)

1/2

= 𝜀(

𝑀

∏

𝑡=1

𝛽𝑡

𝑏)

1/2

. (19)

Combining (18) and (19), we have

𝜀(

𝑀

∏

𝑡=1

𝛽𝑡

𝑏)

1/2

≤

𝑀

∏

𝑡=1

(1 − (1 − 𝜀𝑡

𝑏) 𝛽𝑡

𝑏) . (20)

Substituting 𝛽𝑡

𝑏= 𝜀𝑡

𝑏/(1 − 𝜀

𝑡

𝑏) into (20), we can obtain

𝜀 ≤ 2𝑀

⋅

𝑀

∏

𝑡=1

√𝜀𝑡

𝑏(1 − 𝜀

𝑡

𝑏). (21)

According to Theorem 3, because the condition of 𝜀𝑡

𝑏<

0.5 is satisfied in the algorithm, the error in final targettraining datawill decreasewith the increase of iterations. Andthe upper bound of the associated generalization error canbe calculated by 𝜀 + 𝑂(√𝑀𝑑VC/𝑛

𝑏), where 𝑑VC is the VC-

dimension of the weak classifier model.

4. Experimental Results and Analysis

The performance of the proposed method is investigatedbased on object category recognition in this section.Withoutloss of generality, we consider the following case: a smallnumber of training samples of a target object category anda large number of training samples of other source objectcategories. For any test sample, we verify whether it belongsto the target object category or not.

4.1. Experimental Setting. For object category recognition,the Caltech 256 datasets that contain 256 object categories areconsidered. Practically, among 256 object categories, the 80categories that contain more than 50 samples are used in ourexperiment. We designate the target category and randomlydraw the samples that form the target data. The number ofsamples for training 𝑛

𝑏is limited between 1 and 50, while the

number of samples for testing is 50. Furthermore, in order toillustrate the proposed method does not depend on the dataset, we have also used the background dataset, collected viathe Google image search engine, along with the remainingcategories as our augmented background data set, to verifythe effectiveness and robustness of this method.

The remaining categories are treated as the repositoryfrom which to draw positive samples for the source data. Thenumbers of source categories or domains are varied from 1 to10 in order to investigate the performance of the classifierswith respect to the variability of domains. The number ofsamples for one source of data is 100. For each target objectcategory, the performance of the classifier is evaluated over20 random combinations of 𝑁 source object categories.Given the target and source categories, the performanceof the classifier is obtained by averaging over 20 trials ofexperiments. The overall performance of the classifier isaveraged over 20 target categories. SVM is selected as baseclassifiers and the iteration is 50.


The number of training instances


AdaBoostTrAdaBoost

MSTrAMSDTrA

Erro

r (%

)

0.6

0.7

0.8

0.9

0 5 10 15 20 25 30 35 40 45 50

1

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

ARO

C

(a) The number of source domains𝑁 = 4

The number of source domains


1 2 3 4 5 6 7 8 9 100

0.05

0.1

0.15

0.2

Erro

r (%

)

AdaBoostTrAdaBoost

MSTrAMSDTrA

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10

ARO

C(b) The number of training instances 𝑛𝑏 = 10

Figure 1: Performance comparison.

4.2. Error Analysis. Since transfer learning is not needed toget good classification results when the target data set is large,standard cross-validation method is not used here. Smallportion data of the target set are used for training, andmost ofthe remaining samples are used for testing. Figure 1 comparesAdaBoost, TrAdaBoost, MSTrA, and MSDTrA based on thearea under the receiver operating characteristic curve (ROC)with different number of target training samples (𝑛𝑏 =

{1, 10, 20, 30, 40, 50}) and different number of sources in thefield (𝑁 = {1, 4, 6, 8, 10}). Moreover, for the area bounded bythe ROC curve and the 𝑋-axis, 𝐴ROC is used to evaluate theperformance of different algorithms.

Practically, fixing the number of source domains 𝑁 =

4, Figure 1(a) shows the ROC curves of the four algorithmswith the increase of the number of training instances. SinceAdaBoost does not transfer any knowledge from the source,its performance depends mainly on the number of 𝑛

𝑏. For

a very small value of 𝑛𝑏, it performs slightly improvement

as the ROC curves show. However, due to the transferlearning mechanism, TrAdaBoost has good improvement bycombining the three sources. By incorporating the abilityto transfer knowledge from multiple individual domains,MSTrA andMSDTrAdemonstrate a significant improvementin recognition accuracy, even for a very small 𝑛

𝑏. In addition,

the performance of AdaBoost and TrAdaBoost stronglydepends on the selection of source domains and targetpositive samples, as the standard deviation of 𝐴ROC shows.

Fixing the number of training instances 𝑛𝑏 = 10,Figure 1(b) shows the ROC curves of the four algorithmswith increase of the number of source domains. We can seethat as the number of source domains increase, the 𝐴ROCof MSTrA and MSDTrA increases and the correspondingstandard deviations also decrease.This indicates an improved

0 5 10 15 20 25 30 35 40 45 500.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

Iterations

Accu

racy

(%)

MSDTrAMSTrACDASVM

DTrAdaBoostAdaBoost

Figure 2: The classification performance on the target domain.

performance in both accuracy and consistency. Since TrAd-aBoost is incapable of exploring the decision boundariesseparating multiple source domains, its performance keepsunchanged regardless of the number of source domains.

Figure 2 compares the classification performance of dif-ferent methods in the target domain. We can see thatAdaBoost algorithm does not transfer source domain knowl-edge and gets lower classification accuracy. DTrAdaBoost hasrelatively poor test results, because it only uses one source



Tim

e (s)

0 5 10 15 20 25 30 35 40 45 50

102

101

100

10−1

AdaBoostDTrAdaBoostMSDTrA

MSTrACDASVM

(a) The number of source domains𝑁 = 4

1 2 3 4 5 6 7 8 9 10

102

101

100

10−1


Tim

e (s)

AdaBoostDTrAdaBoostMSDTrA

MSTrACDASVM

(b) The number of training instances 𝑛𝑏 = 10

Figure 3: Time cost comparison on different methods.

domain training sample set and gains the least useful knowl-edge from source domain. CDASVM based on structuralrisk minimization model fully considers the source domainsample information, and thus it has good classificationaccuracy. MSTrA and MSDTrA use four different sourcesdomain training sets which containmore useful information,so they get higher testing accuracy than other algorithms.In each set of experiments, MSTrA only selects classificationwith the highest accuracy at each iteration and ignoresthe impact of other source domains to target tasks. Butby adding the dynamic factors and weighting mechanism,MSDTrA makes better use of all sources domains usefulknowledge and eliminates the influences of unrelated samplesin sources domains training set to target tasks, so it has betterperformance than MSTrA algorithm.

In order to have objective and scientific comparisonresults, hypothesis testing is used on the experimental results.Let the variables 𝑋1, 𝑋2, 𝑋3, 𝑋4, 𝑋5 denote the classificationerror rate of MSDTrA, MSTrA, CDASVM, DTrAdaBoost,and AdaBoost algorithms, respectively. Since the value of𝑋1,𝑋2,𝑋3,𝑋4,𝑋5 is subject to many random factors, we assumethat they submit to normal distribution, 𝑋

𝑖∼ 𝑁(𝜇

𝑖, 𝜎2

𝑖), 𝑖 =

1, 2, 3, 4, 5. Now, we compare the random variable means ofthese algorithms, 𝜇

𝑖(𝑖 = 1, 2, 3, 4, 5).The smaller the 𝜇

𝑖is, the

lower the expected classification error rate is and the higherthe efficiency is. Because the sample variance is the unbiasedestimation of the overall variance, the sample variance valueis used as an estimate of the generality variance. In thisexperiment the significance level 𝛼 is set as 0.01.

Table 1 shows the comparison process on 𝜇𝑖 and otherparameters. We can see from Table 1 that the expectations ofclassification error rate in MSDTrA is far below than otheralgorithms.

4.3. Time Complexity. Since several domains are used intothe learning of target task together, time complexity ofmultisource domains is more than single domain. Suppos-ing that the time complexities of training a classifier andupdating weight are 𝐶ℎ and 𝐶𝑤, respectively, the time com-plexity of AdaBoos, DTrAdaBoost, MSTrA, and MSDTrAcan be approximated to 𝐶ℎ𝑂(𝑀) + 𝐶𝑤𝑂(𝑛𝑏𝑀), 𝐶ℎ𝑂(𝑀) +

𝐶𝑤𝑂(𝑛𝑎𝑀), 𝐶

ℎ𝑂(𝑁𝑀) + 𝐶

𝑤𝑂(𝑛𝑎𝑁𝑀) and 𝐶

ℎ𝑂(𝑁𝑀) +

𝐶𝑤𝑂(𝑛𝑎𝑀). Furthermore, Figure 3 shows the average train-

ing time of the four algorithms with fixed 𝑁, 𝑛𝑏.

4.4. Dynamic Factor. This experiment will prove the effect ofdynamic factor on source weights and target weights. Herea sources domain is considered, 𝑁 = 1. In Figure 4(a),the number of instances is set as constant (𝑛𝑎 = 1000,𝑛𝑏 = 200) and the source error rate is set to zero. Accordingto the WMA, the weights should not change because of𝜀𝑡

𝑎𝑘

= 0; that is, 𝜔𝑡+1

𝑎𝑘

= 𝜔𝑡

𝑎𝑘

. When target error rates 𝜀𝑡

𝑏=

{10%, 20%, 30%, 40%}, the ratio of the weights of MSDTrAand MSTrA is plotted at different boosting iterations.

We can see from Figure 4(a) the following. (1) InMSTrA,source weights converge always even the classification resultsare correct. (2) MSDTrA matches the behavior of the WMA.(3) If dynamic factor is not applied, the smaller the value of𝜀𝑡

𝑏is and the faster the convergence rate of source weights is.

In addition, for a weak learner with 𝜀𝑡

𝑏= 10%, MSTrA is still

not able to get good performance by using over 1000 sourceinstances, even though they were never misclassified.

4.5. Rate of Convergence. Thenumber of source instanceswasset (𝑛

𝑏= 1000), and the classification error is permitted to

vary within the range of 𝜀𝑡

𝑏∈ {10% ∼ 50%}; Figure 4(b)

shows results after a single iteration with different number


Table 1: Hypothesis testing for experimental results.

Hypothesis 𝐻0:𝜇1≥ 𝜇2

𝐻1: 𝜇1< 𝜇2

𝐻0: 𝜇1≥ 𝜇3

𝐻1: 𝜇1< 𝜇3

𝐻0:𝜇1≥ 𝜇4

𝐻1:𝜇1< 𝜇4

𝐻0: 𝜇1≥ 𝜇5

𝐻1: 𝜇1< 𝜇5

Statistics 𝑈1=

𝑋1− 𝑋2

√𝜎2

1/𝑛1+ 𝜎2

2/𝑛2

𝑈2=

𝑋1− 𝑋3

√𝜎2

1/𝑛1+ 𝜎2

3/𝑛3

𝑈3=

𝑋1− 𝑋4

√𝜎2

1/𝑛1+ 𝜎2

4/𝑛4

𝑈4=

𝑋1− 𝑋5

√𝜎2

1/𝑛1+ 𝜎2

5/𝑛5

Rejection region 𝑈1≤ −𝑍

𝛼= −2.325 𝑈

2≤ −𝑍

𝛼= −2.325 𝑈

3≤ −𝑍

𝛼= −2.325 𝑈

4≤ −𝑍

𝛼= −2.325

Value of the statistic 𝑈1= −58.67 𝑈

2= −114.56 𝑈

3= −136.59 𝑈

4= −158.23

Conclusion 𝐻1: 𝜇1< 𝜇2

𝐻1:𝜇1< 𝜇3

𝐻1: 𝜇1< 𝜇4

𝐻1: 𝜇1< 𝜇5

0 2 4 6 8 10 12 14 16 18 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

The r

atio

of a

corr

ectly

clas

sified

sour

ce w

eigh

t

10%20%

30%40%

1

Iterations

(a)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50.95

0.955

0.96

0.965

0.97

0.975

0.98

0.985

0.99

0.995

1

Target classifier error

The r

atio

of a

corr

ectly

clas

sified

sour

ce w

eigh

t

nb = 10

nb = 20

nb = 50

(b)

Figure 4: (a) Results for 20 iterations with different target error rates. (b) Results after a single iteration with different number of targetinstances.

of target instances {10, 20, 50}. It can be observed that aftera single boosting iteration, the ratio of a correctly classifiedsource instances increases with the increases of 𝜀𝑡

𝑏.

5. Conclusions

Considering the situation that sample data from the transfersource domain and the target domain have similar dis-tribution, an instance transfer learning method based onmultisource dynamic TrAdaBoost is provided. By integratingwith the knowledge in multiple source domains, this methodmakes good use of the information of all source domains toguide the target task learning.Whenever candidate classifiersare trained, all the samples in all source domains are involvedin learning, and the information that is beneficial to targettask learning can be obtained, so that negative transfer canbe avoided. The theoretical analysis and experimental resultssuggest that the proposed algorithm has higher classificationaccuracy compared with several existing algorithms.

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper.

Acknowledgment

This research is supported by the Fundamental ResearchFunds for the Central Universities (2013XK09).

References

[1] H. Wang, Y. Gao, and X. G. Chen, “Transfer of reinforcementlearning: the state of the art,”Acta Electronica Sinica, vol. 36, pp.39–43, 2008.

[2] M. E. Taylor and P. Stone, “Transfer learning for reinforcementlearning domains: a survey,” Journal of Machine LearningResearch, vol. 10, pp. 1633–1685, 2009.

[3] Q. Zhang, M. Li, and Y. H. Chen, “Instance-based transferlearning method with multi-source dynamic TrAdaBoost,”


Journal of China University of Mining and Technology, vol. 43,no. 4, pp. 701–708, 2014.

[4] J. N. Meng, Research on the Application of Transfer Learning onText Classification, Dalian University of Technology, 2011.

[5] Y. Cheng, G. Cao, X. Wang, and J. Pan, “Weighted multi-sourceTrAdaBoost,” Chinese Journal of Electronics, vol. 22, no. 3, pp.505–510, 2013.

[6] D. S. Ben and R. Schuller, “Exploiting task relatedness formultiple task learning,” in Proceedings of the 16th AnnualConference on Learning Theory, pp. 567–580, Washington, DC,USA, 2008.

[7] I. Daume and D. Marcu, “Domain adaptation for statisticalclassifiers,” Journal of Artificial Intelligence Research, vol. 26, pp.101–126, 2006.

[8] P. Wu and T. G. Dietterich, “Improving SVM accuracy bytraining on auxiliary data sources,” in Proceedingsof the 21thInternational Conference on Machine Learning (ICML ’04), pp.871–878, July 2004.

[9] X. Liao, Y. Xue, and L. Carin, “Logistic regression with anauxiliary data source,” in Proceedings of the 22nd InternationalConference on Machine Learning, pp. 505–512, ACM, August2005.

[10] M. T. Rosenstein, Z. Marx, L. P. Kaelbling et al., “To transferor not to transfer,” in Proceedings of the Neural InformationProcessing Systems Workshop on Transfer Learning (NIPS ’05),p. 898, 2005.

[11] W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for transferlearning,” in Proceedings of the 24th International Conference onMachine Learning (ICML '07), pp. 193–200,NewYork,NY,USA,June 2007.

[12] D. Pardoe and P. Stone, “Boosting for regression transfer,” inProceedings of the 27th International Conference on MachineLearning (ICML '10), pp. 863–870, Haifa, Israel, June 2010.

[13] E. Eaton and M. Desjardins, “Set-based boosting for instance-level transfer,” in Proceedings of the IEEE International Confer-ence on Data Mining Workshops (ICDMW ’09), pp. 422–428,December 2009.

[14] E. Eaton, Selective knowledge transfer for machine learning [Ph.D. dissertation], University of Maryland, Baltimore, Md, USA,2009.

[15] Y. Yao and G. Doretto, “Boosting for transfer learning withmultiple sources,” in Proceedings of the 2010 IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR ’10), pp. 1855–1862, June 2010.

[16] A. S. Samir and K. R. Chandan, “Adaptive boosting for transferlearning using dynamic updates,” in Proceedings of the EuropeanConference on Machine Learning and Knowledge Discovery inDatabases, pp. 60–75, 2011.

[17] Q. Zhang, M. Li, X. S. Wang et al., “Instance-based transferlearning for multi-source domains,” Acta Automatica Sinica,vol. 40, no. 6, pp. 1175–1182, 2014.

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014


Applied Computational Intelligence and Soft Computing

Advances in

Artificial Intelligence


Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in


Research Article Instance Transfer Learning with...

Documents

Transcript of Research Article Instance Transfer Learning with...