Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple...
Transcript of Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple...
![Page 1: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/1.jpg)
, , , , , , , , , ,
Some Recent Insights on Transfer Learning
P → Q?
Samory KpotufeColumbia University
Based on work with Guillaume Martinet, and (ongoing) Steve Hanneke
![Page 2: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/2.jpg)
, , , , , , , , , ,
Transfer Learning:
Given data {Xi, Yi} ∼i.i.d. P , produce a classifier for (X,Y ) ∼ Q.
Case study: Apple Siri’s voice assistant
- Initially trained on data from American English speakers ...
- Could not understand 30M+ nonnative speakers in the US!
Costly Solution ≡ 5+ years acquiring more data and retraining!
A Main Practical Goal:
Cheaply transfer ML software between related populations.
![Page 3: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/3.jpg)
, , , , , , , , , ,
Transfer Learning:
Given data {Xi, Yi} ∼i.i.d. P , produce a classifier for (X,Y ) ∼ Q.
Case study: Apple Siri’s voice assistant
- Initially trained on data from American English speakers ...
- Could not understand 30M+ nonnative speakers in the US!
Costly Solution ≡ 5+ years acquiring more data and retraining!
A Main Practical Goal:
Cheaply transfer ML software between related populations.
![Page 4: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/4.jpg)
, , , , , , , , , ,
Transfer Learning:
Given data {Xi, Yi} ∼i.i.d. P , produce a classifier for (X,Y ) ∼ Q.
Case study: Apple Siri’s voice assistant
- Initially trained on data from American English speakers ...
- Could not understand 30M+ nonnative speakers in the US!
Costly Solution ≡ 5+ years acquiring more data and retraining!
A Main Practical Goal:
Cheaply transfer ML software between related populations.
![Page 5: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/5.jpg)
, , , , , , , , , ,
Transfer is of general relevance:
AI for Judicial Systems
- Source Population: prison inmates
- Target Population: everyone arrested
Over 60% inaccurate risk assessments on minorities(2016 Pro-Publica study)
Main Issue: Good Target data is hard or expensive to acquire
AI in medicine, Genomics, Insurance Industry, Smart cities,...
![Page 6: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/6.jpg)
, , , , , , , , , ,
Transfer is of general relevance:
AI for Judicial Systems
- Source Population: prison inmates
- Target Population: everyone arrested
Over 60% inaccurate risk assessments on minorities(2016 Pro-Publica study)
Main Issue: Good Target data is hard or expensive to acquire
AI in medicine, Genomics, Insurance Industry, Smart cities,...
![Page 7: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/7.jpg)
, , , , , , , , , ,
Transfer is of general relevance:
AI for Judicial Systems
- Source Population: prison inmates
- Target Population: everyone arrested
Over 60% inaccurate risk assessments on minorities(2016 Pro-Publica study)
Main Issue: Good Target data is hard or expensive to acquire
AI in medicine, Genomics, Insurance Industry, Smart cities,...
![Page 8: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/8.jpg)
, , , , , , , , , ,
Transfer is of general relevance:
AI for Judicial Systems
- Source Population: prison inmates
- Target Population: everyone arrested
Over 60% inaccurate risk assessments on minorities(2016 Pro-Publica study)
Main Issue: Good Target data is hard or expensive to acquire
AI in medicine, Genomics, Insurance Industry, Smart cities,...
![Page 9: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/9.jpg)
, , , , , , , , , ,
Many heuristics ... but no mature theory or principles
![Page 10: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/10.jpg)
, , , , , , , , , ,
Basic questions remain largely unanswered:
Suppose: h is trained on source data ∼ P , to be transferred to target Q.
• Is there enough information in source P about target Q?
• If not, how much new data should we collect, and how?
• Would unlabeled target data suffice? Or help at least?
![Page 11: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/11.jpg)
, , , , , , , , , ,
Basic questions remain largely unanswered:
Suppose: h is trained on source data ∼ P , to be transferred to target Q.
• Is there enough information in source P about target Q?
• If not, how much new data should we collect, and how?
• Would unlabeled target data suffice? Or help at least?
![Page 12: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/12.jpg)
, , , , , , , , , ,
Basic questions remain largely unanswered:
Suppose: h is trained on source data ∼ P , to be transferred to target Q.
• Is there enough information in source P about target Q?
• If not, how much new data should we collect, and how?
• Would unlabeled target data suffice? Or help at least?
![Page 13: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/13.jpg)
, , , , , , , , , ,
Basic questions remain largely unanswered:
Suppose: h is trained on source data ∼ P , to be transferred to target Q.
• Is there enough information in source P about target Q?
• If not, how much new data should we collect, and how?
• Would unlabeled target data suffice? Or help at least?
![Page 14: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/14.jpg)
, , , , , , , , , ,
Formal Setup:
Covariate-Shift: PX 6= QX but PY |X = QY |X .
Given: source data {Xi, Yi} ∼ PnP , target data {Xi, Yi} ∼ QnQ .
Goal: h with small target error errQ(h) = EQ 1(h(X) 6= Y ).
What is the best errQ(h) achievable in terms of nP , nQ?
Depends on distance(PX → QX)
![Page 15: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/15.jpg)
, , , , , , , , , ,
Formal Setup:
Covariate-Shift: PX 6= QX but PY |X = QY |X .
For PY |X 6= QY |X : [Scott 19], [Blanchard et al. 19], [Cai & Wei 19].
Given: source data {Xi, Yi} ∼ PnP , target data {Xi, Yi} ∼ QnQ .
Goal: h with small target error errQ(h) = EQ 1(h(X) 6= Y ).
What is the best errQ(h) achievable in terms of nP , nQ?
Depends on distance(PX → QX)
![Page 16: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/16.jpg)
, , , , , , , , , ,
Formal Setup:
Covariate-Shift: PX 6= QX but PY |X = QY |X .
Given: source data {Xi, Yi} ∼ PnP , target data {Xi, Yi} ∼ QnQ .
Goal: h with small target error errQ(h) = EQ 1(h(X) 6= Y ).
What is the best errQ(h) achievable in terms of nP , nQ?
Depends on distance(PX → QX)
![Page 17: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/17.jpg)
, , , , , , , , , ,
Formal Setup:
Covariate-Shift: PX 6= QX but PY |X = QY |X .
Given: source data {Xi, Yi} ∼ PnP , target data {Xi, Yi} ∼ QnQ .
Goal: h with small target error errQ(h) = EQ 1(h(X) 6= Y ).
What is the best errQ(h) achievable in terms of nP , nQ?
Depends on distance(PX → QX)
![Page 18: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/18.jpg)
, , , , , , , , , ,
Formal Setup:
Covariate-Shift: PX 6= QX but PY |X = QY |X .
Given: source data {Xi, Yi} ∼ PnP , target data {Xi, Yi} ∼ QnQ .
Goal: h with small target error errQ(h) = EQ 1(h(X) 6= Y ).
What is the best errQ(h) achievable in terms of nP , nQ?
Depends on distance(PX → QX)
![Page 19: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/19.jpg)
, , , , , , , , , ,
Formal Setup:
Covariate-Shift: PX 6= QX but PY |X = QY |X .
Given: source data {Xi, Yi} ∼ PnP , target data {Xi, Yi} ∼ QnQ .
Goal: h with small target error errQ(h) = EQ 1(h(X) 6= Y ).
What is the best errQ(h) achievable in terms of nP , nQ?
Depends on distance(PX → QX)
![Page 20: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/20.jpg)
, , , , , , , , , ,
Formal Setup:
Covariate-Shift: PX 6= QX but PY |X = QY |X .
Given: source data {Xi, Yi} ∼ PnP , target data {Xi, Yi} ∼ QnQ .
Goal: h with small target error errQ(h) = EQ 1(h(X) 6= Y ).
What is the best errQ(h) achievable in terms of nP , nQ?
Depends on distance(PX → QX)
![Page 21: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/21.jpg)
, , , , , , , , , ,
However, the right notion of distance(PX → QX) remains unclear...
![Page 22: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/22.jpg)
, , , , , , , , , ,
However, the right notion of distance(PX → QX) remains unclear...
Basic intuition: Transfer iseasiest if P has sufficient massin regions of large Q-mass.
6= proportions of � and M(e.g. 6= proportions of English accents)
![Page 23: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/23.jpg)
, , , , , , , , , ,
However, the right notion of distance(PX → QX) remains unclear...
Basic intuition: Transfer iseasiest if P has sufficient massin regions of large Q-mass.
6= proportions of � and M(e.g. 6= proportions of English accents)
Many foundational results quantify this intuition ...
![Page 24: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/24.jpg)
, , , , , , , , , ,
Common notions of distance(PX → QX):
• Extensions of TV: differences |PX(A)−QX(A)|, suitable A(e.g. dA divergence/Y-discrepancy of S. Ben David, M. Mohri, ...)
• Density Ratios: ratio dQX/dPX over data space(e.g., Sugiyama, Belkin, Jordan, Wainwright, ...)
How well do these notions measure transferability?
![Page 25: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/25.jpg)
, , , , , , , , , ,
Common notions of distance(PX → QX):
• Extensions of TV: differences |PX(A)−QX(A)|, suitable A(e.g. dA divergence/Y-discrepancy of S. Ben David, M. Mohri, ...)
• Density Ratios: ratio dQX/dPX over data space(e.g., Sugiyama, Belkin, Jordan, Wainwright, ...)
How well do these notions measure transferability?
![Page 26: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/26.jpg)
, , , , , , , , , ,
Common notions of distance(PX → QX):
• Extensions of TV: differences |PX(A)−QX(A)|, suitable A(e.g. dA divergence/Y-discrepancy of S. Ben David, M. Mohri, ...)
• Density Ratios: ratio dQX/dPX over data space(e.g., Sugiyama, Belkin, Jordan, Wainwright, ...)
How well do these notions measure transferability?
![Page 27: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/27.jpg)
, , , , , , , , , ,
Common notions of distance(PX → QX):
• Extensions of TV: differences |PX(A)−QX(A)|, suitable A(e.g. dA divergence/Y-discrepancy of S. Ben David, M. Mohri, ...)
• Density Ratios: ratio dQX/dPX over data space(e.g., Sugiyama, Belkin, Jordan, Wainwright, ...)
How well do these notions measure transferability?
![Page 28: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/28.jpg)
, , , , , , , , , ,
Seminal results: (TV, dA, Y-disc, dQ/dP , KL, Renyi, Wasserstein)
The closer PX is to QX =⇒ the easier Transfer is ...
However: PX far from QX ���=⇒ Transfer is Hard
These notions can be pessimistic in measuring transferability ...
![Page 29: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/29.jpg)
, , , , , , , , , ,
Seminal results: (TV, dA, Y-disc, dQ/dP , KL, Renyi, Wasserstein)
The closer PX is to QX =⇒ the easier Transfer is ...
However: PX far from QX ���=⇒ Transfer is Hard
These notions can be pessimistic in measuring transferability ...
![Page 30: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/30.jpg)
, , , , , , , , , ,
Seminal results: (TV, dA, Y-disc, dQ/dP , KL, Renyi, Wasserstein)
The closer PX is to QX =⇒ the easier Transfer is ...
However: PX far from QX ���=⇒ Transfer is Hard
These notions can be pessimistic in measuring transferability ...
![Page 31: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/31.jpg)
, , , , , , , , , ,
Seminal results: (TV, dA, Y-disc, dQ/dP , KL, Renyi, Wasserstein)
The closer PX is to QX =⇒ the easier Transfer is ...
However: PX far from QX ���=⇒ Transfer is Hard
Large TV, dA, Y-disc ≈ 1/2
These notions can be pessimistic in measuring transferability ...
![Page 32: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/32.jpg)
, , , , , , , , , ,
Seminal results: (TV, dA, Y-disc, dQ/dP , KL, Renyi, Wasserstein)
The closer PX is to QX =⇒ the easier Transfer is ...
However: PX far from QX ���=⇒ Transfer is Hard
Large dQ/dP , KL-div ≈ ∞
These notions can be pessimistic in measuring transferability ...
![Page 33: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/33.jpg)
, , , , , , , , , ,
Seminal results: (TV, dA, Y-disc, dQ/dP , KL, Renyi, Wasserstein)
The closer PX is to QX =⇒ the easier Transfer is ...
However: PX far from QX ���=⇒ Transfer is Hard
Large dQ/dP , KL-div ≈ ∞
These notions can be pessimistic in measuring transferability ...
![Page 34: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/34.jpg)
, , , , , , , , , ,
We propose a new distance(P → Q) shown to control transfer ...
![Page 35: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/35.jpg)
, , , , , , , , , ,
Relating source P to target Q [Kpo., Martinet, COLT 18]
Main intuition: PX needs mass in regions of significant QX mass.
Transfer exponent γ ≥ 0:
∀Q x,∀ r ∈ (0, 1], P (B(x, r)) ≥ C · rγ ·QX(B(x, r))
γ captures a continuum of easy to hard transfer ...
![Page 36: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/36.jpg)
, , , , , , , , , ,
Relating source P to target Q [Kpo., Martinet, COLT 18]
Main intuition: PX needs mass in regions of significant QX mass.
Transfer exponent γ ≥ 0:
∀Q x,∀ r ∈ (0, 1], P (B(x, r)) ≥ C · rγ ·QX(B(x, r))
γ captures a continuum of easy to hard transfer ...
![Page 37: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/37.jpg)
, , , , , , , , , ,
Relating source P to target Q [Kpo., Martinet, COLT 18]
Main intuition: PX needs mass in regions of significant QX mass.
Transfer exponent γ ≥ 0:
∀Q x,∀ r ∈ (0, 1], P (B(x, r)) ≥ C · rγ ·QX(B(x, r))
γ captures a continuum of easy to hard transfer ...
![Page 38: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/38.jpg)
, , , , , , , , , ,
Relating source P to target Q [Kpo., Martinet, COLT 18]
Main intuition: PX needs mass in regions of significant QX mass.
Transfer exponent γ ≥ 0:
∀Q x,∀ r ∈ (0, 1], P (B(x, r)) ≥ C · rγ ·QX(B(x, r))
γ captures a continuum of easy to hard transfer ...
![Page 39: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/39.jpg)
, , , , , , , , , ,
Relating source P to target Q [Kpo., Martinet, COLT 18]
Main intuition: PX needs mass in regions of significant QX mass.
Transfer exponent γ ≥ 0:
∀Q x,∀ r ∈ (0, 1], P (B(x, r)) ≥ C · rγ ·QX(B(x, r))
γ captures a continuum of easy to hard transfer ...
![Page 40: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/40.jpg)
, , , , , , , , , ,
Relating source P to target Q [Kpo., Martinet, COLT 18]
Main intuition: PX needs mass in regions of significant QX mass.
Transfer exponent γ ≥ 0:
∀Q x,∀ r ∈ (0, 1], P (B(x, r)) ≥ C · rγ ·QX(B(x, r))
γ captures a continuum of easy to hard transfer ...
![Page 41: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/41.jpg)
, , , , , , , , , ,
Relating source P to target Q [Kpo., Martinet, COLT 18]
Main intuition: PX needs mass in regions of significant QX mass.
Transfer exponent γ ≥ 0:
∀Q x,∀ r ∈ (0, 1], P (B(x, r)) ≥ C · rγ ·QX(B(x, r))
γ captures a continuum of easy to hard transfer ...
![Page 42: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/42.jpg)
, , , , , , , , , ,
First let’s look at extremes γ =∞ or 0
Notice that γ.= γ(P → Q) is asymmetric
(unlike TV, dA, Y-discrepancy, Wasserstein, ...)
![Page 43: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/43.jpg)
, , , , , , , , , ,
First let’s look at extremes γ =∞ or 0
Notice that γ.= γ(P → Q) is asymmetric
(unlike TV, dA, Y-discrepancy, Wasserstein, ...)
![Page 44: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/44.jpg)
, , , , , , , , , ,
First let’s look at extremes γ =∞ or 0
Notice that γ.= γ(P → Q) is asymmetric
(unlike TV, dA, Y-discrepancy, Wasserstein, ...)
![Page 45: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/45.jpg)
, , , , , , , , , ,
The continuum 0 < γ <∞:
![Page 46: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/46.jpg)
, , , , , , , , , ,
The continuum 0 < γ <∞:
γ ≡ How fast PX shifts mass away from QX -dense regions.
![Page 47: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/47.jpg)
, , , , , , , , , ,
The continuum 0 < γ <∞:
γ ≡ How fast fQ/fP goes to ∞.
![Page 48: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/48.jpg)
, , , , , , , , , ,
The continuum 0 < γ <∞:
γ ≡ Difference in support dimension.
![Page 49: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/49.jpg)
, , , , , , , , , ,
Optimistic:γ is often small when other measures are not
![Page 50: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/50.jpg)
, , , , , , , , , ,
Optimistic:γ is often small when other measures are not
Large TV, dA, Y-disc ≈ 1/2
but here typically γ ≈ 0
![Page 51: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/51.jpg)
, , , , , , , , , ,
Optimistic:γ is often small when other measures are not
Large dQX/dPX , KL-div ≈ ∞but γ = 1 ≡ dim(PX)− dim(QX)
![Page 52: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/52.jpg)
, , , , , , , , , ,
γ captures performance limits (minimax rates) under transfer ...
![Page 53: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/53.jpg)
, , , , , , , , , ,
Performance depends on γ + hardness of Q:
Easy to hard Target QX,Y classification
Easy QX,Y Hard QX,Y
Essential: Noise in QY |X , and QX -mass near decision boundary
![Page 54: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/54.jpg)
, , , , , , , , , ,
Performance depends on γ + hardness of Q:
Easy to hard Target QX,Y classification
Easy QX,Y Hard QX,Y
Essential: Noise in QY |X , and QX -mass near decision boundary
![Page 55: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/55.jpg)
, , , , , , , , , ,
Performance depends on γ + hardness of Q:
Easy to hard Target QX,Y classification
Easy QX,Y Hard QX,Y
Essential: Noise in QY |X , and QX -mass near decision boundary
![Page 56: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/56.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 57: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/57.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 58: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/58.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 59: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/59.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 60: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/60.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 61: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/61.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 62: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/62.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 63: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/63.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 64: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/64.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 65: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/65.jpg)
, , , , , , , , , ,
Parametrizing easy to hard QX,Y :
Setup: X ∈ compact X , Y ∈ {0, 1}
Noise conditions on η(x) = E[Y |x]:
• Smoothness: |η(x)− η(x′)| ≤ λρ(x, x′)α.
• Noise Margin: QX (x : |η(x)− 1/2| < t) ≤ Ctβ.
2 types of regularity on QX :
• Near-uniform mass: for any ball Br, QX(Br) ≥ Crd.
• Support regularity: XQ has r-cover size ≤ Cr−d.
d above acts like the intrinsic dimension of QX , for X ∈ IRD.
Classification is easiest with large α, β, small d .... so is transfer
![Page 66: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/66.jpg)
, , , , , , , , , ,
Minimax rates of Transfer:
Given: labeled source and target data {Xi, Yi} ∼ PnP ×QnQ .Excess error: EQ(h) ≡ errQ(h)− infh errQ(h).
Theorem. Define h on {Xi, Yi}, even with knowledge of PX , QX :
infh
supdist(P,Q)=γ
E EQ(h) ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0,
d0 = 2 + d/α for near-uniform QX , and d0 = 2 + β + d/α otherwise.
Immediate message:Transfer is easiest as γ → 0, hardest as γ →∞ ...
![Page 67: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/67.jpg)
, , , , , , , , , ,
Minimax rates of Transfer:
Given: labeled source and target data {Xi, Yi} ∼ PnP ×QnQ .Excess error: EQ(h) ≡ errQ(h)− infh errQ(h).
Theorem. Define h on {Xi, Yi}, even with knowledge of PX , QX :
infh
supdist(P,Q)=γ
E EQ(h) ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0,
d0 = 2 + d/α for near-uniform QX , and d0 = 2 + β + d/α otherwise.
Immediate message:Transfer is easiest as γ → 0, hardest as γ →∞ ...
![Page 68: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/68.jpg)
, , , , , , , , , ,
Minimax rates of Transfer:
Given: labeled source and target data {Xi, Yi} ∼ PnP ×QnQ .Excess error: EQ(h) ≡ errQ(h)− infh errQ(h).
Theorem. Define h on {Xi, Yi}, even with knowledge of PX , QX :
infh
supdist(P,Q)=γ
E EQ(h) ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0,
d0 = 2 + d/α for near-uniform QX , and d0 = 2 + β + d/α otherwise.
Immediate message:Transfer is easiest as γ → 0, hardest as γ →∞ ...
![Page 69: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/69.jpg)
, , , , , , , , , ,
Minimax rates of Transfer:
Given: labeled source and target data {Xi, Yi} ∼ PnP ×QnQ .Excess error: EQ(h) ≡ errQ(h)− infh errQ(h).
Theorem. Define h on {Xi, Yi}, even with knowledge of PX , QX :
infh
supdist(P,Q)=γ
E EQ(h) ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0,
d0 = 2 + d/α for near-uniform QX , and d0 = 2 + β + d/α otherwise.
Immediate message:Transfer is easiest as γ → 0, hardest as γ →∞ ...
![Page 70: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/70.jpg)
, , , , , , , , , ,
Minimax rates of Transfer:
Given: labeled source and target data {Xi, Yi} ∼ PnP ×QnQ .Excess error: EQ(h) ≡ errQ(h)− infh errQ(h).
Theorem. Define h on {Xi, Yi}, even with knowledge of PX , QX :
infh
supdist(P,Q)=γ
E EQ(h) ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0,
d0 = 2 + d/α for near-uniform QX , and d0 = 2 + β + d/α otherwise.
Immediate message:Transfer is easiest as γ → 0, hardest as γ →∞ ...
![Page 71: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/71.jpg)
, , , , , , , , , ,
Minimax rates of Transfer:
Given: labeled source and target data {Xi, Yi} ∼ PnP ×QnQ .Excess error: EQ(h) ≡ errQ(h)− infh errQ(h).
Theorem. Define h on {Xi, Yi}, even with knowledge of PX , QX :
infh
supdist(P,Q)=γ
E EQ(h) ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0,
d0 = 2 + d/α for near-uniform QX , and d0 = 2 + β + d/α otherwise.
Immediate message:Transfer is easiest as γ → 0, hardest as γ →∞ ...
![Page 72: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/72.jpg)
, , , , , , , , , ,
Simulations with increasing γ:
nS ≡ Source sample size (no target sample used)
Transfer requires more source data for larger γ values.
![Page 73: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/73.jpg)
, , , , , , , , , ,
Simulations with increasing γ:
nS ≡ Source sample size (no target sample used)
Transfer requires more source data for larger γ values.
![Page 74: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/74.jpg)
, , , , , , , , , ,
Simulations with increasing γ:
nS ≡ Source sample size (no target sample used)
Transfer requires more source data for larger γ values.
![Page 75: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/75.jpg)
, , , , , , , , , ,
Simulations with increasing γ:
nS ≡ Source sample size (no target sample used)
Transfer requires more source data for larger γ values.
![Page 76: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/76.jpg)
, , , , , , , , , ,
Lower-bound analysis:
Main Ingredients:
- Established techniques based on Fano’s inequality.
- h has access to (P,Q) samples, but has to do well on just Q ...
![Page 77: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/77.jpg)
, , , , , , , , , ,
Lower-bound analysis:
Main Ingredients:
- Established techniques based on Fano’s inequality.
- h has access to (P,Q) samples, but has to do well on just Q ...
![Page 78: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/78.jpg)
, , , , , , , , , ,
Lower-bound analysis:
Main Ingredients:
- Established techniques based on Fano’s inequality.
- h has access to (P,Q) samples, but has to do well on just Q ...
![Page 79: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/79.jpg)
, , , , , , , , , ,
Upper-bound analysis:
Rate achieved by k-NN on combined source and target data.
Main ingredient: show that NN distances depend on ρ.
One open problem we had to solve:
Rates for Vanilla k-NN without uniform density assumption
![Page 80: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/80.jpg)
, , , , , , , , , ,
Upper-bound analysis:
Rate achieved by k-NN on combined source and target data.
Main ingredient: show that NN distances depend on ρ.
One open problem we had to solve:
Rates for Vanilla k-NN without uniform density assumption
![Page 81: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/81.jpg)
, , , , , , , , , ,
Upper-bound analysis:
Rate achieved by k-NN on combined source and target data.
Main ingredient: show that NN distances depend on ρ.
One open problem we had to solve:
Rates for Vanilla k-NN without uniform density assumption
![Page 82: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/82.jpg)
, , , , , , , , , ,
Upper-bound analysis:
Rate achieved by k-NN on combined source and target data.
Main ingredient: show that NN distances depend on ρ.
One open problem we had to solve:
Rates for Vanilla k-NN without uniform density assumption
![Page 83: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/83.jpg)
, , , , , , , , , ,
Many new messages:
Recall Minimax Rates ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0
- Fast rates O(1/n) are possible even with large γ.
- Target data most beneficial when nQ � nd0/(d0+γ/α)P .
- Unlabeled data does not improve rates beyond constants ...
![Page 84: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/84.jpg)
, , , , , , , , , ,
Many new messages:
Recall Minimax Rates ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0
- Fast rates O(1/n) are possible even with large γ.
- Target data most beneficial when nQ � nd0/(d0+γ/α)P .
- Unlabeled data does not improve rates beyond constants ...
![Page 85: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/85.jpg)
, , , , , , , , , ,
Many new messages:
Recall Minimax Rates ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0
- Fast rates O(1/n) are possible even with large γ.
- Target data most beneficial when nQ � nd0/(d0+γ/α)P .
- Unlabeled data does not improve rates beyond constants ...
![Page 86: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/86.jpg)
, , , , , , , , , ,
Many new messages:
Recall Minimax Rates ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0
- Fast rates O(1/n) are possible even with large γ.
- Target data most beneficial when nQ � nd0/(d0+γ/α)P .
- Unlabeled data does not improve rates beyond constants ...
![Page 87: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/87.jpg)
, , , , , , , , , ,
Many new messages:
Recall Minimax Rates ≈(nd0/(d0+γ/α)P + nQ
)−(β+1)/d0
- Fast rates O(1/n) are possible even with large γ.
- Target data most beneficial when nQ � nd0/(d0+γ/α)P .
- Unlabeled data does not improve rates beyond constants ...
![Page 88: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/88.jpg)
, , , , , , , , , ,
All good but ...
Can we automatically decide how much target data to sample?(Ongoing work...)
![Page 89: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/89.jpg)
, , , , , , , , , ,
All good but ...
Can we automatically decide how much target data to sample?(Ongoing work...)
![Page 90: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/90.jpg)
, , , , , , , , , ,
All good but ...
Can we automatically decide how much target data to sample?(Ongoing work...)
![Page 91: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/91.jpg)
, , , , , , , , , ,
γ yields insights into adaptive sampling: (ongoing work)
Setup:nP labeled samples from P , nQ unlabeled samples from Q.
Adaptive Sampling:
Sample in low-confidence regions A ⊂ X with large γ(A).
(γ(A)← compares PX and QX in region A)
Essentially label in Q-massive regions with few samples from P ...
Above refines a procedure of [Berlind, Urner, 15]
![Page 92: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/92.jpg)
, , , , , , , , , ,
γ yields insights into adaptive sampling: (ongoing work)
Setup:nP labeled samples from P , nQ unlabeled samples from Q.
Adaptive Sampling:
Sample in low-confidence regions A ⊂ X with large γ(A).
(γ(A)← compares PX and QX in region A)
Essentially label in Q-massive regions with few samples from P ...
Above refines a procedure of [Berlind, Urner, 15]
![Page 93: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/93.jpg)
, , , , , , , , , ,
γ yields insights into adaptive sampling: (ongoing work)
Setup:nP labeled samples from P , nQ unlabeled samples from Q.
Adaptive Sampling:
Sample in low-confidence regions A ⊂ X with large γ(A).
(γ(A)← compares PX and QX in region A)
Essentially label in Q-massive regions with few samples from P ...
Above refines a procedure of [Berlind, Urner, 15]
![Page 94: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/94.jpg)
, , , , , , , , , ,
γ yields insights into adaptive sampling: (ongoing work)
Setup:nP labeled samples from P , nQ unlabeled samples from Q.
Adaptive Sampling:
Sample in low-confidence regions A ⊂ X with large γ(A).
(γ(A)← compares PX and QX in region A)
Essentially label in Q-massive regions with few samples from P ...
Above refines a procedure of [Berlind, Urner, 15]
![Page 95: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/95.jpg)
, , , , , , , , , ,
γ yields insights into adaptive sampling: (ongoing work)
Setup:nP labeled samples from P , nQ unlabeled samples from Q.
Adaptive Sampling:
Sample in low-confidence regions A ⊂ X with large γ(A).
(γ(A)← compares PX and QX in region A)
Essentially label in Q-massive regions with few samples from P ...
Above refines a procedure of [Berlind, Urner, 15]
![Page 96: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/96.jpg)
, , , , , , , , , ,
Initial experiments with CIFAR-10:
Transfer Setup: Target Q has 50% images of cats and dogs.
nP = 20K, nQ = 6K unlabeled Q data
h : Deep NN with 10 layers, and k-NN on top.
![Page 97: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/97.jpg)
, , , , , , , , , ,
Initial experiments with CIFAR-10:
Transfer Setup: Target Q has 50% images of cats and dogs.
nP = 20K, nQ = 6K unlabeled Q data
h : Deep NN with 10 layers, and k-NN on top.
![Page 98: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/98.jpg)
, , , , , , , , , ,
Initial experiments with CIFAR-10:
Transfer Setup: Target Q has 50% images of cats and dogs.
nP = 20K, nQ = 6K unlabeled Q data
h : Deep NN with 10 layers, and k-NN on top.
![Page 99: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/99.jpg)
, , , , , , , , , ,
Initial experiments with CIFAR-10:
Transfer Setup: Target Q has 50% images of cats and dogs.
nP = 20K, nQ = 6K unlabeled Q data
h : Deep NN with 10 layers, and k-NN on top.
![Page 100: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/100.jpg)
, , , , , , , , , ,
Results
Best performance achieved after relatively few label requests ...
![Page 101: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/101.jpg)
, , , , , , , , , ,
Results
Best performance achieved after relatively few label requests ...
![Page 102: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/102.jpg)
, , , , , , , , , ,
Quick Summary and some New Directions ...
![Page 103: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/103.jpg)
, , , , , , , , , ,
Quick Summary:
• γ captures a more optimistic view of transferability P → Q.
• Unlabeled data can only improve constants in the rates.
• Adaptive sampling is possible with no knowledge of γ.
![Page 104: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/104.jpg)
, , , , , , , , , ,
Quick Summary:
• γ captures a more optimistic view of transferability P → Q.
• Unlabeled data can only improve constants in the rates.
• Adaptive sampling is possible with no knowledge of γ.
![Page 105: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/105.jpg)
, , , , , , , , , ,
Quick Summary:
• γ captures a more optimistic view of transferability P → Q.
• Unlabeled data can only improve constants in the rates.
• Adaptive sampling is possible with no knowledge of γ.
![Page 106: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/106.jpg)
, , , , , , , , , ,
Quick Summary:
• γ captures a more optimistic view of transferability P → Q.
• Unlabeled data can only improve constants in the rates.
• Adaptive sampling is possible with no knowledge of γ.
![Page 107: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/107.jpg)
, , , , , , , , , ,
Quick Summary:
• γ captures a more optimistic view of transferability P → Q.
• Unlabeled data can only improve constants in the rates.
• Adaptive sampling is possible with no knowledge of γ.
![Page 108: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/108.jpg)
, , , , , , , , , ,
New direction: refining γ ...
Often in practice, a family H of predictors is fixed(NN, trees, SVMs, Neural Nets ...)
Intuition:
Consider regions of X most relevant to H (with S. Hanneke)
This yields H specific performance limits ...
![Page 109: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/109.jpg)
, , , , , , , , , ,
New direction: refining γ ...
Often in practice, a family H of predictors is fixed(NN, trees, SVMs, Neural Nets ...)
Intuition:
Consider regions of X most relevant to H (with S. Hanneke)
This yields H specific performance limits ...
![Page 110: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/110.jpg)
, , , , , , , , , ,
New direction: refining γ ...
Often in practice, a family H of predictors is fixed(NN, trees, SVMs, Neural Nets ...)
Intuition:
Consider regions of X most relevant to H (with S. Hanneke)
This yields H specific performance limits ...
![Page 111: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/111.jpg)
, , , , , , , , , ,
New direction: refining γ ...
Often in practice, a family H of predictors is fixed(NN, trees, SVMs, Neural Nets ...)
Intuition:
Consider regions of X most relevant to H (with S. Hanneke)
This yields H specific performance limits ...
![Page 112: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/112.jpg)
, , , , , , , , , ,
New direction: refining γ ...
Often in practice, a family H of predictors is fixed(NN, trees, SVMs, Neural Nets ...)
Intuition:
Consider regions of X most relevant to H (with S. Hanneke)
In particular:
Consider disagreements between classifiers
γ : PX(h 6= h′) & QX(h 6= h′)1+γ
Set A where 2 half-spaces disagree
This yields H specific performance limits ...
![Page 113: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/113.jpg)
, , , , , , , , , ,
New direction: refining γ ...
Often in practice, a family H of predictors is fixed(NN, trees, SVMs, Neural Nets ...)
Intuition:
Consider regions of X most relevant to H (with S. Hanneke)
In particular:
Consider disagreements between classifiers
γ : PX(h 6= h′) & QX(h 6= h′)1+γ
Set A where 2 half-spaces disagree
This yields H specific performance limits ...
![Page 114: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/114.jpg)
, , , , , , , , , ,
New Messages: [Kpo. & Hanneke, 19]
Near optimal heuristics for bounded VC classes:(no need to estimate γ)
No Classification noise:
ERM on combined source and target data is minimax-optimal.
Any Level of Noise:
Minimize RQ(h) subject to RP (h) ≤ minh′ RP (h′) + ∆nP (h)
§ Hard to implement in general...
![Page 115: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/115.jpg)
, , , , , , , , , ,
New Messages: [Kpo. & Hanneke, 19]
Near optimal heuristics for bounded VC classes:(no need to estimate γ)
No Classification noise:
ERM on combined source and target data is minimax-optimal.
Any Level of Noise:
Minimize RQ(h) subject to RP (h) ≤ minh′ RP (h′) + ∆nP (h)
§ Hard to implement in general...
![Page 116: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/116.jpg)
, , , , , , , , , ,
New Messages: [Kpo. & Hanneke, 19]
Near optimal heuristics for bounded VC classes:(no need to estimate γ)
No Classification noise:
ERM on combined source and target data is minimax-optimal.
Any Level of Noise:
Minimize RQ(h) subject to RP (h) ≤ minh′ RP (h′) + ∆nP (h)
§ Hard to implement in general...
![Page 117: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/117.jpg)
, , , , , , , , , ,
New Messages: [Kpo. & Hanneke, 19]
Near optimal heuristics for bounded VC classes:(no need to estimate γ)
No Classification noise:
ERM on combined source and target data is minimax-optimal.
Any Level of Noise:
Minimize RQ(h) subject to RP (h) ≤ minh′ RP (h′) + ∆nP (h)
§ Hard to implement in general...
![Page 118: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/118.jpg)
, , , , , , , , , ,
Somehow we are still just scratching the surface of what’s possible...
Results extend beyond covariate-shift to PY |X 6= QY |X
Mostly Open:
• More complex transfer regimes?
• Multitask, Curriculum, Lifelong, Fairness, Robustness ?
Thanks!
![Page 119: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/119.jpg)
, , , , , , , , , ,
Somehow we are still just scratching the surface of what’s possible...
Results extend beyond covariate-shift to PY |X 6= QY |X
Mostly Open:
• More complex transfer regimes?
• Multitask, Curriculum, Lifelong, Fairness, Robustness ?
Thanks!
![Page 120: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/120.jpg)
, , , , , , , , , ,
Somehow we are still just scratching the surface of what’s possible...
Results extend beyond covariate-shift to PY |X 6= QY |X
Mostly Open:
• More complex transfer regimes?
• Multitask, Curriculum, Lifelong, Fairness, Robustness ?
Thanks!
![Page 121: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/121.jpg)
, , , , , , , , , ,
Somehow we are still just scratching the surface of what’s possible...
Results extend beyond covariate-shift to PY |X 6= QY |X
Mostly Open:
• More complex transfer regimes?
• Multitask, Curriculum, Lifelong, Fairness, Robustness ?
Thanks!
![Page 122: Some Recent Insights on Transfer Learningskk2175/Talks/transferLearning.pdfCase study: Apple Siri’s voice assistant - Initially trained on data from American English speakers ...-Could](https://reader033.fdocuments.in/reader033/viewer/2022052010/601f99465689c11b8e66af73/html5/thumbnails/122.jpg)
, , , , , , , , , ,
Somehow we are still just scratching the surface of what’s possible...
Results extend beyond covariate-shift to PY |X 6= QY |X
Mostly Open:
• More complex transfer regimes?
• Multitask, Curriculum, Lifelong, Fairness, Robustness ?
Thanks!