Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n...
Transcript of Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n...
![Page 1: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/1.jpg)
MachineLearning
KernelsandtheKernelTrick
1
![Page 2: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/2.jpg)
Supportvectormachines
• Trainingbymaximizingmargin
• TheSVMobjective
• SolvingtheSVMoptimizationproblem
• Supportvectors,dualsandkernels
2
![Page 3: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/3.jpg)
Supportvectormachines
• Trainingbymaximizingmargin
• TheSVMobjective
• SolvingtheSVMoptimizationproblem
• Supportvectors,dualsandkernels
3
![Page 4: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/4.jpg)
Thislecture
1. Supportvectors
2. Kernels
3. Thekerneltrick
4. Propertiesofkernels
5. Anotherexampleofthekerneltrick
4
![Page 5: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/5.jpg)
Thislecture
1. Supportvectors
2. Kernels
3. Thekerneltrick
4. Propertiesofkernels
5. Anotherexampleofthekerneltrick
5
![Page 6: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/6.jpg)
Sofarwehaveseen
• Supportvectormachines
• Hingelossandoptimizingtheregularizedloss
Morebroadly,differentalgorithmsforlearninglinearclassifiers
6
![Page 7: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/7.jpg)
Sofarwehaveseen
• Supportvectormachines
• Hingelossandoptimizingtheregularizedloss
Morebroadly,differentalgorithmsforlearninglinearclassifiers
Whataboutnon-linearmodels?
7
![Page 8: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/8.jpg)
Onewaytolearnnon-linearmodels
Explicitlyintroducenon-linearityintothefeaturespace
8
Ifthetrueseparatorisquadratic
![Page 9: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/9.jpg)
Onewaytolearnnon-linearmodels
Explicitlyintroducenon-linearityintothefeaturespace
9
Ifthetrueseparatorisquadratic Transformallinputpointsas
![Page 10: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/10.jpg)
Onewaytolearnnon-linearmodels
Explicitlyintroducenon-linearityintothefeaturespace
10
Ifthetrueseparatorisquadratic Transformallinputpointsas
Now,wecantrytofindaweightvectorinthishigherdimensionalspace
Thatis,predictusingwTÁ(x1,x2)¸ b
![Page 11: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/11.jpg)
SVM:Primals andduals
TheSVMobjective
11
Thisiscalledtheprimalformoftheobjective
Thiscanbeconvertedtoitsdualform,whichwillletusproveaveryusefulproperty
![Page 12: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/12.jpg)
SVM:Primals andduals
TheSVMobjective
12
Thisiscalledtheprimalformoftheobjective
Thiscanbeconvertedtoitsdualform,whichwillletusproveaveryusefulproperty
Anotheroptimizationproblem
HasthepropertythatmaxDual=minPrimal
![Page 13: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/13.jpg)
Supportvectormachines
Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas
13
![Page 14: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/14.jpg)
Supportvectormachines
Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas
Furthermore,
14
++
++++++
-- --
-- -- --
---- --
--
+ -
Allpointsoutsidethemargin
![Page 15: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/15.jpg)
Supportvectormachines
Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas
Furthermore,
15
++
++++++
-- --
-- -- --
---- --
--
+ -
Allpointsonthewrongsideofthemargin
![Page 16: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/16.jpg)
Supportvectormachines
Letw betheminimizeroftheSVMproblemforsomedatasetwithmexamples:{(xi,yi)}Then,fori =1…m,thereexist®i¸ 0suchthattheoptimumwcanbewrittenas
Furthermore,
16
++
++++++
-- --
-- -- --
---- --
--
+ -
Allpointsonthemargin
![Page 17: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/17.jpg)
Supportvectors
Theweightvectoriscompletelydefinedbytrainingexampleswhose®isarenotzero
Theseexamplesarecalledthesupportvectors
17
![Page 18: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/18.jpg)
Thislecture
ü Supportvectors
2. Kernels
3. Thekerneltrick
4. Propertiesofkernels
5. Anotherexampleofthekerneltrick
18
![Page 19: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/19.jpg)
Predictingwithlinearclassifiers
• Prediction=and
• Thatis,wejustshowedthat
– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex
19
![Page 20: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/20.jpg)
Predictingwithlinearclassifiers
• Prediction=and
• Thatis,wejustshowedthat
– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex
20
![Page 21: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/21.jpg)
Predictingwithlinearclassifiers
• Prediction=and
• Thatis,wejustshowedthat
– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex
21
![Page 22: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/22.jpg)
Predictingwithlinearclassifiers
• Prediction=and
• Thatis,wejustshowedthat
– Weonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex
• Thisistrueevenifwemapexamplestoahighdimensionalspace
22
![Page 23: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/23.jpg)
Predictingwithlinearclassifiers
• Prediction=and
• Thatis,wejustshowedthat
– Thatisweonlyneedtocomputedotproductsbetweentrainingexamplesandthenewexamplex
• Thisistrueevenifwemapexamplestoahighdimensionalspace
23
![Page 24: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/24.jpg)
Dotproductsinhighdimensionalspaces
Letusdefineadotproductinthehighdimensionalspace
Sopredictionwiththishighdimensionalliftingmapis
24
because
![Page 25: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/25.jpg)
Dotproductsinhighdimensionalspaces
Letusdefineadotproductinthehighdimensionalspace
Sopredictionwiththishighdimensionalliftingmapis
25
because
![Page 26: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/26.jpg)
Dotproductsinhighdimensionalspaces
Letusdefineadotproductinthehighdimensionalspace
Sopredictionwiththishighdimensionalliftingmapis
26
because
![Page 27: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/27.jpg)
Kernelbasedmethods
Whatdoesthisnewformulationgiveus?IfwehavetocomputeÁ everytimeanyway,wegainnothing
IfwecancomputethevalueofKwithoutexplicitlywritingtheblownuprepresentation,thenwewillhaveacomputationaladvantage
27
Predictusing
![Page 28: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/28.jpg)
Kernelbasedmethods
Whatdoesthisnewformulationgiveus?IfwehavetocomputeÁ everytimeanyway,wegainnothing
IfwecancomputethevalueofKwithoutexplicitlywritingtheblownuprepresentation,thenwewillhaveacomputationaladvantage.
28
Predictusing
![Page 29: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/29.jpg)
Thislecture
ü Supportvectors
ü Kernels
3. Thekerneltrick
4. Propertiesofkernels
5. Anotherexampleofthekerneltrick
29
![Page 30: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/30.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
30
![Page 31: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/31.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
31
Alldegreezeroterms
![Page 32: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/32.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
32
Alldegreezeroterms Alldegreeoneterms
![Page 33: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/33.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
33
Alldegreezeroterms Alldegreeoneterms Alldegreetwoterms
![Page 34: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/34.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
andcomputethedotproductA=Á(x)TÁ (z)[takestime]
34
Alldegreezeroterms Alldegreeoneterms Alldegreetwoterms
![Page 35: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/35.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
andcomputethedotproductA=Á(x)TÁ (z)[takestime]
• Instead,intheoriginalspace,compute
Theorem:A=B(Coefficientsdonotreallymatter)
35
![Page 36: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/36.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
andcomputethedotproductA=Á(x)TÁ (z)[takestime]
• Instead,intheoriginalspace,compute
Theorem:A=B(Coefficientsdonotreallymatter)
36
![Page 37: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/37.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
andcomputethedotproductA=Á(x)TÁ (z)[takestime]
• Instead,intheoriginalspace,compute
Theorem:A=B(Coefficientsdonotreallymatter)
37
![Page 38: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/38.jpg)
Example:PolynomialKernel
• Giventwoexamplesx andz wewanttomapthemtoahighdimensionalspace[forexample, quadratic]
andcomputethedotproductA=Á(x)TÁ (z)[takestime]
• Instead,intheoriginalspace,compute
Claim:A=B(Coefficientsdonotreallymatter)
38
![Page 39: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/39.jpg)
Example:Twodimensions,quadratickernel
39
A=Á(x)TÁ (z)
![Page 40: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/40.jpg)
TheKernelTrick
SupposewewishtocomputeK(x,z)= Á(x)TÁ (z)
HereÁ mapsx andztoahighdimensionalspace
TheKernelTrick:Savetime/spacebycomputingthevalueofK(x,z)byperformingoperationsintheoriginalspace(withoutafeaturetransformation!)
40
![Page 41: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/41.jpg)
Computingdotproductsefficiently
KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.
Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.
• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin
some(perhapsinfinitedimensional)featurespace.
• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite
41
(Notjustfordegree2polynomials)
![Page 42: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/42.jpg)
Thislecture
ü Supportvectors
ü Kernels
ü Thekerneltrick
4. Propertiesofkernels
5. Anotherexampleofthekerneltrick
42
![Page 43: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/43.jpg)
Whichfunctionsarekernels?
KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.
Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.
• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin
some(perhapsinfinitedimensional)featurespace.
• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite
43
(Notjustfordegree2polynomials)
![Page 44: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/44.jpg)
Whichfunctionsarekernels?
KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.
Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.
• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin
some(perhapsinfinitedimensional)featurespace.
• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite
44
(Notjustfordegree2polynomials)
![Page 45: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/45.jpg)
Whichfunctionsarekernels?
KernelTrick: Youwanttoworkwithdegree2polynomialfeatures,Á(x).Then,yourdotproductwillbeoperateusingvectorsinaspaceofdimensionalityn(n+1)/2.
Thekerneltrickallowsyoutosavetime/spaceandcomputedotproductsinanndimensionalspace.
• CanweuseanyfunctionK(.,.)?– No!AfunctionK(x,z)isavalidkernelifitcorrespondstoaninnerproductin
some(perhapsinfinitedimensional)featurespace.
• Generalcondition: constructtheGrammatrix{K(xi ,zj)};checkthatit’spositivesemidefinite
45
(Notjustfordegree2polynomials)
![Page 46: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/46.jpg)
Reminder:Positivesemi-definitematrices
AsymmetricmatrixMispositivesemi-definiteifitis– Foranyvectornon-zeroz,wehavezTMz¸ 0
(Ausefulpropertycharacterizingmanyinterestingmathematicalobjects)
46
![Page 47: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/47.jpg)
TheKernelMatrix
• TheGrammatrixofasetofnvectorsS={x1…xn}isthen×nmatrixG withGij =xiTxj– ThekernelmatrixistheGrammatrixof{φ(x1),…,φ(xn)}– (sizedependsonthe#ofexamples,notdimensionality)
• ShowingthatafunctionKisavalidkernel– Directapproach:Ifyouhavetheφ(xi),youhavetheGrammatrix(andit’seasyto
seethatitwillbepositivesemi-definite).Why?
– Indirect:IfyouhavetheKernel,writedowntheKernelmatrixKij,andshowthatitisalegitimatekernel,withoutanexplicitconstructionofφ(xi)
47
![Page 48: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/48.jpg)
TheKernelMatrix
• TheGrammatrixofasetofnvectorsS={x1…xn}isthen×nmatrixG withGij =xiTxj– ThekernelmatrixistheGrammatrixof{φ(x1),…,φ(xn)}– (sizedependsonthe#ofexamples,notdimensionality)
• ShowingthatafunctionKisavalidkernel– Directapproach:Ifyouhavetheφ(xi),youhavetheGrammatrix(andit’seasyto
seethatitwillbepositivesemi-definite).Why?
– Indirect:IfyouhavetheKernel,writedowntheKernelmatrixKij,andshowthatitisalegitimatekernel,withoutanexplicitconstructionofφ(xi)
48
![Page 49: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/49.jpg)
Mercer’scondition
LetK(x,z)beafunctionthatmapstwondimensionalvectorstoarealnumber
Kisavalidkernelifforeveryfiniteset{x1,x2,! },foranychoiceofrealvaluedc1,c2,!,wehave
49
![Page 50: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/50.jpg)
Polynomialkernels
• Linearkernel:k(x,z)=xTz
• Polynomialkernelofdegreed:k(x,z)=(xTz)d– onlydth-orderinteractions
• Polynomialkerneluptodegreed:k(x,z)=(xTz +c)d(c>0)– allinteractionsoforderdorlower
50
![Page 51: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/51.jpg)
GaussianKernel(ortheradialbasisfunctionkernel)
– (x−z)2:squaredEuclideandistancebetweenx andz– c=σ2:afreeparameter– verysmallc:K≈identitymatrix(everyitemisdifferent)– verylargec: K≈unitmatrix(allitemsarethesame)
– k(x,z)≈1whenx,zclose– k(x,z)≈0whenx,zdissimilar
51
![Page 52: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/52.jpg)
GaussianKernel(ortheradialbasisfunctionkernel)
– (x−z)2:squaredEuclideandistancebetweenx andz– c=σ2:afreeparameter– verysmallc:K≈identitymatrix(everyitemisdifferent)– verylargec: K≈unitmatrix(allitemsarethesame)
– k(x,z)≈1whenx,zclose– k(x,z)≈0whenx,zdissimilar
52
Exercises:1. Provethatthisisakernel.2. Whatisthe“blownup”featurespaceforthiskernel?
![Page 53: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/53.jpg)
ConstructingNewKernels
Youcanconstructnewkernelsk’(x,x’)fromexistingones:
– Multiplyingk(x,x’)byaconstantc
ck(x,x’)
– Multiplyingk(x,x’)byafunctionfappliedtox and x’
f(x)k(x,x’)f(x’)
– Applyingapolynomial(withnon-negativecoefficients)tok(x,x’)
P(k(x,x’))withP(z)=∑iaizi and ai≥0
– Exponentiatingk(x,x’)
exp(k(x,x’))
53
![Page 54: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/54.jpg)
ConstructingNewKernels(2)
• Youcanconstructk’(x,x’)fromk1(x,x’),k2(x,x’) by:– Addingk1(x,x’) andk2(x,x’):
k1(x,x’)+k2(x,x’)
– Multiplyingk1(x,x’)andk2(x,x’):k1(x,x’)k2(x,x’)
• Also:– Ifφ(x)2 Rm and km(z,z’)avalidkernelinRm,
k(x,x’) =km(φ(x),φ(x’))isalsoavalidkernel
– IfA isasymmetricpositivesemi-definitematrix,k(x,x’) =xAx’isalsoavalidkernel
54
![Page 55: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/55.jpg)
ConstructingNewKernels(2)
• Youcanconstructk’(x,x’)fromk1(x,x’),k2(x,x’) by:– Addingk1(x,x’) andk2(x,x’):
k1(x,x’)+k2(x,x’)
– Multiplyingk1(x,x’)andk2(x,x’):k1(x,x’)k2(x,x’)
• Also:– Ifφ(x)2 Rm and km(z,z’)avalidkernelinRm,
k(x,x’) =km(φ(x),φ(x’))isalsoavalidkernel
– IfA isasymmetricpositivesemi-definitematrix,k(x,x’) =xAx’isalsoavalidkernel
55
![Page 56: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/56.jpg)
KernelTrick:Anexample
Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,
wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz
Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.
56
![Page 57: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/57.jpg)
Thislecture
ü Supportvectors
ü Kernels
ü Thekerneltrick
ü Propertiesofkernels
5. Anotherexampleofthekerneltrick
57
![Page 58: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/58.jpg)
KernelTrick:Anexample
Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,
wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz
Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.
58
![Page 59: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/59.jpg)
KernelTrick:Anexample
Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,
wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz
Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.
59
![Page 60: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/60.jpg)
KernelTrick:Anexample
Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,
wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz
Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.
60
![Page 61: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/61.jpg)
KernelTrick:Anexample
Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,
wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz
Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.
61
![Page 62: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/62.jpg)
KernelTrick:Anexample
Lettheblownupfeaturespacerepresentthespaceofall3nconjunctions.Then,
wheresame(x,z) isthenumberoffeaturesthathavethesamevalueforbothxandz
Example:Taken=3;x=(001),z=(011),wehaveconjunctionsofsize0,1,2,3Proof: letm=same(x,z);construct“surviving”conjunctionsby1. choosingtoincludeoneofthesekliteralswiththerightpolarityintheconjunctions,or2. choosingtonotincludeitatall.Conjunctionswithliteralsoutsidethissetdisappear.
62
![Page 63: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/63.jpg)
Exercises
1. Showthatthisargumentworksforaspecificexample– TakeX={x1,x2,x3,x4}– Á(x) =Thespaceofall3n conjunctions;|Á(x)|=81– Considerx=(1100),z=(1101)– WriteÁ(x),Á(z),therepresentationofx,z intheÁ space– ComputeÁ(x)TÁ(z)– Showthat
K(x,z)=Á(x)TÁ(z)=åi Ái(z)Ái(x)=2same(x,z) =8
2. Trytodevelopanotherkernel,e.g.,wherethespaceofallconjunctionsofsize3(exactly)
63
![Page 64: Kernels and the Kernel Trick - svivek · The Kernel Matrix • The Gram matrix of a set of n vectors S = {x 1…x n} is the n×n matrix Gwith G ij= x i Tx j –The kernel matrix is](https://reader034.fdocuments.in/reader034/viewer/2022050101/5f406e5581f8712e795f8cc1/html5/thumbnails/64.jpg)
Summary:Kerneltrick
• Tomakethefinalprediction,wearecomputingdotproducts
• Thekerneltrickisacomputationaltricktocomputedotproductsinhigherdimensionalspaces
• ThisisapplicablenotjusttoSVMs.ThesameideacanbeextendedtoPerceptrontoo:theKernelPerceptron
• Important:Alltheboundswehaveseen(eg:Perceptronbound,etc)dependontheunderlyingdimensionality– Bymovingtoahigherdimensionalspace,weareincurringapenalty
onsamplecomplexity
64