UVA CS 4501: Machine Learning Lecture 8: Review of Regression · Regularized multivariate linear...

25
UVA CS 4501: Machine Learning Lecture 8: Review of Regression Dr. Yanjun Qi University of Virginia Department of Computer Science

Transcript of UVA CS 4501: Machine Learning Lecture 8: Review of Regression · Regularized multivariate linear...

  • UVACS4501:MachineLearning

    Lecture8:ReviewofRegression

    Dr.YanjunQi

    UniversityofVirginia

    DepartmentofComputerScience

  • Wherearewe?èFivemajorsec@onsofthiscourse

    q Regression(supervised)q Classifica@on(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

    2/19/18 2

    Dr.YanjunQi/UVACS

  • Lecture3

    q Linearregression(akaleastsquares)q Learntoderivetheleastsquareses@matebynormalequa@on

    q Evalua@onwithCross-valida@on

    2/19/18 3

    Dr.YanjunQi/UVACS

  • Lecture-4

    q Morewaystotrain/performop@miza@onforlinearregressionmodelsü Review:GradientDescentü GradientDescent(GD)forLRü Stochas@cGD(SGD)forLR

    2/19/18 4

    Dr.YanjunQi/UVACS

  • Lecture-5

    q RegressionModelsBeyondLinearü LRwithnon-linearbasisfunc@onsü Instance-basedRegression:K-NearestNeighborsü Locallyweightedlinearregressionü RegressiontreesandMul@linearInterpola@on(later)

    2/19/18 5

    Dr.YanjunQi/UVACS

  • Lecture-6

    q LinearRegressionModelwithRegulariza@onsü  Review:(Ordinary)Leastsquares:squaredloss(NormalEqua@on)ü  Ridgeregression:squaredlosswithL2regulariza@onü  Lassoregression:squaredlosswithL1regulariza@onü  Elas@cregression:squaredlosswithL1ANDL2regulariza@onü WHYandInfluenceofRegulariza@onParameter

    2/19/18 6

    Dr.YanjunQi/UVACS

  • Lecture-7

    q FeatureSelec@onü  GeneralIntroduc@onü  Filteringü Wrapperü  EmbeddedMethod

    2/19/18 7

    Dr.YanjunQi/UVACS

  • Task

    Machine Learning in a Nutshell

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    2/19/18 8

    Dr.YanjunQi/UVACS

  • Multivariate Linear Regression

    Regression

    Y = Weighted linear sum of X’s

    Least-squares / GD / SGD

    Linear algebra

    Regression coefficients

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    ŷ = f (x) =θ T x2/19/18 9

    Dr.YanjunQi/UVACS

  • Multivariate Linear Regression with basis Expansion

    Regression

    Y = Weighted linear sum of (X basis expansion)

    SSE

    Linear algebra

    Regression coefficients

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    !! ŷ =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ

    2/19/18 10

    Dr.YanjunQi/UVACS

  • K-Nearest Neighbor

    Regression/ classification

    Local Smoothness

    NA

    NA

    Training Samples

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    2/19/18 11

    Dr.YanjunQi/UVACS6316/f16

  • Locally Weighted / Kernel Linear Regression

    Regression

    Y = Weighted linear sum of X’s

    Weighted SSE

    Linear algebra

    Local Regression coefficients

    (conditioned on each test point)

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min

    α (x0 ),β(x0 )Kλ(x0 ,xi )[ yi −α(x0)−β(x0)xi ]2

    i=1

    N

    ∑2/19/18 12

    Dr.YanjunQi/UVACS

    θ*(x0)= (BTW(x0)B)−1BTW(x0)y

  • Regularized multivariate linear regression

    Regression

    Y = Weighted linear sum of X’s

    Least-squares + Regularization

    Linear algebra for Ridge / sub-GD for Lasso & Elastic

    Regression coefficients (regularized weights)

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    2/19/18 13

    Dr.YanjunQi/UVACS

    min J(β ) = Y −Y^⎛

    ⎝⎞⎠

    2

    i=1

    n

    ∑ + λ( β jq )1/qj=1

    p

  • FeatureSelec@on:filtersvs.wrappersvs.embedding

    n  Maingoal:ranksubsetsofusefulfeatures

    FromDr.IsabelleGuyon2/19/18

    Dr.YanjunQi/UVACS

    14

  • Complexity versus Goodness of Fit: Model Selection

    x

    y

    x

    y

    x

    y

    x

    y

    Too simple?

    Too complex ? About right ?

    Training data

    What ultimately matters: GENERALIZATION

    LowVariance/HighBias

    LowBias/HighVariance

    2/19/18 15

    Dr.YanjunQi/UVACS

  • e.g.Byk=10foldCrossValida@onmodel P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

    1 train train train train train train train train train test

    2 train train train train train train train train test train

    3 train train train train train train train test train train

    4 train train train train train train test train train train

    5 train train train train train test train train train train

    6 train train train train test train train train train train

    7 train train train test train train train train train train

    8 train train test train train train train train train train

    9 train test train train train train train train train train

    10 test train train train train train train train train train

    •  Dividedatainto10equalpieces

    •  9piecesastrainingset,therest1astestset

    •  Collectthescoresfromthediagonal

    •  Wenormallyusethemeanofthescores 16 2/19/18

    Dr.YanjunQi/UVACSMakesurethatthetrain/test/valida@onfoldsareindeedindependentsamples.

  • EvaluaJone.g.Regression(1Dexample)

    2/19/18

    17

    y^=θ 0+θ1 x1

    θ * = XTX( )−1XT !y

    Dr.YanjunQi/UVACS

    ε1ε2

    := ε i

    Jtest =1m

    (x iTθ * − yi )2i=n+1

    n+m

    ∑ = 1m ε i2

    i=n+1

    n+m

    Tes@ngMSEErrortoreport:

  • e.g.APrac@calApplica@onofRegressionModel

    2/19/18

    Dr.YanjunQi/UVACS

    18

    ProceedingsofHLT’2010HumanLanguageTechnologies:

  • 2/19/18

    Dr.YanjunQi/UVACS

    19

    ThefeatureweightscanbedirectlyinterpretedasU.S.dollarscontributedtothepredictedvalueyˆbyeachoccurrenceofthefeature.

    tomovies

    AREALAPPLICATION:MovieReviewsandRevenues:AnExperimentinTextRegression,ProceedingsofHLT'10HumanLanguageTechnologies:

  • 2/19/18

    Dr.YanjunQi/UVACS

    20

    Acombina@onofthemetaandtextfeaturesachievesthebestperformancebothintermsofMAEandpearsonr.

  • 2/19/18

    Dr.YanjunQi/UVACS

    21

    MovieReviewsandRevenues:AnExperimentinTextRegression,ProceedingsofHLT'10HumanLanguageTechnologies:

    Thefeaturesarefromthetext-onlymodelannotatedinTable2(total,notperscreen).ThefeatureweightscanbedirectlyinterpretedasU.S.dollarscontributedtothepredictedvaluebyeachoccurrenceofthefeature.Sen@ment-relatedtextfeaturesarenotasprominentasmightbeexpected,andtheiroverallpropor@oninthesetoffeatureswithnon-zeroweightsisquitesmall(es@matedinpreliminarytrialsatlessthan15%).Phrasesthatrefertometadataarethemorehighlyweightedandfrequentones.

  • 2/19/18 22

    AnOpera@onalModelofMachineLearning

    Learner Reference Data

    Model

    Execution Engine

    Model Tagged Data

    Production Data Deployment

    Consistsofinput-outputpairs

    Dr.YanjunQi/UVACS

  • GoalsinGeneral

    •  1.GeneralizeWell– Connec@ngtoAsympto@cERRORBOUND

    •  2.Interpretable– Especiallyforsomedomains,thisisabouttrust!

    •  3.Computa@onalEfficient

    2/19/18

    Dr.YanjunQi/UVACS

    23

  • 24

    Probabilis@cInterpreta@onofLinearRegression(LATER)

    •  Letusassumethatthetargetvariableandtheinputsarerelatedbytheequa@on:

    whereεisanerrortermofunmodeledeffectsorrandomnoise

    •  NowassumethatεfollowsaGaussianN(0,σ),thenwehave:

    •  Byiid(amongsamples)assump@on:

    yi =θTx i + ε i

    ⎟⎟⎠

    ⎞⎜⎜⎝

    ⎛ −−= 22

    221

    σθ

    σπθ )(exp);|( i

    Ti

    iiyxyp x

    ⎟⎟

    ⎞⎜⎜

    ⎛ −−⎟

    ⎠⎞⎜

    ⎝⎛== ∑∏ =

    =2

    12

    1 221

    σθ

    σπθθ

    n

    i iT

    inn

    iii

    yxypL

    )(exp);|()(

    x

    2/19/18

    Dr.YanjunQi/UVACS

    Manymorevaria@onsofLinearRfromthisperspec@ve,e.g.binomial/poisson

    (LATER)

  • References

    q BigthankstoProf.EricXing@CMUforallowingmetoreusesomeofhisslides

    q Prof.AlexanderGray’sslides

    2/19/18 25

    Dr.YanjunQi/UVACS