What’slearning?...

29
1 ©2016 Sham Kakade 1 What’s learning? Point Estimation Machine Learning – CSE546 Sham Kakade University of Washington September 28, 2016 http://www.cs.washington.edu/education/courses/cse546/16au/ 2 ©2016 Sham Kakade What is Machine Learning ?

Transcript of What’slearning?...

Page 1: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

1

©2016 Sham Kakade 1

What’s learning?Point Estimation

Machine Learning – CSE546Sham KakadeUniversity of Washington

September 28, 2016

http://www.cs.washington.edu/education/courses/cse546/16au/

2©2016 Sham Kakade

What is Machine Learning ?

Page 2: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

2

3©2016 Sham Kakade

Machine Learning

Study of algorithms thatn improve their performancen at some taskn with experience

Data UnderstandingMachine Learning

4©2016 Sham Kakade

Classification

from data to discrete classes

Page 3: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

3

Spam filtering

5©2016 Sham Kakade

data prediction

6©2016 Sham Kakade

Text classification

Company home page

vs

Personal home page

vs

University home page

vs

Page 4: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

4

7©2016 Sham Kakade

Object detection

Example training images for each orientation

(Prof. H. Schneiderman)

8©2016 Sham Kakade

Reading a noun (vs verb)

[Rustandi et al., 2005]

Page 5: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

5

Weather prediction

9©2016 Sham Kakade

The classification pipeline

10©2016 Sham Kakade

Training

Testing

Page 6: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

6

11©2016 Sham Kakade

Regression

predicting a numeric value

Stock market

12©2016 Sham Kakade

Page 7: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

7

Weather prediction revisited

13©2016 Sham Kakade

Temperature

14©2016 Sham Kakade

Modeling sensor data

n Measure temperatures at some locations

n Predict temperatures throughout the environment15

20

25

30

tem

pera

ture

(C

)

Page 8: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

8

15©2016 Sham Kakade

Similarity

finding data

Given image, find similar images

16©2016 Sham Kakade http://www.tiltomo.com/

Page 9: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

9

Similar products

17©2016 Sham Kakade

18©2016 Sham Kakade

Clustering

discovering structure in data

Page 10: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

10

Clustering Data: Group similar things

©2016 Sham Kakade 19

Clustering images

20©2016 Sham Kakade [Goldberger et al.]

Set of Images

Page 11: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

11

Clustering web search results

21©2016 Sham Kakade

22©2016 Sham Kakade

Embedding

visualizing data

Page 12: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

12

Embedding images

23©2016 Sham Kakade

Images have thousands or millions of pixels.

Can we give each image a coordinate,

such that similar images are near each other?

[Saul & Roweis ‘03]

Embedding words

24©2016 Sham Kakade [Joseph Turian]

Page 13: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

13

Embedding words (zoom in)

25©2016 Sham Kakade [Joseph Turian]

26©2016 Sham Kakade

Reinforcement Learning

training by feedback

Page 14: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

14

27©2016 Sham Kakade

Learning to act

n Reinforcement learningn An agent

¨ Makes sensor observations¨ Must select action¨ Receives rewards

n positive for “good” statesn negative for “bad” states

[Ng et al. ’05]

28©2016 Sham Kakade

Impact

What are the biggest successes?

Page 15: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

15

29©2016 Sham Kakade

Successes

n Speech Recognition¨ SIRI, Alexa, etc.

n Computer vision¨ ImageNet

n Alpha-­Go¨ Game playing¨ Go was ‘solved’ with ML/AI

n And more:¨ Natural language processing¨ Robotics (self-­driving cars?)¨ Medical analysis¨ Computational biology

30©2016 Sham Kakade

Growth of Machine Learning

n Machine learning is preferred approach to¨ Speech recognition, Natural language processing¨ Computer vision¨ Medical outcomes analysis¨ Robot control¨ Computational biology¨ Sensor networks¨ …

n This trend is accelerating, especially with Big Data¨ Improved machine learning algorithms ¨ Improved data capture, networking, faster computers¨ Software too complex to write by hand¨ New sensors / IO devices¨ Demand for self-­customization to user, environment

One of the most sought for specialties in industry today.

Page 16: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

16

31©2016 Sham Kakade

Logistics

32©2016 Sham Kakade

Syllabus

n Covers a wide range of Machine Learning techniques – from basic to state-­of-­the-­art

n You will learn about the methods you heard about:¨ Point estimation, regression, logistic regression, optimization, nearest-­neighbor, decision trees, boosting, perceptron, overfitting, regularization, dimensionality reduction, PCA, error bounds, SVMs, kernels, margin bounds, K-­means, EM, mixture models, HMMs, graphical models, deep learning, reinforcement learning…

n Covers algorithms, theory and applicationsn It’s going to be fun and hard work.

Page 17: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

17

33©2016 Sham Kakade

Prerequisites

n Linear algebra:¨ SVDs, eigenvectors, matrix multiplication

n Probabilities ¨ Distributions, densities, marginalization…

n Basic statistics¨ Moments, typical distributions, regression…

n Algorithms¨ Dynamic programming, basic data structures, complexity…

n Programming¨ Python will be very useful

n We provide some background, but the class will be fast paced

n Ability to deal with “abstract mathematical concepts”

Recitations & Python

n We’ll run an optional recitations:¨ Time/Location

n We are recommending Python for homeworks!¨ There are many resources to get started with Python online

¨We’ll run an optional tutorial:n First recitation: next week

34©2016 Sham Kakade

Page 18: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

18

35©2016 Sham Kakade

Staff

n Three Great TAs: Great resource for learning, interact with them!¨ Dae Hyun LeeOffice hours: TBD

¨ Angli LiuOffice hours: TBD

¨ Alon MilchgrubOffice hours: TBD

36©2016 Sham Kakade

Communication Channels

n Announcements on Canvas.n Use the Discussion board!

¨All non-­personal questions should go here¨Answering your question will help others¨ Feel free to chime in

n For e-­mailing instructors about personal issues and grading use:¨ cse546-­[email protected]

n Office hours limited to knowledge based questions. Use email for all grading questions.

Page 19: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

19

37©2016 Sham Kakade

Text Books

n Required Textbook: ¨ Machine Learning: a Probabilistic Perspective;; Kevin Murphy

n Optional Books:¨ Understanding Machine Learning: From Theory to Algorithms;; Shai Shalev-­Shwartz and Shai Ben-­David.

¨ Pattern Recognition and Machine Learning;; Chris Bishop¨ The Elements of Statistical Learning: Data Mining, Inference, and Prediction;; Trevor Hastie, Robert Tibshirani, Jerome Friedman

¨ Machine Learning;; Tom Mitchell¨ Information Theory, Inference, and Learning Algorithms;; David MacKay

38©2016 Sham Kakade

Grading

n 4 homeworks (65%)¨ First posted today

n Start early!¨HW 1,2,4 (15%)

n Collaboration allowedn You must write (and submit) your own code, which we may run.

n You must write (and understand) your own answers.¨HW 3 midterm (20%)

n No collaboration allowed.

n Final project (35%)¨ Full details: see website¨Projects done individually, or groups of two students

Page 20: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

20

39©2016 Sham Kakade

HW Policy (SEE WEBSITE)n Homeworks are hard/long, start early

¨ Heavy programming component.¨ They will build on themselves (you will re-­use your code).

n 33% subtracted per late day.n You have 2 LATE DAYS to use for homeworks throughout the quarter

¨ Please plan accordingly.¨ No exceptions (aside from university policies).

n All homeworks must be handed in, even for zero credit.n Use Canvas to submit homeworks.n No collaboration allowed on HW 3 n Collaboration: HW 1,2,4

¨ Each student writes (and understands) their own answers.¨ You may discuss the questions.¨ Write on your homework anyone with whom you collaborate.¨ Each student must write their own code for the programming part.¨ Please don’t search for answers on the web, Google, previous years’ homeworks, etc.

n please ask us if you are not sure if you can use a particular reference

Projects (35%)n SEE WEBSITEn An opportunity/intro for research.

¨ encouraged to be related to your research, but must be something new you did this quarter

¨ It’s Not a project you worked on during the summer, last year, etc.n Grading:

¨ We seek some novel exploration.¨ If you write your own code, great. We take this into account for grading. ¨ You may use ML toolkits (e.g. TensorFlow, etc), then we expect more ambitious project (in terms of scope, data, etc).

¨ If you use simpler/smaller datasets, then we expect a more involved analysis.

n Individually or groups of twon Must involve real data

¨ Must be data that you have available to you by the time of the project proposalsn Must involve machine learning

40©2016 Sham Kakade

Page 21: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

21

(tentative) project dates (35%)n Full details in a couple of weeks

n Mon., October 24, 5p: Project Proposalsn Mon., November 14, 5p: Project Milestonen Thu., December 8, 9-­11:30am: Poster Sessionn Thu., December 15, 10am: Project Report

41©2016 Sham Kakade

42©2016 Sham Kakade

Enjoy!

n ML is becoming ubiquitous in science, engineering and beyond

n It’s one of the hottest topics in industry today n This class should give you the basic foundation for applying ML and developing new methods

n Have fun..

Page 22: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

22

43©2016 Sham Kakade

A Data Science Job

n Someone asks you a stat/data science question:¨She says: I have thumbtack, if I flip it, what’s the probability it will fall with the nail up?

¨You say: Please flip it a few times:

¨You say: The probability is:

¨She says: Why???¨You say: Because…

44©2016 Sham Kakade

Thumbtack – Binomial Distribution

n P(Heads) = q, P(Tails) = 1-­q

n Flips are i.i.d.:¨ Independent events¨ Identically distributed according to Binomial distribution

n Sequence D of aH Heads and aT Tails

Page 23: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

23

45©2016 Sham Kakade

Maximum Likelihood Estimation

n Data: Observed set D of aH Heads and aT Tails n Hypothesis: Binomial distribution n Learning q is an optimization problem

¨What’s the objective function?

n MLE: Choose q that maximizes the probability of observed data:

46©2016 Sham Kakade

Your first learning algorithm

n Set derivative to zero:

Page 24: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

24

47©2016 Sham Kakade

How many flips do I need?

n She says: I flipped 3 heads and 2 tails.n You say: q = 3/5, I can prove it!n She says: What if I flipped 30 heads and 20 tails?n You say: Same answer, I can prove it!

nShe says: What’s better?n You say: Humm… The more the merrier???n She says: Is this why I am paying you the big bucks???

48©2016 Sham Kakade

Simple bound (based on Hoeffding’s inequality)

n For N = aH+aT, and

n Let q* be the true parameter, for any e>0:

Page 25: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

25

49©2016 Sham Kakade

PAC Learning

n PAC: Probably Approximate Correctn Billionaire says: I want to know the thumbtack parameter q, within e = 0.1, with probability at least 1-­d = 0.95. How many flips?

50©2016 Sham Kakade

What about continuous variables?

n She says: If I am measuring a continuous variable, what can you do for me?

n You say: Let me tell you about Gaussians…

Page 26: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

26

51©2016 Sham Kakade

Some properties of Gaussians

n affine transformation (multiplying by scalar and adding a constant)¨X ~ N(µ,s2)¨Y = aX + b è Y ~ N(aµ+b,a2s2)

n Sum of Gaussians¨X ~ N(µX,s2X)¨Y ~ N(µY,s2Y)¨ Z = X+Y è Z ~ N(µX+µY, s2X+s2Y)

52©2016 Sham Kakade

Learning a Gaussian

n Collect a bunch of data¨Hopefully, i.i.d. samples¨ e.g., exam scores

n Learn parameters¨Mean¨Variance

Page 27: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

27

53©2016 Sham Kakade

MLE for Gaussian

n Prob. of i.i.d. samples D=x1,…,xN:

n Log-­likelihood of data:

54©2016 Sham Kakade

Your second learning algorithm:MLE for mean of a Gaussian

n What’s MLE for mean?

Page 28: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

28

55©2016 Sham Kakade

MLE for variance

n Again, set derivative to zero:

56©2016 Sham Kakade

Learning Gaussian parameters

n MLE:

n BTW. MLE for the variance of a Gaussian is biased¨Expected result of estimation is not true parameter! ¨Unbiased variance estimator:

Page 29: What’slearning? Point&Estimationcourses.cs.washington.edu/courses/cse546/16au/slides/lecture01.pdf · 2 ©2016&Sham&Kakade 3 Machine&Learning Studyof&algorithmsthat! improve&their&performance!

29

What you need to know…

n Learning is…¨ Collect some data

n E.g., thumbtack flips¨ Choose a hypothesis class or model

n E.g., binomial¨ Choose a loss function

n E.g., data likelihood¨ Choose an optimization procedure

n E.g., set derivative to zero to obtain MLE

n Like everything in life, there is a lot more to learn…¨ Many more facets… Many more nuances… ¨ More later…

57©2016 Sham Kakade