Middle Term Exam

27
Middle Term Exam • 03/04, in class

description

Middle Term Exam. 03/04, in class. Project. It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign you to a “tough” project Important date 03/23: project proposal 04/27 and 04/29: presentation 05/02: final report. - PowerPoint PPT Presentation

Transcript of Middle Term Exam

Page 1: Middle Term Exam

Middle Term Exam

• 03/04, in class

Page 2: Middle Term Exam

Project

• It is a team work• No more than 2 people for each team

• Define a project of your own• Otherwise, I will assign you to a “tough” project

• Important date03/23: project proposal04/27 and 04/29: presentation05/02: final report

Page 3: Middle Term Exam

Project Proposal

• Introduction: describe the research problem• Related wok: describe the existing approaches and their deficiency• Proposed approaches: describe your approaches and its potential

to overcome the shortcomings of existing approaches• Plan: the plan for this project (code development, data sets, and

evaluation)• Format: it should look like a research paper• The required format (both Microsoft Word and Latex) can be

downloaded from www.cse.msu.edu/~cse847/assignments/format.zip

• Warning: any submission that does not follow the format will be given zero score.

Page 4: Middle Term Exam

Project Report

• The same format as the proposal• Expand the proposal with detailed description of

your algorithm and evaluation results• Presentation• 25 minute presentation• 5 minute discussion

Page 5: Middle Term Exam

Introduction to Information Theory

Rong Jin

Page 6: Middle Term Exam

Information

Information knowledgeInformation: reduction in uncertaintyExample:

1. flip a coin2. roll a die#2 is more uncertain than #1• Therefore, more information is provided by the

outcome of #2 than #1

Page 7: Middle Term Exam

Definition of Information

Let E be some event that occurs with probability P(E). If we are told that E has occurred, then we say we have received I(E)=log2(1/P(E)) bits of information

Example:Result of a fair coin flip (log22=1 bit)

Result of a fair die roll (log26=2.585 bits)

Page 8: Middle Term Exam

Entropy

• A zero-memory information source S is a source that emits symbols from an alphabet {s1, s2,…, sk} with probability {p1, p2,…,pk}, respectively, where the symbols emitted are statistically independent.

• Entropy is the average amount of information in observing the output from S

Page 9: Middle Term Exam

Entropy

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1. 0 H(P) logk2. Measures the uniformness of a distribution P: The

further P is from uniform, the lower the entropy.3. For any other probability distribution {q1,…,qk},

Page 10: Middle Term Exam

A Distance Measure Between Distributions

• Kullback-Leibler distance between distributions P and Q

• 0 D(P, Q)• The smaller D(P, Q), the more Q is similar to P• Non-symmetric: D(P, Q) D(Q, P)

Page 11: Middle Term Exam

Mutual Information

• Indicate the amount of information shared between two random variables

• Symmetric: I(X;Y) = I(Y;X)• Zero iff X and Y are independent

Page 12: Middle Term Exam

Maximum Entropy

Rong Jin

Page 13: Middle Term Exam

Motivation

Consider a translation example• English ‘in’ French {dans, en, à, au-cours-de, pendant}• Goal: p(dans), p(en), p(à), p(au-cours-de), p(pendant)

Case 1: no prior knowledge on translation

Case 2: 30% of times either dans or en is used

Page 14: Middle Term Exam

Maximum Entropy Model: Motivation

• Case 3: 30% of time dans or en is used, and 50% of times dans or à is used

• Need a measure the uninformness of a distribution

Page 15: Middle Term Exam

Maximum Entropy Principle (MaxEnt)

• p(dans) = 0.2, p(a) = 0.3, p(en)=0.1• p(au-cours-de) = 0.2, p(pendant) = 0.2

* max ( )

where ( ) ( ) log ( ) ( ) log ( ) ( ) log ( )

( ) log ( ) ( ) log ( )

subject to

( ) ( ) 3/10

( ) ( ) 1/ 2

( ) ( ) ( ) (

PP H P

H P p dans p dans p en p en p a p a

p au course de p au course de p pendant p pendant

p dans p en

p dans p a

p dans p en p a p au cours d

) ( ) 1e p pendant

Page 16: Middle Term Exam

MaxEnt for Classification

Objective is to learn p(y|x)

Constraints• Appropriate normalization

Page 17: Middle Term Exam

MaxEnt for Classification

Constraints• Consistent with data

Feature function

Empirical mean of feature functions

Model mean of feature functions

Page 18: Middle Term Exam

MaxEnt for Classification

• No assumption about p(y|x) (non-parametric)• Only need the empirical mean of feature functions

Page 19: Middle Term Exam

MaxEnt for Classification

• Feature function

Page 20: Middle Term Exam

Example of Feature Functions

f1(x) I(x{dans, en})

f2(x) I(x{dans, a})

dans 1 1

en 1 0

au-cours-de 0 0

a 0 1

pendant 0 0

Empirical Average 0.3 0.5

Page 21: Middle Term Exam

Solution to MaxEnt

• Identical to conditional exponential model

• Solve W by maximum likelihood estimation

Page 22: Middle Term Exam

Iterative Scaling (IS) Algorithm

Assume

Page 23: Middle Term Exam

Iterative Scaling (IS) Algorithm• Compute the empirical mean for every feature and every

class

• Initialize Repeat

• Compute p(y|x) for each training example (xi, yi) using W• Compute the model mean of every feature for every class

• Update W

Page 24: Middle Term Exam

Iterative Scaling (IS) Algorithm• It guarantees that the likelihood function always

increases

Page 25: Middle Term Exam

Iterative Scaling (IS) Algorithm

• How about features that can take both positive and negative values?

• How about the sum of features is not a constant?

Page 26: Middle Term Exam

MaxEnt for Classification

Page 27: Middle Term Exam

MaxEnt for Classification