MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala...

26
MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057 – Vipul Singh

Transcript of MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala...

Page 1: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

MAXIMUM ENTROPY MARKOV MODEL

Adapted From: Heshaam FailiUniversity of Tehran

100050052 – Dikkala Sai Nishanth100050056 – Ashwin P. Paranjape100050057 – Vipul Singh

Page 2: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Introduction

• Need for MEMM in NLP

• MEMM and the feature and weight vectors

• Linear and Logistic Regression (MEMM)

• Learning in logistic regression

• Why call it Maximum Entropy?2

Page 3: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Need for MEMM in NLP

3

Page 4: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

• HMM – tag and observed word both depend only on previous tag

• Need to account for dependency of tag on observed word

• Need to extract “Features” from word & use

4

Page 5: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

MAXIMUM ENTROPY MODELS• Machine learning framework called Maximum Entropy modeling, MAXEnt• Used for Classification

– The task of classification is to take a single observation, extract some useful features describing the observation, and then based on these features, to classify the observation into one of a set of discrete classes.

• Probabilistic classifier: gives the probability of the observation being in that class

• Non-sequential classification– in text classification we might need to decide whether a particular email

should be classified as spam or not– In sentiment analysis we have to determine whether a particular sentence or

document expresses a positive or negative opinion.– we’ll need to classify a period character (‘.’) as either a sentence boundary or

not

5

Page 6: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

MaxEnt

• MaxEnt belongs to the family of classifiers known as the exponential or log-linear classifiers

• MaxEnt works by extracting some set of features from the input, combining them linearly (meaning that we multiply each by a weight and then add them up), and then using this sum as an exponent

• Example: tagging– A feature for tagging might be this word ends in -ing or the

previous word was ‘the’

6

Page 7: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Linear Regression

• Two different names for tasks that map some input features into some output value: regression when the output is real-valued, and classification when the output is one of a discrete set of classes

7

Page 8: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

8

price = w0+w1 Num Adjectives∗

Page 9: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Multiple linear regression

• price=w0+w1 Num Adjectives+∗ w2 Mortgage Rate+∗ w3 Num Unsold Houses∗

9

Page 10: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Learning in linear regression

• sum-squared error

• X is matrix of feature vectors• y is vector of costs

10

Page 11: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Logistic regression

• Classification in which the output y we are trying to predict takes on one from a small set of discrete values

• binary classification:

• Odds

• logit function11

Page 12: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Logistic regression

12

Page 13: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Logistic regression

13

Page 14: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Logistic regression: Classification

14

hyperplane

Page 15: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Learning in logistic regression

15

conditional maximum likelihood

estimation.

Page 16: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Learning in logistic regression

Convex Optimization

16

PS – GIS, if needed will be inserted here.

Page 17: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

MAXIMUM ENTROPY MODELING

• multinomial logistic regression(MaxEnt)– Most of the time, classification problems that

come up in language processing involve larger numbers of classes (part-of-speech classes)

• y is a value take on C different value corresponding to classes C1,…,Cn

17

Page 18: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Maximum Entropy Modeling

• Indicator function: A feature that only takes on the values 0 and 1

18

Page 19: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Maximum Entropy Modeling• Example

– Secretariat/NNP is/BEZ expected/VBN to/TO race/?? tomorrow/

19

Page 20: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Maximum Entropy Modeling

20

Page 21: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

The Occam Razor

• Adopting the least complex hypothesis possible is embodied in Occam's razor ("Nunquam ponenda est pluralitas sine necesitate.')

21

Page 22: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Why do we call it Maximum Entropy?

• From of all possible distributions, the equiprobable distribution has the maximum entropy

22

Page 23: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Why do we call it Maximum Entropy?

23

Page 24: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

Maximum Entropy

probability distribution of a multinomial logistic regression model whose weights W maximize the likelihood of the training data! Thus the exponential model

24

Page 25: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

References• Jurafsky, Daniel and Martin, James H. (2006) Speech and Language

Processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice-Hall.

• Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. 1996. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 1 (March 1996), 39-71.

• Ratnaparkhi A. 1996. A Maximum Entropy Model for Part-of-Speech Tagging. Proceedings of the Empirical Methods in Natural Language Processing (1996), pp. 133-142

25

Page 26: MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran 100050052 – Dikkala Sai Nishanth 100050056 – Ashwin P. Paranjape 100050057.

THANK YOU

26