Text Classification Spring 2018 COSC 6336: Natural ...

40
Text Classification COSC 6336: Natural Language Processing Spring 2018 Some content on these slides was adapted from J&M and from Wei Xu

Transcript of Text Classification Spring 2018 COSC 6336: Natural ...

Page 1: Text Classification Spring 2018 COSC 6336: Natural ...

Text ClassificationCOSC 6336: Natural Language Processing

Spring 2018

Some content on these slides was adapted from J&M and from Wei Xu

Page 2: Text Classification Spring 2018 COSC 6336: Natural ...

Class Announcements★ Midterm exam now on March 21st. ★ Removing Paper presentations from syllabus★ Practical 1 is posted:

○ System submission is March 7th○ Report is due March 9th

Page 3: Text Classification Spring 2018 COSC 6336: Natural ...

Today’s lecture★ Text Classification

○ Example applications○ Task definition

★ Classical approaches to Text Classification○ Naive Bayes○ MaxEnt

Page 4: Text Classification Spring 2018 COSC 6336: Natural ...
Page 5: Text Classification Spring 2018 COSC 6336: Natural ...

What do these books have in common?

Page 6: Text Classification Spring 2018 COSC 6336: Natural ...

Other tasks that can be solved as TC★ Sentiment classification

★ Native language identification

★ Author Profiling

★ Depression detection

★ Cyberbullying detection

★ ….

Page 7: Text Classification Spring 2018 COSC 6336: Natural ...

Formal definition of the TC task

★ Input:○ a document d○ a fixed set of classes C = {c1, c2,…, cJ}

★ Output: a predicted class c ∈ C

Page 8: Text Classification Spring 2018 COSC 6336: Natural ...

Methods for TC tasks★ Rule based approaches★ Machine Learning algorithms

○ Naive Bayes○ Support Vector Machines○ Logistic Regression○ And now deep learning approaches

Page 9: Text Classification Spring 2018 COSC 6336: Natural ...

Naive Bayes for Text Classification★ Simple approach ★ Based on the bag-of-words representation

Page 10: Text Classification Spring 2018 COSC 6336: Natural ...

Bag of wordsThe first reference to Bag of Words is attributed to a 1954 paper by Zellig Harris

Page 11: Text Classification Spring 2018 COSC 6336: Natural ...

Naive BayesProbabilistic classifier (eq. 1)

According to Bayes rule: (eq. 2)

Replacing eq. 2 into eq. 1:

Dropping the denominator:

Page 12: Text Classification Spring 2018 COSC 6336: Natural ...

Naive BayesA document d is represented as a set of features f

1 , f

2 , …, f

n

How many parameters do we need to learn in this model?

Page 13: Text Classification Spring 2018 COSC 6336: Natural ...

Naive Bayes Assumptions1. Position doesn’t matter2. Naive Bayes assumption: probabilities P(f

i|c) are independent given the class

c and thus we can multiply them:

This leads us to:

Page 14: Text Classification Spring 2018 COSC 6336: Natural ...

Naive Bayes in PracticeWe consider word positions:

We also do everything in log space:

Page 15: Text Classification Spring 2018 COSC 6336: Natural ...

Naive Bayes: TrainingHow do we compute and ?

Page 16: Text Classification Spring 2018 COSC 6336: Natural ...

Is Naive Bayes a good option for TC?

Page 17: Text Classification Spring 2018 COSC 6336: Natural ...

Evaluation in TCConfusion table

Accuracy = TP + TN

(TP + TN + FN + FP)

Gold Standard

True False

True TP = true positives FP = False positives

False FN = false negatives TN = True negatives

Page 18: Text Classification Spring 2018 COSC 6336: Natural ...

Evaluation in TC: Issues with Accuracy?Suppose we want to learn to classify each message in a web forum as “extremely negative”. We have a collected gold standard data:

★ 990 instances are labeled as negative★ 10 instances are labeled as positive★ Test data has 100 instances (99- and 1+)★ A dumb classifier can get 99% accuracy by always predicting “negative” !

Page 19: Text Classification Spring 2018 COSC 6336: Natural ...

More Sensible Metrics: Precision, Recall and F-measure

P= TP/(TP+FP)

R=TP/(TP+FN)

F-measure =

Gold Standard

True False

True TP = true positives FP = False positives

False FN = false negatives TN = True negatives

Page 20: Text Classification Spring 2018 COSC 6336: Natural ...

What about Multi-class problems?● Multi-class: c > 2

● P, R, and F-measure are defined for a single class

● We assume classes are mutually exclusive

● We use per class evaluation metrics

P = R =

Page 21: Text Classification Spring 2018 COSC 6336: Natural ...

Micro vs Macro Average★ Macro average: measure performance per class and then average★ Micro average: collect predictions for all classes then compute TP, FP, FN,

and TN ★ Weighted average: compute performance per label and then average where

each label score is weighted by its support

Page 22: Text Classification Spring 2018 COSC 6336: Natural ...

Example

Page 23: Text Classification Spring 2018 COSC 6336: Natural ...

Train/Test Data Separation

Page 24: Text Classification Spring 2018 COSC 6336: Natural ...

Does this model look familiar?

Page 25: Text Classification Spring 2018 COSC 6336: Natural ...

Logistic Regression(MaxEnt)

Page 26: Text Classification Spring 2018 COSC 6336: Natural ...

Disadvantages of NB★ Correlated features

○ Their evidence is overestimated

★ This will hurt classifier performance

Page 27: Text Classification Spring 2018 COSC 6336: Natural ...

Logistic Regression★ No independence assumption★ Feature weights are learned simultaneously

Page 28: Text Classification Spring 2018 COSC 6336: Natural ...

Logistic RegressionComputes p(y|x) directly from training data:

Page 29: Text Classification Spring 2018 COSC 6336: Natural ...

Logistic RegressionComputes p(y|x) directly from training data:

But since the result of this will not be a proper probability function we need to normalize this:

Page 30: Text Classification Spring 2018 COSC 6336: Natural ...

Logistic Regression★ We then compute the probability (y|x) for every class★ We choose the class with the highest probability

Page 31: Text Classification Spring 2018 COSC 6336: Natural ...

Features in Text Classification★ We can have indicator functions of the form:

Page 32: Text Classification Spring 2018 COSC 6336: Natural ...

Features in Text Classification★ Or other functions:

Page 33: Text Classification Spring 2018 COSC 6336: Natural ...

An Example

Page 34: Text Classification Spring 2018 COSC 6336: Natural ...

Learning in Logistic Regression

Page 35: Text Classification Spring 2018 COSC 6336: Natural ...

Learning in Logistic RegressionHow does training look like?

We need to find the right weights

How?

By using the Conditional Maximum Likelihood Principle:

Choose parameters that maximize the log probability of the y labels in the training data:

Page 36: Text Classification Spring 2018 COSC 6336: Natural ...

Learning in Logistic RegressionSo the objective function L(w) we are maximizing is this:

Page 37: Text Classification Spring 2018 COSC 6336: Natural ...

Learning in Logistic Regression★ We use optimization methods like Stochastic Gradient Ascent to find the

weights that maximize L(w)★ We start with weights = 0 and move in the direction of the gradient (the partial

derivative L’(w))

Page 38: Text Classification Spring 2018 COSC 6336: Natural ...

Learning in Logistic Regression★ The derivative turns out to be easy:

Page 39: Text Classification Spring 2018 COSC 6336: Natural ...

Regularization★ Overfitting: when a model “memorizes” the training data ★ To prevent overfitting we penalize large weights:

★ Regularization forms:○ L2 norm (Euclidean)

○ L1 norm (Manhattan)

Page 40: Text Classification Spring 2018 COSC 6336: Natural ...

In Summary★ NB is a generative model

○ Highly correlated features can result in bad results

★ Logistic Regression○ Weights for feature are calculated simultaneously

★ NB seems to fare better in small data sets★ Both NB and Logistic regression are linear classifiers★ Regularization helps alleviate overfitting