Naïve Bayes Text Classi cation
Transcript of Naïve Bayes Text Classi cation
![Page 1: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/1.jpg)
Naïve Bayes Text Classification
9 March 2021
cmpu 366 · Computational Linguistics
![Page 2: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/2.jpg)
Machine learning is the area of computer science focused on the development and implementation of systems that improve as they encounter more data.
Machine learning has been central to advances in NLP for approximately the last 25 years.
![Page 3: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/3.jpg)
Text classification
![Page 4: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/4.jpg)
From: "Fabian Starr“ <[email protected]> Subject: Hey! Sofware for the funny prices!
Get the great discounts on popular software today for PC and Macintosh http://iiled.org/Cj4Lmx 70-90% Discounts from retail price!!! All sofware is instantly available to download - No Need Wait!
Is this spam?
![Page 5: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/5.jpg)
Who wrote each of the Federalist papers?
Mad dog A. Ham.
![Page 6: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/6.jpg)
What’s the subject of this medical article? Antagonists and inhibitors
Blood supply
Chemistry
Drug therapy
Embryology
Epidemiology
…
![Page 7: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/7.jpg)
…zany characters and richly applied satire, and some great plot twists
It was pathetic. The worst part about it was the boxing scenes…
…awesome caramel sauce and sweet toasty almonds. I love this place!
…awful pizza and ridiculously overpriced…
Are these reviews positive or negative?
![Page 8: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/8.jpg)
Many problems take the form of text classification: Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language identification
Sentiment analysis
Part-of-speech tagging
Automatic essay grading
…
![Page 9: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/9.jpg)
Text classification problems take this form: Input:
A document d
A fixed set of classes C = {c1, c2, …, cj}
Output:
A predicted class c ∈ C
![Page 10: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/10.jpg)
We can build a classifier by writing rules. Look for combinations of words or other features
Spam: black-list-address OR (“dollars” AND “have been selected”)
Accuracy can be high if the rules are carefully refined by an expert.
But building and maintaining these rules is expensive.
![Page 11: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/11.jpg)
Instead, like humans learn from experience, we make computers learn from data.
![Page 12: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/12.jpg)
A supervised machine-learning text classification problem takes this form:
Input:
A document d
A fixed set of classes C = {c1, c2, …, cj}
A training set of m hand-labeled documents (d1, c1), …, (dm, cm)
Output:
A learned classifier γ : d → c
![Page 13: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/13.jpg)
Supervised machine learning
Source: NLTK book
![Page 14: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/14.jpg)
Features
![Page 15: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/15.jpg)
A classification decision must rely on some observable evidence, which we encode as features.
![Page 16: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/16.jpg)
Typical features include: Words (n-grams) present in the text
Frequency of words
Capitalization
Presence of named entities
Syntactic relations
Semantic relations
![Page 17: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/17.jpg)
The simplest and most common features are Boolean, e.g., is the word present or not?
However, we can also have integer features like the number of times a word occurs.
![Page 18: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/18.jpg)
The features we select depend on the task. Is a name masculine or feminine?
Last letter = …
What part-of-speech is a word, e.g., park or carbingly?
Is the word preceded by the? to?
Does the word end with -ly? -ness?
Is an email spam?
Does it contain generic Viagra?
Is the subject in all capital letters?
See features that were used by SpamAssassin: http://spamassassin.apache.org/old/tests_3_3_x.html
![Page 19: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/19.jpg)
Feature engineering is the problem of deciding what features are relevant.
Approaches: Hand-crafted
Use expert knowledge to determine a small set of features that are likely to be relevant.
Kitchen sink
Give lots of features to the machine-learning algorithm and see what features are given greater weight and which are ignored
E.g., use each word in the document as a feature: has-cash: True
has-the: True
has-linguistics: False
…
![Page 20: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/20.jpg)
Weighting the evidence
A classification decision involves reconciling multiple features with different levels of predictive power.
Different types of classifiers use different algorithms to:
Determine the weights of individual features to maximize correct predictions for the training data and
Compute the likelihood of a label for an input, using the feature weights.
![Page 21: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/21.jpg)
Popular machine learning methods: Naïve Bayes
Decision tree
Maximum entropy (ME)
Hidden Markov model (HMM)
Neural networks, including deep learning
Support vector machine (SVM)
![Page 22: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/22.jpg)
Naïve Bayes
![Page 23: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/23.jpg)
Naïve Bayes is a simple classification method based on Bayes rule.
For text classification, we can use it with a simple representation of a document as a bag of words.
![Page 24: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/24.jpg)
The bag of words representation
Figure from J&M, 3rd ed. draft sec. 6.1
![Page 25: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/25.jpg)
The bag of words representation
Figure from J&M, 3rd ed. draft sec. 6.1
![Page 26: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/26.jpg)
The bag of words representation
Figure from J&M, 3rd ed. draft sec. 6.1
![Page 27: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/27.jpg)
The bag of words representation
γ( ) = cseen 2sweet 1whimsical 1recommend 1happy 1
... ...
![Page 28: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/28.jpg)
Bayes rule and classification
![Page 29: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/29.jpg)
Bayes rule relates conditional probabilities.
For a document d and a class c,
P(c ∣ d) =P(d ∣ c) ⋅ P(c)
P(d)Posterior
Likelihood
Evidence
Prior
![Page 30: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/30.jpg)
To choose the most likely class, cMAP from the set of classes C, for a document d:
cMAP = argmaxc∈C
P(c ∣ d)
= argmaxc∈C
P(d ∣ c)P(c)P(d)
= argmaxc∈C
P(d ∣ c)P(c)
MAP is “maximum a posteriori” – the most likely class
Bayes rule
Dropping the denominator, which is the same for each class
![Page 31: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/31.jpg)
But the number of training examples needed to calculate these estimates is exponentially large compared to the number of features:
O(|F|n · |C|) parameters
cMAP = argmaxc∈C
P(d ∣ c)P(c)
= argmaxc∈C
P( f1, f2, …, fn ∣ c)P(c)
Likelihood Prior
Document d is represented as features f1, …, fn
![Page 32: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/32.jpg)
Fortunately, the “naïve” in “naive Bayes” isn’t (just) a value judgment; it’s a functional design choice.
The naïve Bayes assumption is that the features f1, …, fn are conditionally independent (of one another) given the class c.
This simplifies combining contributions of features; you just multiply their probabilities:
P(f1, …, fn | c) = P(f1 | c) · P(f2 | c) ⋯ P(fn | c)
![Page 33: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/33.jpg)
cMAP = argmaxc∈C
P( f1, f2, …, fn ∣ c)P(c)
cNB = argmaxc∈C
P(c)n
∏i=1
P( fi ∣ c)
![Page 34: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/34.jpg)
Returning to our “bag of words model”: Let positions = all word positions in a document
where wi is the word at position i.
cNB = argmaxc∈C
P(c) ∏i ∈ positions
P(wi ∣ c)
This class is the one our naïve Bayes text classifier returns
![Page 35: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/35.jpg)
Naïve Bayes: Learning
![Page 36: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/36.jpg)
We need to estimate the prior probability of each category, P(c) for each c ∈ C.
We can get the maximum-likelihood estimate for each c from the training corpus:
We also need the probability of each word (feature) given each category:
P(c) =doccount(C = c)
Ndoc
P(wi ∣ c) =count(wi, c)
∑w∈V count(w, c)Fraction of times word wi appears among all words in documents of topic c
![Page 37: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/37.jpg)
In general, the more training data we can give the classifier, the better it will do. Er
ror r
ate
Training set size
![Page 38: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/38.jpg)
Note that we have a big problem with zero counts! If we never saw the word fantastic in a document labeled as positive,
When we calculate
this one 0 will turn the whole estimate to 0!
P(fantastic ∣ positive) =count(fantastic, positive)∑w∈V count(w, positive)
= 0
cMAP = argmaxc∈C
P(c)n
∏i=1
P( fi ∣ c)
![Page 39: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/39.jpg)
As we did with n-grams, we can use Laplace (add-1) smoothing:
→ P(wi ∣ c) =count(wi, c)
∑w∈V count(w, c)count(wi, c) + 1
(∑w∈V count(w, c)) + |V |
![Page 40: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/40.jpg)
What about the unknown words – those that appear in the test data but not in the training data?
Ignore them! Just remove them from the test document.
We could build an unknown word model, but it wouldn’t generally help; it’s unlikely to help us to know which class has more unknown words.
![Page 41: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/41.jpg)
Naïve Bayes and language modeling
![Page 42: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/42.jpg)
Generative model for multinomial naïve Bayes
c = spam
w1 = Dear w2 = sir w3 = SEEKING w4 = YOUR …
![Page 43: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/43.jpg)
Naïve Bayes classifiers can use any sort of feature URL, email address, dictionaries, network features
But if we have a feature corresponding to each word in the text, then each class in our naïve Bayes model is a unigram language model.
![Page 44: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/44.jpg)
The probability of assigning each word: P(word | c)
The probability of assigning each sentence: P(s | c) = Π P(word | c)
Class positive
0.1 I
0.1 love
0.01 this
0.05 fun
0.1 film
I love this fun film
0.1 0.1 0.05 0.01 0.1positive
P(s | positive) = 0.000 000 5
![Page 45: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/45.jpg)
The probability of assigning each word: P(word | c)
The probability of assigning each sentence: P(s | c) = Π P(word | c)
Class positive
0.1 I
0.1 love
0.01 this
0.05 fun
0.1 film
I love this fun film
0.1 0.1 0.05 0.01 0.1
0.2 0.001 0.01 0.005 0.1
positive
Class negative
0.2 I
0.001 love
0.01 this
0.005 fun
0.1 film P(s | positive) = 0.000 000 5
negative
P(s | negative) = 0.00 000 000 1
P(s | positive) > P(s | negative)
![Page 46: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/46.jpg)
Oh, to be Bayesian and naïve!
![Page 47: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/47.jpg)
Strengths of naïve Bayes classification:
The model is easy to understand and easy to implement
(compared with other classifiers!)
Training and classification are both fast
Requires modest storage space
Relatively robust to irrelevant features
If we include features – e.g., words – that don’t help us classify, they cancel out without affecting the results
Works well for many tasks
It’s a good, dependable baseline for classification that’s widely used in practice – but it’s not the best!
![Page 48: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/48.jpg)
Weakness of naïve Bayes classification:
The bag-of-words representation ignores the sequential ordering of words
The independence assumption is inappropriate if there are strong conditional dependencies between the variables.
Strengths of naïve Bayes classification:
The model is easy to understand and easy to implement
(compared with other classifiers!)
Training and classification are both fast
Requires modest storage space
Relatively robust to irrelevant features
If we include features – e.g., words – that don’t help us classify, they cancel out without affecting the results
Works well for many tasks
The model may not be “right”, but often we’re interested in the accuracy of the classification, not of the probability estimates.
![Page 49: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/49.jpg)
Evaluation
![Page 50: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/50.jpg)
After choosing the parameters for the classifier – i.e., training it – we test how well it does on a test set of examples that weren’t used for training.
![Page 51: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/51.jpg)
Precision, recall, and F-measure
![Page 52: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/52.jpg)
Jurafsky & Martin asks us to imagine we’re the CEO of Delicious Pie Company and we want to know what people are tweeting about our pies (which are delicious).
We build a classifier to identify which tweets are about Delicious Pie Company.
![Page 53: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/53.jpg)
2×2 confusion matrix
![Page 54: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/54.jpg)
2×2 confusion matrix
Classifier says it’s about us
Classifier says it’s not about us
It’s really about us It’s really NOT about us
![Page 55: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/55.jpg)
2×2 confusion matrix
Classifier says it’s about us
Classifier says it’s not about us
It’s really about us It’s really NOT about us
What percent of the tweets about us did we identify?
What percent of the tweets that we said were about us really were?
What percent of the tweets were identified correctly either way?
![Page 56: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/56.jpg)
Accuracy sounds great – it’s consider how the classifier does on all inputs!
Well, it depends on the base (prior) probabilities: 99.99% accuracy might be terrible.
If we see 1 million tweets and only 100 of them are about Delicious Pie Company, we could just label every tweet “not about us”!
60% accuracy might be pretty good.
If we’re labeling documents with 20 different topics and the largest category only accounts for 10% of the data, that’s a much more difficult problem.
Instead, we measure precision and recall.
![Page 57: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/57.jpg)
Precision is the percent of items the system detected (labeled as positive for a class) that are actually positive:
true positives / (true positives + false positives)
![Page 58: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/58.jpg)
Recall is the percent of items actually present in the input that were correctly identified by the system:
true positives / (true positives + false negatives)
![Page 59: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/59.jpg)
The classifier that says no tweets are about pie would have 99.99% accuracy – but 0% recall!
It doesn’t identify any of the 100 tweets we wanted.
![Page 60: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/60.jpg)
There’s a trade-off between precision and recall. A highly precise classifier will ignore cases where it’s less confident, leading to more false negatives → lower recall
A high-recall classifier will flag things it’s unsure about, leading to more false positives → lower precision
![Page 61: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/61.jpg)
In developing a real application, picking the right trade-off point between precision and recall is an important usability issue.
Think about a grammar checker: Too many false positives will irritate lots of users.
But if you’re designing a system to detect hate speech online, you might want to err on the side of high recall to avoid abuse slipping through the cracks.
![Page 62: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/62.jpg)
Any balance of precision and recall can be encoded as a single measure called an F-score:
The most common F-score is F1, which is the harmonic mean of precision and recall:
Fβ =(β2 + 1)PR
β2P + R
F1 =2PR
P + R
Why do we use the harmonic mean rather than the mean?
![Page 63: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/63.jpg)
Development test sets
We train on a training set and test on a test set.
But sometimes we also want a development test set. This avoids overfitting – “tuning to the test set” – and offers a more conservative estimate of performance.
Training set Development Test Set Test Set
Problem: We want as much data as possible for training and as much as possible for dev. How should we split it?
![Page 64: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/64.jpg)
Cross-validation: multiple splits
We can pool results over splits, compute the pooled dev. performance.
![Page 65: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/65.jpg)
3×3 confusion matrix
![Page 66: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/66.jpg)
How can we combine the precision or recall scores from three (or more) classes to get one metric?
Macroaveraging
Compute the performance for each class and then average over classes
Microaveraging
Collect decisions for all classes into one confusion matrix
Compute precision and recall from that table
![Page 67: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/67.jpg)
Macroaveraging and microaveraging
![Page 68: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/68.jpg)
Assignment 2: Who Said It?
Jane Austen or Herman Melville? I never met with a disposition more truly amiable.
But Queequeg, do you see, was a creature in the transition stage – neither caterpillar nor butterfly.
Oh, my sweet cardinals!
Task: build a Naïve Bayes classifier and explore it
Do three-way partition of data: test data
development-test data
training data
![Page 69: Naïve Bayes Text Classi cation](https://reader030.fdocuments.in/reader030/viewer/2022012702/61a4b59c390800306a4cc810/html5/thumbnails/69.jpg)
Acknowledgments
The lecture incorporates material from: Nae-Rae Han, University of Pittsburgh
Nancy Ide, Vassar College
Daniel Jurafsky, Stanford University
Daniel Jurafasky and James Martin, Speech and Language Processing