ML

Large scale text classification

Using Semi-supervised Multinomial Naïve Bayes

Presenter: QingZhi Chen

OUT LINE1.Introduction2.Text document representation3.Multinomial Naïve Bayes4.Semi-supervised Learning for MNB5.Experiments

INTRODUCTIONMultinomial Naïve Bayes Frequency Estimate * difficulty for collect labeled data , and large unlabeled data become useless

Expectation Maximization maximizing marginal log likelihood

Semi-supervised Frequency Estimate better conditional log likelihood

Type?

bag-of-wordsIgnore the ordering of words in d

Naïve Bayesian Assumption Each word is independent of each other

apple

Red rind

sweetroundedapple

Red rind

sweetrounded

rind apple red rounded sweet

Text Document representation

d={, , ,…. ,c} corresponds to a word in document d and its value is frequency ƒ i of in d c is the class label of d V is set of unique words ω in all d i T is the training set is -th document in T is indicate parameter estimates

Multinomial Naïve Bayesobjective function

parameter

P(c) is prior probability of c class in whole T is number of in T with the label c

is the number of occurrence of in the document

FE parameter learning

is the number of occurrence of in the documentwith the class label cFE objective function

Decompose to CLL + MLL

Semi-supervised Learning for MNB

Basic assumption In where # of unlabeled >> labeled we can use provide more information about modelExpectation Maximization classical semi-supervised method for MNB : assign document to c

* will be the same as FE&SFE’s * update(3)(2) using (6)(7)(1) until parameter are stable* this implementation still use the , and counts the labeled documents with 1 rather than . * deficit of EM ： inferior CLL and too strong assumption

Semi-supervised Frequency Estimate

soft classify word to c

EXPERIMENT

Source Data setPerformance Index ： AUC & AccuracyInfluence on Conditional log likelihoodImpact of Size of Unlabeled DataComputational Cost

*AUC refers to area under curve

Base on ’ MNB classifier

ML

Technology

Transcript of ML