Ranking Similarity between Political Speeches using Naive ...

Post on 28-Feb-2022

8 views 0 download

Transcript of Ranking Similarity between Political Speeches using Naive ...

Ranking Similarity between Political Speeches using Naive Bayes Text Classification

James Ryder and Sen Zhang

Dept. of Mathematics, Computer Science, & Statistics

State University of New York at Oneonta

RyderJ@Oneonta.edu --- ZhangS@Oneonta.edu

Text Classification Training

• Comprised of two subcomponents - Concordancer

- Metaconcordancer

• Al l training is done offline and prior to any cI a ss ifi catio n

• Training generates f inished metaconcordance files that are used during classification

• A set of metaconcordance files is a set of categories containing N f iles

Training - Metaconcordancer

Combine all concordances (C1 - CM) for a single author (Ai) into a single metaconcordance fi le (MTA)

• • _~'I M eta concordance r

• A, MTA

Fil e

Create a metaconcordance file for author Ai' This is a complete description of this author's texts and is used at Classification time.

Inaugural Speech Experiment Design

• Training Phase

- The Inaugural speeches of the recent ten u.s. presidents: Barack Obama, George W. Bush, Bill Clinton, George Bush, Ronald Reagan, Jimmy Carter, Richard Nixon, Lyndon Johnson, John Kennedy, and Dwight Eisenhower.

- For those who served two term s, only his second inaugural speech w as collected.

• Classification Phase

- Obama's Inaugural speech

- Bush's Second Inaugural speech

- Bush's Farewell speech

Top Frequently used Words of the George W. Bush's Farewell Speech

«\it! took cronomy JUStICe """ 01 cause ,hru .. hort 1 ''''''ghlS gave "'''''' ..."

,~~ hu~~.,~neon e~T"'ihtGod~bctler hard • h 'Ie ars ~h .t'ailroad ~;Iung two jomg hI',,,,,,, /xU'll

show rug t men ouse never . attack ~"I ike

AdemocraCy Laura · whose Americans hers Amencan best liround da YSStates merIt;apast COnfide~:N!f~SSOtffm:' opereme~beJ,,,, \; musibock a Ion President (alk may home Id wl1ether l:: d Jus,,,,

r<omf.: war hdpcOountnr ""' liberty(""A son · , keepanlger Every United fi mmd...l ,h", . . ~./ ",o:"nr, menca Ssaf,..tvar£aq d1y

"'m~" re~~9m wrught C1tl~enS tough,:~always'l:' character future 1· r tIme new d .. .."" gIven "1'1" Y young ;n,dl'li= see lIe """1 eClslons eight honor. times suffmng !CCUli1)' ce

terrorists good pro5J'<f1l}' ahead thank pafilh r,ng "tohl}' ~ meet pea id<dogy i"uruJe .~nJorful f~'h.lJoscd';" h~;gher seven (,I/o" nght

""'mcdtcine'~ ~!I' ~~ t<1T1If

TC Authors Training Example

• Set of N authors - We are given a snippet of text said to be written by one of the authors in the set of authors (categories)

• This TC system should attempt to predict which author is most likely to have written the snippet of text

• In the training phase, we need to obtain samples of the writing for each author in the category (author) set

Prepare for Classification

• Col lect all MTA (category) fi les into one folder.

• Edit the category list f ile by inserting the names of all category files to compare against.

• To be ready to classify some unknown snippet of text, one needs - All category files prepared (MTA)

- The category list file

CategoriesA = {MTAA" MTAA2, ... , MTAAN } is the set of all category files

Prelimin ary Experi ment Results {1.1}: Ranking via Comparing

Their Speech es with Obam a's Inaugural Speech

(iI ~. ~

~ .~ . {o ~ ~ <) ~ ~ ~

/ ~ '?, oJ G ~. ~ ~ ~ ~ ~ ~ ~

"% ~ t. ~ a ? ~ ? "" ~ ?

-< -< ~

~

'" t1> r-I a

em :;: ~ t1> t1> ~

~ ~

~ ;;: ;;: Ql Ql ~

~ Obama's Inaugural Speech " " ~ ~

Top Frequently used Words of the Barack Obama's Inaugural Speech

Joint and Conditional Probability Using variables X andY. P(X, Y) means th e probability that X will take on a spec ific value x and Y will take on a speci fic value y. Th at is. X will occur and Y will occur too .

P(X. Y) = P(Y. X). This idea is known as Joint Probability.

P(Y = Y I X = x) is read the "probability d, at Y will take on the specificva lue y GIV EN THAT X has already taken on the speci fic valu e x". Conditional Probability P(Y I X) .

P(X. Y) = P(Y. X) and P(Y, X) = P(X. Y) P(X. Y) = P(X) P(Y I X) P(y. X) = pry ) P(X I Y)

Th e formula above is read 1he probability tha t X occurs and Y occurs is the same as th e probability th at X occurs and given th at X has occurred, Y occurs" andvice versa on th e right side.

P(X. Y) = P(Y. X) P(X) P(Y I X) = P(Y) P(X I Y)

P(Y I X) = P(Y) P(X I Y) P(X)

Th e formula above is readM(th e chan ce of Y given th at X occurred) is «th e chan ce of Y occurrin g) times (th e chance ol X occurring given thatY occurred)) divided by (th e chance 01 X occurring)

This is the standard Bayes Theorem

TC Author Example Training

• Find many works of literature that each author has written

• For each author, create a single concordance for each of this author's M texts (T)

A = {A" A2, ... ,AN} The set of all N authors TAi = {T" T2, ... , T M} The set of all texts for author Ai

C,

-~II Con corda ncer 11--.. , : CM

Create a concordance file for each of author's texts (TAi), C, through CM'

Mapping Classic Bayes Theorem into Na·ive Bayes Text Classification System

We will use a modi fied (naN e) version 01 Bayes Th eorem to create an ordered list 01 categories l or which a given in puttext may belong. Th e ordering is based upon th e relative like lihoodthatthe text is similar to a category in stance for all categories in th e ca tegory set.

a P(Y I X)

b c = PlY) PIX I Y)

P(X)

d

a) X is th e in puttext ($ource) th at we attemptto classify . Y is a single instance of a category from among a set ol categories bein gconsidered. a is read 'the probability th altext X belongs to category Y". b) is th e probability thatthis is instance ' j' from the category set. If th e category set contain s 1 a categories then P(Y) is 0.10.

c ) is th e probability of all words in th e in putt ext bein g foun d in category in stance YI from th e set o f ca tegories .

d) is th e probability that inpultext X is in laclthe inputtext X. Clearly, this is P(1) and there lore will be discarded in the fin al formula without affectin g relative scoresbehYeen the categories .

Political Spectrum Experiment Design

• Training Phase - Ten prom inent world-wide polit ica l f igures

• George W. Bush, Winston Churchill, Bill Clinton, Adolf Hilter, John Kennedy, Tse-tung Mao, Karl Marx, Barack Obama, Joseph Sta lin, Margaret Thatcher.

- For each of them, we randomly select five speeches or written works. By random, here we mean, we just collected these speeches fi-om the Internet without prior knowledge about them and without reading them.

• Classi f icat ion Phase - Obama's Inaugura l speech - Bush's Second In augural speech - Bu sh's Farewe ll speech

References and Acknowledgments

• Beautiful Word Clouds: http://www.wordle.net

• Inaugu ral Speeches of Presidents of the Un ited States: http://www.ba rtleby.com/124

• Thanksto Dr. Wil liam R. Wilkerson for his help in directing us to online political speech repositories.

• Thanks to the TLTC for printing out the Poster.

Na·lve Bayes Text Classifier

• Our text classification (TC) system is broken into two ma in components - Tra ining

- Classification

• Train ing must be done first

• We need to map a standard Bayes Theorem into a formula for quantifying the likel ihood that a given text (X) fa lls into a certain category instance (Y).

Training - Concordancer

• For each text (Tj ) in TAj' the concordancer

- counts the number of occurrences of each unique

word in the text (frequency)

- counts the total number of words

- calculates the relative frequency of each unique

wo rd in the text (frequency / total_words)

- creates an output file concordance (C) containing

the above information and the list of unique

words

c) For a giv en category Yi what is the probability that the words in X appear in Yi ?

X = {W, .W2,""W, } Th e set of all words in the snippet of text

P(X I Yi ) = P«w, . w, .... w,) I Vi)

, P(X I Yi) = TT P(WJ I Yi ) The probability of wJ is th e relativ e Irequency of th e word

J"' contained in th e metaconcordance for category Y,

II wJ from X is not present in the Yi th en we use a very sm all number for the probability because the probability of a word not found is zero. M ultip lying by zero destroys the prod uct.

This product wi ll result in an extremely small number that may be small er than a computer ca n properly rep resent precisely. So, we use a trick . Instead we add th e logarithms.

Trick: log (A • B) = log (A) + log (B)

d

10g(P(X I Vi)) = r 10g(P('-"'i I Yi)) j= 1

Prelim inary Experim ent Results {2.2}: Ranki ng via Comparing

Th eir speech es with G. W. Bush's Second In augural Speec h

(iI

~ {o 1 <:; ~ <) ~ '? '?, ~ ~.

<::.. ~ % 'f,

~ 0j % S: ~. '0 ?

~ ~ ~ oJ ~

~ ~ "1- a '?

-< -< ~

~ t1> <1> r-I a

em :;: ~ "' "' ~

~ ~

~ ;;: ;;: '" Ql ~

~ G.W. Bush's Second Inaugural Speech " " ~ ~

Future Work

• To improve ranking accuracy, we plan to - use variants of Na'ive Bayes and address the poor

independent assumption;

- explore more linguist ic, rhetorical and stylistical features such as metaphors, analogies, similes, opposition, alliteration, antithesis and parallelism etc.;

- select more representative training datasets;

- conduct more intensive experiments .