Social Media & Text...

89
Wei Xu socialmedia-class.org Social Media & Text Analysis lecture 5 - Paraphrase Identification and Logistic Regression CSE 5539-0010 Ohio State University Instructor: Wei Xu Website: socialmedia-class.org

Transcript of Social Media & Text...

Page 1: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Social Media & Text Analysis lecture 5 - Paraphrase Identification and Logistic Regression

CSE 5539-0010 Ohio State UniversityInstructor: Wei Xu

Website: socialmedia-class.org

Page 2: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

In-class Presentation

• pick your topic and sign up

• a 10 minute presentation (20 points)

- A Social Media Platform

- Or a NLP Researcher

Page 3: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

socialmedia-class.org

Reading #6 & Quiz #3

Page 4: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Mini Research Proposal• propose/explore NLP problems in GitHub dataset

https://github.com

Page 5: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Mini Research Proposal• propose/explore NLP problems in GitHub dataset

• GitHub: - a social network for programmers (sharing,

collaboration, bug tracking, etc.) - hosting Git repositories (a version control system

that tracks changes to files, usually source code) - containing potentially interesting text fields in

natural language (comments, issues, etc.)

Page 6: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

(Recap)

what is Paraphrase?“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)

Page 7: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

(Recap)

what is Paraphrase?

wealthy richword

“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)

Page 8: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

(Recap)

what is Paraphrase?

wealthy richword

the king’s speech His Majesty’s addressphrase

“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)

Page 9: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

(Recap)

what is Paraphrase?

wealthy richword

the king’s speech His Majesty’s addressphrase

… the forced resignation of the CEO of Boeing,

Harry Stonecipher, for …sentence

… after Boeing Co. Chief Executive Harry Stonecipher was ousted from …

“sentences or phrases that convey approximately the same meaning using different words” — (Bhagat & Hovy, 2012)

Page 10: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

The Ideal

Page 11: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

(Recap)

Paraphrase Research'01

Novels'05 '13Bi-Text

'04News

'13* '14* Twitter

'11Video

XuRitter

Callison-Burch Dolan

Ji

XuRitter Dolan

Grishman Cherry

'12*Style

‘01Web

XuCallison-Burch

Napoles

'15*'16*Simple

* my researchWei Xu. “Data-driven Approaches for Paraphrasing Across Language Variations” PhD Thesis (2014)

80sWordNet

Page 12: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

DIRT (Discovery of Inference Rules from Text)

modified by adjectives objects of verbsadditional, administrative,

assigned, assumed, collective, congressional,

constitutional ...

assert, assign, assume, attend to, avoid, become,

breach ...

Decking Lin and Patrick Pantel. “DIRT - Discovery of Inference Rules from Text” In KDD (2001)

Lin and Panel (2001) operationalize the Distributional Hypothesis using dependency relationships to define

similar environments.

Duty and responsibility share a similar set of dependency contexts in large volumes of text:

Page 13: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Bilingual Pivoting

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

Source: Chris Callison-Burch

word alignment

Page 14: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Bilingual Pivoting

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

Source: Chris Callison-Burch

word alignment

Page 15: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Bilingual Pivoting

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

Source: Chris Callison-Burch

word alignment

Page 16: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Bilingual Pivoting

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

Source: Chris Callison-Burch

word alignment

Page 17: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Bilingual Pivoting

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

Source: Chris Callison-Burch

word alignment

Page 18: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Quiz #2

Key Limitations of PPDB?

Page 19: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Quiz #2

bug

insect, beetle,pest, mosquito,

fly

squealer, snitch, rat, mole

microphone, tracker, mic,

wire, earpiece, cookie

glitch, error, malfunction, fault, failure

bother, annoy, pester

microbe, virus, bacterium,

germ, parasite

Source: Chris Callison-Burch

word sense

Page 20: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Another Key Limitation'01

Novels'05 '13Bi-Text

'04News

'13* '14*Twitter

'11Video

'12*Style

‘01Web

'15*'16*Simple

* my researchWei Xu. “Data-driven Approaches for Paraphrasing Across Language Variations” PhD Thesis (2014)

80sWordNet

only paraphrases, no non-paraphrases

Page 21: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Paraphrase Identificationobtain sentential paraphrases automatically

Mancini has been sacked by Manchester City

Mancini gets the boot from Man City Yes!%

WORLD OF JENKS IS ON AT 11

World of Jenks is my favorite show on tvNo!$

Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. “Extracting Lexically Divergent Paraphrases from Twitter” In TACL (2014)

(meaningful) non-paraphrases are needed to train classifiers!

Page 22: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Also Non-Paraphrases'01

Novels'05 '13Bi-Text

'04News

'13* '14*Twitter

'11Video

'12*Style

‘01Web

'15*'16*Simple

* my researchWei Xu. “Data-driven Approaches for Paraphrasing Across Language Variations” PhD Thesis (2014)

80sWordNet

(meaningful) non-paraphrases are needed to train classifiers!

Page 23: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

News Paraphrase Corpus

Microsoft Research Paraphrase Corpus

(Dolan, Quirk and Brockett, 2004; Dolan and Brockett, 2005; Brockett and Dolan, 2005)

also contains some non-paraphrases

Page 24: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Twitter Paraphrase Corpus

Wei Xu, Alan Ritter, Chris Callison-Burch, Bill Dolan, Yangfeng Ji. “Extracting Lexically Divergent Paraphrases from Twitter” In TACL (2014)

also contains a lot of non-paraphrases

Page 25: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Paraphrase Identification:

A Binary Classification Problem• Input:

- a sentence pair x- a fixed set of binary classes Y = {0, 1}

• Output: - a predicted class y ∈ Y (y = 0 or y = 1)

Page 26: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Paraphrase Identification:

A Binary Classification Problem• Input:

- a sentence pair x- a fixed set of binary classes Y = {0, 1}

• Output: - a predicted class y ∈ Y (y = 0 or y = 1)

negative (non-paraphrases)

Page 27: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Paraphrase Identification:

A Binary Classification Problem• Input:

- a sentence pair x- a fixed set of binary classes Y = {0, 1}

• Output: - a predicted class y ∈ Y (y = 0 or y = 1)

negative (non-paraphrases)

positive (paraphrases)

Page 28: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Paraphrase Identification:

A Binary Classification Problem• Input:

- a sentence pair x- a fixed set of binary classes Y = {0, 1}

• Output: - a predicted class y ∈ Y (y = 0 or y = 1)

Page 29: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Classification Method:

Supervised Machine Learning• Input:

- a sentence pair x- a fixed set of binary classes Y = {0, 1}- a training set of m hand-labeled sentence pairs

(x(1), y(1)), … , (x(m), y(m))

• Output:

- a learned classifier 𝜸: x → y ∈ Y (y = 0 or y = 1)

Page 30: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Classification Method:

Supervised Machine Learning• Input:

- a sentence pair x (represented by features)- a fixed set of binary classes Y = {0, 1}- a training set of m hand-labeled sentence pairs

(x(1), y(1)), … , (x(m), y(m))

• Output:

- a learned classifier 𝜸: x → y ∈ Y (y = 0 or y = 1)

Page 31: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

(Recap Week #3) Classification Method: Supervised Machine Learning

• Naïve Bayes• Logistic Regression • Support Vector Machines (SVM) • …

Page 32: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

(Recap Week#3)

Naïve Bayes• Cons:features ti are assumed independent given the class y

• This will cause problems:- correlated features ➞ double-counted evidence - while parameters are estimated independently - hurt classier’s accuracy

P(t1,t2,...,tn | y) = P(t1 | y) ⋅P(t2 | y) ⋅...⋅P(tn | y)

Page 33: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Classification Method: Supervised Machine Learning

• Naïve Bayes • Logistic Regression • Support Vector Machines (SVM) • …

Page 34: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Logistic Regression• One of the most useful supervised machine

learning algorithm for classification!

• Generally high performance for a lot of problems.

• Much more robust than Naïve Bayes (better performance on various datasets).

Page 35: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Let’s start with something simpler!

Before Logistic Regression

Page 36: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Paraphrase Identification:

Simplified Features

• We use only one feature:

- number of words that two sentence shared in common

Page 37: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

A very related problem of Paraphrase Identification:

Semantic Textual Similarity• How similar (close in meaning) two sentences are?

5: completely equivalent in meaning

4: mostly equivalent, but some unimportant details differ

3: roughly equivalent, some important information differs/missing

2: not equivalent, but share some details

1: not equivalent, but are on the same topic

0: completely dissimilar

Page 38: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

A Simpler Model:

Linear RegressionSe

nten

ce S

imila

rity

(rat

ed b

y H

uman

)

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

Page 39: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

A Simpler Model:

Linear Regression

• also supervised learning (learn from annotated data)

• but for Regression: predict real-valued output (Classification: predict discrete-valued output)

Sent

ence

Sim

ilarit

y

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

Page 40: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

A Simpler Model:

Linear Regression

• also supervised learning (learn from annotated data)

• but for Regression: predict real-valued output (Classification: predict discrete-valued output)

Sent

ence

Sim

ilarit

y

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

threshold ➞ Classification

Page 41: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

A Simpler Model:

Linear Regression

• also supervised learning (learn from annotated data)

• but for Regression: predict real-valued output (Classification: predict discrete-valued output)

Sent

ence

Sim

ilarit

y

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

threshold ➞ Classification

paraphrase{

Page 42: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

A Simpler Model:

Linear Regression

• also supervised learning (learn from annotated data)

• but for Regression: predict real-valued output (Classification: predict discrete-valued output)

Sent

ence

Sim

ilarit

y

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

threshold ➞ Classification

paraphrase

non-paraphrase

{{

Page 43: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Training Set

- m hand-labeled sentence pairs (x(1), y(1)), … , (x(m), y(m)) - x’s: “input” variable / features - y’s: “output”/“target” variable

#words in common (x)

Sentence Similarity (y)

1 0

4 1

13 4

18 5

… …

Page 44: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org Source: NLTK Book

(Recap Week#3)

Supervised Machine Learningtraining set

Page 45: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Supervised Machine Learningtraining set

(also called) hypothesis

Page 46: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Supervised Machine Learningtraining set

(also called) hypothesis

#words in commonSentence Similarity

Page 47: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Supervised Machine Learningtraining set

(also called) hypothesis

#words in commonSentence Similarity

x(estimated)

y

Page 48: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hypothesis (h)

x(estimated)

y

training set

• How to represent h ?

hθ (x) = θ0 +θ1x

Linear Regression w/ one variable

Linear Regression:

Model Representation

Page 49: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

- m hand-labeled sentence pairs (x(1), y(1)), … , (x(m), y(m)) 𝞱’s: parameters

#words in common (x)

Sentence Similarity (y)

1 0

4 1

13 4

18 5

… …

hθ (x) = θ0 +θ1x

Linear Regression w/ one variable:

Model Representation

Source: many following slides are adapted from Andrew Ng

Page 50: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

- m hand-labeled sentence pairs (x(1), y(1)), … , (x(m), y(m))

- 𝞱’s: parameters

#words in common (x)

Sentence Similarity (y)

1 0

4 1

13 4

18 5

… …

hθ (x) = θ0 +θ1x

How to choose 𝞱?

Linear Regression w/ one variable:

Model Representation

Page 51: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org Andrew'Ng'

0'

1'

2'

3'

0' 1' 2' 3'0'

1'

2'

3'

0' 1' 2' 3'0'

1'

2'

3'

0' 1' 2' 3'

Linear Regression w/ one variable::

Model Representation

Page 52: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Sent

ence

Sim

ilarit

y (r

ated

by

Hum

an)

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

• Idea: choose 𝞱0, 𝞱1 so that h𝞱(x) is close to y for training examples (x, y)

Linear Regression w/ one variable:

Cost Function

Page 53: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

• Idea: choose 𝞱0, 𝞱1 so that h𝞱(x) is close to y for training examples (x, y)

Sent

ence

Sim

ilarit

y (r

ated

by

Hum

an)

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

Linear Regression w/ one variable:

Cost Function

Page 54: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

• Idea: choose 𝞱0, 𝞱1 so that h𝞱(x) is close to y for training examples (x, y)

Sent

ence

Sim

ilarit

y (r

ated

by

Hum

an)

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

J(θ0,θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

minimizeθ0 , θ1

J(θ0,θ1)

squared error function:

Linear Regression w/ one variable:

Cost Function

Page 55: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Linear Regression• Hypothesis:

• Parameters:

• Cost Function:

• Goal:

hθ (x) = θ0 +θ1x

minimizeθ0 , θ1

J(θ0,θ1)

Andrew'Ng'

Hypothesis:'

Parameters:'

Cost'Func6on:'

Goal:'

Simplified'

𝞱0, 𝞱1

J(θ0,θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

Page 56: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Andrew'Ng'

Hypothesis:'

Parameters:'

Cost'Func6on:'

Goal:'

Simplified'

Linear Regression• Hypothesis:

• Parameters:

• Cost Function:

• Goal:

hθ (x) = θ0 +θ1x

minimizeθ0 , θ1

J(θ0,θ1)

Andrew'Ng'

Hypothesis:'

Parameters:'

Cost'Func6on:'

Goal:'

Simplified'

𝞱0, 𝞱1

hθ (x) = θ1x

J(θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

∑minimize

θ1J(θ1)

𝞱1

Simplified

J(θ0,θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

Page 57: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

0

1

2

3

0 1 2 3

hθ (x)

y1

2

3

0 0.5 1 1.5 2 2.5-0.5 0

(for fixed 𝞱1, this is a function of x )

𝞱1

J(𝞱1)

x

J(θ1)(function of the parameter 𝞱1 )

J(1) = 12 × 3

[(1−1)2 + (2 − 2)2 + (3− 3)2 ]= 0

𝞱1=1

Page 58: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

0

1

2

3

0 1 2 3

hθ (x)

y1

2

3

0 0.5 1 1.5 2 2.5-0.5 0

(for fixed 𝞱1, this is a function of x )

𝞱1

J(𝞱1)

x

J(θ1)(function of the parameter 𝞱1 )

J(1) = 12 × 3

[(0.5 −1)2 + (1− 2)2 + (1.5 − 3)2 ]= 0.68

𝞱1=0.5

Page 59: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

0

1

2

3

0 1 2 3

hθ (x)

y1

2

3

0 0.5 1 1.5 2 2.5-0.5 0

(for fixed 𝞱1, this is a function of x )

𝞱1

J(𝞱1)

x

J(θ1)(function of the parameter 𝞱1 )

minimizeθ1

J(θ1)

Page 60: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x ) (function of the parameter 𝞱0, 𝞱1 )

𝞱1 𝞱0

J(𝞱1, 𝞱2)

J(θ0,θ1)

Page 61: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Andrew'Ng'

Hypothesis:'

Parameters:'

Cost'Func6on:'

Goal:'

Simplified'

Linear Regression• Hypothesis:

• Parameters:

• Cost Function:

• Goal:

hθ (x) = θ0 +θ1x

minimizeθ0 , θ1

J(θ0,θ1)

Andrew'Ng'

Hypothesis:'

Parameters:'

Cost'Func6on:'

Goal:'

Simplified'

𝞱0, 𝞱1

hθ (x) = θ1x

J(θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

∑minimize

θ1J(θ1)

𝞱1

Simplified

J(θ0,θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

Page 62: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'contour plot

Page 63: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 64: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 65: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Parameter Learning• Have some function

• Want

• Outline:

- Start with some 𝞱0, 𝞱1

- Keep changing 𝞱0, 𝞱1 to reduce J(𝞱1, 𝞱2) until we hopefully end up at a minimum

J(θ0,θ1)minθ0 , θ1

J(θ0,θ1)

Page 66: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

1

2

3

0 0.5 1 1.5 2 2.5-0.5 0𝞱1

J(𝞱1)

minimizeθ1

J(θ1)

Gradient Descent

Page 67: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

1

2

3

0 0.5 1 1.5 2 2.5-0.5 0𝞱1

J(𝞱1)

minimizeθ1

J(θ1)

θ1 := θ1 −α∂∂θ1

J(θ1)

learning rate

Gradient Descent

Page 68: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Andrew'Ng'

θ1"θ0"

J(θ0,θ1)"

Gradient Descent

minimizeθ0 ,θ1

J(θ0,θ1)

Page 69: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Andrew'Ng'

θ0"

θ1"

J(θ0,θ1)"

Gradient Descent

minimizeθ0 ,θ1

J(θ0,θ1)

Page 70: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

repeat until convergence {

}

Gradient Descent

θ j := θ j −α∂∂θ j

J(θ0,θ1) (simultaneous update for j=0 and j=1)

learning rate

Page 71: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Linear Regression w/ one variable:

Gradient Descent

∂∂θ0

J(θ0,θ1) = ?∂∂θ1

J(θ0,θ1) = ?

hθ (x) = θ0 +θ1x

J(θ0,θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

∑Cost Function

Page 72: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

repeat until convergence {

}

θ0 := θ0 −α1m

(hθ (x(i ) )− y(i ) )

i=1

m

∑ simultaneous update 𝞱0, 𝞱1

Linear Regression w/ one variable:

Gradient Descent

θ1 := θ1 −α1m

(hθ (x(i ) )− y(i ) )

i=1

m

∑ ⋅ x(i )

Page 73: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

𝞱1 𝞱0

J(𝞱1, 𝞱2)

Linear Regressioncost function is convex

Page 74: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 75: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Page 76: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Page 77: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 78: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 79: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 80: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 81: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 82: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

hθ (x)(for fixed 𝞱0, 𝞱1, this is a function of x )

J(θ0,θ1)(function of the parameter 𝞱0, 𝞱1 )

Andrew'Ng'

(for'fixed''''''''''','this'is'a'func6on'of'x)' (func6on'of'the'parameters'''''''''''')'

Page 83: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

Batch Update

• Each step of gradient descent uses all the training examples

Page 84: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

(Recap)

Linear Regression

• also supervised learning (learn from annotated data)

• but for Regression: predict real-valued output (Classification: predict discrete-valued output)

Sent

ence

Sim

ilarit

y

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

threshold ➞ Classification

Page 85: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

(Recap)

Linear Regression

• also supervised learning (learn from annotated data)

• but for Regression: predict real-valued output (Classification: predict discrete-valued output)

Sent

ence

Sim

ilarit

y

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

threshold ➞ Classification

paraphrase{

Page 86: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

(Recap)

Linear Regression

• also supervised learning (learn from annotated data)

• but for Regression: predict real-valued output (Classification: predict discrete-valued output)

Sent

ence

Sim

ilarit

y

0

1

2

3

4

5

#words in common (feature)0 5 10 15 20

threshold ➞ Classification

paraphrase

non-paraphrase

{{

Page 87: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

𝞱1 𝞱0

J(𝞱1, 𝞱2)

(Recap)

Linear Regression• Hypothesis:

• Parameters:

• Cost Function:

• Goal:

hθ (x) = θ0 +θ1x

minimizeθ0 , θ1

J(θ0,θ1)

𝞱0, 𝞱1

J(θ0,θ1) =12m

(hθ (x(i ) )− y(i ) )2

i=1

m

Page 88: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

repeat until convergence {

}

θ j := θ j −α∂∂θ j

J(θ0,θ1) (simultaneous update for j=0 and j=1)

learning rate

(Recap)

Gradient Descent

Page 89: Social Media & Text Analysissocialmedia-class.org/slides/lecture5_paraphrase_identification_a.pdf · Social Media & Text Analysis lecture 5 ... Mini Research Proposal ... Key Limitations

Wei Xu ◦ socialmedia-class.org

socialmedia-class.org

Next Class: - Logistic Regression (cont’)