Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language...

43
Natural Language Processing Introduction to Probability Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Natural Language Processing 1(11)

Transcript of Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language...

Page 1: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Natural Language Processing

Introduction to Probability

Joakim Nivre

Uppsala University

Department of Linguistics and Philology

[email protected]

Natural Language Processing 1(11)

Page 2: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Probability and Statistics

“Once upon a time, there was a . . . ”

I Can you guess the next word?I Hard in general, because language is not deterministicI But some words are more likely than others

I We can model uncertainty using probability theoryI We can use statistics to ground our models in empirical data

Natural Language Processing 2(11)

Page 3: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Probability and Statistics

“Once upon a time, there was a . . . ”

I Can you guess the next word?I Hard in general, because language is not deterministicI But some words are more likely than others

I We can model uncertainty using probability theoryI We can use statistics to ground our models in empirical data

Natural Language Processing 2(11)

Page 4: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Probability and Statistics

“Once upon a time, there was a . . . ”

I Can you guess the next word?I Hard in general, because language is not deterministicI But some words are more likely than others

I We can model uncertainty using probability theoryI We can use statistics to ground our models in empirical data

Natural Language Processing 2(11)

Page 5: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

The Mathematical Notion of Probability

I The probability of A, P(A), is a real number between 0 and 1:1. If P(A) = 0, then A is impossible (never happens)

2. If P(A) = 1, then A is necessary (always happens)

3. If 0 < P(A) < 1, then A is possible (may happen)

IA is an event in a sample space ⌦

ISample space = all possible outcomes of an “experiment”

IEvent = a subset of the sample space

IEvents can be described as a variable taking a certain value

1. {w 2 ⌦ |w is a noun} , PoS = noun

2. {s 2 ⌦ | s consists of 8 words} , #Words = 8

Natural Language Processing 3(11)

Page 6: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

The Mathematical Notion of Probability

I The probability of A, P(A), is a real number between 0 and 1:1. If P(A) = 0, then A is impossible (never happens)

2. If P(A) = 1, then A is necessary (always happens)

3. If 0 < P(A) < 1, then A is possible (may happen)

IA is an event in a sample space ⌦

ISample space = all possible outcomes of an “experiment”

IEvent = a subset of the sample space

IEvents can be described as a variable taking a certain value

1. {w 2 ⌦ |w is a noun} , PoS = noun

2. {s 2 ⌦ | s consists of 8 words} , #Words = 8

Natural Language Processing 3(11)

Page 7: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Logical Operations on Events

I Often we are interested in combinations of two or more eventsI This can be represented using set theoretic operationsI Assume a sample space ⌦ and two events A and B :

1. Complement A (also A

0) = all elements of ⌦ that are not in A

2. Subset A ✓ B = all elements of A are also elements of B

3. Union A [ B = all elements of ⌦ that are in A or B

4. Intersection A \ B = all elements of ⌦ that are in A and B

Natural Language Processing 4(11)

Page 8: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Venn Diagrams

Natural Language Processing 5(11)

Page 9: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Axioms of Probability

IP(A) = The probability of event A

I Axioms:1. P(A) � 0

2. P(⌦) = 1

3. If A and B are disjoint, then P(A [ B) = P(A) + P(B)

Natural Language Processing 6(11)

Page 10: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Probability of an Event

I If A is an event and {x1, . . . , xn} its individual outcomes, then

P(A) =nX

i=1

P(xi )

I Assume all 3-letter strings are equally probableI What is the probability of a string of three vowels?

1. There are 26 letters, of which 6 are vowels

2. There are N = 26

33-letter strings

3. There are n = 6

33-vowel strings

4. Each outcome (string) is equally with P(xi ) =1N

5. So, a string of three vowels has probability

P(A) = nN = 63

263 ⇡ 0.012

Natural Language Processing 7(11)

Page 11: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Probability of an Event

I If A is an event and {x1, . . . , xn} its individual outcomes, then

P(A) =nX

i=1

P(xi )

I Assume all 3-letter strings are equally probableI What is the probability of a string of three vowels?

1. There are 26 letters, of which 6 are vowels

2. There are N = 26

33-letter strings

3. There are n = 6

33-vowel strings

4. Each outcome (string) is equally with P(xi ) =1N

5. So, a string of three vowels has probability

P(A) = nN = 63

263 ⇡ 0.012

Natural Language Processing 7(11)

Page 12: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Rules of Probability

I Theorems:1. If A and A are complementary events, then P(A) = 1 � P(A)2. P(;) = 0 for any sample space ⌦3. If A ✓ B , then P(A) P(B)4. For any event A, 0 P(A) 1

Natural Language Processing 8(11)

Page 13: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Addition Rule

I Axiom 3 allows us to add probabilities of disjoint eventsI What about events that are not disjoint?

I Theorem: If A and B are events in ⌦, thenP(A [ B) = P(A) + P(B)� P(A \ B)

IA = “has glasses”, B = “is blond”

IP(A) + P(B) counts blondes with glasses twice

Natural Language Processing 9(11)

Page 14: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Addition Rule

I Axiom 3 allows us to add probabilities of disjoint eventsI What about events that are not disjoint?

I Theorem: If A and B are events in ⌦, thenP(A [ B) = P(A) + P(B)� P(A \ B)

IA = “has glasses”, B = “is blond”

IP(A) + P(B) counts blondes with glasses twice

Natural Language Processing 9(11)

Page 15: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Quiz 1

I Assume that the probability of winning in a lottery is 0.01I What is the probability of not winning?

1. 0.01

2. 0.99

3. Impossible to tell

Natural Language Processing 10(11)

Page 16: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Quiz 2

I Assume that A and B are events in a sample space ⌦I Which of the following could possibly hold:

1. P(A [ B) < P(A \ B)2. P(A [ B) = P(A \ B)3. P(A [ B) > P(A \ B)

Natural Language Processing 11(11)

Page 17: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Natural Language Processing

Joint, Conditional and Marginal Probability

Joakim Nivre

Uppsala University

Department of Linguistics and Philology

[email protected]

Natural Language Processing 1(11)

Page 18: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Conditional Probability

I Given events A and B in ⌦, with P(B) > 0, the conditionalprobability of A given B is:

P(A|B) =DEF

P(A \ B)

P(B)

I P(A \ B) or P(A,B) is the joint probability of A and B .

IThe prob that a person is rich and famous – joint

IThe prob that a person is rich if they are famous – conditional

IThe prob that a person is famous if they are rich – conditional

Natural Language Processing 2(11)

Page 19: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Conditional Probability

P(A) = size of A relative to ⌦

P(A,B) = size of A \ B relative to ⌦

P(A|B) = size of A \ B relative to B

Natural Language Processing 3(11)

Page 20: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Example

I We sample word bigrams (pairs) from a large text TI Sample space and events:

I ⌦ = {(w1

,w2

) 2 T} = the set of bigrams in TI A = {(w

1

,w2

) 2 T |w1

= run} = bigrams starting with run

I B = {(w1

,w2

) 2 T |w2

= amok} = bigrams ending with amok

I Probabilities:I P(run

1

) = P(A) = 10

�3

I P(amok

2

) = P(B) = 10

�6

I P(run

1

, amok

2

) = (A,B) = 10

�7

I Probability of amok following run? Of run preceding amok?

I P(run before amok) = P(A|B) = 10

�7

10

�6 = 0.1

I P(amok after run) = P(B|A) = 10

�7

10

�3 = 0.0001

Natural Language Processing 4(11)

Page 21: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Example

I We sample word bigrams (pairs) from a large text TI Sample space and events:

I ⌦ = {(w1

,w2

) 2 T} = the set of bigrams in TI A = {(w

1

,w2

) 2 T |w1

= run} = bigrams starting with run

I B = {(w1

,w2

) 2 T |w2

= amok} = bigrams ending with amok

I Probabilities:I P(run

1

) = P(A) = 10

�3

I P(amok

2

) = P(B) = 10

�6

I P(run

1

, amok

2

) = (A,B) = 10

�7

I Probability of amok following run? Of run preceding amok?I P(run before amok) = P(A|B) = 10

�7

10

�6 = 0.1

I P(amok after run) = P(B|A) = 10

�7

10

�3 = 0.0001

Natural Language Processing 4(11)

Page 22: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Multiplication Rule for Joint Probability

I Given events A and B in ⌦, with P(B) > 0:

P(A,B) = P(B)P(A|B)

I Since A \ B = B \ A, we also have:

P(A,B) = P(A)P(B |A)

I The multiplication rule is also known as the chain rule

Natural Language Processing 5(11)

Page 23: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Quiz 1

I The probability of winning the Nobel Prize if you have a PhDin Physics is 1 in a million [P(A|B) = 0.000001]

I Only 1 in 10,000 people have a PhD in Physics [P(B) = 0.0001]

I What is the probability of a person both having a PhD inPhysics and winning the Nobel Prize? [P(A,B) = ?]

1. Smaller than 1 in a million

2. Greater than 1 in a million

3. Impossible to tell

Natural Language Processing 6(11)

Page 24: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Marginal Probability

I Marginalization, or the law of total probabilityI If events B1, . . . ,Bk constitute a partition of the sample space

⌦ (and P(Bi ) > 0 for all i), then for any event A in ⌦:

P(A) =kX

i=1

P(A,Bi ) =kX

i=1

P(A|Bi )P(Bi )

I Partition = pairwise disjoint and B1 [ · · · [ Bk = ⌦

Natural Language Processing 7(11)

Page 25: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Joint, Marginal and Conditional

I Joint probabilities for rain and wind:

no wind some wind strong wind storm

no rain 0.1 0.2 0.05 0.01

light rain 0.05 0.1 0.15 0.04

heavy rain 0.05 0.1 0.1 0.05

I Marginalize to get simple probabilities:I P(no wind) = 0.1 + 0.05 + 0.05 = 0.2I P(light rain) = 0.05 + 0.1 + 0.15 + 0.04 = 0.34

I Combine to get conditional probabilities:I P(no wind|light rain) = 0.05

0.34 = 0.147

I P(light rain|no wind) = 0.050.2 = 0.25

Natural Language Processing 8(11)

Page 26: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Joint, Marginal and Conditional

I Joint probabilities for rain and wind:

no wind some wind strong wind storm

no rain 0.1 0.2 0.05 0.01

light rain 0.05 0.1 0.15 0.04

heavy rain 0.05 0.1 0.1 0.05

I Marginalize to get simple probabilities:I P(no wind) = 0.1 + 0.05 + 0.05 = 0.2I P(light rain) = 0.05 + 0.1 + 0.15 + 0.04 = 0.34

I Combine to get conditional probabilities:I P(no wind|light rain) = 0.05

0.34 = 0.147

I P(light rain|no wind) = 0.050.2 = 0.25

Natural Language Processing 8(11)

Page 27: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Joint, Marginal and Conditional

I Joint probabilities for rain and wind:

no wind some wind strong wind storm

no rain 0.1 0.2 0.05 0.01

light rain 0.05 0.1 0.15 0.04

heavy rain 0.05 0.1 0.1 0.05

I Marginalize to get simple probabilities:I P(no wind) = 0.1 + 0.05 + 0.05 = 0.2I P(light rain) = 0.05 + 0.1 + 0.15 + 0.04 = 0.34

I Combine to get conditional probabilities:I P(no wind|light rain) = 0.05

0.34 = 0.147

I P(light rain|no wind) = 0.050.2 = 0.25

Natural Language Processing 8(11)

Page 28: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Bayes Law

I Given events A and B in sample space ⌦:

P(A|B) = P(A)P(B |A)P(B)

IFollows from definition using chain rule

IAllows us to “invert” conditional probabilities

I Denominator can be computed using marginalization:

P(B) =kX

i=1

P(B ,Ai ) =kX

i=1

P(B |Ai )P(Ai )

ISpecial case of partition: P(A), P(A)

Natural Language Processing 9(11)

Page 29: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Independence

I Two events A and B are independent if and only if:

P(A,B) = P(A)P(B)

I Equivalently:P(A) = P(A|B)P(B) = P(B |A)

I Example:I P(run

1

) = P(A) = 10

�3

I P(amok

2

) = P(B) = 10

�6

I P(run

1

, amok

2

) = (A,B) = 10

�7

I A and B are not independent

Natural Language Processing 10(11)

Page 30: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Independence

I Two events A and B are independent if and only if:

P(A,B) = P(A)P(B)

I Equivalently:P(A) = P(A|B)P(B) = P(B |A)

I Example:I P(run

1

) = P(A) = 10

�3

I P(amok

2

) = P(B) = 10

�6

I P(run

1

, amok

2

) = (A,B) = 10

�7

I A and B are not independent

Natural Language Processing 10(11)

Page 31: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Quiz 2

I Research has shown that people with disease D exhibitsymptom S with 0.9 probability

I A doctor finds that a patient has symptom SI What can we conclude about the probability that the patient

has disease D1. The probability is 0.1

2. The probability is 0.9

3. Nothing

Natural Language Processing 11(11)

Page 32: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Natural Language Processing

Statistical Inference

Joakim Nivre

Uppsala UniversityDepartment of Linguistics and Philology

[email protected]

Natural Language Processing 1(12)

Page 33: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Statistical Inference

IInference from a finite set of observations (a sample) to a

larger set of unobserved instances (a population or model)

ITwo main kinds of statistical inference:

1. Estimation2. Hypothesis testing

IIn natural language processing:

I Estimation – learn model parameters (probability distributions)I Hypothesis tests – assess statistical significance of test results

Natural Language Processing 2(12)

Page 34: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Random Variables

IA random variable is a function X that partitions the sample

space ⌦ by mapping outcomes to a value space ⌦X

IThe probability function can be extended to variables:

P(X = x) = P({! 2 ⌦ |X (!) = x})

IExamples:

1. The part-of-speech of a word X : ⌦ ! {noun, verb, adj, . . .}2. The number of words in a sentence Y : ⌦ ! {1, 2, 3, . . .}.

IWhen we are not interested in particular values, we write P(X )

Natural Language Processing 3(12)

Page 35: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Expectation

IThe expectation E [X ] of a (discrete) numerical variable X is:

E [X ] =X

x2⌦X

x · P(X = x)

IExample: The expectation of the sum Y of two dice:

E [Y ] =12X

y=2

y · P(Y = y) =252

36

= 7

Natural Language Processing 4(12)

Page 36: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Entropy

IThe entropy H[X ] of a discrete random variable X is:

H[X ] = E [� log2 P(X )] = �X

x2⌦X

P(X = x) log2 P(X = x)

IEntropy can be seen as the expected amount of information

(in bits), or as the difficulty of predicting the variable

I Sum of two dice: �P

12

y=2

P(Y = y) log

2

P(Y = y) ⇡ 3.27

I 11-sided die (2–12): �P

12

z=2

1

11

log2

1

11

⇡ 3.46

Natural Language Processing 5(12)

Page 37: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Quiz 1

ILet X be a random variable that map (English) words to the

number of characters they concern

I For example, X (run) = 3 and X (amok) = 4I

Which of the following statements do you think are true:

1. P(X = 0) = 02. P(X = 5) < P(X = 50)3. E [X ] < 50

Natural Language Processing 6(12)

Page 38: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Statistical Samples

IA random sample of a variable X is a vector (X1, . . . ,X

N

) of

independent variables Xi

with the same distribution as XI It is said to be i.i.d. = independent and identically distributedI In practice, it is often hard to guarantee thisI Observations may not be independent (not i.)I Distribution may be biased (not i.d.)

IWhat is the intended population?

I A Harry Potter novel is a good sample of J.K. Rowling, orfantasy fiction, but not of scientific prose

I This is relevant for domain adaptation in NLP

Natural Language Processing 7(12)

Page 39: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Estimation

IGiven a random sample of X , we can define sample variables,

such as the sample mean:

X =1

N

NX

i=1

Xi

ISample variables can be used to estimate model parameters

(population variables)

1. Point estimation: use variable X to estimate parameter �2. Interval estimation: use variables X

min

and Xmax

to constructan interval such that P(X

min

< � < Xmax

) = p, where p is theconfidence level adopted

Natural Language Processing 8(12)

Page 40: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Maximum Likelihood Estimation (MLE)

ILikelihood of parameters ✓ given sample x1, . . . , x

N

:

L(✓|x1, . . . , xN

) = P(x1, . . . , xN

|✓) =NY

i=1

P(xi

|✓)

IMaximum likelihood estimation – choose ✓ to maximize L:

max

✓L(✓|x1, . . . , x

N

)

IBasic idea:

I A good sample should have a high probability of occurringI Thus, choose the estimate that maximizes sample probability

Natural Language Processing 9(12)

Page 41: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Examples

ISample mean is an MLE of expectation:

E [X ] = X

I For example, estimate expected sentence length in a certaintype of text by mean sentence length in a representative sample

IRelative frequency is an MLE of probability:

P(X = x) =f (x)

NI For example, estimate the probability of a word being a noun

by the relative frequency of nouns in a suitable corpus

Natural Language Processing 10(12)

Page 42: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

MLE for Different Distributions

IJoint distribution of X and Y :

PMLE

(X = x ,Y = y) =f (x , y)

N

IMarginal distribution of X :

PMLE

(X = x) =X

y2⌦Y

PMLE

(X = x ,Y = y)

IConditional distribution of X given Y :

PMLE

(X = x |Y = y) = P

MLE

(X=x ,Y=y)

P

MLE

(Y=y)

= P

MLE

(X=x ,Y=y)Px2⌦

X

P

MLE

(X=x ,Y=y)

Natural Language Processing 11(12)

Page 43: Natural Language Processing - Uppsala Universitynivre/master/NLP-Probability...Natural Language Processing Joint, Conditional and Marginal Probability Joakim Nivre Uppsala University

Quiz 2

IConsider the following sample of English words:

{once, upon, a, time, there,was, a, frog}

IWhat is the MLE of word length (number of characters) based

on this sample?

1. 42. 83. 3.25

Natural Language Processing 12(12)