Probability Distributions: Discrete
Transcript of Probability Distributions: Discrete
![Page 1: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/1.jpg)
Probability Distributions: Discrete
Introduction to Data Science AlgorithmsJordan Boyd-Graber and Michael PaulSEPTEMBER 27, 2016
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 1 of 8
![Page 2: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/2.jpg)
Categorical distribution
• Recall: the Bernoulli distribution is a distribution over two values(success or failure)
• categorical distribution generalizes Bernoulli distribution over anynumber of values
◦ Rolling a die◦ Selecting a card from a deck
• AKA discrete distribution.
◦ Most general type of discrete distribution◦ specify all (but one) of the probabilities in the distribution◦ rather than the probabilities being determined by the probability
mass function.
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 2 of 8
![Page 3: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/3.jpg)
Categorical distribution
• If the categorical distribution is over K possible outcomes, then thedistribution has K parameters.
• We will denote the parameters with a K -dimensional vector ~θ .
• The probability mass function can be written as:
f (x) =K∏
k=1
θ[x=k]k
where the expression [x = k ] evaluates to 1 if the statement is true and0 otherwise.
◦ All this really says is that the probability of outcome x is equal to θx .
• The number of free parameters is K −1, since if you know K −1 of theparameters, the K th parameter is constrained to sum to 1.
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 3 of 8
![Page 4: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/4.jpg)
Categorical distribution
• Example: the roll of a (unweighted) die
P(X = 1) = 16
P(X = 2) = 16
P(X = 3) = 16
P(X = 4) = 16
P(X = 5) = 16
P(X = 6) = 16
• If all outcomes have equal probability, this is called the uniformdistribution.
• General notation: P(X = x) = θx
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 4 of 8
![Page 5: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/5.jpg)
Sampling from a categorical distribution
• How to randomly select a value distributed according to a categoricaldistribution?
• The idea is similar to randomly selected a Bernoulli-distributed value.
• Algorithm:
1 Randomly generate a number between 0 and 1r = random(0, 1)
2 For k = 1, . . . ,K :
• Return smallest r s.t. r <∑k
i=1θk
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 5 of 8
![Page 6: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/6.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.452383
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?
• Return X = 3
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 6 of 8
![Page 7: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/7.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.452383
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?
• Return X = 3
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 6 of 8
![Page 8: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/8.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.452383
r <θ1?
r <θ1 +θ2?r <θ1 +θ2 +θ3?
• Return X = 3
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 6 of 8
![Page 9: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/9.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.452383
r <θ1?r <θ1 +θ2?
r <θ1 +θ2 +θ3?
• Return X = 3
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 6 of 8
![Page 10: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/10.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.452383
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?
• Return X = 3
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 6 of 8
![Page 11: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/11.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.452383
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?
• Return X = 3
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 6 of 8
![Page 12: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/12.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.117544
r <θ1?
• Return X = 1
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 7 of 8
![Page 13: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/13.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.117544
r <θ1?
• Return X = 1
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 7 of 8
![Page 14: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/14.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.117544
r <θ1?
• Return X = 1
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 7 of 8
![Page 15: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/15.jpg)
Sampling from a categorical distribution
• Example: simulating the roll of a die
P(X = 1) = θ1 = 0.166667
P(X = 2) = θ2 = 0.166667
P(X = 3) = θ3 = 0.166667
P(X = 4) = θ4 = 0.166667
P(X = 5) = θ5 = 0.166667
P(X = 6) = θ6 = 0.166667
Random number in (0,1):r = 0.117544
r <θ1?
• Return X = 1
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 7 of 8
![Page 16: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/16.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 17: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/17.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 18: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/18.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?
r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 19: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/19.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?
r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 20: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/20.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?
r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 21: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/21.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?
r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 22: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/22.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?
r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 23: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/23.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 24: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/24.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8
![Page 25: Probability Distributions: Discrete](https://reader030.fdocuments.in/reader030/viewer/2022012809/61be794a16774269eb591f45/html5/thumbnails/25.jpg)
Sampling from a categorical distribution
• Example 2: rolling a biased die
P(X = 1) = θ1 = 0.01
P(X = 2) = θ2 = 0.01
P(X = 3) = θ3 = 0.01
P(X = 4) = θ4 = 0.01
P(X = 5) = θ5 = 0.01
P(X = 6) = θ6 = 0.95
Random number in (0,1):r = 0.209581
r <θ1?r <θ1 +θ2?r <θ1 +θ2 +θ3?r <θ1 +θ2 +θ3 +θ4?r <θ1 +θ2 +θ3 +θ4 +θ5?r <θ1+θ2+θ3+θ4+θ5+θ6?
• Return X = 6
• We will always return X = 6 unless our random number r < 0.05.
◦ 6 is the most probable outcome
Introduction to Data Science Algorithms | Boyd-Graber and Paul Probability Distributions: Discrete | 8 of 8