Probability, random variables. Continuous random variable ...

7
v2020 1 / 7 Biomathematics 2 Probability, random variables. Continuous random variable. Normal, standard normal distribution. Dr. Beáta Bugyi associate professor University of Pécs, Medical School Department of Biophysics 2020

Transcript of Probability, random variables. Continuous random variable ...

v2020

1 / 7

Biomathematics 2

Probability, random variables.

Continuous random variable. Normal, standard normal

distribution.

Dr. Beáta Bugyi

associate professor

University of Pécs, Medical School

Department of Biophysics

2020

v2020

2 / 7

CONTINUOUS RANDOM VARIABLE continuous: uncountable, infinite number of values, arises from measurement

Probability – discrete/continuous random variables

Let’s consider that a statistical experiment has an outcome corresponding to

A) a discrete random variable and X = 0 – 10 (finite number of outcomes: 10)

Give the probability that the outcome is 6.

𝑃(𝑋 = 6) =1

10= 0.1

B) a continuous random variable and X = 0 – 10 (infinite number of outcomes)

Give the probability that the outcome is 6. Exactly 6, not 6.1, 6.01, …, 6.00000000001

𝑃(𝑋 = 6) =1

∞= 0

NORMAL DISTRIBUTION

𝑁(𝜇, 𝜎), 𝜇 = 𝑚𝑒𝑎𝑛, 𝜎 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

Probability density function (PDF)

𝑓(𝑥) =1

√2𝜋𝜎2exp (−

(𝑥 − 𝜇)2

2𝜎2 )

Cumulative density function (CDF)

𝐹(𝑥) = ∫1

√2𝜋𝜎2exp (−

(𝑥 − 𝜇)2

2𝜎2 )𝑥

−∞

Graphical representation of the PDF and CDF of normal distributions.

The normal distribution is defined by its mean (𝜇) and standard deviation (𝜎).

The PDF has a characteristic bell shape.

The PDF is symmetric to the mean of the distribution.

v2020

3 / 7

The inflection point of the PDF corresponds to the standard deviation of the distribution.

The width (width at half-maximum) of the PDF is proportional to the standard deviation; the

larger the width the larger the standard deviation.

Probability is given by the area under the PDF (see examples below).

Example 1

The test result of students from Subject 1 follows a normal distribution with a mean of 60% and

standard deviation of 10%. 𝑵(𝝁, 𝝈) = 𝑵(𝟔𝟎, 𝟏𝟎). Represent graphically the following

probabilities.

Q1.1: What is the probability that a student scores 60%? 𝑃(𝑋 = 𝑥 = 60) = ?

Q1.2: What is the probability that a student scores less than 60%? 𝑃(𝑋 < 𝑥 = 60) =?

Q1.3: What is the probability that a student scores more than 60%? 𝑃(𝑋 > 𝑥 = 60) = ?

Q1.4: What is the probability that a student scores less than 80%? 𝑃(𝑋 < 𝑥 = 80) = ?

Q1.5: What is the probability that a student scores between 60% and 80%? 𝑃(𝑥 = 60 < 𝑋 < 𝑥 =

80) = ?

Example 2

The test result of students from Subject 2 follows a normal distribution with a mean of 62% and

standard deviation of 8%. 𝑵(𝝁, 𝝈) = 𝑵(𝟔𝟐, 𝟖).

Question:

How can we work with different normal distributions? Do we need the PDF of each and every normal

distribution?

Answer:

Normal distributions can be standardized; ∞ normal distribution 1 standardized distribution

(standard normal distribution)

How to standardize normal distributions?

𝑁(𝜇, 𝜎)

z score: 𝒛 =𝒙−𝝁

𝝈

z score: how many standard deviations (𝜎) is a given value (𝑥) from the mean (𝜇)

STANDARD NORMAL DISTRIBUTION

𝑆𝑁(0, 1), 𝜇 = 1, 𝜎 = 0

Probability density function (PDF)

𝑓(𝑥) =1

√2𝜋𝜎2exp (−

(𝑥−𝜇)2

2𝜎2 ) , 𝑤ℎ𝑒𝑟𝑒 𝜇 = 0 𝑎𝑛𝑑 𝜎 = 1: 𝑓(𝑥) =1

√2𝜋exp (−

𝑥2

2),

Cumulative density function (CDF)

𝐹(𝑥) = ∫1

√2𝜋exp (−

𝑥2

2)

𝑥

−∞

Graphical representation of the PDF and CDF of the standard normal distribution.

v2020

4 / 7

Z table

summarizes the CDF of the standard normal distribution

Example 1

The test result of students from Subject 1 follows a normal distribution with a mean of 60% and

standard deviation of 10%. 𝑵(𝝁, 𝝈) = 𝑵(𝟔𝟎, 𝟏𝟎). Standardize the normal distribution. Give the

probabilities by using the Z table.

Q1.1: What is the probability that a student scores 60%? 𝑃(𝑋 = 𝑥 = 60) = ?

𝑃(𝑋 = 𝑥 = 60) = 0

Q1.2: What is the probability that a student scores less than 60%? 𝑃(𝑋 < 𝑥 = 60) =?

𝑧 =𝑥 − 𝜇

𝜎=

60 − 60

10= 0.00

𝑃(𝑋 < 𝑥 = 60) = 0.5 → 50 %

Q1.3: What is the probability that a student scores more than 60%? 𝑃(𝑋 > 𝑥 = 60) = ?

𝑃(𝑋 > 𝑥 = 60) + 𝑃(𝑋 < 𝑥 = 60) = 1

𝑃(𝑋 > 𝑥 = 60) = 1 − 𝑃(𝑋 < 𝑥 = 60) = 1 − 0.5 = 0.5 → 50 %

Q1.4: What is the probability that a student scores less than 80%? 𝑃(𝑋 < 𝑥 = 80) = ?

𝑧 =𝑥 − 𝜇

𝜎=

80 − 60

10= 2.00

𝑃(𝑋 < 𝑥 = 80) = 0.9772 → 97.72 %

Q1.5: What is the probability that a student scores between 60% and 80%? 𝑃(𝑥 = 60 < 𝑋 < 𝑥 =

80) = ?

𝑃(𝑋 < 80) − 𝑃(𝑋 < 60) = 0.9772 − 0.5 = 0.4772 → 47.72%

Example 2

v2020

5 / 7

The test result of students from Subject 2 follows a normal distribution with a mean of 62% and

standard deviation of 8%. 𝑵(𝝁, 𝝈) = 𝑵(𝟔𝟐, 𝟖). Give the probabilities by using the Z table.

Q2.1: What is the probability that a student scores less than 65%? 𝑃(𝑋 < 𝑥 = 65) =?

𝑧 =𝑥 − 𝜇

𝜎=

65 − 62

8= + 0.375

If a value is not listed in the table, use the following approximation:

+ 0.375 =0.37 + 0.38

2

𝑃(𝑋 < 𝑥 = 65) =0.6443 + 0.6480

2= 0.6462 → 64.62 %

Q2.2: What is the probability that a student scores less than 45%? 𝑃(𝑋 < 𝑥 = 45) =?

𝑧 =𝑥 − 𝜇

𝜎=

45 − 62

8= −2.125

If a value is not listed in the table, use the following approximation:

−2.125 =−2.12 + (−2.13)

2

𝑃(𝑋 < 𝑥 = 45) =0.0170 + 0.0166

2= 0.0168 → 1.68 %

Q2.3: What is the probability that a student scores between 45% and 65%? 𝑃(𝑥 = 45 < 𝑋 < 𝑥 = 65) =

?

𝑃(𝑥 = 45 < 𝑋 < 𝑥 = 65) = 𝑃(𝑋 < 𝑥 = 65) − 𝑃(𝑋 < 𝑥 = 45) = 0.6462 − 0.0168 = 0.6294

→ 62.94 %

Q2.4: What is the median of the students’ scores? 𝑃(𝑋 < 𝑥) = 0.5, 𝑥 = ?

𝑃(𝑋 < 𝑥) = 0.5 → 𝑧 = 0.00

𝑧 =𝑥 − 𝜇

𝜎→ 0.00 =

𝑥 − 62

8→ 𝑥 = 62

Note: The mean of a data set following normal distribution is equal to its median.

Q2.5: What is the first quartile of the students’ scores? 𝑃(𝑋 < 𝑥) = 0.25, 𝑥 = ?

𝑃(𝑋 < 𝑥) = 0.25 → 𝑧 = −0.675

𝑧 =𝑥 − 𝜇

𝜎→ −0.675 =

𝑥 − 62

8→ 𝑥 = 56.6

Q2.6: What is the third quartile of the students’ scores? 𝑃(𝑋 < 𝑥) = 0.75, 𝑥 = ?

𝑃(𝑋 < 𝑥) = 0.75 → 𝑧 = 0.675

𝑧 =𝑥 − 𝜇

𝜎→ 0.675 =

𝑥 − 62

8→ 𝑥 = 67.4

Q2.7: Find what percentage of data is between mean ± 1×standard deviation, mean ± 2×standard

deviation, mean ± 3×standard deviation.

v2020

6 / 7

IMPORTANCE OF NORMAL DISTRIBUTION

CENTRAL LIMIT THEOREM

Example 3

In a population of persons let X = life expectancy of a person (in years). The distribution of X

has a mean and standard deviation of 72 and 18.2 years, respectively.

𝑋 = 𝑙𝑖𝑓𝑒 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑛𝑐𝑦 𝑜𝑓 𝑎 𝑝𝑒𝑟𝑠𝑜𝑛 𝑖𝑛 𝑎 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 (𝑦𝑒𝑎𝑟𝑠)

𝑋 = 𝑥𝑝𝑒𝑟𝑠𝑜𝑛1, 𝑥𝑝𝑒𝑟𝑠𝑜𝑛2, …

We choose samples from the population, each of the samples consists of n persons and by

finding the average lifetime in each sample (�̅�, sample mean) we obtain the distribution of �̅�.

Sampling distribution of sample means: a distribution of the sample means calculated from all

possible random samples of a specific size (n) taken from a population.

�̅� = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑙𝑖𝑓𝑒 𝑒𝑥𝑝𝑒𝑐𝑡𝑎𝑛𝑐𝑦 𝑜𝑓 𝑝𝑒𝑟𝑠𝑜𝑛𝑠 𝑖𝑛 𝑎 𝑠𝑎𝑚𝑝𝑙𝑒 (𝑦𝑒𝑎𝑟𝑠)

�̅� = �̅�𝑠𝑎𝑚𝑝𝑙𝑒1, �̅�𝑠𝑎𝑚𝑝𝑙𝑒2, …

Properties of the distribution of the sample means

𝜇�̅� = 𝜇𝑋

𝜎�̅� =𝜎𝑋

√𝑛 (standard error of the mean, SEM)

Characteristics of the distribution: Central limit theorem (CLT)

POPULATION SAMPLE

𝑋 = 𝑥

life expectancy of a person in a

population

�̅� = �̅�

average life expectancy of persons in a

sample

normal distribution normal distribution for any n

not normal/not known distribution

CLT: if n is large enough (𝑛 ≥ 30)

approximated by normal distribution

the larger n, the better the approximation

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

Q3.1: Consider that X has normal distribution: 𝑁𝑋(72, 18.2). What is the distribution of �̅� if n

= 10 or n = 40?

n = 10 normal, n = 40 normal

Q3.2: Consider that the distribution of X is not known/not normal. What is the distribution of

�̅� if n = 10 or n = 40?

n = 10 not known/not normal, n = 40 approximated by normal

Q3.3: What is the mean of �̅� and standard deviation of �̅� (standard error of the mean) if n = 40?

𝜇�̅� = 𝜇𝑋 = 72

𝜎�̅� =𝜎𝑋

√𝑛=

18.2

√40= 2.88

v2020

7 / 7

𝑁�̅�(72, 2.88)

Q3.4: Find 𝑃(𝑋 < 𝑥 = 70) and 𝑃(�̅� < �̅� = 70)?

𝑃(𝑋 < 𝑥 = 70): What is the probability that the life expectancy of a person in the population

is less than 70 years?

𝑁𝑋(72, 18.2)

𝑧 =𝑥 − 𝜇

𝜎=

70 − 72

18.2= −0.109

𝑃(𝑋 < 𝑥 = 70) = 0.4247 → 42.47 %

𝑃(�̅� < �̅� = 70): What is the probability that the average life expectancy of persons in a sample

is less than 70 years?

𝑁�̅�(72, 2.88)

𝑧 =𝑥 − 𝜇

𝜎=

�̅� − 𝜇

𝜎�̅�=

�̅� − 𝜇𝜎𝑋

√𝑛

=70 − 72

2.88= −0.7

𝑃(�̅� < �̅� = 70) = 0.2420 → 24.2 %