Presenters: Nouruddin Boojhawoonah & Poonam Gopaul Notes reffered from statistics tutorial:...

27
Presentation on Probability Distribution * Binomial * Chi-square Presenters: Nouruddin Boojhawoonah & Poonam Gopaul Notes reffered from statistics tutorial: Probability distribution. J.CRAWSHAW and J.CHAMBERS

Transcript of Presenters: Nouruddin Boojhawoonah & Poonam Gopaul Notes reffered from statistics tutorial:...

  • Slide 1

Presenters: Nouruddin Boojhawoonah & Poonam Gopaul Notes reffered from statistics tutorial: Probability distribution. J.CRAWSHAW and J.CHAMBERS Slide 2 To understand probability distributions, it is important to understand variables. random variables, and some notation. A variable is a symbol (A, B, x, y, etc.) that can take on any of a specified set of values. When the value of a variable is the outcome of a statistical experiment, that variable is a random variable.statistical experiment Generally, statisticians use a capital letter to represent a random variable and a lower-case letter, to represent one of its values. For example, X represents the random variable X. P(X) represents the probability of X. P(X = x) refers to the probability that the random variable X is equal to a particular value, denoted by x. As an example, P(X = 1) refers to the probability that the random variable X is equal to 1. Slide 3 Number of headsProbability 00.25 10.50 20.25 Probability Distributions An example will make clear the relationship between random variables and probability distributions. Suppose you flip a coin two times. This simple statistical experiment can have four possible outcomes: HH, HT, TH, and TT. Now, let the variable X represent the number of Heads that result from this experiment. The variable X can take on the values 0, 1, or 2. In this example, X is a random variable; because its value is determined by the outcome of a statistical experiment. A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurence. Consider the coin flip experiment described above. The table below, which associates each outcome with its probability, is an example of a probability distribution. The below table represents the probability distribution of the random variable X. Slide 4 Number of heads: xProbability: P(X = x) Cumulative Probability: P(X < x) 00.25 10.500.75 20.251.00 Cumulative Probability Distributions A cumulative probability refers to the probability that the value of a random variable falls within a specified range. Let us return to the coin flip experiment. If we flip a coin two times, we might ask: What is the probability that the coin flips would result in one or fewer heads? The answer would be a cumulative probability. It would be the probability that the coin flip experiment results in zero heads plus the probability that the experiment results in one head. P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75 Like a probability distribution, a cumulative probability distribution can be represented by a table or an equation. In the table below, the cumulative probability refers to the probability than the random variable X is less than or equal to x. Slide 5 Uniform Probability Distribution The simplest probability distribution occurs when all of the values of a random variable occur with equal probability. This probability distribution is called the uniform distribution. Uniform Distribution. Suppose the random variable X can assume k different values. Suppose also that the P(X = x k ) is constant. Then, P(X = x k ) = 1/k Example 1 Suppose a die is tossed. What is the probability that the die will land on 6 ? Solution: When a die is tossed, there are 6 possible outcomes represented by: S = { 1, 2, 3, 4, 5, 6 }. Each possible outcome is a random variable (X), and each outcome is equally likely to occur. Thus, we have a uniform distribution. Therefore, the P(X = 6) = 1/6. Example 2 Suppose we repeat the dice tossing experiment described in Example 1. This time, we ask what is the probability that the die will land on a number that is smaller than 5 ? Solution: When a die is tossed, there are 6 possible outcomes represented by: S = { 1, 2, 3, 4, 5, 6 }. Each possible outcome is equally likely to occur. Thus, we have a uniform distribution. This problem involves a cumulative probability. The probability that the die will land on a number smaller than 5 is equal to: P( X < 5 ) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4) = 1/6 + 1/6 + 1/6 + 1/6 = 2/3 Slide 6 If a variable can take on any value between two specified values, it is called a continuous variable; otherwise, it is called a discrete variable.variable Some examples will clarify the difference between discrete and continuous variables. Suppose the fire department mandates that all fire fighters must weigh between 150 and 250 pounds. The weight of a fire fighter would be an example of a continuous variable; since a fire fighter's weight could take on any value between 150 and 250 pounds. Suppose we flip a coin and count the number of heads. The number of heads could be any integer value between 0 and plus infinity. However, it could not be any number between 0 and plus infinity. We could not, for example, get 2.5 heads. Therefore, the number of heads must be a discrete variable. Just like variables, probability distributions can be classified as discrete or continuous.probability distributions Discrete Probability Distributions If a random variable is a discrete variable, its probability distribution is called a discrete probability distribution.random variableprobability distribution Slide 7 Binomial Distribution To understand binomial distributions and binomial probability, it helps to understand binomial experiments and some associated notation; so we cover those topics first. Binomial Experiment A binomial experiment (also known as a Bernoulli trial) is a statistical experiment that has the following properties:statistical experiment The experiment consists of n repeated trials. Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The probability of success, denoted by P, is the same on every trial. The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.independent Consider the following statistical experiment. You flip a coin 2 times and count the number of times the coin lands on heads. This is a binomial experiment because: The experiment consists of repeated trials. We flip a coin 2 times. Each trial can result in just two possible outcomes - heads or tails. The probability of success is constant - 0.5 on every trial. The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other trials. Notation The following notation is helpful, when we talk about binomial probability. x: The number of successes that result from the binomial experiment. n: The number of trials in the binomial experiment. P: The probability of success on an individual trial. Q: The probability of failure on an individual trial. (This is equal to 1 - P.) b(x; n, P): Binomial probability - the probability that an n-trial binomial experiment results in exactly x successes, when the probability of success on an individual trial is P. n C r : The number of combinations of n things, taken r at a time.combinations Slide 8 Number of headsProbability 00.25 10.50 20.25 Binomial Distribution A binomial random variable is the number of successes x in n repeated trials of a binomial experiment. The probability distribution of a binomial random variable is called a binomial distribution (also known as a Bernoulli distribution). probability distribution Suppose we flip a coin two times and count the number of heads (successes). The binomial random variable is the number of heads, which can take on values of 0, 1, or 2. The binomial distribution is presented below. The binomial distribution has the following properties: The mean of the distribution ( x ) is equal to n * P. The variance ( 2 x ) is n * P * ( 1 - P ).variance The standard deviation ( x ) is sqrt[ n * P * ( 1 - P ) ].standard deviation Binomial Probability The binomial probability refers to the probability that a binomial experiment results in exactly x successes. For example, in the above table, we see that the binomial probability of getting exactly one head in two coin flips is 0.50. Given x, n, and P, we can compute the binomial probability based on the following formula: Binomial Formula. Suppose a binomial experiment consists of n trials and results in x successes. If the probability of success on an individual trial is P, then the binomial probability is: P(X=r)= (nCr).qn-r.pr Slide 9 Lets work out an example 30% of pupils in a school travel by bus. From a sample of ten pupils chosen at random, find the probability that (a) only three travel by bus, (b) less than half travel by bus Hints: (we need to identify n=? & p=?) Slide 10 Other examples (1) The random variable X~Bin(6,.042). Find (a)P(X= 6) (b)P(X= 4) (c)P(X 2) (2) A fair coin is tossed six times. Find the probability of throwing at least four heads. (3) X~Bin(n, 0.3). Find the least possible value of n such that P(X1)= 0.8. (4) Assuming that a couple are equally likely to produce a boy or a girl, find the probability that in a family of five children there are more boys than girls. Slide 11 (6) Charlie finds that when she takes a cutting from a particular plant, the probability that it roots successfully is 1/3. (a)She takes nine cuttings. Find the probability that (i) more than five cuttings root successfully, (ii) at least three cuttings root successfully, (b) Find the number of cuttings that she should take in order to be 99% certain that at least one cutting root successfully. (5) X~Bin(4, p) and P(X=4)= 0.0256. Find P(X=2). Slide 12 Example to illustrate Diagrammatic representation of the Binomial Distribution In a survey on washing powder, it is found that the probability that a shopper chooses Soapsuds is 0.35. Using a sample of seven shoppers, illustrate the information in a diagram. Solution: X~Bin(7, 0.35) P(X=r) = (7Cr).q n-r. p r P(X=0)= 0.0490 P(X=1)= 0.1847 P(X=2)= 0.2984 P(X=3)= 0.2678 P(X=4)= 0.1442 P(X=5)= 0.0466 P(X=6)= ??? P(X=7)= ??? Slide 13 X~Bin(7, 0.35) p X0 Slide 14 Expectation and Variance of the Binomial Distribution If X~Bin(n, p) E(X)=np VAR(X)=npq, where q= 1-p Computation of Expectation and Variance for a probability distribution table E(X)= ExP(X=r) E(X^2)= Ex^ 2P(X=r) VAR(X)= E(X^2)-E^2(X) Slide 15 The random variable X~Bin(4, 0.8). Construct the probability distribution for X and find the expectation and variance. Verify that E(X)= np and Var(X)= npq X~Bin( 4,0. 8 )so n=4 and p=0.8 P(X=0)= 0.2^4=0.0016 P(X=1) = 4*0.2^3*0.8=0.0256 P(X=2)= 4C2*0.2^2*0.8^2=0.1536 P(X=3)= 4C3*0.2*0.8^3=0.4096 P(X=4)=0.8^4=0.4096 X01234 P(X=r)0.00160.02560.15360.4096 Probability distribution table for X: Slide 16 E(X)= ExP(X=r) = 0*0.0016 + 1*0.0256 + 2*0.1536 + 3*0.4096 + 4*0.4096 = 3.2 E(X^2)= Ex^2P(X=r) = (0^2*0.0016) + (1^2*0.0256) + (2^2*0.1536) + (3^2*0.4096) + (4^2*0.4096) = 10. 88 VAR(X) = E(X^2)-E^2(X) = 10.88- (3.2^2) = 0.64 Now,np= 8*0.4 = 3.2 npq= 8*0.4*0.6 = 0.64 Therefore, E(X)= np VAR(X)= npq Slide 17 The X 2 test is a significance test that enables us to decide whether it is valid to use a particular distribution, such as binomial,poisson or normal, as a model so that we can interpret observed data. We can also use the X 2 test to decide Whether two variables are independent. Slide 18 Example: A farmer Kept a record of the number of heifer calves born to each of his cows during the first five years of breeding of each cow. The results are summarized below Test, at 5% Level of significance, whether or not the binomial distribution with parameters n=5,p=0.5 is an adequate model for these distribution Number of Heifers 012345 Number of cows 4194152268 Slide 19 procedures 1. Consider a set of data with observed frequency, O 2. Make the null hypothesis(h o ) concerning the distribution followed by the data. Let X be the r.v.the number of heifer calves born to a cow in the first five years of breeding. H o :X~Bin(5,0.5) 3. Calculate the expected frequencies,E according to this hypothesis. The expected frequencies are given by 150p(X=x) where P(X=x)= 5 c x (o.5) 5-x (o.5) x = 5 c x (0.5) 5 Number of Heifers 012345 Observed frequenc y (O) 4194152268 Slide 20 )5 Number of heifers 012345 Observed frequency(o ) 4194152268Total15 0 Expected frequency(E ) 4.723.446.9 23.44.7Total15 0 150x 5 c 0 (0.5) 5 150x 5 c 1 (0.5) 5 150 x 5 c 2 (0.5) 5 Slide 21 Since the expected frequencies for the first and last cells are less than 5, We must combine them with the next cell. Number of heifers 0 or 1234 or 5 Observed frequency( O) 23415234Total 150 Expected Frequency (E) 28.146.9 28.1Total 150 4.7+23.4 Slide 22 5. Decide on the level of the test and the rejection criterion, looking up the critical values in the x 2 tables The x 2 (3) distribution is considered. 4.Work out the number of degrees of freedom v Where v= Number of cells- Number of restrictions The Number of restriction depends on the null hypothesis The number of cells=4 There is one restriction, that the total expected frequency is150. Therefore, v =4-1=3 Slide 23 From the table Degree of freedo m 99%95 % 90%70%50%30%10%5%1% 10.000 16 20.020 30.127.82 40.301.14 Slide 24 We test at the 5% level and reject H 0 if x 2 > x 2 5% (3),i.e. if x2>7.82 OE(O-E) 2/ E 2328.10.925 4146.90.742 5246.9.554 3428.11.2387 Total 150 3.461 Slide 25 X 2 =Sum(O-E) 2 /E = 3.461 Since X 2