A short course on Statistics, Probability and Applications

download A short course on Statistics, Probability and Applications

of 70

Transcript of A short course on Statistics, Probability and Applications

  • 7/31/2019 A short course on Statistics, Probability and Applications

    1/70

    Statistics, Probability and Applications

    Abhijit Kar Gupta

  • 7/31/2019 A short course on Statistics, Probability and Applications

    2/70

    1

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Contents:

    1. What is Statistics? 12. Central Tendency

    Mean (A.M., G.M., H.M.) 2 Useful Method of Mean Calculation 3 Median 6 Mode 7 Variance, Standard Deviation 7 Measures of Position (Percentiles, Quartiles, Deciles) 8

    3. Concept of Probability, the Probability Rules and Applications 104. Probability Distributions 16

    From a Discrete to Continuous Prob. Distribution 18 Combination Rules 20 Normal Distribution 22 Experiment with rolling dice 24 Shape of a Distribution: Symmetry, Skewness, Kurtosis 26

    5. Z-Distribution 296. Binomial and Poisson Distribution 33

    Binomial Probability 33 Pascals Triangle 35

    Poisson Distribution 38 Poisson Distribution from Binomial Distribution 39

    7. Correlation and Regression 418. Sampling 47

    Basic Concept 47 Method of Sampling 47

    9. Sampling Distributions 50 Hypothesis Testing 50 Students t -test 54

    Demonstrations of t-distributions 56 Chi-square test 60

    10. Appendix 6511. Z-Score Table 6812. Books and Websites 69

  • 7/31/2019 A short course on Statistics, Probability and Applications

    3/70

    2

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Statistics, Probability and Applications

    Before you begin:

    This is a short course [based on my lecture notes for the M. Sc (Geography) students ofDistance Education] on the elements of Statistics and the concept of Probability Theorysupplemented with examples and illustrations. The purpose of this general and basic course isto serve as a guideline to the practical utility by any student not specializing in Mathematics or

    Statistics. Rigorous mathematics is thus avoided as much as it has been possible .

    *The cover page photo is taken by the author. The cartoons used in this book are duly acknowledged bywriting the source when it is not obvious.

    Chapter I

    What is Statistics?

    Statistics is a systematic presentation of data out of which we may conclude somethingmeaningful.

    Just a collection of raw data is meaningless unless

    we are able to calculate some quantities out ofthem. It is only interesting when some patternsemerge out of the data that are representative ofsome event or measurement.

    After we collect a set of data, the first things welike to know about are the trends or tendencies ofthis data set. We estimate the central tendency ofit.

    [Cartoon Source: http://www-users.york.ac.uk/ ]

    Central Tendency

    The central tendency of a data set is obtained by calculating mean, median and mode.

    http://www-users.york.ac.uk/http://www-users.york.ac.uk/http://www-users.york.ac.uk/http://www-users.york.ac.uk/
  • 7/31/2019 A short course on Statistics, Probability and Applications

    4/70

    3

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Mean:

    There are various kinds of mean, (i) arithmetic , (ii) geometric , (iii) harmonic . We usually calculatearithmetic mean and this we commonly call mean or average.

    Suppose, we have a set of -data points: .

    Arithmetic mean (A.M.)

    (1)The arithmetic mean or average is the measure of the middle of the data set.

    Now suppose, appears times, appears times and so on in the data set. Here , are called the frequencies. The arithmetic mean in this case is , (2)where .

    Formula (2) is called the weighted mean .

    Note : In the formula (2), if we put for all , we get back formula (1).The above formula (2) can also be written as

    ,Where is the relative frequency for each (each data point).Example #1The ages of father, mother, son and daughter in a family are 60 years, 55 years, 25 yearsand 20 years respectively. What is the average age of the family members?

    Ans. Average age = years.

    Example #2In the game of Ludo (dice throwing), you obtain 1 two times, 2 five times, 3 twotimes, 4 six times, 5 four times and 6 onl y once from the random throwing of a dice.What is the average value you get?

    Ans. Average value = .

  • 7/31/2019 A short course on Statistics, Probability and Applications

    5/70

    4

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Geometric mean (G.M.):

    G.M. =

    Harmonic mean (H.M.):

    H.M. =

    It is useful to calculate arithmetic mean (A.M.) of any set of numbers unless they havesome special properties among them.For example, if we are to find the mean of the following set of numbers: 2, 4, 8, 16, 32, it

    is useful to calculate the geometric mean (G.M.).G.M.=

    Note: The numbers 2, 4, 8are in geometric progression.

    If we are asked to find out the mean of the following numbers, , it would be

    interesting to find out the harmonic mean (H.M.):

    H.M.= .

    Note: Here the numbers are in harmonic progression. In fact, the inverse of the

    numbers in H.M. are in A.M.

    Useful Method of Mean Calculation:

    In practical calculations, when we are to obtain arithmetic mean (A.M.) of a set of big numbers,

    we follow a short cut method:

    Step I: We assume a mean by just looking at the numbers. Let this be .(This is our choice and we do this as per our convenience.)

    Step II: Next, we calculate the deviation of this assumed mean from each data point:.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    6/70

    5

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Now, the calculated mean The actual arithmetic mean,

    Similarly, for data with frequencies, Here also, we get the same formula as above, = .

    Example #1

    Consider the following table. We are to calculate the mean rainfall over seven days in monsoonseason.

    Days Rainfall in mm. 1 250 502 240 403 190 -104 254 545 225 256 232 327 170 -30

    Total 1561 161

    Here the assumed mean, mm, .

    mm. The actual mean, = mm.Also, verify by direct calculation, mm.Example #2

    Calculate the mean of the following data with the help of assumed mean method .

    Classinterval

    10-20 45 4 5 2020-30 35 5 -5 -2530-40 48 3 8 2440-50 43 2 3 650-60 40 1 0 0

    =

  • 7/31/2019 A short course on Statistics, Probability and Applications

    7/70

    6

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    60-70 37 1 -3 -370-80 39 4 -1 -4Total 20 18

    Assumed mean, and number of data, Mean of deviation,

    The actual mean, We can also check this from direct calculation,

    Median:

    Median is the data in the middle when the data set is arranged in ascending or descendingorder.

    Example #1

    9, 12, 6, 1, 11

    After ordering, 1, 6, 9, 11, 12

    Median = 9.

    If the data set has even number of entries, the median is the mean of the two data point at themiddle after the ordering.

    Example #2

    9, 12, 6, 1, 11, 13

    After ordering, 1, 6, 9, 11, 12, 13

    Median =

  • 7/31/2019 A short course on Statistics, Probability and Applications

    8/70

    7

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Mode:

    Mode is the data value which has maximum frequency. This means this value occurs maximumnumber of times in the data set.

    Example:

    0, 2, 5, 9, 3, 2, 6, 2, 3, 5, 4, 2, 1

    In the above data set the number 2 occurs maximum times. Mode = 2.

    Usually, a data set follows the following approximate empirical formula :

    Variance, Standard Deviation:

    It is a useful idea to know how the numbers or values are scattered around the mean value ( ).Some of the members may be bigger and some are smaller than the mean. The difference

    can thus be negative or positive, the square of this is thus an absolute measure of thedistance of a number from the mean value. The mean or average of all such terms is defined as

    variance ( : [Note: the Greek symbol (sigma) is traditionally used for standard deviation (s.d.); thevariance is just the square of s.d.]

    Median Mode = 2 Mean

  • 7/31/2019 A short course on Statistics, Probability and Applications

    9/70

    8

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    So the variance is the mean of the square of the deviation from the mean value.

    For a set of data , occurring at some frequencies, , the aboveformula for variance is

    ,where as before.It can be shown that the expression for variance can also be written as

    Variance = [mean of square square of mean]

    Standard deviation is just the square root of variance.

    Standard deviation = .Measures of Position:

    It is often important to classify data!

    In statistical data analysis, we often like to measure the position of a data point relative toother values in the set. For example, we like to know the rank or position of a student relativeto others in a certain examination.

    The measures are done for rank-ordered data , where the elements in the data set are arrangedin ascending order (from the smallest to the largest).

    The following are the most common measures of position of the rank-ordered data:

    = = = =

  • 7/31/2019 A short course on Statistics, Probability and Applications

    10/70

    9

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Percentiles:

    Percentile is the value of a variable below which a certain percent of observations fall. Forexample, 90 th percentile is the value (or score) below which 90% of the data are to be found.

    Suppose, we have -number of values. How is the percentile calculated?

    1. First the data is rank-ordered (arranged in ascending order)2. To calculate the -th percentile we have to find the rank :

    3. Round off the above rank to the nearest integer and then take the value correspondingto the integer rank.

    Example:

    Given the numbers 2, 5, 4, 9, 8, 1

    Rank ordered set: 1, 2, 4, 5, 8, 9

    The rank of the 60 th percentile, (rounded off to nearest

    integer)

    The 60 th percentile is 5 (the 4 th member in the ordered list).

    Note: The 100th

    percentile is defined to be the largest value in the given data set.

    Quartiles:

    A quartile is one of the three points that divide a rank-ordered data set into four equal groups.

    First quartile ( ): Cuts off lower 25% of data 25th percentile

    Second quartile ( ): Divides the data set into half 50th percentile

    Third quartile ( ): Cuts off lowest 75% (or highest 25%) of data 75th percentile

    Inter quartile range = upper quartile lower quartile

    Note: The 50 th percentile = Median

    Deciles:

    Like percentiles, deciles is calculated to find the position of data out of 10 (instead of 100). Soall we have to do is to replace 100 by 10 in the above percentile formula.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    11/70

    10

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Chapter II

    Concept of Probability, the Probability Rules and Applications

    For randomly occurring events, we would like to know how many times we get a desired resultout of all trials. This means we would like to know the fraction of favourable events or trails.Suppose, we flip a coin a few number of times. We know there is a 50-50 chance of occurring aHead or a Tail. We may count how many times there is a Head or a Tail out of all the flips.

    Let,

    = No. of favourable events and = Total no. of events.

    = fraction of favourable events. We can also say this is relative frequency inthe usual language of Statistics.

    [Cartoon source: www1.free-clipart.net]

    Now, if we do the trials a large number of times, this fraction tends to some fixed valuespecific to the event. Then the limiting value of the fraction is what we call probability .

    Note:Total no. of trials is also called sample space when we are drawing samples out of totalpopulation. As the no. of trials is increased, the sampl e space becomes bigger.

    50-50 proposition!

    While lecturing on probability at Warwick University one day in October 1972, Jeffrey Hamilton,demonstrating the effect of chance, took a coin from his pocket and casually tossed it in the air. Theprobability that the coin would land face up (heads) was exactly the same as the probability that itwould land face down (tail); it was, Hamilton explained, a 50-50 proposition.

    Hamilton and the assembled students then watched as it hit the floor, bounced, rolled, spun aroundand came to rest on its edge. After a stunned silence, the entire room broke into wild applause!

    Source: www.anecdotage.com

    http://www.anecdotage.com/http://www.anecdotage.com/http://www.anecdotage.com/http://www.anecdotage.com/
  • 7/31/2019 A short course on Statistics, Probability and Applications

    12/70

    11

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Definition of Probability:

    Probability is the ratio of number of favourable events to the total number of events, providedthe total number of events is very large (actually infinity).

    , when (infinity).

    So by definition, is a fraction between 0 and 1 : .

    No favourable outcome.

    All the outcomes are in favour.

    We can also think in the following way: probability of occurring an event, probability ofnot occurring the event. Since, either the event will occur or not occur, we must write:

    Therefore, we have .

    Example #1:

    In a coin tossing, we know from our experience, = and = =

    . So, .

    Example #2:

    In a throw of a dice, we know that the probability of the dice facing 1 up, 2 up, 3 up etc.will be , , and so on.

    Here,

    Probability of n ot occurring 1 is .

    Note:The condition that the total probability of all the events has to be 1 is called normalization ofprobabilities: Rules of Probability:

  • 7/31/2019 A short course on Statistics, Probability and Applications

    13/70

    12

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    When more than one event takes place, we need to calculate the joint probability for the all theevents.

    Mutually Exclusive Events

    Two events are mutually exclusive (or disjoint) when they cannot occur at the same time.Suppose, two events are A and B and the individual probabilities for them are designated as

    and . Mutually exclusive means,

    .Addition Rule:

    Example#1: The probability of occurring either Head or Tail in a coin toss,

    Example#2: The probability of occurring either 1 or 6 in a dice throw,

    .

    Independent Events

    When the occurrence of one event does not influence the other but they can occur at the sametime, they are called independent. For example, the rain fall today and the Manchester Unitedwinning a match.

    Multiplication Rule:

    Example #1:What is the probability that two Heads will occur when we toss two coins together?

    for the first coin and for the second coin.

    .Note that if would flip a single coin two times and ask the probability of getting Heads twice, wewould get the same answer.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    14/70

    13

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Example #2:Now we ask the question, what is the probability of getting one Head and one Tail in theflipping of two coins together?

    Consider, the probability of obtaining Head in the first coin and Tail in the second coin:

    .And the probability of obtaining Tail in the first and the Head in the second:

    .Now the total probability of above two events (either of them occurs mutually exclusively):

    .Note that in the flipping of two coins together, there are 4 types of events, HH, HT, TH, TT. Outof which the relative occurrence of one Head and one Tail is 2/4 = /12.

    When Events are NOT Mutually Exclusive:

    If the events are not mutually exclusive, thereare some overlap. Suppose, we designatean area A corresponding to the probability ofsome event A and the area B to the probabilityof another event B. The overlap between thetwo areas then represents the jointprobability, . Note that for twoindependent events the overlap would be zero.

    Addition Rule in this case:

    When Events are NOT Independent:

    Multiplication rule:

    ) The probability of B given A. This is a conditional probability , i.e., the probability ofoccurring B provided A occurs first.

    )

  • 7/31/2019 A short course on Statistics, Probability and Applications

    15/70

    14

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Similarly, ) The probability of A , given B. Note here that

    ) = , when B does not depend on A which means A and B are independent. ) = , when A does not depend on B which means A and B are independent.

    So, we can write the formula for conditional probability :

    Let us consider the following table and use the probability rules.

    In a survey over 100 people, the question was asked whether they are graduate or not.

    Graduate Non-

    graduate

    Total

    Male 40 20 60

    Female 10 30 40

    Total 50 50 100

    An interesting anecdote:

    One the occasion of his receiving second Nobel Prize, the legendary scientist

    remarked that while the chance of any person in the world receivinghis first Nobel prize is one in several billions (consider the population of the world), thechance of receiving the second Nobel Prize is one in several hundreds (the total number ofliving people who received the prize in the past) and it was therefore less remarkable to

    receiv e ones second prize than ones first!

    *Prof. Linus Pauling received his first Nobel Prize in Chemistry and the 2 nd one in Peace.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    16/70

    15

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Q,1 What us the probability that a randomly selected person is a male?

    Ans.

    Q.2 What is the probability that a randomly selected person is a female? Ans.

    Q.3 What is the probability that a randomly selected person is a male who is graduate?

    Ans.

    [Also we can think , ]Q.4 What is the probability that a randomly selected person is a female who is non-graduate?

    Ans.

    [Also, ]

    Q.5 What is the probability that the randomly selected person is either a male graduate or afemale non-graduate?

    Ans. This two events are mutually exclusive and by the law of addition, .

    Q.6 If we now select two persons, what is the probability that one of them is a male graduateand another is a female non-graduate?

    Ans. Two independent events are occurring together. So by the law of multiplication of

    probabilities, .

    Q.7 What is the probability that a randomly selected no-graduate is a female? [Prob. of non-graduate among female]

    Ans.

    Q.8 What is the probability that a randomly selected graduate is a male?

    Ans. This is no. of male out of total graduates, .

    Note : In Q.7 & 8, each probability is a conditional probability . However, we gave the answers bylooking at the table directly. Now we answer them in terms of the law of conditional

    probability. Ans. to Q.8: Suppose, A = graduate, B = male, = probability of male given that they aregraduates.We use the formula:

  • 7/31/2019 A short course on Statistics, Probability and Applications

    17/70

    16

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Here, = Prob. of male graduates = , = prob. of graduates = .

    Exercise: Q.7 can also be answered in terms of conditional probability formula. Do this and checkyourself.

    Q.9 What is the probability that the selected person is either male or graduate? Ans. Here the two events do not happen together but they are not mutually exclusive. So weuse the formula:

    = .

    Chapter III

    Probability Distributions

    Let us think of the probabilities for a number of events marked 1, 2, 3..and so on. For each event we can have and also for all the events, (normalization).

    So, we have a set of probabilities corresponding to a set of events. This collection of

    probabilities is a probability distribution for all that discrete events.Imagine, instead of discrete events, we have as a variable which can have continuous values.Also, there is the probability for each value of . Now if we plot against , we get acontinuous curve which is the continuous probability distribution curve (commonly referred asthe probability distribution curve).

    Fig. 3.1

  • 7/31/2019 A short course on Statistics, Probability and Applications

    18/70

    17

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Area under the curve (above x-axis) can be obtained by summing up the areas of the approximaterectangular bars (which we may easily find by plotting this on a graph paper). Approximate areaof one such bar of width and height is = . So, the approximate total area

    between the two end points and is = .To calculate exactly, we need the help of Integral Calculus which essentially sums up the areasof the rectangles (bars) of infinitesimally (smaller than the smallest you can think) small width.

    Those not familiar with the Mathematics of Calculus , do not have to worry as the followingexplanation and symbols can be understood qualitatively which may serve the purpose for now.

    The area under the curve (between the two extreme points shown in the above figure) is the

    following definite integral:Area = = .

    is the total probability for all the values between the two limits. That is why, is oftenreferred to as the probability density. So, is the actual probability in between and

    , where is the infinitesimally small (smaller than you can think) range! Note that thearea of the bar of height and width at some position is .As in the discrete case , the area is the sum of all the mutually exclusive events.

    [The sum (called sigma) in the discrete case becomes

    (called integral) for the continuous

    case.]

    Also,

    = (Normalization)Normalization means that the total area under the curve (extended from negative infinity topositive infinity that means over the entire stretch of the curve.) is unity. This is true as indiscrete case we know that the sum of all the probabilities for all the events should be 1.

    For discrete events, we calculated the relative frequency and then the Bar diagram from them.Here for the continuous case, the bars merge together to form a continuous spectrum and thatis the probability distribution. The relative frequencies tend to the probabilities forcorresponding values of the variable for large number of events.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    19/70

    18

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Now given the probability distribution curve, we would like to know about the shape and size ofthe curve, some specific quantities that are representative of the character of the event.

    From a discrete data set to a continuous Prob. distribution:

    For any discrete set of data collection, we measure the central tendency of the data set. Wecommonly calculate mean, mean of square and variance .

    Mean:

    = = ,where is the frequency of occurrence for event and we have total frequency, .

    [Note: relative frequency ]

    Mean of Square:

    = Variance:

    Var ( ) = = = = * +

    Standard deviation is the square root of the variance.

    Now for a large number of events, each of the ratios in the above formulas becomes thecorresponding probability :

    as tends to very large.Therefore, we write the above quantities in terms of probabilities:

  • 7/31/2019 A short course on Statistics, Probability and Applications

    20/70

    19

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    If the probabilities , , etc. are known for the values , , and so on, we can saythat we have a discrete probability distribution . When the probabilities are so infinitesimallyclosely spaced that we can have probabilities for all possible continuous values of the variable

    , we can say that there is a function of which is called continuous probabilitydistribution function.

    [Note: However, in a practical calculation, when instead of probabilities, we are given the frequencies, , for the quantities that appear in a data set, we calculate mean or average: = .]

    Expectation Values:

    As the probability distribution (no matter discrete or continuous) for some event or somepopulation is known, we may expect what its mean value would be, either through

    mathematical calculations or through our experience.

    *In Statistics, population means entire or all possible set of data. Taking a few data (which wecall sample ) from the population we often try to estimate the mean, which is definitelydifferent from the population mean. But we know, with the larger and larger sample size, thismean (which we call sample mean ) should tend towards the population mean. This means, weexpect the population mean. More on this aspect will be discussed in the chapter onSampling. ]

    So, the expectation value, the is mean of . Likewise, we can have expectation value ofany power of .

    Mean, Mean of Square, = Variance,

    = Standard deviation

  • 7/31/2019 A short course on Statistics, Probability and Applications

    21/70

    20

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Combination Rules:

    When we scale a variable that is we multiply a variable by a number or add with this, we need to knowhow this scaled variable behaves. Do they have same statistical measures? Do they follow the same kindof distributions? Also, we ask the same question for two or more variables when scaled and addedtogether to form a combined variable.

    [Continuous case]

    [Continuous case]

    =

    When

    Mean: Variance: When

    Mean: Variance: If has a Normal distribution, is also a Normal distribution.

    When

    Mean: Variance: If and are separately Normal distributions, is then also a Normaldistribution.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    22/70

    21

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Following the combination rules in the above box, we can solve the following problem.Example:

    The weight of individual people follows Normal distribution, . What will be theprobability distribution of weight of 10 people taking together?

    Ans. Here, mean , .Mean weight of 10 people, + = = 40Variance, + = = 500

    The probability distribution of weight of 10 people taking together, .

    Normal Distribution:

    For any naturally occurring event, for any random measurement of any value in anyexperiment, the distribution that occurs is Normal distribution. The bell shaped symmetriccurve is called Normal curve. If we calculate the height or age distribution or a distribution IQlevel among a population, the probability distribution turns out to be Normal. The namenormal is given as it occurs normally. In Mathematics or Physics literature, it is also calledGaussian distribution after the great mathematician, Karl Fredrick Gauss.

    Properties of Normal Distribution:

    A Bell shaped Symmetric distribution with the peak at the middle. The distribution curveis extended from to [from minus infinity to plus infinity].

    Mean, Mode and Median at the same position (at the peak).

  • 7/31/2019 A short course on Statistics, Probability and Applications

    23/70

    22

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Area under the curve: Total area under the curve = 100% A = 68%,

    [Area within one standard deviation ( from the mean ( on both sides]

    A = 95%,

    A = 99.7%,

    Normal distribution is most commonly observed and widely used and discussed. There arevarious other kinds of distributions which can be identifies by the shapes andmathematical expressions.

    NOTE:

    If we combine a set of Normal distributions, we get a Normal distribution as a result. Considersome -numbers ( ) where each of which are drawn independently from a Normal

    distribution. Calculate the mean of the numbers: . If we draw -numbers againand again, the mean of them would be different but the mean would follow Normaldistribution, provided the number is sufficiently large. But more interestingly, the individualdistributions from which the numbers are drawn, do not matter, the combination always turnup to be a Normal distribution. This is Central Limit Theorem .

  • 7/31/2019 A short course on Statistics, Probability and Applications

    24/70

    23

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Experiment with rolling dice:

    So, here we roll dice, calculate probabilities of occurring numbers and try to establish sometruth!

  • 7/31/2019 A short course on Statistics, Probability and Applications

    25/70

    24

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Example #1 Throwing of a single dice :

    The chance of turning up of any side is equal which is 1 out of 6. We consider that a prioriprobabilities for each case and find out the mean and variance from the following table.

    1 2 3 4 5 6 Total

    1/6 1/6 1/6 1/6 1/6 1/6 1

    1/6 2/6 3/6 4/6 5/6 6/6 21/6

    1/6 4/6 9/6 16/6 25/6 36/6 91/6

    From the table, we can calculate mean, andvariance ,

    If we plot against , we obtain the probability distribution for this case. This distribution isuninteresting as we can check that the probabilities for all values of are same! The curveobtained by joining the points will be a horizontal straight line.

    Fig.

    Did you know?

    Gambling is one of the earliest form of entertainment in human history. Roman

    soldiers regularly played dice games during their campaigns and the NewTestament says that Roman soldiers guarding Jesus cross were tossing dice to seewho would get his clothes.

    Around 1000AD, King Olaf of Norway and King Olaf of Sweden resolve a differenceof opinion on territory by rolling dice.

    Source: New Statesman magazine, 23 April, 2012

  • 7/31/2019 A short course on Statistics, Probability and Applications

    26/70

    25

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Now we do this similar experiment taking two dice together.

    Example #2 (Two Dice)We look for the value of which is the sum of two numbers on the top faces of the two dice asrolled.

    Here we shall have possible combinations of events and can have a minimumvalue, and maximum value, .

    2 3 4 5 6 7 8 9 10 11 12 Total

    1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 1

    2/36 6/36 12/36 20/36 30/36 42/36 40/36 36/36 30/36 22/36 12/36 252/36

    4/36 18/36 48/36 100/36 180/36 294/36 320/36 324/36 300/36 242/36 144/36 1974/36

    Mean, , Variance,

    Now if we plot against taking fromabove table, we get an interestingsymmetric distribution around a peak! Thepeak is at (mean value).

    The distribution is showing a peak at themiddle and it is symmetric !

    We can go on doing such experiment taking 3 or more dice together and ask for the sum ofvalues and the corresponding probabilities as above. It can be understood that the smoothnessof the distribution would be more and more tending towards a definite shape while retainingthe peak at the centre.[In fact, the envelope of the probability values at different (joining the top of the height bars)of the discrete distribution will slowly assume a continuous symmetric curve!]

  • 7/31/2019 A short course on Statistics, Probability and Applications

    27/70

    26

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    In the limit of large number of events obtained from the large number of dice throwingtogether, we tend to get a continuous bell shaped symmetric distribution.

    This is Normal Distribution .

    Shape of a Distribution: Symmetry, Skewness, Kurtosis

    Skewness:A Normal distribution is symmetric around its peak. The peak corresponds to the most probablevalue that is the value for which the probability is the maximum. An interesting thing about asymmetric distribution is that the mean, median and mode are at the same position.

    The skewness is any deviation from symmetry or we can say, lack of symmetry. For a symmetricdistribution , skewness is zero.

    Coefficient of skewness =

    The following mathematical definition is often used to measure the skewness:

    Skew = ,where is the standard deviation of the distribution. So, we see that the skewness is adimensionless quantity.Skewness can be positive or negative. A distribution with a positive value of skewness is called

    positively skewed, which means the tail of the distribution is more extended towards the more

    positive values of . On the other hand, a distribution with a negative value of skewness iscalled negatively skewed , which means the tail is more extended towards more negative values(or lowers values) of .

    Below are the two figures demonstrating the negative and positive skewness: the distributionsare correspondingly called negative skewed and positively skewed distributions.

    For a large number of independent random observations, the probability distribution for themean of the observations can be shown to be Normal distribution . This is called Central LimitTheorem.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    28/70

    27

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    (Negative Skewness: Mean < Mode) (Positive Skewness: Mean > Mode)

    Kurtosis:Kurtosis is another kind of measure of the shape of the distribution. It tells us about thepeakedness (how the peak looks like) or flatness of the probability distribution.

    A Normal distribution is considered as a standard (or benchmark) in this regard. So, any changeof shape of the peak of a distribution (peakedness or flatness) compared to a Normaldistribution is measured.The mathematical expression for kurtosis:

    Kurt = Note that the number 3 is subtracted from the expression so as to make the value of kurtosisfor Normal distribution equal to zero. It can be shown that = 0 for Normaldistribution.

    When kurtosis is positive , the peak of the distribution appears sharper relative to a Normaldistribution. The distribution is then called leptokurtic . One the other hand, when the kurtosisis negative, we call the distribution mesokurtic. A mesokurtic distribution looks flattercompared to a Normal distribution. As the distribution looks almost flat on top, it is calledplatykurtic .

    Fig.

    Platykurtic(Negative kurtosis)

  • 7/31/2019 A short course on Statistics, Probability and Applications

    29/70

    28

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    If a distribution has more than one peak The distribution we discussed (and we shall consider throughout) is a unimodal distribution thatmeans a distribution which has a single mode or one peak. But in many practical cases, we can

    have a distribution with many peaks or many modes. For example, a distribution with twopeaks (in the fig. below) is called a bimodal distribution.

    Figs.

    (Bimodal distribution )

    Chapter IV

    Z-Distribution

    What is a Z-distribution?A Z-distribution is nothing but a Normal distribution with the peak (mean) at zero.The peak of a Normal distribution is generally at a finite value with a standard deviation

    (say). If we consider a new variable the given Normal distribution (of variable)becomes another Normal distribution (of variable ) with the peak value at and this isthen called Z-distribution.[The derivation of Z-distribution is given in appendix for those who are interested to know.]

    For solving problems with Normal distribution, it is often advantageous to obtain a Z-

    distribution and then to consult a Z-table.In the following, we demonstrate with some examples how that is done.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    30/70

    29

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Consider the following typical situations where we have to calculate the areas from Z-distribution:

    Fig.(Total area under the curve = 1 )

    Fig.(Area between and is 0.5 or area between

    and is 0.5 because of symmetry)

    Fig. (Area between and any other value )

    Fig. (Area between two positive values of or betweentwo negative values)

    Fig.(Area between a negative value and a positive value)

    Fig.(Area less than a negative or greater than apositive value)

  • 7/31/2019 A short course on Statistics, Probability and Applications

    31/70

    30

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Important:In the z-score table we always look for the area between zero and any other value (as theintegral is actually done that way). So, zero is always the reference point.

    Finally, the area between any two values of is obtained by adding or subtracting the scoresinvolving zero. This will be clear from the following examples.

    Examples:(Some typical problems are discussed, consult the z-score table given in the appendix.)

    #1. In the Geography examination, the marks distribution is known to be Normal where themean is 52 and the standard deviation is 15. Determine the z-scores of students receivingmarks: (i) 40, (ii) 95, (iii) 52.Solution : Here,

    ,

    (i) (ii)

    (iii)

    So, we see the z-scores can be negative, positive or zero.

    #2. Find the area under the normal curve in each of the following cases:

    (i) and Area = 0.3849 from table.

    (ii) and Area = 0.2518

    (Note: The area is equal to the area between and as the curve is symmetric.)

    (iii) Area between and 2.21

  • 7/31/2019 A short course on Statistics, Probability and Applications

    32/70

    31

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Area = (area between and 2.21) + (area between and -0.46) = 0.4861 + 0.1772 = 0.6633

    (Note: The areas are added as they are on both sides of .)

    (iv) Area between and

    Required area = (area between and 1.94) (area between and 0.81)= 0.4738 0.2881 = 0.1857

    (Note: There is the subtraction as the two areas are on the same side of .)

    (v) To the left of

    Required area = 0.5 (area between and )

    = 0.5 0.2257 = 0.2743(vi) To the right of

    Required area = (area between and ) + 0.5= (area between and ) + 0.5

    = 0.3997 + 0.5 = 0.8997

    #3. Among 1000 students, the mean score in the final examination is 25 and the standarddeviation is 4.0. Assume the distribution is Normal. Find the following.

    (a) How many students score between 22 and 27?

    =25, = 4.0 ,

  • 7/31/2019 A short course on Statistics, Probability and Applications

    33/70

  • 7/31/2019 A short course on Statistics, Probability and Applications

    34/70

    33

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Chapter V

    Binomial and Poisson Distribution

    Before we discuss Binomial distribution, we should know certain basic mathematicaloperations. For those who are not familiar with some mathematical notations and rules, mayconsult the necessary introduction given in the following Box.

    Factorial : ! = For example, ! Consider that factorial of negative integers have no meaning and ! .Note that we can write ! = ! Permutation: How many different objects can be arranged among themselves? Theanswer is the permutation of objects, ! For example, for three objects A, B, C, the different combinations are ABC, ACB, BCA,BAC, CAB, CBA: total 6 ways = ! Combination: or = !! ! This is the number of ways some objects can be selected from objects.

    For example, if we want to know how 2 students can be selected from total 3 students,

    the answer is !! ! !! !.

    Also note for quick calculations, !! ! = 1, !! ! and !! ! .

  • 7/31/2019 A short course on Statistics, Probability and Applications

    35/70

    34

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Binomial Probability:

    Suppose, the probability of occurring a certain event is and not occurring of the event is. In a total of trials, the particular event occurs times each with probability and

    does not occur times each with probability, . Also, we have to know which eventswill occur out of total events. The number of ways we can do that is the number of

    combinations = . Consider a variable which is equal to the relative frequency, .

    As the events are considered independent, the joint probability will be

    The above probability is called binomial probability .

    Now consider the following table based on the binomial probability:

    ..

    ) --------

    If we add all the terms of the second row above, we get the following binomial expansion:

    (1)

    From the expression (1) above, we can easily check the following known algebraic formulas:

    .

    = ..

  • 7/31/2019 A short course on Statistics, Probability and Applications

    36/70

    35

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    The coefficients of the terms on the right of the above can be arranged in the followingtriangular form which is called Pascals triangle :

    1 1

    1 2 1

    1 3 3 1

    1 4 6 4 1

    1 5 10 10 5 1

    1 6 15 20 15 6 1

    1 7 21 35 35 21 7 1

    1 8 28 56 70 56 28 8 1

    The Rule :

    As indicated above, a number in a row (except the right and left most ones) is the sum of twonumbers on the two sides of the preceding row.

    So, from the 8 th row in the Pascals triangle we can easily write the binomial expansion:

    Remember that each term represents a binomial probability. A binomial distribution is acollection of these discrete binomial probabilities. Note:

    Example #1:

    Five independent shots are fired at a target. The probability of a hit from each shot is 0.4.

    Q. What is the probability that two shots will hit the target?

    Ans. Here , , ,

    !! !

    Q. What is the probability that there will be more than two hits?

  • 7/31/2019 A short course on Statistics, Probability and Applications

    37/70

    36

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Ans. Prob. =

    = !! !

    !! !

    !! !

    = !! !! =

    Q. What is the expectation value of the hits (that is the mean value of hitting the targets out ofall five shots)?

    Ans. For this we have to calculate the probabilities , , ,..for the corresp onding numberof hits 0, 1, 2..

    The expectation value,

    = 0 +

    =

    = 0.2592 + 0.6912 + 0.6912 + 0.3072 + 0.0512 = 2.0

    Example #2:

    Now, imagine a situation where we toss 8 coins together or we toss one coin 8 timesconsecutively. We measure the relative occurrence of Head in 8 trials. Let us attach values,Head = 1 and Tail = 0. So, we can think of a variable which can take values 1/8, 2/8, 3/8,4/8. and so on. Thus we can associate probabilities for the values of directly from Pascalstriangle (or by using formula). Note that probability of occurring Head, and not-occurring Head, .

    ,

    ,

  • 7/31/2019 A short course on Statistics, Probability and Applications

    38/70

    37

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    ,

    ,

    If we now plot against , we get the following symmetric discrete distribution with thepeak value at .

    Fig.

    For large number of trails, this distribution becomes Normal distribution. Therefore, we can saythe following:

    Poisson Distribution:

    Poisson distribution is applicable for the case when we count the events occurring in someinterval of space or time. [Usually, it is considered for extremely rare events. For rare events,the distribution becomes asymmetric and that differs largely from a symmetric Normaldistribution as that will be clear from later demonstration.] For example, if we count the

    number of phone calls received in a span of 5 minutes over a day or count the numbers of carspassing on the road in a time interval of 1 minute, we will have a distribution not quitesymmetric like Normal distribution although it is random. This distribution is Poisson and itappears like a skewed one.

    If is a variable that takes the values,

    Binomial Probability distribution for a random variable becomes Normal distributionfor a large number of trials.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    39/70

    38

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    = mean of the distribution,

    Some Characteristics of Poisson distribution:

    One interesting thing is that for a Poisson distribution , mean and variance are same. For a dataset, if mean and variance are not found approximately equal then the Poisson distribution willnot be suitable model.

    We can arrive at Poisson distribution from Binomial distribution . How?

    Consider to be the mean value out of total . We can then say that the probability of

    occurrence, and so, .

    For -trials, we write the Binomial probability :

    !

    For a Poisson Distribution:

    Mean,

    Variance,

  • 7/31/2019 A short course on Statistics, Probability and Applications

    40/70

    39

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    = !

    =

    !

    ! [We can write this as becomes large.]

    ! [In the limit of large , ]

    Note: In the above derivation, the approximation from Binomial to Poisson distribution is possibleonly when we assume very large and very small (and thus q very large). The small value of

    means a rare event!

    In the following figs. we demonstrate how a symmetric binomial distribution (which wouldbecome a Normal distribution for a large number of events) becomes a Poisson distribution as

    the value of is increased (and so is decreased).

  • 7/31/2019 A short course on Statistics, Probability and Applications

    41/70

    40

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    A Binomial distribution (with large and forhigh value) is again plotted below along withan actual Poisson distribution (continuouscurve) of appropriately chosen .

    It is often difficult to differentiate aPoisson distribution from a Normaldistribution with naked eye. The meanand the variance of the Normaldistribution is chosen suitably so as tomatch with the Poisson distribution. Aclose examination can only reveal thedifference! Look at the adjacent graph.

    Chapter VI

    Correlation and Regression

    Measure of Correlation:

    Let us first note that the variance of a set of data is given by

    (1)Variance ( ) of another set of data is likewise,

    (2)

  • 7/31/2019 A short course on Statistics, Probability and Applications

    42/70

    41

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    We can write the above two expressions in the following form:

    and

    Therefore, we can also define a similar kind of formula involving two variables,

    , (3)which is called covariance of the two sets of data.

    The linear correlation between two sets of data is defined by the following coefficient:

    The above coefficient is called Pearsons correlation coefficient . Correlation is to test howstrongly a pair of variables is related.

    Note: In many books, the correlation coefficient is written in the form, , where

    , and .

    We can have an idea of the kind of correlation between two sets of data from the ( scatter plots:

    Corr (x,y) =

    Properties of :

    The coefficient measures the strength of a linearrelationship.

    The range: +1 perfect positive linear correlation

    perfect negative linear correlation

    no correlation

  • 7/31/2019 A short course on Statistics, Probability and Applications

    43/70

    42

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Figs.

    Correlation Matrix:

    For the relations among more than two sets of variables, it is useful to present the correlationcoefficients between every two sets of variables in the form of a table. This is called correlation

    matrix

    For example, for three sets of variables, , we have the following table:X Y Z

    X 1

    Y 1

    Z 1

    Note that in the above table, we have only three different entries. The reason is that the matrixis symmetric as the correlation between and is same as between and and so on:

    , , . Also, the correlation of a variable with itself is trivial; it is always

    the perfect correlation ( = 1). So we have only three independent useful

    quantities.

    Practical Calculation of Correlation Coefficient:

    For practical calculations , we often use the following formula after multiplying by to the

    numerator and denominator of the formula for :

    Example #1

  • 7/31/2019 A short course on Statistics, Probability and Applications

    44/70

    43

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    In the following table, some values of two values and are given in two columns. Wecalculate the necessary quantities in the other columns to be put in the correlation formula.

    Here we find,

    ,

    ,

    ,

    ,

    and

    The correlation coefficient, The above calculated value of correlation coefficient is close to 1. Thus we may say, there is agood (positive) correlation between two sets of data.

    Example #2

    Calculate the correlation coefficient from the following height-weight data:

    Height(cm) 170 172 181 157 150 168 166 175 177 165 163 152 161 173 175

    Wight(kg)

    65 66 69 55 51 63 61 75 72 64 61 52 60 70 72

    Example #3

    Following is a table that represents the data for shoe sizes vs. height achieved by Olympicparticipants in a high jump event. Both the columns are measured in inches.

    Shoesize

    12.0 7.0 4.5 11.0 8.5 5.0 12.0 7.5 8.5 5.5 9.5 5.5 10.5 12.0 14.0 7.0 7.0

    height 72 64 62 70 69 65 72 65 65 65 68 61 69 77 73 65 67

    No. 1 2 5 4 25 102 4 9 16 81 363 5 11 25 121 554 6 10 36 100 605 8 12 64 144 96

    Total 25 47 145 471 257

    12.0 12.0 7.0 13.0 11.0 12.0 4.5 10.5 10.0 10.0 13.0 7.5 4.5 8.5 14.0 10.0 6.5

    71 73 64 71 71 72 61 71 66 67 73 69 61 70 75 72 66

  • 7/31/2019 A short course on Statistics, Probability and Applications

    45/70

    44

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Follow the same procedure as is done in example #1 and calculate the correlation coefficient.For a visual effect, we may have a scatter plot of the pair of data. The relationship betweenthem seems to be linear which should be well reflected in the correlation coefficient.

    Rank Correlation:

    Spearmans rank correlation coefficient between two sets of ranked variables is defined below.Suppose, the original data sets for two variables and are ranked-ordered to have two sets:

    and .

    We calculate the differences between the ranks of two sets.

    The rank correlation coefficient:

    Example:

    For the following data set, we find the ranks of them in the table.

    From the table: (last column)

    Negative correlation! The strength of correlation is medium as the value of correlationcoefficient should lie between 0 and 1.

    X Y Rank of X Rank of Y

    3 5 2 2 0 0

    6 1 3 1 2 4

    2 9 1 4 3 9

    10 8 4 3 1 2

  • 7/31/2019 A short course on Statistics, Probability and Applications

    46/70

    45

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Time Series, Auto Correlation

    What is a Time Series?

    A Time series is a set of observations generated sequentially in time. Any electrical signal, stockexchange data (the daily trading curve), ECG curve, record of temperature or humidity over aperiod etc. all are basically time series.

    A time series can tell us a lot of things about what is happening in the system and this enablesus to predict with a certain degree of accuracy.

    It is sometimes important to see if there is any cross-correlation among data points in a giventime series. Cross-correlation is nothing but the correlation between data taken at some timewith that of other time. This we call autocorrelation . This can throw some light on the hidden

    pattern inside the time series data.

    Autocorrelation:

    Remember, in the correlation formulas (on p. 34 and on p. 35) before, we considered the pairof quantities and which corresponding to the same parameter or serial number. Here wewould just have to consider a pair of values and of the same variable but at different

    times. If index corresponds to a time , will correspond to another time .Regression

    If two variables are related, that means there is a significant correlation between them; we canmake quantitative prediction of one variable for some value of the other. This is the basis ofregression analysis.

    There are two types of regression analysis :

    Linear regression when the data approximately follow a straight line

    Non-linear regression

    when there is no linear relationship exists; in general, apolynomial is considered to fit the data points.

    A regression is drawn through the scatterplot of two variables. The line is chosen so that itcomes through all the points as close as possible.

    Regression analysis is widely used for prediction and forecasting.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    47/70

    46

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Linear Regression:

    Suppose, we have a set of data for a pair of variables ( ) and we predict that the dependentvariable can be obtained from the independent variable , where they obey a linear

    regression equation: , where the coefficients and are given by the following,

    , [Derivations of the above formulas are given in appendix. ]

    So the regression equation is the line with slope and intercept which passes through thepoint [mean values].Example:

    We plot the data in example#3 above and obtain a scatterplot. Next we calculate the values of the parameters and by the above formulas. Then we can draw a straight line withthe slope = and intercept (on y-axis) = . We can examinethat this straight line superposed on the scatter data is thebest fit line for the data points. This straight line fit is alsocalled least square fit.

    Chapter VIISampling

    Basic Concept:

    What is sampling?Sampling is to take a subsection of the population for a particular study. The aim is toselect the data sample in order to represent the total data set.In statistics, population means the total collection of data. When the population or theentire collection of data is studied, it is called census .In short, population is the total set and the sample is the subset of it.

    Why the sampling is done?

  • 7/31/2019 A short course on Statistics, Probability and Applications

    48/70

    47

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    When the number of elements in a population is large it is often not possible toinvestigate the population completely due to lack of time, money and resources. This iswhy the sampling is necessary.

    Sampling is done in such a way that the subset of data represents the entire set.

    Example:If a TV channel wants to know the popularity of a program it would be expensive to askeverybodys opinion. Instead a subsection of viewers are interviewed and the data iscollected.

    Methods of Sampling:

    A sample of size means there are -data points in the collection. A sample of size is

    collected from a population of size in such a way that all the features of the population arewell represented by this.

    If a sampling method does over-represent or under-represent a feature of the population it issaid to be biased . The aim of any selection method is to reduce the chance of bias as far aspossible.

    There are several methods of sampling; among them the most common is the randomsampling .

    Random sampling:For a sample of size , we collect -data from the population. We collect many suchsamples for our evaluation. If this is done randomly so that each group of size takenfrom the population has equal chance of getting selected, we call this random sampling.Sometimes, it is called simple random sampling .For a random sampling, the successive drawings have to be independent.

    Let us suppose, we want to select a sample of size 100 from a population of size 10000.In case of random sampling, we select the elements (that is which element is to be

    picked) with the help of a random number (generated in a computer) or by consulting arandom number table or by some kind of dice throwing.

    Systematic Sampling:If simple random sampling from population is not possible, the systematic sampling maybe done. First, population is enumerated from 1 onwards. If sample size of from a

  • 7/31/2019 A short course on Statistics, Probability and Applications

    49/70

    48

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    population of size is to be obtained, every -th item is selected. First a random

    number between 1 and is selected and then it is taken as the 1 st element. After thisevery -th element is taken.

    Example:Follow the table given below.

    Stratified Sampling:In this method, the population is first divided into groups (strata). Each element of thesample belongs to one such group.Divide the population into non-overlapping groups each containing , data suchthat . Next do the simple random sampling to collect one ora few elements from each group.Suppose, a population is classified into several groups according to age or something

    like that. Then from each group random samples are collected.Note: This is also called restricted random sampling .

    Cluster Sampling:In this method, like before, the population is divided into groups called clusters. Thenclusters are taken randomly and the elements are collected from them as sample.

    Sl no. value

    1 20

    2 27

    3 33

    4 215 15

    6 22

    7 45

    8 13

    9 32

    10 29

    11 10

    12 16

    For a sample of size

    Select a random number between 1-3: choose 2, for example.

    Start with #2 and then take 5, 8, 11number data.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    50/70

    49

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Probability samplingAny method of sampling that uses (probabilistically) random selection is in generalcalled probability sampling .

    Sampling variation:When sampling from a population is done, we take not one sample but different sets ofsamples having same size. If the samples are different, we call this sampling variation .

    Usually in practice, we often draw only one sample or one set of data from a population.But we may not be sure what may happen in case we draw several other samples. Will

    we get the same result? The answer is No. If we look for mean value, we see that themean is not the same for all the samples that we are able to draw. We then get somedistributions of the sample means.

    population size, sample size, = the sample fraction. Many samples of the same size yield a sampling distribution. The sampling distributions are usually assumed to follow any well-known probability

    distribution.

    We look for various properties from the distribution curves. It is seen how the variation of sample size can affect the properties. From the experience and theory, we can say that the variability of sampling

    distributions decreases with sample size.

    Chapter VIIISAMPLING DISTRIBUTIONS

    What do you do after the sample is collected?

    The first thing one can do with a set of data is to measure the central tendency of it. Usually, wecalculate the mean and variance.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    51/70

    50

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    The calculation of mean (or variance) is done over many samples of same sizes. Let us suppose,we have collected -samples of same size. The mean values , , .of thevarious samples are calculated. It is assumed that the grand mean of all these mean values is

    the actual sample mean, .

    The mean of the sample means is the estimate of the population mean. Similarly, the varianceof the mean values calculated from the set of samples (of equal size) is an estimate of thepopulation variance.It can be shown:

    Hypothesis Testing

    What is Hypothesis?

    On the basis of sample information, we make certain decisions about the population. In takingsuch decisions we make certain assumptions. These assumptions are known as statisticalhypothesis. [ Note: A collected set of data points which is a part of the population (a few number of data)is called a sample . The process of selection is called sampling . When all the data are consideredfor a study, this is called population .]

    Sample mean is the unbiased estimate of population mean, .

    For the population variance, the unbiased estimate is

  • 7/31/2019 A short course on Statistics, Probability and Applications

    52/70

    51

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    How to test Hypothesis?Assuming the hypothesis correct, we calculate the probability of getting the observed sample. Ifthis probability is less than a certain assigned value, the hypothesis is rejected .

    If there is no significant difference between the observed value and the expected value, thehypothesis is called Null Hypothesis .

    Test of significance:The tests which enable us to decide whether to accept or to reject the null hypothesis are calledthe tests of significance. If the differences between the sample values and the populationvalues are significantly large it is to be rejected (i.e ., Hypothesis is not Null).

    It is known that the mean of a sample is an unbiased estimate of the population mean . It iscalled point estimate. But we know, if we collect different samples, the mean (

    ) varies fromsample to sample. Mean of samples form a distribution which we call sampling distribution .Note that the sampling distribution is Normal if the variable in the population is normallydistributed.

    Now the question is, how close is a calculated mean to the population mean? We have toestimate that with some level of accuracy.

    Confidence Interval:

    Confidence interval is a range of values over which we can trap the population mean with some

    probability. So, we consider the probability distribution of sample means in order to find thatprobability of trapping.

    Suppose, we have a sample mean and we consider a symmetric interval around this: where is a value that we shall determine.

    If | | , the confidence interval traps the population mean .How to calculate confidence interval?

  • 7/31/2019 A short course on Statistics, Probability and Applications

    53/70

    52

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Suppose, the variable follows a Normal distribution, with mean and standard deviation .

    Symbolically, So, for a sample if size , .This mean that the distribution of mean ( ) of sample size follows a Normal distribution withmean and standard deviation .If the confidence interval is 95%, the interval has a probability 0.95 to trap the populationmean: | | Now as an example, consider a sampling distribution with , .

    Here follows z-distribution, Z . [Normal distribution with mean = 0, stand dev. = 1]

    Now let us look up the z-table. The total area under the curve is 100% which gives us the totalprobability = 1. The shaded area (as in the fig.) is 95% of the total area which corresponds toprobability = 0.95.

    The half of the shaded area = 0.95/2 = 0.475 as it is symmetric around zero.

    In the z-distribution [ , we now find the value of from z-table, where thearea from to is 0.475.

    as we consider the critical value, .

  • 7/31/2019 A short course on Statistics, Probability and Applications

    54/70

    53

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Thus 95% confidence interval:

    If the sample mean is , the confidence interval: So we can say with 95% confidence level that the population mean can be in this interval.

    Let us now calculate the width of the confidence interval for 95% confidence:

    So, we can see that the interval decreases with the increase of sample size. That is we cannarrow down the search of the population mean as we take larger sample size. Then we can saywith more accuracy that our measured mean is closer to the population mean.

    For example, for ,

    For ,

    ,

    For 98% Confidence interval:

    Shaded area = 0.98. Half the shaded area = 0.98/2 =0.49 which is between and

    Thus 98% confidence interval is

    * +.

    NOTE #1:Symbolically, it often said that the confidence level is , where .This also means significant level .

    For example, for

    NOTE: For a sample of size = , with population variance , a 95% confidenceinterval means * +

  • 7/31/2019 A short course on Statistics, Probability and Applications

    55/70

    54

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Confidence level = Significance level = Confidence level and significance levels are complimentary .

    NOTE #2:When we are not sure if the population is Normal and we do not know the population variance

    , we can still use the method of calculating the confidence interval by considering thevariance of a large sample (usually ).

    Then we consider the interval, * +.

    Student s -test:

    This is applied to find confidence interval for a small sample. The population is Normal.

    Calculation of -parameter:Consider the variable defined as

    [Note here, we use , calculated for the sample, instead of .]

    The values of the variable varies from samp le to sample and thus it forms a distributionlooking very similar to Normal distribution. This is t-distribution . As we take larger and largersamples, the t-distributions more and more become closer to a Normal distribution, ,which is nothing but z-distribution.

    Confidence

    Level

    z

    90% 1.645

    95% 1.96

    98% 2.326

    99% 2.576

  • 7/31/2019 A short course on Statistics, Probability and Applications

    56/70

    55

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Now instead of sample size, the family of distributions are characterized by a parameter calleddegrees of freedom (df), usually denoted by [Nu].

    Degrees of freedom = No. of independent values used for calculation of .

    For example, if is the sample size, we use -data points but they are related by their mean,

    or . Such a condition in the form of a relation or equation is called aconstraint. Thus we have independent quantities and this is degrees of freedom here.

    Degrees of freedom, Number of values Number of constraintsIn this case,

    Note:Generally, more parameters we have, more accurately we can fit a model distribution with thedata. For each estimate we make in calculation, we remove one degree of freedom. So if weknow that the actual population distribution is Normal, taking more independent observations(df higher) will make the t-distribution tending towards the Normal distribution.

    Demonstrations of t-Distributions:Here we do an experiment through computer. We used a computer code (written in Fortranlanguage, appended in the Appendix) to generate numbers that follow Normal distribution

    having some mean and standard deviation. Now we may call the entire set of data thusgenerated to be our population. Next, we randomly select a few numbers from this population(through a random number generator), which we can be called as random sample. Many suchrandom samples are drawn. If we now use the known mean of population, (the expectationvalue) and calculate the mean ( ) and standard deviation ( ) of each sample, we can calculatethe t-parameters from the formula as given above. The number of free variables or the degreeof freedom ( df ) of a sample is one less the sample size since the only expected value that isconsidered is the population mean. For demonstration, we took samples of sizes, and so we have the degrees of freedoms, respectively. The probability distributionsare thus made from the -parameters thus created from a sample with some df . So we havemany t- distributions with different df . It is clear from the adjoining graph that higher thevalues of df, better is the t-distribution tending towards the Normal distribution of the

    population .[Note : The Normal distribution plotted in the graph for comparison is actually a Z-distribution,that is the Normal distribution with mean = 0 and standard deviation = 1.]

  • 7/31/2019 A short course on Statistics, Probability and Applications

    57/70

    56

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Fig: A set of t-distributions plotted with the Normal distribution (with mean = 0 and std. dev.=1) for comparison.

    The -distributions are now designated as -distributions. As is higher the -distributionstend more and more towards z-distribution.Like z-table, we now have -table to consult, from where we have the area under the curve withsome -range.

    So, for a Normal distribution, for a sample of size , we have confidence interval:

    * + for a confidence level, for -degrees of freedom.Whats Next? Once the t-parameter or -distribution is achieved from the samples we collected, we need to

    test it with the assumed Normal distribution of the population with a level of significance.

    Now this comparison can be done considering one side of the Normal distribution, it is calledone tail t-test. When both sides of the distribution are considered, it is called two tail t-test.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    58/70

    57

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    EXAMPLE 1:

    Consider the following 10 measurements of some variable. The hypothesis is that thepopulation mean is . We have to verify that. Assume that the readings follow a Normal

    Distribution. No. ofObs.

    1 2 3 4 5 6 7 8 9 10

    Values 0.13 -0.09 0.06 0.15 -0.02 0.03 0.01 -0.02 -0.07 0.05

    Degrees of freedom, From -table for with 95% confidence level, we have .

    Confidence interval: * + The mean is trapped inside the above interval. So the hypothesis is right. Nullhypothesis.

    EXAMPLE 2:The mean life time (in Hours) of an electric bulb is measured to be 10.4. Now a technology isintroduced to increase the life time. The experimental data collected from a random sample ofsize , , . Test whether there is any evidence at the 10%significance level that the new technology has actually increased the life time.[Note that it is not asked if there is any decrease in life time. The question is to ask whetherthere is any increase or it remains the same.]

    Did you know?

    The t -distributions were discovered by William S. Gosset in 1908. Gosset was a statistician at theGuinness brewing company with which the part of the agreement was that he would not publishunder his own name. He therefore w rote under the pseudo name, Student. Hence it is the name,Students t -distribution .

  • 7/31/2019 A short course on Statistics, Probability and Applications

    59/70

    58

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Ans. Null hypothesis, , Alternate hypothesis, Here we consider one tail t-test as we are to look for the increase only.

    Sample mean,

    Unbiased estimate of the population variance (from the sample),

    * + For the t-test,

    Here, degrees of freedom , . So we look for area under the curve for distribution.

    For 10% significance, i.e. for 90% confidence level, we find . Thus ourobserved value lies in the rejection region. That means that the mean life time is increased.Alternate hypothesis.

    EXAMPLE 3:You are measuring some length which is 10 cm. Five measurements by you are 9.88, 10.18.

    10.23, 10.39, 10.25 cm. Assume that the measurements follow a Normal distribution. Test atthe 5% significance level whether there results support the claim or it is biased.

    Ans.Since the bias can be in either direction (positive or negative), we consider two tail test.The Hypothesis, Null cm, Alternate cm.Sample mean,

  • 7/31/2019 A short course on Statistics, Probability and Applications

    60/70

    59

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Variance, * + , this is an unbiasedestimate of the population mean.

    For 5% significance level, we consider the area of 0.95 (shaded area in the fig.) around thecentre, and an area of 0.025 on both sides (at both the tails).

    We consider distribution as the degrees of freedom, . The rejection region oneither sides corresponds to , from the table.

    Here we find that the t-value is below the rejection region that is in the acceptance region. Thusthe hypothesis ( cm) is accepted. Null hypothesis.

    NOTE: In one tail t-test, we consider only one side of the t-distribution, either on the right side (forincrease or positive values) or on the left side (for decrease or negative values). For two tail t-test, we consider both sides of the distribution (as we have done before) considering the fact thatthe value of the variable can increase or decrease from the mean value.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    61/70

    60

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Chi-Squared Test: ( -Test)In some measurement, we obtain the frequencies of some events. We call them observedfrequencies (

    ). We have to test whether the observed frequencies are consistent with the

    expected frequencies according to some given distribution or hypothesis.

    The measure of discrepancy between the observed and expected frequencies is defined by thefollowing quantity:

    Note: (Chi-square) is a positive quantity, lower its value better is the agreement between theobserved and expected frequencies. In other words, it gives a goodness of fit of the model or

    hypothesis. For , the agreement is absolute.Like t-distribution, we do also have -distribution. We measure the values for different samples ofsame size and obtain a distribution. The distribution, here also, is characterized by the degrees of

    freedom . So for we write ,

    EXAMPLE 1:In a dice throw experiment, we obtain the following fig. where the dice was thrown 600 times.

    Score 1 2 3 4 5 6Freq. 90 108 110 95 100 97

  • 7/31/2019 A short course on Statistics, Probability and Applications

    62/70

    61

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Let us check the above with respect to -test. Our hypothesis is that for each score, theprobability = 1/6 (for a fair dice). So the expected frequency = .

    Hypothesis, the dice is fair, the dice is not fair.In this example, the degrees of freedom, .So after we calculate from the following table, we have to look for the -table.To calculate

    :

    From the table, we see . If we consider 90% confidence level, we have ( ). Our obtained value for is below this. So it falls within the acceptance region. The dice is fair,

    the hypothesis null.

    EXAMPLE 2:In a genetic study, it is predicted that the children with both parents of blood group AB will fallinto blood groups AB, A and B in the ratio 2:1:1. Out of a random sample 100, we find 55children have blood group AB, 27 have blood group A and 18 blood group B. Test at 10%

    significance level whether the observed results agree with the theoretical prediction.

    Ans.Hypothesis The childrens blood group is in ratio 2:1:1

    The childrens blood group is NOT in ratio 2:1:1

    The ratio of probabilities AB, A, B is 2:1:1 =

    Score 1 90 100 -10 12 108 100 8 0.64

    3 110 100 10 1

    4 95 100 -5 0.25

    5 100 100 0 0

    6 97 100 -3 0.09

    Total 600 600 0 2.98

  • 7/31/2019 A short course on Statistics, Probability and Applications

    63/70

    62

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Degrees of freedom

    For 10% significance level we look for -distribution table: ( ) We find The rejection region is thus above .Here the obtained value of is below the rejection region, so it falls in the acceptance region.The hypothesis is correct. Null Hypothesis!

    EXAMPLE 3The rain fall ( ) at some place is measured in cm in the following table. We assume that is arandom variable and it follows a Normal distribution with mean and standard deviation

    .

    (i) Calculate the expected frequencies of the different classes(ii) Carry out a goodness of fit analysis to test at the 5% level of significance and test the

    hypothesis that the random variable actually follows the Normal distribution

    .Ans.

    (i)

    For 35, 45, 55, 65 we have -1, -0.333,

    0.333, 1 respectively.

    Now Follow z-table.

    Bloodgroup

    AB 55 50 5 0.5

    A 27 25 2 0.16AB 18 25 -7 1.96Total 100 100 0 2.62

    65Obs. Freq. 10 18 28 18 12

  • 7/31/2019 A short course on Statistics, Probability and Applications

    64/70

    63

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    For , we have ,

    Expected frequency =

    Here, total frequency =

    For , ,

    Expected frequency =

    For , ,

    Expected frequency =

    By symmetry, the expected frequencies for the 4 th and 5 th groups are 18.14 and 13.65respectively.

    To carry out -test we prepare the following table.Class 65 12 13.65 -1.65 0.2

    Total 86 86.01 0 Here,

  • 7/31/2019 A short course on Statistics, Probability and Applications

    65/70

    64

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    From the -distribution table, , for 5% significance level.Since, 2.56 is not in the rejection region, the data follows Normal distribution, .Null hypothesis.

    Additional Information

    Type I and Type II errors:In case of Hypothesis testing, we callType I Error -> When we incorrectly reject the true Null Hypothesis.Type II Error -> When we fail to reject the false Null Hypothesis.

    Probability Density Function:

    In Probability theory, the probability density function (P.D.F.) of a continuous random variableis the probability around a certain value or probability in a unit interval. P.D.F. when integratedover a finite interval gives the cumulative probability.

    , P.D.F.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    66/70

    65

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    ----------------Appendix-----------------

    How Z-distribution is obtained from Normal distribution:

    Mathematical Expression for Normal distribution:

    , (1) where mean, = standard deviation. The above expression is symmetric around the mean,

    .[The value of the exponential, ]

    Normal distributions are often referred by the symbol:

    The total area under the curve,

    = 1.If we put , we get

    Thus we can write, the rescaled probability,

    (2) Now the above is a symmetric distribution around .

    So the Normal distribution (1) has become a Z-distribution in (2). This i s nothing but a normaldistribution with mean = 0 and standard deviation = 1.

    We have to remember that the area under the curve between values of gives us the total

    probability:

    Area = .Now instead of actually doing the integration over , we are supplied with the -score and wefind the area under the curve (hence the total probability) between two limits from the table.(See the z-score table.)

  • 7/31/2019 A short course on Statistics, Probability and Applications

    67/70

    66

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    Least Square Fit(Regression Formulas)

    Let us think that we are about to fit the set of data by a straight line.The equation of a straight line is bmx y Consider the data points ( 11 , y x ), ( 22 , y x ), ( 33 , y x ).etc. If we know the two parameters

    and , we can draw a st. line with them.

    Error is defined as 21

    )(),( iin

    i

    yb xmbm .

    For the best fit, this error should be minimum.

    Therefore, we must have 0m

    and 0

    b

    .

    [We take partial derivatives of the error function with respect to the parameters.]

    Now, 21

    )(n

    iii ybmxmm

    = 2

    1

    )( iin

    i

    yb xmm

    = )().(21

    iiii

    n

    i

    ybmxm

    yb xm

    = iin

    ii x ybmx )(2

    1

    = in

    ii

    n

    ii

    n

    ii x y xb xm

    111

    222 = 0 (1)

    Similarly, n

    i

    i

    n

    i

    i ynb xmb 110

    (2)

    From (1) and (2),

    n

    iii

    n

    ii

    n

    ii

    n

    ii

    n

    ii

    x y

    y

    m

    b

    x x

    xn

    1

    1

    1

    2

    1

    1

    Slope,

    n

    i

    n

    iii

    n

    i

    n

    i

    n

    iiiii

    x xn

    x y y xnm

    1

    2

    1

    2

    1 1 1

    )(

    and Intercept,

    n

    i

    n

    iii

    n

    i

    n

    i

    n

    iiii

    n

    iii

    x xn

    y x x x yb

    1

    2

    1

    2

    1 1 11

    2

    )(

    .

    Example:For the data points (1,2), (2,3), (3,4), (4,5)

    4n , 1043211

    n

    i

    i x , 145432

    1

    n

    ii y

  • 7/31/2019 A short course on Statistics, Probability and Applications

    68/70

    67

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    40544332211

    n

    iii y x , 3044332211

    1

    2n

    i

    i x

    120

    20

    100120

    140160

    10304

    10144042

    m , 120

    20

    100120

    400420

    10304

    401030142

    b .

    FORTRAN Program:

    C Least Square fitC

    open(1,file='xy.dat')open(2,file='fit.dat')write(*,*)'Number of Points?'read(*,*)nsumx=0.0sumy=0.0sumsqx=0.0sumxy=0.0write(*,*)'Give data in the form: x,y'do i=1,n

    read(*,*)x,ywrite(1,*)x,ysumx=sumx+xsumy=sumy+ysumsqx=sumsqx+x*xsumxy=sumxy+x*y

    enddo

    deno=n*sumsqx-sumx*sumxslope=(n*sumxy-sumx*sumy)/denob=(sumsqx*sumy-sumx*sumxy)/denowrite(*,*)'Slope, Intercept= ',slope,b

    Cwrite(*,*)'Give a lower and upper limits of X'read(*,*)xmin, xmaxx=xmindx=(xmax-xmin)/2.0do i=1,3

    y=slope*x+bwrite(2,*)x,y

    x=x+dxenddostopend

    For the Least Square Fit of given data points. The straight line is drawn with the values of theslope and the intercept obtained from the program.

  • 7/31/2019 A short course on Statistics, Probability and Applications

    69/70

    68

    Statistics, Probability and Applications by - Dr. Abhijit Kar Gupta , [email protected]

    z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

    0.1 0.03983 0.04380 0.04776 0.05172 0.05567 0.05962 0.06356 0.06749 0.07142 0.07535

    0.2 0.07926 0.08317 0.08706 0.09095 0.09483 0.09871 0.10257 0.10642 0.11026 0.11409

    0.3 0.11791 0.12172 0.12552 0.12930 0.13307 0.13683 0.14058 0.14431 0.14803 0.15173

    0.4 0.15542 0.15910 0.16276 0.16640 0.17003 0.17364 0.17724 0.18082 0.18439 0.18793

    0.5 0.19146 0.19497 0.19847 0.20194 0.20540 0.20884 0.21226 0.21566 0.21904 0.22240

    0.6 0.22575 0.22907 0.23237 0.23565 0.23891 0.24215 0.24537 0.24857 0.25175 0.25490

    0.7 0.25804 0.26115 0.26424 0.26730 0.27035 0.27337 0.27637 0.27935 0.28230 0.28524

    0.8 0.28814 0.29103 0.29389 0.29673 0.29955 0.30234 0.30511 0.30785 0.31057 0.31327

    0.9 0.31594 0.31859 0.32121 0.32381 0.32639 0.32894 0.33147 0.33398 0.33646 0.33891

    1.0 0.34134 0.34375 0.34614 0.34849 0.35083 0.35314 0.35543 0.35769 0.35993 0.36214

    1.1 0.36433 0.36650 0.36864 0.37076 0.37286 0.37493 0.37698 0.37900 0.38100 0.38298

    1.2 0.38493 0.38686 0.38877 0.39065 0.39251 0.39435 0.39617 0.39796 0.39973 0.