Sampling distribution

A PRESENTATION ON

Sampling Distribution

A Brief Explanation

SAMPLING DISTRIBUTION

There are three distinct types of distribution of data which are –1.Population Distribution, characterizes the distribution of elements of a population 2.Sample Distribution, characterizes the distribution of elements of a sample drawn from a population3.Sampling Distribution, describes the expected behavior of a large number of simple random samples drawn from the same population.Sampling distributions constitute the theoretical basis of statistical inference and are of considerable importance in business decision-making. Sampling distributions are important in statistics because they provide a major simplification on the route to statistical inference.

DEFINITION A sampling distribution is a theoretical probability

distribution of a statistic obtained through a large number of samples drawn from a specific population ( McTavish : 435)

A sampling distribution is a graph of a statistics(i.e. mean, mean absolute value of the deviation from the mean,range,standard deviation of the sample, unbiased estimate of variance, variance of the sample) for sample data.

Sampling distribution is a theoretical distribution of an infinite number of sample means of equal size taken from a population . ( Walsh : 95)

CHARACTERISTICS

Usually a univariate distribution.

Closely approximate a normal distribution.

Sample statistic is a random variable – sample mean , sample & proportionA theoretical probability distribution

The form of a sampling distribution refers to the shape of the particular curve that describes the distribution.

FUNCTIONS OF SAMPLING DISTRIBUTION

Sampling distribution is a graph which perform several duties to show data graphically.Sampling distribution works for :MeanMean absolute value of the deviation from the meanRangeStandard deviation of the sampleUnbiased estimate of the sampleVariance of the sample

WHY SAMPLING DISTRIBUTION IS IMPORTANT????

PROPERTIES OF STATISTICS

SELECTION OF DISTRIBUTIO TYPE TO MODEL SCORE

HYPOTHESIS TESTING

i)Properties of Statistic : Statistic have different properties as estimators of a population parameters. The sampling distribution of a statistic provides a window into some of the important properties. For example if the expected value of a statistic is equal to the expected value of the corresponding population parameter, the statistic is said to be unbiased Consistency is another valuable property to have in the estimation of a population parameter, as the statistic with the smallest standard error is preferred as an estimator estimator A statistic used to estimate a model parameter.of the corresponding population parameter, everything else being equa.l

ii) Selection of distribution type to model scores : The sampling distribution provides the theoretical foundation to select a distribution for many useful measures. For example, the central limit theorem describes why a measure, such as intelligence, that may be considered a summation of a number of independent quantities would necessarily be distributed as a normal (Gaussian) curve.

iii) Hypothesis Testing : The sampling distribution is integral to the hypothesis testing procedure. The sampling distribution is used in hypothesis testing to create a model of what the world would look like given the null hypothesis was true and a statistic was collected an infinite number of times. A single sample is taken, the sample statistic is calculated, and then it is compared to the model created by the sampling distribution of that statistic when the null hypothesis is true. If the sample statistic is unlikely given the model, then the model is rejected and a model with real effects is more likely.

TYPES OF SAMPLING DISTRIBUTIONThe types of sampling distribution are as follows:1) Sampling Distribution of the Mean:Sampling distribution of means of a population data is defined as the theoretical probability distribution of the sample means which are obtained by extracting all the possible samples having the same size from the given population.Given a finite population with mean (m) and variance (s2). When sampling from a normally distributed population, it can be shown that the distribution of the sample mean will have the following properties -

CENTRAL LIMIT THEOREM

The central limit theorem, first introduced by De Moivre during the early eighteenth century, happens to be the most important theorem in statistics. According to this theorem, if we select a large number of simple random samples, for example, from any population distribution and determine the mean of each sample, the distribution of these sample means will tend to be described by the normal probability distribution with a mean µ and variance /n.Or in other words, we can say that, the sampling distribution of sample means approaches to a normal distribution.Symbolically, the theorem can be explained as following :

When given n independent random variables ,,,….. which have the same distribution ( no matter what distribution),then : X =

is a normal variate. The mean µ and variance of X are

=

where are the mean and variance of

UTILITY : The utility of this theory is that it requires virtually no conditions on distribution patterns of the individual random variable being summed. As a result, it furnishes a practical method of computing approximate probability values associated with sums of arbitrarily distributed independent random variables. This theorem helps to explain why a vast number of phenomena show approximately a normal distribution. Because of its theoretical and practical significance, this theorem is considered as most remarkable theoretical formulation of all probability laws. However, most of hypothesis testing and sampling theory is based on this theorem. So the central limit theorem is perhaps the most fundamental result in all of statistics.

2) SAMPLING DISTRIBUTION OF THE PROPORTION :

Sampling distribution of the proportion is found when the sample proportion and proportion of successes are given.

Properties :

Sample proportion tend to target the value of proportion. Under certain conditions, the distribution of sample proportion can be approximated by a normal distribution.

Example:Sample distribution of the proportion of the girls from sample space for two randomly selected births:bb,bg,gb,ggAll four outcomes are equally likely:Probabilities: P(0 girls)=0.25 P(1 girl)=0.50 P(2girls)=0.75

STANDARD ERROR OF THE SAMPLING DISTRIBUTION The sampling distribution has a standard deviation. The mean of the sampling distribution will be the same as the population mean, but the standard deviation will be smaller than the Population Standard Deviation. The standard deviation of the sampling distri bution has a special name : ‘The Standard Error’ or sometimes ‘The Standard Error of the Mean . The variation of sample mean around the population mean is the sampling error and is measured using a statistic known as the standard error of the mean. This is an estimate of the amount that a sample mean is likely to differ from the population mean. This consideration is important because sampling theory tells us that 68% of all sample means will lie between + or – one standard error from the population mean. And that 95 % of all sample mean will lie between + or – 1.96 standard errors from the population mean (Bryman,Alan,2004, P: 96 ) .

Formula : The standard error of a sampling distribution is equal to the standard deviation of the population divided by the square root of the sample size. The formula of the standard error is as follows : = σ/ Here, = Standard deviation of sample mean . = Standard deviation of population . Total Population .How to reduce Error : When sample size increases, sampling error decreases .

Purpose :

1. Allows us to quantify the extent to which a ‘test’ provides accurate scores.2. If the standard error is smaller,the range of population mean will be narrower.3. When standard error is larger, the range of population mean will be wider

Application : 95 % CI = Mean ( 1.96 SEM ) 99 % CI = Mean ( 2.58 SEM )

STANDARD ERROR TABLE

SAMPLING DISTRIBUTION

STANDARD ERROR SAMPLING DISTRIBUTION

STANDARD ERROR

MEANS = FIRST & THIRD QUARTILES

= =

PROPORTIONS = = SEMI-INTERQUARTILE RANGESS

=

STANDARD DEVIATIONS

= =

VARIANCES = =

MEDIANS =σ COEFFICIENTS OF VARIATION

=

Point & Interval Estimates There are two kinds of estimates of population parameters from sample statistics :

A point estimate is a single value and an interval estimate is a range of values.

POINT ESTIMATES

INTERVAL ESTIMATES

POINT ESTIMATION :

A point estimate of a population parameter is a single value of a statistic.

For example,the sample mean ¯x is a point estimate of the population mean μ. Similarly, the sample proportion p is a point estimate of the population proportion P. Interval Estimation :

An interval estimate is defined by two numbers, between which a population parameter is said to lie.

For example a < x < b is an interval estimate of the population mean μ. It indicates that the population mean is greater than a but less than b. In any estimation problem, we need to obtain both a point estimate and an interval estimate. The point estimate is our best guess of the true value of the parameter, while the interval estimate gives a measure of accuracy of that point estimate by providing an interval that contains plausible values.

MATHEMATICAL PROBLEMS Sampling Distribution of means Prob. 1 : A population consists of the five numbers 2,3,6,8 and 11. Consider all possible samples of size 2 that can be drawn with and without replacement from this population .a)The mean of the population.b)The standard deviation of the population .c)The mean of the sampling distribution of means.d)Standard deviation of the sampling distribution of means (the standard error of means ).

# Answer :a) Mean of the population = = = 6b)Standard deviation of population ,= = = = = 10.8

With replacement :c)There are 5(5)= 25 samples of size 2 that can be drawn with replacement. These are : (2,2) (2,3) (2,6) (2,8) (2,11) (3,2) (3,3) (3,6) (3,8) (3,11) (6,2) (6,3) (6,6) (6,8) (6,11) (8,2) (8,3) (8,6) (8,8) (8,11)

The corresponding sample means are : 2.0 2.5 4.0 5.0 6.5 2.5 3.0 4.5 5.5 7.0 4.0 4.5 6.0 7.0 8.5. 5.0 5.5 7.0 8.0 9.5 6.5 7.0 8.5 9.5 11.0And the mean of sampling distribution of mean is , = = = 6.0

Illustrating the fact that = µ

d) Here, standard deviation of the sampling distribution of mean is, x =( substracting the mean 6 from each numbers, squaring the result, adding all 25 numbers thus obtained and dividing by 25 ) = = 5.40 σx = This illustrates the fact that for finite populations involving sampling with replacement , x = - since the right hand side is 10.8/2 = 5.40 ; agreeing with the above value . Without Replacement: c) There are 10 samples of size 2 that can be drawn without replacement from the population :

(2,3) (2,6) (2,8) (2,11) (3,6) (3,8) (3,11) (6,8) (6,11) (8,11)

The corresponding sample means are : 2.5, 4.0 , 5 , 0 , 6.5 , 4.5 , 5.5 , 7.0 , 7.0 , 8.5 , 9.5 .

The mean of sampling distribution of means is , = = 6.0 = µ(d) The variance of sampling distribution of mean is , x = = 4.05 And, = 2.01this illustrates, x = () = ( ) =4.05As obtained above .

SAMPLING DISTRIBUTION OF PROPORTIONSProb. 2 : Find the probability that in 120 tosses of a fair coin , a)Between 40 % and 60 % will be heads and b)5/8 or more will be heads .

Answer: We consider the 120 tosses of the coin to be simple from the infinite population of all possible tosses of the coin. In this population the probability of heads is p=1/2 and the probability of tails is q= 1-p = ½

a) = = = 0.045640 % in standard units = = -2.1960 % in standard units = = 2.19Required probability = (area under normal curve between z= -2.19 and z= 2.19 ) = 2 ( 0.4857 ) = 0.9714 Although this result is accurate to two significant figures, it does not agree exactly since we have not used the fact that the proportion is actually a discrete variable. To account for this, we subtract ½ N = ½ (120) from 0.40 and add ½ N = ½ (120) to 0.60 ; thus, since 1/240 = 0.00417, the required proportions in standard units are, = -2.28 and = 2.28

b) According to (a) since 5/8 = 0.6250(0.6250 – 0.00417 ) in standard units = = 2.65

Required probability = ( area under normal curve to right of z=2.65 ) =(area to right of z = 0) – (area between z=0 and z= 2.65 ) = 0.5 – 0.4960 =0.0040 .

REFERENCES :1.Statistics For The Social Sciences with Computer Applications – Anthony Walsh2.Schaum’s Outline of Theory and Problems of STATISTICS – Murray R. Spiegel3.Business Statistics – SP Gupta & MP Gupta 4.Descriptive and Inferential Statistics – An introduction - Herman J Loether & Donald G McTavish

Sampling distribution

Data & Analytics

Transcript of Sampling distribution