Sampling distribution concepts
-
Upload
umar-sheikh -
Category
Education
-
view
2.519 -
download
2
Transcript of Sampling distribution concepts
UNIT-V
Population & Sample
Population Sample
Population in statistics means the whole of the information that comes under the purview of statistical investigation.
It is the totality of all the observations of a statistical inquiry.
It is also known as “UNIVERSE”
A population may be finite or infinite
A part of the population selected for study is called a SAMPLE.
Hence, Sample is nothing but the selection of a group of items from a population in such a way that this group represents the population.
The number of individuals included in the finite sample is called the SIZE OF THE SAMPLE.
Parameter & Statistic
Parameter Statistic
Any statistical measure (such as mean, mode , S.D.) computed from population data is known as PARAMETER.
Any statistical measure computed from sample data is known as STATISTIC.
STATISTIC computed from a sample drawn from the parent population plays an important role in
A) The Theory of Estimation
B) Testing of Hypothesis
Notations used
Notations
Statistical Measure Population Sample
Mean µ X
Standard deviation σ S
Size N n
Sampling & Sampling TheorySampling Sampling theory
It is the process of selecting a sample from the population.
Sampling can also be defined as the process of drawing a sample from the population & compiling a suitable statistic in order to estimate the parameter drawn from the parent population & to test the significance of the statistic computed from such sample.
Sampling theory is based on Sampling
It deals with statistical inferences drawn from sampling results, which are of three types:
i. Statistical Estimation,
ii. Tests of significance, and
iii. Statistical inference
Objects of Sampling theory To estimate population parameter on
the basis of sample statistic. To set the limits of accuracy &
degree of confidence of the estimates of the population parameter computed on the basis of sample statistic.
To test significance about the population characteristic on the basis of sample statistic.
Methods of Sampling
Random (Probability) Sampling
Non-random Sampling
Simple Random sampling
Stratified Sampling
Systematic Sampling
Multi-stage Sampling
Judgment Sampling
Quota Sampling
Convenience Sampling
Random Sampling Methods
Simple Random sampling
This method refers to the sampling technique in which each and every item of the population is given a chance of being included in the sample;
The selection is free from personal bias;
This method is also known as method of chance selection.
It is sometimes also referred to as “representative sampling” (if the sample is chosen at random and if the size of the sample is sufficiently large, it’ll represent all groups in the population)
Contd..
It is a probability sampling because every item of the population has an equal opportunity of being selected in the sample;
Methods of obtaining a Simple Random Sample:
1. Lottery method2. Table of random numbers ( a number of
random tables are available such as Tippets table; Fisher and Yates numbers; Kendall and Babington Smith numbers)
Stratified Sampling
It is one of the restricted random methods which by using available information concerning the data attempts to design a more efficient sample than that obtained by the simple random procedure;
The process of stratification requires that the populationmay be divided into homogeneous groups or classes called strata
then a sample may be taken from each group by simple random method
And the resulting sample is called a stratified sample
Contd..
A stratified sample may be either proportional or disproportionate.
In a proportional stratified sampling plan, the number of items drawn from each stratum is proportional to the size of the strata.
For example, if the population is divided into 4 strata, their respective sizes being 15, 10,20 ,55 % of the population and a sample of 1000 is to be drawn, the desired proportional sample may be obtained in the following manner:
Contd..
From stratum one 1000 (0.15) 150 items
From stratum two 1000 (0.10) 100
From stratum three 1000 (0.20) 200
From stratum four 1000 (0.55) 550
Sample Size 1000
Disproportionate Stratified sampling includes procedures of taking an equal number of items from each stratum irrespective of its size.
Systematic Sampling
This method is popularly used in such cases where a complete list of the population from which sampling is to be drawn is available;
The method is to select every kth item from the list where ‘k’ refers to the sampling interval;
k = size of population / sample size (N/n);
The starting point between the first & the kth is selected at random
Contd..
For example, if a complete list of 1000 students is available and we want to draw a sample of 200 students; this means we must take every 5th item.
But the first item between one and five shall be selected at random.
Let it be three, now we shall go on adding 5 & obtain numbers of desired sample.
Cluster Sampling
It is different from stratified sampling in a way that each strata consists of homogeneous items but the groups in clusters are mutually exclusive and not exactly homogeneous;
Multi- stage sampling is a type of cluster sampling;
Multi-stage Sampling
As the name suggests this method refers to a sampling procedure which is carried out in several stages;
The material is regarded as made up of a number of first stage sampling units, each made up of a number of second stage units;
At first the first stage units are sampled by some suitable method such as random sampling, then, a sample of second stage is selected from each of the selected first stage units again by some suitable method which may be the same or different from the method employed for the first stage units.
Non-Random Sampling Methods
Judgment Sampling
In this method of sampling the choice of sample items depends exclusively on the judgment of the of the investigator;
This method, though simple , is not scientific;
This method is used in solving many types of economic & business problems such as
i. When sample size is small;ii. With the help of Judgment sampling,
estimation can be made available quickly;
Quota Sampling
It is a type of judgment sampling;
In a quota sample, quotas are set up according to given criteria but within quotas the selection of sample items depends on personal judgment.
Convenience Sampling
It is also known as the Chunk; A Chunk is a fraction of one population
taken for investigation because of its convenient availability;
Hence chunk is selected neither by probability nor by judgment but by convenience;
Convenience samples are sometimes called accidental samples because those entering into the sample enter by ‘accident’;
Errors in Sampling: Discrepancies in Statistical measure of population (Parameter) & of the sample drawn from the same population (Statistic).Sampling Errors Non Sampling Errors
These are of two types
a. Biased arise due to any bias in selection , estimation, tec
b. Unbiased errors arise due to chance factors
Occurs primarily due to the following reasons:
1. Faulty selection of the sample
2. Substitution
May arise in the following ways:
1. Due to negligence & carelessness on the part of investigator;
2. Due to incomplete investigation & sample survey;
3. Due to negligence & non response on the part of the respondents;
4. Errors in data processing.
Principles of Sampling
Principle of “Statistical Regularity”: This principle lays down that a moderately large number of items chosen at random from a large group are almost sure on an average to possess the characteristics of the large group.
Principle of “Inertia of Large Numbers”: this is principle is corollary of the above principle.
It states that, other things being equal, larger the size of sample, more accurate the results are likely to be.
Theory of Estimation
Statistical estimation is the procedure of using a sample statistic to estimate a population parameter.
A Statistic is used to estimate a parameter is called an estimator, and
The value taken by the estimator is called an estimate.
for example, the sample mean(say 7.65) is an estimator of the population mean.
Statistical estimation is divided into two major categories:Point Estimation Interval Estimation
In point estimation, a single statistic is used to provide an estimate of the population parameter;
Change in sample will cause deviation in estimate;
An interval estimate is a range of values within which a researcher can say with some confidence that the population parameter falls;
This range is called confidence interval;
Qualities of a good estimator: A good estimator is one which is
close to the true value of the parameter as possible.
A good estimator must possess the following characteristics:
i. Unbiasednessii. Consistencyiii. Efficiency andiv. Sufficiency
Contd.. Unbiasedness: this is a desirable property for a
good estimator to have; “unbiasedness” refers to the fact that a sample mean is an unbiased estimator of a population mean because the mean of the sampling distribution of a sample means taken from the same population is equal to the population mean itself;
Efficiency: it refers to the size of the standard error of the statistic; if two statistic are compared from a sample of the same size & try to decide which is a good estimator; the statistic that has a smaller standard error or standard deviation of the sampling distribution will be selected.
Contd..
Consistency: a statistic is a consistent estimator if the sample size increases, it becomes almost certain that the value of statistic comes very close to the value of the population parameter;
Sufficiency: an estimator is sufficient if it makes so much use of the information in the sample that no other estimator could extract from the sample additional information about the population estimator being estimated;
Hypothesis Testing
Hypothesis testing is based on hypothesis;
“Hypothesis” is an assumption about an unknown population parameter;
Hypothesis testing is a well defined procedure which helps in deciding objectively whether to accept or reject the hypothesis based on the information available from the sample;
Hypothesis Testing Procedure
STEP 1: SET NULL & ALTERNATIVE HYPOTHESIS: The assumption which we want to test is called
the NULL hypothesis; It is symbolized as Ho; Null hypothesis is set with no difference (i.e.
status quo) & considered true, unless and until it is proved by the collected sample data;
Example, Ho :µ =500“the null hypothesis is that the population mean is equal to
500”
Contd.. The Alternative hypothesis, generally
referred by H1 or Ha is the logical opposite of the null hypothesis;
H1 :µ ≠500; ( Ho :µ >500; or H1 :µ <500)
In other words, when null hypothesis is found to be true, the alternative hypothesis must be false; or vice versa;
Rejection in null hypothesis indicates that the difference have statistical significance & acceptance in null hypothesis indicates that the difference are due to chance;
STEP2: SET UP A SUITABLE SIGNIFICANCE The level of significance, generally denoted by
‘α’ is the probability, which is attached to a null hypothesis, which may be rejected even when it is true;
The level of significance is also known as the size of rejection region or size of critical region;
It is generally specified before any samples are drawn, so that results obtained will not influence the direction to be taken;
Any level of significance can be adopted in practice we either take 5% or 1% level of significance;
Contd.. When we take 5% level of significance then
there are about 5 chances out of 100 that we would reject the null hypothesis when it should be accepted i.e. we are about 95% confident that we have made the right decision;
When the null hypothesis is rejected at α=0.5, test result is said to be significant;
When the null hypothesis is rejected at α=0.01, test result is said to be highly significant;
STEP3: DETERMINATION OF A SUITABLE TEST STATISTIC
Many of the test statistic that we shall encounter will have the following form:
Test statistic = Sample Statistic- hypothesized population parameter
Standard Error of the sample statistic
STEP4 : SET THE DECISION RULE The next step for the researcher is to
establish a critical region Acceptance region : when null
hypothesis is accepted; Rejection region ; when null
hypothesis is rejected;
STEP5: COLLECT THE SAMPLE DATA
Data is now collected;
Appropriate sample statistic are computed;
STEP6: ANALYSE THE DATA
This involves selection of an appropriate probability distribution for a particular test;
For example, when the sample is small (n<30) the use of normal probability distribution (Z) is not an accurate choice, (t) distribution needs to be used in this case;
Some commonly used testing procedures are
Z, t, F & Chi square
STEP7: ARRIVE AT A STATISTICAL CONCLUSION & BUSINESS IMPLICATION Statistical conclusion is a decision to
accept or reject a null hypothesis;
This depends on whether the computed test statistic falls in acceptance region or rejection region;
Types of Errors in Hypothesis Testing
Correct Decision
Type I error (α)
Type II error (β)
Correct Decision
Decision
Condition
Ho: true Ho: false
Accept
Reject
Z-test Hypothesis testing for large samples i.e. n>=
30; Based on the assumption that the population ,
from which the sample is drawn, has a normal distribution;
As a result, the sampling distribution of mean is also normally distributed;
Application:1. For testing hypothesis about a single
population mean;2. Hypothesis testing for the difference between
two population means;3. Hypothesis testing for attributes.
Formula for single population mean (finite population) Z = x - µ
σ √nWhere ,µ = population meanx = sample meanσ = population standard deviationn = sample size
Q A marketing research firm conducted a survey 10 yrs ago & found that an average household income of a particular geographic is Rs 10000. Mr. gupta who recently joined the firm a VP expresses doubts. For verifying the data, firm decides to take a random sample of 200 households that yield a sample mean of Rs 11000. assume that the population S.D is Rs 1200. verify Mr. Gupta’s doubts using α=0.05?
Step 1: set null & alternative hypothesis
Ho: µ=10000
H1: µ≠10000 Step2: Determine the appropriate statistical test
Since sample size >=30, so z-test can be used for hypothesis testing
Step3: set the level of significance
The level of significance is known (α=0.05) Step4: Set the decision rule
Acceptance region covers 95% of the area & rejection region 5%
Critical area can be calculated from the table ( + 1.96)
Step5: collect the sample dataA sample of 200 respondents yield a sample mean of Rs 11000
Step6: Analyze the datan=200µ=10 000x=11000 σ=1200 Z = x - µ = 11000-10000 = 11.79
σ 1200 √n √ 200 Step7: Arrive at a statistical conclusion & business
implicationZ value is 11.79 which is greater than +1.96, hence
null hypothesis is rejected and alternative hypothesis is accepted. Hence Mr. Gupta’s doubt about household income was right.
Formula for single population mean (infinite
population) Z = x - µ σ x √N-n
√n √N-1
When population Standard deviation is not known:
Z = x - µ s
√n where s= sample standard deviation
Hypothesis testing for the difference between two population means Z = (x1 – x2) – (µ1 - µ2)
√ σ12 + σ2
2
√n1 + n2
Hypothesis for attributes
Z = x- µ √ npq
Where,n=sample sizeµ= npp=probability of happeningq=chance of not happening
Q In 600 throws of 6-faced dice, odd points appeared 360 times, would you say that the dice is fair at 5% level of significance
Ho=dice is fair P=q=½ n=600 np=300 x=360
Z = x-np = 360-300 =4.9 √ npq √ 600* ½*½ Z is greater than 1.96(at 5%), Ho is rejected. Hence, dice is not fair.
t-test
Given by W.S. Gosset in 1908 under the pen name of student’s test
t-test can be applied when:1. When a researcher draws a small
random sample (n<30) to estimate the population (µ);
2. When the population standard deviation (σ) is unknown;
3. The population is normally distributed
Application of t-test
Hypothesis testing for single population mean;
Hypothesis testing for the difference between two independent population means;
Hypothesis testing for the difference between two dependent population means;
Hypothesis testing for single population mean t = x - µ
s √nWith degree of freedom (n-1)Where ,µ = population meanx = sample means = sample standard deviationn = sample size
Q: Royal tyre has launched a new brand of tyres for tractors & claims that under normal circumstances the average life of tyres is 40000 km. a retailer wants to test this claim & has taken a random sample of 8 tyres. He tests the life of tyres under normal circumstances. The results obtained are:
Tyres
1 2 3 4 5 6 7 8
Km 35 000
38 000
42 000
41 000
39 000
41 500
43 000
38 500Use α = 0.05 for testing the hypothesis
Step1: Set null & alternative hypothesisNull hypothesis: Ho: µ = 40 000Alternative hypothesis: Ho: µ ≠ 40 000Step2:Determine the appropriate statistical testThe sample size is less than 30, so t test will be an appropriate testStep3:Set the level of significanceThe level of significance, i.e. α = 0.05 Step4: Set the decision ruleThe t distribution value for a two-tailed test is t0.025 = 2.365 for degrees of freedom 7. so if computed t value is outside the + 2.365 range, the null hypothesis will be rejected; otherwise accepted.
Step 5: Collect the sample data:
Step 6: Analyze the dataX=39750; µ=40000; s=2618.61 n=8; df=n-
1=7 ;Table value of t0.025,7=2.365 t = x - µ =39750-40000 = -0.27
s 2618.61 √n √ 8 Step 7: Arrive at a statistical conclusion &
Business implicationThe observed t value is -0.27 which falls within the
acceptance region & hence null hypothesis is accepted i.e. Ho: µ = 40 000
Tyres
1 2 3 4 5 6 7 8
Km 350000
38000 42000 41000 39000 41500 43000 38500
Hypothesis testing for the difference between two independent population means t= (x1 – x2) – (µ1 - µ2)
σ √ 1 + 1 √n1 + n2
σ can be estimated by pooling two sample variances & computing pooled standard deviation
σ= s pooled = √ s12 (n1 -1) + s2
2 (n2 -1)
n1 + n2– 2
F-test
Is named after R.A. Fisher who first studied it in 1934;
This distribution is usually defined in terms of the ratio of the variances of two normally distributed populations
The quantitys1
2 / σ12
s22 / σ2
2
is distributed as F-distributed with (n1 – 1) & (n2 -1) degree of freedom
Contd..
Where
s12 = Σ (x1 – x1)2
(n1 – 1)
s22 = Σ (x2 – x2)2
(n2 – 1)
Chi Square test
Chi square is related to categorical data (as counting of frequencies from one or more variables);
Some researchers place chi-square in the category of Non-parametric tests
X2 test was developed by Karl Pearson in 1900;
the symbol X stands for the Greek letter “chi”;
X2 is a function of its degree of freedom;
Contd..
Being a sum of square quantities X2 distribution can never be a negative value;
X2 is a continuous probability distribution with range zero to infinity;
X2 = Σ (O-E)2
EWith df =(r-1)(c-1)E= row total x column total
Grand total
Decision rule
If X2 calculated > X2 critical, reject the null hypothesis;
If X2 calculated < X2 critical, accept the null hypothesis;
Conditions to apply chi- square test Data should not be in % or ratios
rather they should be expressed in original units;
The sample should consist of atleast 50 observations & should be drawn randomly & individual observation in a sample should be independent from each other;