Sampling theory

of 34 /34

Embed Size (px)



Transcript of Sampling theory


2. Sampling TheoryTwo ways of collection of statistical data:1.Complete Enumeration (or Census) 2. Sample Survey Population (or Universe):Totality of statistical data forming a subject of investigation.Sample :Portion of population which is examined with a view to estimating the characteristics of population. 3. Methods of Sampling1.Simple Random Samplinga) [Simple Random Sampling without replacement]b) [Simple Random Sampling with replacement]2. Systematic Sampling3.Stratified Sampling4.Cluster Sampling5.Quota Sampling6. Purposive Sampling ( or Judgment Sampling) 4. Some Important terms associated with samplingParameter : A characteristic of a population based onall the units of the population.Statistics: A statistical measure of sampleobservation and as such it is a function of sampleobservations.Statistical inferences are drawn about populationvalues i. e. parameters based on the sampleobservations i.e. statistics. Usually the followingnotations are used: Measure ParameterStatistic Mean XProportionPpStandard deviation s 5. Sampling Distribution:Starting with a population of N units, we can drawmany samples of a fixed size n. In case ofsampling with replacement, the total number ofsamples that can be drawn is Nn and whensampling is without replacement, the totalnumber of samples that can be drawn is NCn.If it is possible to obtain the values of a statistic(t) from all possible samples of a fixed samplesize along with corresponding probabilities, thenwe can arrange these values of statistics(treating them as random variables ) , in the formof probability distribution. Such a probabilityDistribution is called Sampling Distribution. 6. Basic Statistical Laws:1. Law of Statistical Regularity:- It states that areasonably large number of items selected at randomfrom a large group of items, will on the average representthe characteristics of the group.2. Law of Inertia of Large Number: It states that largegroups of data show high degree of stability becausethere is a greater possibility that one side arecompensated by the extremes on the other side.3. Central Limit Theorem : If x1, x2, x3, .. xn is a randomsample of size n drawn from any population (having mean and variance 2), then the distribution sample mean (x) isnormally distributed with mean and variance 2/n,provided n is sufficiently large, i.e. n, where and 2respectively are population mean and variance. 7. The Mean of the statistic is called Expectation and standarddeviation of statistic t is called Standard Error.Standard Errors (S.E.) of common statistics: Statistic Standard Error(S.E.)1.Single Mean (x) : /n2.Differences of Means (x-y) : [ 2/n + 2/n ]3. Single Proportion (p) : [PQ/n]4. Differences of proportion (p-p):[PQ(1/ n +1/ n]The factor [N-n / N-1] is known as finite populationcorrection factor (fpc)This is ignored for large population. It is used when n/N isgreater than 0.05 8. Examples :1. A simple random sample of size 36 is drawn from finite population consisting of 101 units. If the population S.D. is 12.6, find the standard error of sample mean when the sample is drawn(a) with replacement (b) without replacement . [Ans: a) 2.1 b) 1.69]2. A random sample of 500 oranges was takenfrom a large consignment and 65 were found tobe defective. Show that the S.E. of theproportion of bad ones in a sample of this size is0.015. 9. Theory of Estimation :Point Estimation ; When a single sample value (t) is used toestimate parameter (), is called point estimation.Interval Estimation: Instead of estimating parameter by a single value, an interval of values is defined. It specifies two values that contains unknown parameter.i.e. P ( t t ) = 1 . Then [ t , t ] is called confidence interval. is called level of significance e.g. 5% or 1% l.o.s. 1 is called confidence level e.g. 95% or 99% .Confidence LevelThe confidence level is the probability value associated with a confidence interval.It is often expressed as a percentage. For example, say , then the confidence level is equal to (1-0.05) = 0.95, i.e. a 95% confidence level. 10. Determination of sample size for Mean :The following factors must be known:i) The desired confidence level.ii) The permissible sampling error E = x - .iii) The standard deviation .The size of sample mean n is given byn = ( Z / E )2 . 11. Determination of sample size for Proportion:The following factors must be known:i) The desired confidence level.ii) The permissible sampling error E = P - p.iii) The estimated true proportion of success.The size of sample mean n is given byn = ( Z2pq / E 2 ). Where q = 1-p 12. Problems:1. It is known that the population standard deviation in waiting timefor L.P.G. gas cylinder in Delhi is 15 days. How large a sampleshould be chosen to be 95% confident, the waiting time is within7 days of true average. [18]2. A manufacturing concern wants to estimate the average amount of purchase of its product in a month by the customers whose standard error is Rs.10. Find the sample size if the maximum error is not to exceed Rs.3 with a probability of 0.99[74]3. The business manager of a large company wants to check the inventory records against the physical inventories by a sample survey. He wants to almost assure that maximum sampling error should not be more than 5% or below the true proportion of accurate records. The proportion of accurate records is estimated as 35% from past experience. Determine the sample size. [819] ************ 13. Standard deviation and confidenceintervals 14. If t is statistic then 95% confidence interval is given by[ t 1.96 S.E.of t] 99% confidence interval is given by[ t 2.58 S.E.of t] 15. There are five ingredients to any statisticaltest :(a) Null Hypothesis (Ho)(b) Alternate Hypothesis(c) Test Statistic(d) Rejection/Critical Region or Acceptance of Ho(e) Conclusion 16. Null HypothesisH0: there is no significant difference between the two values (i. e. statistic and parameter or two sample values)Alternative hypothesisH1: The above difference is significant[the statement to be accepted if the null isrejected ] 17. Type I ErrorIn a hypothesis test, a type I error occurs whenthe null hypothesis is rejected when in fact it istrue; that is, Ho is wrongly rejected.P(type I error) = significance level = 1 .Type I error = ( Reject Ho / Ho is true)Type II ErrorIn a hypothesis test, a type II error occurs whenthe null hypothesis Ho, is not rejected when in factit is false .Type II error = ( Accept Ho / Ho is not true) 18. Decision Reject Ho Accept HoTruth Ho Type I ErrorRight decisionH1 Right decision Type II ErrorP(RejectHo/Ho is true) = Type I Error= Level of significance(Producers risk)P(AcceptHo/Ho is not true) = Type II Error(Consumers risk)A type I error is often considered to be more serious, andtherefore more important to avoid, than a type II error. 19. One tailed test : Here the alternate hypothesis HA is one-sided and we test whether the test statistic falls in the criticalregion on only one side of the distribution Two tailed test : Here the alternate hypothesis HA isformulated to test for difference in either direction 20. Common test statisticsNameFormula1.One-sample z-test2.Two-sample z-test3.One-proportion z-test4.Two-proportion z-test, 21. Critical Value(s)The critical value(s) for a hypothesis test is athreshold to which the value of the test statisticin a sample is compared to determine whether ornot the null hypothesis is rejected.For Normal Tests:Critical value (Ztable)Level of Significance 1% 5%Two tailed test 2.581.96One tailed test 2.331.645 22. Decision:*If modulus of the computed value of Z is lessthan table value of Z, then Accept NullHypothesis Ho. i.e. Calculated |z| < Table z then Accept Ho*If modulus of the computed value of Z is greaterthan table value of Z, then Reject NullHypothesis Ho.i.e. Calculated |z| > Table z then Reject Ho 23. Steps in Hypothesis Testing1. Identify the null hypothesis Ho and the alternate hypothesis H A.2. Choose 1- (level of significance). The value should be small, usually less than 10%. It is important to consider the consequences of both types of errors.3. Select the test statistic and determine its value from the sample data. This value is called the observed value of the test statistic.4. Compare the observed value of the statistic to the critical value obtained for the chosen l.o.s..5. Make a decision. :-If the test statistic falls in the critical region: Reject Ho in favour of H1.-If the test statistic does not fall in the critical region: Conclude that there is not enough evidence to reject Ho. 24. Chi Square Goodness of Fit(One Sample Test)This test allows us to compare a collection of categoricaldata with some theoretical expected distribution.Ho: There is no considerable difference between observedvalue and theoretical value.H1: The difference is significantChi Square Test of IndependenceFor a contingency table that has r rows and c columns, thechi square test can be thought of as a test ofindependence. In a test of independence the null andalternative hypotheses are:Ho: The two categorical variables are independent.H1: The two categorical variables are related. 25. Calculate the chi square statistic x2 by completingthe following steps:1.For each observed number in the table subtractthe corresponding expected number (O E).2.Square the difference [ (O E)2 ].3.Divide the squares obtained for each cell in thetable by the expected number for that cell [ (O - E)2 / E ].4.Sum all the values for (O - E)2 / E. This is the chisquare statistic . 26. Example . Incidence of three types of malaria in three tropical regions.Asia Africa South AmericaTotals Malaria A31144590 Malaria B255360Malaria C 5345 2100 Totals 8664100250Solution: We now set up the following tableObservedExpected |O -E|(O E) 2 (O E)2/E 31 30.960.040.0016 0.00005161423.04 9.0481.723.5464536.009.00 81.00 2.25 220.64 18.64 347.4516.83 515.36 10.36107.336.995324.00 29.00841.00 35.045334.40 18.60345.96 10.064525.6019.40376.36 14.70 2 40.0038.00 1444.0036.10 27. Test Statistic:Chi Square = 125.516(Calculated value)Degrees of Freedom = (c - 1)(r - 1) = 2(2) = 4Reject Ho because 125.516 is greater than 9.488 (for alpha 5% l.o.s.)(Table value) 28. Oneway analysis of varianceIf the variances in the groups (treatments) aresimilar, we can divide the variation of theobservations intothe variation of the groups (variation of the means)andthe variation in the groups. The variation ismeasured with the sum of the squares 29. Analysis of Variance (By Coding Method)Steps in Short Cut Method1.Set the null hypothesis Ho & Alternate hypothesis H12. Steps of computing test statistici] Find the sum of all the values of all the items of all the samples (T)ii] Compute the correction factor C = square of T / NN the total number of observations of all the samples.iii] Find sum of squares of all the items of all the samples.iv] Find the total sum of squares SST [ Total in (iii) C]v] Find sum of squares between the samples SSC.[Square the totals of the sample total ,divide by no. of elements in that samples & subtract C from it.]vi] Set up ANOVA table and calculate F, which is the test statistic.vii] If calculated F is less than table F , Accept Ho otherwise Reject Ho. 30. ANOVA TableSource of Sum ofd.o.f. Mean Squares Fvariation squares BetweenSSCc-1 MSC=SSC/c-1 Samples Within SSE c(r-1) MSE=SSE/ c(r-1)MSC/MSSamples[orMSE/MSC](As F ratio is greater than 1)Total SSTcr-1-