determinatiion of

72
Determination of Sample size a review of statistical theory A descriptive statistics [email protected]

Transcript of determinatiion of

Page 1: determinatiion of

Determination of

Sample size

a review of statistical

theory

A descriptive statistics

[email protected]

Page 2: determinatiion of

Plan your data collection and collect data using one or more of:

Questionnaires

Semi-structured in-depth

and group interviews

Sampling Secondary data Observation

Qualitative methods

Write your project report and prepare your presentation

Quantitative methods

Analyse your data using one or both of:

Submit your project report and give your presentation

The r

esea

rch

proc

ess

2

Page 3: determinatiion of

Descriptive and Inferential statistics

• Statistics used to describe or summarize information about the population or sample.

• Inferential statistics is a use of statistics to make inferences or judgment about a population on the basis of sample

Page 4: determinatiion of

Sample statistics and population parameters

• Sample statistics relates to variables in a sample or measures computed from sample data

• Population parameters: variables in a population or measured characteristics of the population

Page 5: determinatiion of

Frequency distributions (table)

• A set of data organized by summarizing the number of times a particular value of a variable occurs

• To make the data useable• Process?• Percentage distribution is a frequency

organized in a table ( graph) that summarizes the percentage value associated with the particular values of a variable

Page 6: determinatiion of

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Monthly Rent for 70 houses

Page 7: determinatiion of

Percentage tableSpending amount on buying Percent spent

1 chokidar 4000 ?

2 clerk 6000 ?

3 suprintendent 10000 ?

4 officers 20000 ?

total 40000 100%

Page 8: determinatiion of

Percentage tableSpending amount on buying

Percent spent Prob

1 chokidar 4000 ? ?

2 clerk 6000 ? ?

3 superintendent 10000 ? ?

4 officers 20000 ? ?

total 40000 100% 1.00

Page 9: determinatiion of

Probability distribution

• The organization of probability values associated with particular values of a variable into table (graph).

• Percentages are shown in probability situation• Long run relative frequency• Event will occur in future

Page 10: determinatiion of

Proportion

• The percentage of population elements that successfully meet some criterion

Page 11: determinatiion of

Descriptive Statistics: Numerical Measures

Numerical data properties

Variation ShapeCentral Tendency

Kurtosis

Standard Deviation

Skewness

Median

Mode

Mean

Variance

Interquartile Range

Range

Page 12: determinatiion of

The central tendency: Mean

• The measure of central tendency: the arithmetic average

Page 13: determinatiion of

Mean

• The mean of a data set is the average of all the data values.

• As we said, the sample mean is the point

estimator of the population mean m.

Page 14: determinatiion of

Sample Mean

Number ofobservationsin the sample

Sum of the valuesof the n observations

ixx

n

Page 15: determinatiion of

Population Mean m

Number ofobservations inthe population

Sum of the valuesof the N observations

ix

N

Page 16: determinatiion of

The central tendency: Median and mode

• The measure of central tendency that is the mid point; the value below which half the values in a sample fall

• Mode is a measure of central tendency: the value that occurs most often

Page 17: determinatiion of

Median

Whenever a data set has extreme values, the median is the preferred measure of central location.

A few extremely large incomes or property values can inflate the mean.

The median is the measure of location most often reported for annual income and property value data.

The median of a data set is the value in the middle when the data items are arranged in ascending order.

Positioning Point n 12

Page 18: determinatiion of

Median

12 14 19 26 2718 27

For an odd number of observations:

in ascending order

26 18 27 12 14 27 19 7 observations

the median is the middle value.

Median = 19

Page 19: determinatiion of

Median

12 14 19 26 2718 27

For an even number of observations:

in ascending order

26 18 27 12 14 27 30 8 observations

the median is the average of the middle two values.

Median = (19 + 26)/2 = 22.5

19

30

Page 20: determinatiion of

Measures of Variability (Dispersion)

Range Interquartile Range or Mid-

spread Deviations score, Variance Standard Deviation

Coefficient of Variation

Page 21: determinatiion of

Measure of dispersion

• Can be skinny of fat• Hence, Range tells the distance between the smallest

and largest values of a frequency distribution (EXTREME VALUES)

• Interquartile range encompasses the middle most observations, that is, the range between the bottom quartile (25%) and top quartile (25%)

Page 22: determinatiion of

Range

The range of a data set is the difference between the largest and smallest data values.

It is the simplest measure of variability. It is very sensitive to the smallest and largest data values.

Page 23: determinatiion of

Quartiles

• Unless the sample size is large, percentiles may not make sense, since percentiles divide the data into 100 groups.

• In smaller samples, we might divide the data into four groups (quartiles). Since almost any sample can be divided into four groups, the quartiles are important descriptive statistics to explain.

Quartiles are specific percentiles. First Quartile = 25th Percentile

Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile

Page 24: determinatiion of

Measures of Variability (Dispersion)

It is often desirable to consider measures of variability (dispersion), as well as measures of location.

For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

Page 25: determinatiion of

Range: Example

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

Range = largest value - smallest valueRange = 615 - 425 = 190

Monthly Rent for 70 Apartments

Page 26: determinatiion of

Interquartile Range or Midspread

The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values—it is not effected by the extreme values.

Interquartile Range Q Q3 1

Page 27: determinatiion of

Interquartile Range: Example

425 430 430 435 435 435 435 435 440 440440 440 440 445 445 445 445 445 450 450450 450 450 450 450 460 460 460 465 465465 470 470 472 475 475 475 480 480 480480 485 490 490 490 500 500 500 500 510510 515 525 525 525 535 549 550 570 570575 575 580 590 600 600 600 600 615 615

3rd Quartile (Q3) = 5251st Quartile (Q1) = 445

Interquartile Range = Q3 - Q1 = 525 - 445 = 80

Monthly Rent for 70 Apartments

Page 28: determinatiion of

Measures of Variability (Dispersion)

• Deviations scores• di =Xi - X where di = a deviation score• Example: ?

• Average deviationn

xi

)(

n

Xxi

)(

Page 29: determinatiion of

variance• A measure of variability or dispersion, the

square root is the standard deviation

The variance is a measure of variability that utilizes all the data.

It is based on the difference between the value of each observation (xi) and the mean ( for a sample, m for a population).

Page 30: determinatiion of

Variance

The variance is computed as follows:

The variance is the average of the squared differences between each data value and the mean.

for asample

for apopulation

22

( )xNi

1

)( 22

n

xixs

Page 31: determinatiion of

Standard deviation

• Square root of the variance for distribution

The standard deviation of a data set is the positive square root of the variance.

It is measured in the same units as the data, making it more easily interpreted than the variance.

Page 32: determinatiion of

Standard Deviation

The standard deviation is computed as follows:

for asample

for apopulation

s s 2 2

Page 33: determinatiion of

Normal distribution

• Normal distribution curve• A symmetrical, bell shaped distribution that

describes the expected probability of many chance occurrences

• Expected distribution of sample mean• Normal curve equals (3) SD ± from its mean

Page 34: determinatiion of
Page 35: determinatiion of

standardized normal distribution

• Specific normal curve that has:– Symmetrical about its mean– Identify the normal distribution at highest point=1– Probability of occurrences =1– Normal distribution mean = zero and SD=1– Formula or standardized value = (value to be transformed)-(mean)/SDWhere µ= hypothesized or expected value of the mean

Page 36: determinatiion of
Page 37: determinatiion of

standardized normal distribution

• Example:– a shop keeper has an experience of his average

sales level: Mean=9000 units and it varies: SD= 500 units. Further, he expects that the sales level will be between 7500 and 9625 units. What is the probability of occurrence?

Formula:

Page 38: determinatiion of

Additional type of distribution

• Population distribution• Sample distribution• Sampling distribution

Page 39: determinatiion of

Population distribution

• A frequency distribution of the elements of a population

• It has its mean µ and SD σ

Page 40: determinatiion of

Sample distribution

• A frequency distribution of the elements of a sample

• sample mean xA and its SD is rep S

Page 41: determinatiion of

Sampling distribution of the sample mean• Basis for understanding statistics• It is theoretical probability distribution of all

possible samples of certain size drawn from a particular population

• in actual practice would never calculated• Large samples say 50000 each having n

elements from a specified population• Several people, several samples; not same mean• Central limit theorem says if large sample and

drawn randomly, mean approx normal distri

Page 42: determinatiion of

Sampling distribution…

• It’s the functional relationship between the possible values of some summary characteristics of n cases drawn at random and the probability associated with each value over all possible samples size n from a particular population

• Sampling mean is called expected value of the statistics

Page 43: determinatiion of

Sampling distribution…

• SD of the sampling distribution of x called standard error of the mean

• Standard error of the mean is the SD of the sampling distribution of the mean

Page 44: determinatiion of

THREE type of distributionDistribution Mean SD

1 Population distribution µ σ

2 Sample distribution xA S

3 Sampling distribution µ x = µ S x

Page 45: determinatiion of

Central-limit theorem

• The theory stating that as a sample size increases the distribution of sample mean of size n, randomly selected, approaches a normal distribution

Page 46: determinatiion of

Central-limit theorem• Example– Number of rupees spend on book, further assume

that age of youth 20 years old at Commerce Department and population size is 6. Now calculate the mean

S No STUDENTS EXP ON BOOK $

1 A 1.00

2 B 2.00

3 C 3.00

4 D 4.00

5 E 5.00

6 F 6.00

Page 47: determinatiion of

Samples

Page 48: determinatiion of

Samples1.2

1.3 2, 3

1, 4 2, 4 3, 4

1, 5 2, 5 3, 5 4, 5

1, 6 2, 6 3, 6 4, 6 5, 6

Page 49: determinatiion of

Means of the samples and their frequency distributionSample Summation X x= probability

1,2

1,3

1,4

1,5

1,6

2,3

2,4

2,5

2,6

3,4

3,5

3,6

4,5

4,6

5,6

Page 50: determinatiion of

Means of the samples and their frequency distributionSample Summation X x= probability

1,2 3.00

1,3 4.00

1,4 5.00

1,5 6.00

1,6 7.00

2,3 5.00

2,4 6.00

2,5 7.00

2,6 8.00

3,4 7.00

3,5 8.00

3,6 9.00

4,5 9.00

4,6 10.00

5,6 11.00

Page 51: determinatiion of

Means of the samples and their frequency distributionSample Summation X x= probability

1,2 3.00 1.50

1,3 4.00 2.00

1,4 5.00 2.50

1,5 6.00 3.00

1,6 7.00 3.50

2,3 5.00 2.50

2,4 6.00 3.00

2,5 7.00 3.50

2,6 8.00 4.00

3,4 7.00 3.50

3,5 8.00 4.00

3,6 9.00 4.50

4,5 9.00 4.50

4,6 10.00 5.00

5,6 11.00 5.50

Page 52: determinatiion of

Means of the samples and their frequency distributionSample Summation X x= probability

1,2 3.00 1.50 1/15

1,3 4.00 2.00 1/15

1,4 5.00 2.50 1/15

1,5 6.00 3.00 1/15

1,6 7.00 3.50 1/15

2,3 5.00 2.50 1/15

2,4 6.00 3.00 1/15

2,5 7.00 3.50 1/15

2,6 8.00 4.00 1/15

3,4 7.00 3.50 1/15

3,5 8.00 4.00 1/15

3,6 9.00 4.50 1/15

4,5 9.00 4.50 1/15

4,6 10.00 5.00 1/15

5,6 11.00 5.50 1/15

Page 53: determinatiion of

Means of the samples and their frequency distribution

Sample mean Frequency Probability

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

5.50

Page 54: determinatiion of

Means of the samples and their frequency distribution

Sample mean Frequency Probability

1.50 1

2.00 1

2.50 2

3.00 2

3.50 3

4.00 2

4.50 2

5.00 1

5.50 1

Page 55: determinatiion of

Means of the samples and their frequency distribution

Sample mean Frequency Probability

1.50 1 1/15

2.00 1 1/15

2.50 2 2/15

3.00 2 2/15

3.50 3 3/15

4.00 2 2/15

4.50 2 2/15

5.00 1 1/15

5.50 1 1/15

Page 56: determinatiion of

Point estimates• An estimate of the population mean in the

form of a single value, usually the sample mean

• Example: large population mean is unknown. In order to know the mean through sample of say 300 people • It is rarely exact to population mean• Confidence level is important

Page 57: determinatiion of

Confidence interval & Level• It is the range of confidence• A specified range of numbers within which a

population mean is expected to lie: the set of acceptable hypotheses of the level of probability associated with an interval estimate

• Confidence level is a percentage or decimal value that tells how confident a researcher can be about being correct. It states the long run percentage of the time that a confidence interval will include the true population mean

Page 58: determinatiion of

Confidence interval: example• A manger thinks that age is useful standard in

placement. People have been sampled and sample mean of 100 people were 37.5 years, SD (S)12.00 years. Hopping for the sample point estimate from the sample exactly the same as the population mean age. Confidence level is 95%. Please follow the steps for calculation

Page 59: determinatiion of

Confidence interval; stepsSteps

1 Calculate the mean from the sample

2 Assuming SD is unknown, so estimate the population SD by finding S, i-e the sample SD

3 Estimate the standard error of the mean, utilizing the following formula

4 Determine the Z-value associated with the confidence level desired. The confidence level should be divided by 2 to determine what %age of the area under the curve must be included on each side of the mean

5 Calculate the confidence level

Page 60: determinatiion of

Steps

• Step 1: Calculate the mean from the sample X = 37.5

• Assuming SD is unknown, so estimate the population SD by finding S, i-e the sample SD

S= 12.00

Page 61: determinatiion of

step3

• Estimate the standard error of the mean, utilizing the following formula

S=12/ /100 =1.2

Page 62: determinatiion of

step4• Determine the Z-value associated with the confidence level desired. The

confidence level should be divided by 2 to determine what %age of the area under the curve must be included on each side of the mean

• Sampling confidence is 95% and half is 47.5%• See the Z table(2)• Find the value in the table which is equal to

1.96

Page 63: determinatiion of

step5

• Calculate the confidence levelformula µ = X ±E or µ= X ± Zcl SX

• µ = 37.5 ± (1.96) (1.2) = 37.5 ± 2.352 = 35.15 and 39.85

Page 64: determinatiion of

Some basic formulasPopulation Mean

Sample mean

Deviation

Variance

SD population

SD sample

Standardized normal distribution

Standard error of sampling distribution

Page 65: determinatiion of

Some basic formulasPopulation Mean µ= Ʃ xi /N

Sample mean ×̅A = Ʃ xi/n

Deviation Di = ( xi-×̅A)

Variance S2 = ( xi-×̅A)2

SD population σ =/ (xi-×̅A)2 /N

SD sample ×̅A =/ (xi-×̅A)2 /n-1

Standardized normal distribution Z= x- µ/ σ

Standard error of sampling distribution

Page 66: determinatiion of

Sample size: random error and sample size

• When the SD of the population is unknown, a confidence interval is calculated using the formula

Confidence interval= ×̅A +- Z S/ A under root n

Page 67: determinatiion of

Factors in determining the sample size for questions involving means

• Heterogeneity • Magnitude of acceptable error• Confidence level

Page 68: determinatiion of

Estimating sample size for questions involving means

• Steps:– Estimate the SD of population *pilot study*

Sequential sampling– Make a judgment about the allowable magnitude

of error– Determine confidence level

Page 69: determinatiion of

Rule of thumb for SD estimation

• Expect it to be about 1/6 of the rangeExample: If researcher studying on DVD purchase expected price paid to range from $100 to $700If we plan on using 10-point purchase intention scale, it is: 10/6 = 1.67

Page 70: determinatiion of

SD is knownestimate the mean of population

sample size is:

n= [ZS/E]2

Z= standardized value S= sample SD or estimate of the population SDE= acceptable magnitude or error

Page 71: determinatiion of

Example: determine sample size

Study on the annual expenditure on soap, have 95% confidence level (Z= 1.96) and range of error (E) of less then $2. If the estimate of the SD =$29, the sample size will be n= [ZS/E]2

=(1.96*29/2)2

If the range of error (E) is $4 then

Page 72: determinatiion of

Thank you