Introduction to Inferential Statistics. Introduction Researchers most often have a population that...

Introduction to Inferential Statistics

Introduction Researchers most often have a population that

is too large to test, so have to draw a sample from the population

Researchers collect a random sample from the population and generalize from the known characteristics of the sample to the unknown population

Random Sampling To use inferential statistical techniques, it is required

that samples be randomly drawn from the population of interest Non-random samples can be used for exploratory research

Meaning that you are going to explore a topic to see what variables are important to the topic

But, the conclusions cannot be generalized to the population

Random sampling requires a precise process of selection

Need to remember that randomness is not representativeness Because a sample is random does not guarantee that it is

an exact representation of the population

Random Sampling The probability is high that a randomly

selected sample will be representative The good thing about inferential statistics is

that they allow you to state the probability of this type of error very precisely

Ways in Which Random Samples are Gathered

Simple Random Sample For a simple random sample, each case and each

combination of cases in the population must have an equal probability of being chosen for the sample

This kind of sample is used when you have a complete list of all cases in the population

Most researchers use tables of random numbers to select cases

This would be extremely time consuming for a large sample, so another technique is used

Systematic Sampling Needed for large populations Only the first case is randomly selected After that, every kth case is selected

Will choose the first case from the table of random numbers, then choose every 10th case, or however many you need to reach a sample of the size you want

Will divide your population size by the sample size to find the distance to the next score for your sample

This is not a random sample Need to make sure the list of the population is random

Stratified Sample Proportional Stratified Sample

This is used if you want to guarantee a representation of certain categories of cases (the same percentage as in the population)

If you want to compare chemistry majors to criminology majors You first ask students their major Then you put them all in the same list sorted by

major, then begin your random sample from the list All majors will be included if your sample is large

enough, but each would be included in the proportion they are in the original population Will have more criminology majors

Stratified Sample Disproportionate Stratified Sampling

If you need exactly the same number of students from each major You separate the students by major Then have to change your sampling fraction to account for

differences in number of cases If you need a sample of 50, and there are 100 zoology majors,

you would choose every other one to get exactly 50 cases

Problem with this is you cannot generalize directly to the population, since your sample will never be representative

The biggest problem in sampling is that there is no complete list of most populations

Cluster Sample Used when there is no list of the members of

the population It involves random selection of geographical

units (states, neighborhoods, or blocks) And will test every case within the last

geographical units Not a random sample, so not as trustworthy,

but cheaper to do

The Sampling Distribution

Introduction Researchers have a great deal of information about

the sample distribution, but they know nothing about the population

It is the population that is of interest We do not want to know what 2,000 people think out of

the 100,000,000 or so adults in the U.S. What you would want to know about the distribution

of the population The shape of the distribution Some measure of central tendency Some measure of dispersion

The Normal Curve You need to know the properties of the normal

curve, which are based on the laws of probability, to find out information about the population from the sample

To do this, you use a device known as the sampling distribution It bridges the gap between the sample and the population The sampling distribution is the central concept in

inferential statistics, so you need to understand the concept

Three Distributions The sample distribution

This is empirical (observed) and known It is collected by researchers and used to learn about the

population The population distribution

It exists in reality, so is empirical, but it is unknown to the researcher

The sole purpose of inferential statistics is to make inferences (meaning draw conclusions) about the population distribution

The Sampling Distribution This is nonempirical (theoretical)

Theoretical, since you only do one sample, and the sampling distribution is based on an infinite number of samples taken from that population

Laws of probability tell us much about this distribution

Theoretically, if you drew an infinite number of samples from a population Then you only computed the mean of each sample And you put the means on a graph to form a frequency

polygon

The Sampling Distribution We know that each sample mean will be

slightly different Since each sample is not an exact representation

of the population We know that most of the sample means will

cluster around the true population value

Two Theorems About the Sampling Distribution If repeated random samples of size N are

drawn from a normal population with mean µ and standard deviation σ, then the sampling distribution of sample means will be normal with a mean µ and a standard deviation of σ /the square root of N So the mean of the sampling distribution will

be the same as the mean of the population

Theorems Since the samples are random, the means

should miss an equal number of times on either side of the population value Making the distribution symmetrical A normal curve with a bell shape So, we know about the shape of the sampling

distribution

Dispersion of the Sampling Distribution We can also tell something about the

dispersion (specifically the standard deviation) of the sampling distribution

The formula for the standard deviation of the sampling distribution is represented by the symbol σ/the square root of N Which is the standard deviation of the population

divided by the square root of N

Dispersion of the Sampling Distribution What this tells you is that in comparing a sampling

distribution with a population distribution, there will always be more variance in the population distribution

As the sample size gets larger, the variance of the sampling distribution will get smaller (N = the number in the sample)

The above theorem applies to populations that are normally distributed on a particular variable

Central Limit Theorem This second theorem is needed if the

population distribution is not normal If repeated random samples of size N are

drawn from any population, with mean µ and standard deviation σ, then as N becomes large, the sampling distribution of sample means will approach normality, with mean µ and standard deviation σ/the square root of N

Large Samples What, exactly, is meant by large

A good rule of thumb is that if N is 100 or more, the Central Limit Theorem applies, and you can assume that the sampling distribution is normal in shape

Introduction to Inferential Statistics. Introduction Researchers most often have a population that...

Documents

Transcript of Introduction to Inferential Statistics. Introduction Researchers most often have a population that...