Chapter 1 Introduction

14
Chapter 1: Introduction 1 C C h h a a p p t t e e r r 1 1 : : I I N N T T R R O O D D U U C C T T I I O O N N Upon completion of this chapter, you should be able to: Define statistics Differentiate between descriptive and inferential statistics Compare the different types of variables Explain the importance of sampling Differentiate between the types of sampling procedures CHAPTER OVERVIEW What is statistics? Two kinds of statistics Variables Operational definitions Sampling Sampling techniques a) Simple random sampling b) Systematic sampling c) Stratified sampling d) Cluster sampling Summary Key Terms References Chapter 1: Introduction Chapter 2: Descriptive Statistics Chapter 3: The Normal Distribution Chapter 4: Hypothesis Testing Chapter 5: T-test Chapter 6: Oneway Analysis of Variance Chapter 7: Correlation Chapter 8: Chi-Square This chapter introduces you to the definition of statistics, focusing on descriptive and inferential statistics and how they are applied in analysing educational data. Before you can begin to use statistics in your research, you should be clear about the variables you intend to measure which should be defined operationally. The use of statistics requires the adoption of appropriate sampling procedures so that you are able to generalise findings from the sample to the population. Several sampling procedures are discussed in this chapter.

description

Statistics

Transcript of Chapter 1 Introduction

Page 1: Chapter 1 Introduction

Chapter 1: Introduction

1

CCChhhaaapppttteeerrr 111:::

IIINNNTTTRRROOODDDUUUCCCTTTIIIOOONNN

Upon completion of this chapter, you should be able to:

Define statistics

Differentiate between descriptive and inferential statistics

Compare the different types of variables

Explain the importance of sampling

Differentiate between the types of sampling procedures

CHAPTER OVERVIEW

What is statistics?

Two kinds of statistics

Variables

Operational definitions

Sampling

Sampling techniques

a) Simple random sampling

b) Systematic sampling

c) Stratified sampling

d) Cluster sampling

Summary

Key Terms

References

Chapter 1: Introduction

Chapter 2: Descriptive Statistics

Chapter 3: The Normal Distribution

Chapter 4: Hypothesis Testing

Chapter 5: T-test

Chapter 6: Oneway Analysis of Variance

Chapter 7: Correlation

Chapter 8: Chi-Square

This chapter introduces you to the definition of statistics, focusing on descriptive and

inferential statistics and how they are applied in analysing educational data. Before you

can begin to use statistics in your research, you should be clear about the variables you

intend to measure which should be defined operationally. The use of statistics requires the

adoption of appropriate sampling procedures so that you are able to generalise findings

from the sample to the population. Several sampling procedures are discussed in this

chapter.

Page 2: Chapter 1 Introduction

Chapter 1: Introduction

2

What is Statistics?

Let us refer to some definitions of statistics:

American Heritage Dictionary® defines statistics as:

"The mathematics of the collection, organization, and interpretation of numerical

data, especially the analysis of population characteristics by inference from

sampling."

The Merriam-Webster’s Collegiate Dictionary® defines statistics as:

“a branch of mathematics dealing with the collection, analysis, interpretation, and

presentation of masses of numerical data".

Websters’s New World Dictionary® defines statistics as:

“ facts or data of a numerical kind, assembled, classified and tabulated so as to

present significant information about a given subject

Jon Kettenring, President of the American Statistics Association defines statistics:

"as the science of learning from data. Statistics is essential for the proper running

of government, central to decision making in industry, and a core component of

modern educational curricula at all levels."

Note that the word "mathematics" is mentioned

in two of the definitions above while "science" is

stated in the other definition. Both mathematics and

science scare some students. These students lament the

fact that they are from the humanities and the social

sciences and hence are weak in mathematics. Being

terrified of mathematics does not just happen

overnight. Chances are that you may have had bad

experiences with mathematics in earlier years

(Kranzler, 2007).

Fear of mathematics can lead to a defeatist

attitude which may affect the way you approach

statistics. In most cases, the fear of statistics is due to

irrational beliefs. Just because you had difficulty in the past, does not mean that you

will always have difficulty with quantitative subjects. You have gotten this far in your

education and doing this course in statistics. It is not likely that you are an incapable

person.

You have to convince yourself that statistics is not a difficult subject and you

need not have to worry about the mathematics involved. Identify your irrational

beliefs and thoughts about statistics. Are you telling yourself: "I'll never be any good

in statistics", 'I'm a loser when it comes to anything dealing with numbers", "What

will other students think of me if I do badly".

Page 3: Chapter 1 Introduction

Chapter 1: Introduction

3

For each of these irrational beliefs about your abilities, ask yourself what

evidence is there to suggest that "you will never be good in statistics" or that "you are

lousy in mathematics". When you do that, you will begin to replace your irrational

beliefs with positive thoughts and you will feel better. You will realise that your

earlier beliefs about statistics are the cause of your unpleasant emotions. Each time

you feel anxious or emotionally upset, question your irrational beliefs which may help

you overcome your initial fears.

Keeping this in mind, this course has been written by presenting statistics in a

form that appeals to those who fear mathematics. Emphasis is on the applied aspects

of statistics and with the aid of a statistical software called Statistical Package of the

Social Science (or better known as SPSS), you need not have to worry too much about

the intricacies of mathematical formulas However, you need to know about the

different formulas used, what they mean and when they are used. However,

computation using these mathematical formulas have been kept to a minimal.

Two Kinds of Statistics

Statistics is all around you. Television uses a lot of statistics. For example,

when it is reported during the holidays a total of 134 died in traffic accidents; the

stock market fell by 26 points; the number of violent crimes in the city has increased

by 12%. Imagine a football game between Manchester United and Liverpool and no

one kept score! Without statistics you could not plan your budgets, pay your taxes,

enjoy games to their fullest, evaluate classroom performance and so forth. Are you

beginning to get the picture? We need statistics. Generally there are two kinds of

statistics:

Descriptive Statistics

Inferential Statistics

a) Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a

study. Historically, descriptive statistics began during Roman times when the empire

undertook census of births, deaths, marriages and taxes. They provide simple

summaries about the sample and the measures. Together with simple graphics

analysis, they form the basis of virtually every quantitative analysis of data. With

descriptive statistics you are simply describing what is or what the data shows.

Descriptive Statistics are used to present quantitative descriptions in a

manageable form. In a research study we may have lots of measures. Or we may

measure a large number of people on any measure. Descriptive statistics help us to

simplify large amounts of data in a sensible way. Each descriptive statistic reduces

lots of data into a simpler summary. For instance, the Grade Point Average (GPA) for

a students describes the general performance of a student across a wide range of

subjects or courses.

Descriptive statistics includes the construction of graphs, charts and tables and

the calculation of various descriptive measures such as averages (mean) and measure

of variation (standard deviation). The purpose of descriptive statistics is to summarise,

arrange and present a set of data in such a way that facilitates interpretation. Most of

Page 4: Chapter 1 Introduction

Chapter 1: Introduction

4

the statistical presentations appearing in newspapers and magazines are descriptive in

nature.

b) Inferential Statistics

Inferential statistics or statistical induction comprises the use of statistics to

make inferences concerning some unknown aspect of a population. Inferential

statistics is relatively new. Major development began with the works of Karl Pearson

(1857-1936) and the works of Ronald Fisher (1890-1962) who published their

findings in the early years of the 20th

century. Since the work of Pearson and Fisher,

inferential statistics has evolved rapidly and is now applied in many different fields

and disciplines.

Inference is the act or process of deriving a conclusion solely on what one

already knows. In other words, you are trying to reach conclusions that extend beyond

data obtained from your sample towards what the population might think. You are

using methods for drawing and measuring the reliability of conclusions about a

population based on information obtained from a sample of the population. Among

the widely used inferential statistical tools are the t-test, analysis of variance,

Pearson‟s correlation, linear regression and multiple regression.

c) Descriptive or Inferential Statistics

Descriptive statistics and inferential statistics are interrelated. You must

almost always use techniques of descriptive statistics to organise and summarise the

information obtained from a sample before carrying out an inferential analysis.

Furthermore, the preliminary descriptive analysis of a sample often reveals features

that lead you to the choice of the appropriate inferential method.

As you proceed through his course, you will obtain a more thorough

understanding of the principles of descriptive and inferential statistics. You should

establish from the beginning the intent of your study. If the intent of your study is to

examine and explore the data obtained for its own intrinsic interest only, the study is

descriptive. However, if the information is obtained from a sample of a population

and the intent of the study is to use that information to draw conclusions about the

population, the study is inferential. Thus, a descriptive study may be performed on a

sample as well as on a population. Only when an inference is made about the

population, based on data obtained from the sample, does the study become

inferential.

LEARNING ACTIVITY

a) Define statistics

b) Explain the differences between descriptive and

inferential statistics

c) When would you use the two types of statistics?

d) Explain two ways in which descriptive statistics and

inferential statistics are interrelated.

Page 5: Chapter 1 Introduction

Chapter 1: Introduction

5

Variables

Before you can use any statistical tool to analyse your data, you obviously

need to have data which has to be collected. What is data? Data is defined as pieces

of information when processed or analysed enables interpretation. Quantitative data

consists of numbers while qualitative data consists of words and phrases. For

example, the scores obtained from 30 students in a mathematics test is data. To

explain the performance of the 30 students you need to process or analyse the scores

(or data) using a calculator or computer or manually. We collect and analyse data to

explain phenomenon. A phenomenon is explained based on the interaction between

two or more variables. The following is an example of a phenomenon:

Intelligence Quotient (IQ) and Attitude Influence Performance in Mathematics

Note that there are THREE variables explaining the particular phenomenon; namely,

Intelligence Quotient, Attitude and Mathematics Performance

What is a Variable?

A variable is a construct that is deliberately and consciously invented or

adopted for a special scientific purpose. For example, the variable "Intelligence" is a

construct based on observation of presumably intelligent and less intelligent

behaviours. Intelligence can be specified by observing and measuring using

intelligence tests, interviewing teachers about intelligent and less intelligent students.

Basically, a variable is something that “varies” and has a value. A variable is a

symbol to which are assigned numerals of values. For example, the variable

“mathematics performance” is assigned scores obtained from performance on a

mathematics test and may vary or range from 0 to 100.

A variable can be either a continuous variable (ordinal variable) or

categorical variable (nominal variable). In the case of the variable "gender" there

are only 2 values; i.e. male and female and is called a categorical or nominal

variable Other examples of categorical variables are: graduate-nongraduate, low

income-high income, citizen-noncitizen. There are also variables which have more

than two values such as religion which may have several values such as Islam,

Christianity, Sikhism, Buddhism and Hinduism.

When you use any statistical tool you should be very clear which variables

have been identified as independent variables and which variables are dependent

variable.

a) Independent Variable An independent variable (IV) is the variable that is presumed cause a change in the

dependent variable (DV). The independent variables is the antecedent while the

dependent variable is the consequent. See Figure below which describes a study to

determine which teaching method (independent variable) is effective in enhancing the

academic performance in history (dependent variable) of students.

Page 6: Chapter 1 Introduction

Chapter 1: Introduction

6

An independent variable (teaching method) can be manipulated „Manipulated‟

means the variable can manoeuvred, and in this case it is divided into „discovery

method' and „lecture method‟. Other examples of independent variables are gender

(male-female), race (Malay, Chinese, Indian), socioeconomic status (high, middle,

low). Other names for the independent variable are treatment, factor and predictor

variable.

b) Dependent Variable The dependent variable in this study is academic performance which cannot be

manipulated by the researcher. Academic performance is a score and other examples

of dependent variables IQ (score from IQ tests), attitude (score on an attitude scale),

self-esteem (score from a self-esteem test) and so forth. Other names for the

dependent variable are outcome variable, results variable and criterion variable.

Put it another way, the DV is the variable predicted to, whereas the independent

variable is predicted from. The DV is the presumed effect, which varies with changes

or variation in the independent variable.

Operational Definition of Variables

As mentioned earlier a variable is “deliberately” constructed for a specific

purpose. Hence, a variable used in your study may be different from a variable used in

another study even though they have the same name. For example, the variable

“academic achievement” used in your study may be computed based on performance

in the UPSR examination while in another study it may computed using a battery of

tests you developed. Operational definition (Bridgman, 1927) means that variables

used in the study must be defined as it is used in the context of the study. This is done

to facilitate measurement and to eliminate confusion.

Thus, it is essential that you stipulate clearly how you have defined variables

specific to your study. For example, in an experiment to determine the effectiveness

of the discovery method in teaching science, the researcher will have to explain in

great detail the variable “discovery” method used in the experiment. Even though

there are general principles of the discovery method, its application in the classroom

may vary. In other words, you have to define the variable operationally or how it is

used in the experiment.

LEARNING ACTIVITY

a) What is a variable?

b) Explain the differences between an ordinal and nominal

variable

c) Why should variables be operationally defined?

LEARNING ACTIVITY

a) What is a variable?

b) Explain the differences between an ordinal and nominal

variable

c) Why should variables be operationally defined?

Page 7: Chapter 1 Introduction

Chapter 1: Introduction

7

Sampling

Every day we make judgements and decisions based on samples. For example,

when you pick a grape and taste it before buying the whole bunch of grapes, you are

sampling. Based on the one grape you have tasted, you will make the decision

whether all the grapes on display are fresh and sweet. Similarly, when a teacher who

asks a student two or three questions is attempting to determine his or her grasp of an

entire subject. People are not usually aware that such a pattern of thinking is

sampling.

Population (Universe) is defined as an aggregate of people, objects, items,

etc. possessing common characteristics. It is a complete group of people,

objects, items, etc. about which we want to study. Every person, object, item,

etc. has some certain specified attributes. In Figure 1.1, the population consists

of #, $, @, & and %.

Sample is that part of the population or universe we select for the purpose of

investigation. The sample is used as an "example" and in fact the word sample

derives from the Latin exemplum, which means example. A sample should

exhibit the characteristics of the population or universe; it should be a

"microcosm", a word which literally means "small universe". In the figure

shown, the sample also consists of one #, $, @, & and %.

We use samples to make inferences about the population. Reasoning from a

sample to the population is called statistical induction. Based on the characteristics of

a specifically chosen sample (a small part of the population of the group that we

observe), we make inferences concerning the characteristics of the population. We

measure the trait or characteristic in a sample and generalise the finding to the

population from which the sample was taken.

Population

Sample

Figure 1.1 Drawing a sample from the population

# & @ $ % $ % # @ & @ # $ % &

@ # % &

Page 8: Chapter 1 Introduction

Chapter 1: Introduction

8

Why is a sample used in educational research?

The study of a sample offers several advantages over a complete study of the

universe. Why and when is it desirable to study a sample rather than the population or

universe?

In most studies, investigation of the sample is the only way of finding out

about a particular phenomenon. In some cases, because of financial, time and

physical constraints, it is practically impossible to study the population and

investigation of the sample is the only way of making a study.

If one were to study the population, then every item in the population is

studied. Imagine having to study 500,000 Form V students in Malaysia!

Wonder what the costs will be!. Even if you have the money and time to study

the population of Form V students, it may take so much time, that the findings

are of no use by the time they become available.

Studying the population may not be necessary, since we have sound sampling

techniques that will yield satisfactory results. Of course, we cannot expect

from a sample exactly the same answer that might be had from studying the

whole population.

However, by using statistics, we can establish based on the results obtained

from a sample, the limits, with a known probability where the true answer lies.

We are able to generalise logically and precisely about different kinds of

phenomena which we have never seen, simply based upon a sample, of say,

200 students.

LEARNING ACTIVITY

a) What is the difference between a population and sample?

b) Why is study of the population practically impossible?

c) “The sample should be representative of the population”

Explain.

d) Explain why a sample of 30 doctors from Kuala Lumpur

taken to estimate the average income of all Kuala

Lumpur residents is not representative.

Page 9: Chapter 1 Introduction

Chapter 1: Introduction

9

Sampling Techniques

When students are asked how they selected the sample for their study, quite a

few are unable to explain convincingly the techniques used and rationale for selection

of the sample. If you have to draw a sample, you must choose the method for

obtaining the sample from the population. In making that choice, keep in mind that

the sample will be used to draw conclusions about the entire population.

Consequently, the sample should be a representative sample, that is, it should reflect

as closely as possible the relevant characteristics of the population under

consideration.

a) Simple Random Sampling

All individuals in the defined population have an equal and independent

chance of being selected as a member of the sample. By 'independent' is meant that

the selection of one individual does not affect in any way the selection of any other

individual. i.e. each individual, event or object has an equal probability of being

selected. Suppose for example there are 10,000 Form 1 students in a particular district

and you want to select a simple random sample of 500 students, when we select the

first case, each student has one chance in 1000 of being selected. Once the student is

selected, the next student to be selected has a 1 in 9999 chance of being selected.

Thus, as each case is selected, the probability of being selected next changes slightly

because the population from which we are selecting has become one case smaller.

Using a Table of Random Numbers to select a sample. Obtain a list of all

Form 1 students in Daerah Petaling and assign a number to each student. Then get a

table of random numbers which consists of a long series of 3 or 4-digit numbers

generated randomly by a computer. Using the table, you randomly select a row or

column as a starting point, then select all the numbers that follow in that row or

column. If more numbers are needed, proceed to the next row or column until enough

number has been selected to make-up the desired sample

Table of Random Numbers

Column Number

Line

Number

Say for example, you choose line 3 and begin your selection. You will select

student #265, followed by student #313 and student #492. When you come to „805‟

you skip the number because you only need numbers between 1 and 500. You

837 603 125 716 098 988 009

405 435 544 351 897 815 805

265 313 492 805 404 550 426 336 498 706 702 697 065 771

670 796 368 476 787 021 828

376 985 353 419 096 911 815

789 731 220 500 480 302 769

Page 10: Chapter 1 Introduction

Chapter 1: Introduction

10

proceed to the next number, i.e. student #404. Again you skips „550‟ and proceed to

select student #426. You continue until you have selected all 500 students to form

your sample. To avoid repetition, you also eliminate numbers that have occurred

previously. If you have not found enough numbers by the time you reach the bottom

of the table, you move over to the next line or column.

b) Systematic Sampling Systematic sampling is random sampling with a system. From the sampling frame, a

starting point is chosen at random, and thereafter at regular intervals. If it can be

ensured that the list of students from the accessible population is randomly listed than

systematic sampling can be used. First, you divide the accessible population (1000) by

the sample desired (100) which will give you 10. Next, select a figure less or smaller

than the number arrived by the division, i.e. less than 10. If you choose 8, then you

select every eighth name from the list of population. If the random starting point is 10,

then the subjects selected are 10, 18, 26, 34, 42, 50, 58, 66, and 74 until you have

your sample of 100 subjects. This method differs from random sampling because each

member of the population is not chosen independently. The advantage is that it

spreads the sample more evenly over the population and it is easier to conduct than a

simple random sample

LEARNING ACTIVITY

a) Briefly discuss how you would select a sample of 300

teachers from a population 5000 teachers in a district

using systematic sampling.

b) What are some advantages of using systematic

sampling?

LEARNING ACTIVITY

a) What is meaning of random?

b) What is simple random sampling technique?

c) Explain the use of the Table of Random Numbers in

the selection of a sample randomly?

Page 11: Chapter 1 Introduction

Chapter 1: Introduction

11

c) Stratified Sampling In certain studies, the researcher wants to ensure that certain sub-groups or

stratum of individuals are included in the sample and for this stratified sampling is

preferred. For example, if you intend to study differences in reasoning skills among

students in you school according to socio-economic status and gender, random

sampling may not ensure that you have sufficient number of male and female students

with the socio-economic levels. The size of the sample in each stratum is taken in

proportion to the size of the stratum. This is called proportional allocation.

Suppose that shown below is the population of students in your school.

Male, High Income 160

Female, High Income 140

Male, Low Income 360

Female, Low Income 340

TOTAL 1000

The first step is to find the total number of students (990) and calculate the percentage

in each group.

% male, high income = ( 160 / 1000 ) x 100 = 16%

% male, part time = ( 140 / 1000 ) x 100 = 14%

% female, full time = (360 / 1000 ) x 100 = 36%

% female, part time = (340 / 1000) x 100 = 34%

If you want a sample of 100 students you should ensure that:

16% should be male, high income = 16 students

14% should be female, high income = 14 students

36% should be male, high income = 36 students

34% should be female, low income = 34 students

When you take a sample from each stratum randomly, it is referred to as stratified

random sampling. The advantage of stratified sampling is that it ensures better

coverage of the population than simple random sampling. Also, it is often

administratively more convenient to stratify a sample so that interviewers can be

specifically trained to deal with a particular age groups or ethnic group.

Page 12: Chapter 1 Introduction

Chapter 1: Introduction

12

d) Cluster Sampling

In cluster sampling, the unit of sampling is not the individual but rather a

naturally group of individuals. Cluster sampling is used when it is more feasible or

convenient to select groups of individuals than it is to select individuals from a

defined population. Clusters are chosen to be as heterogeneous as possible, that is, the

subjects within each cluster are diverse and each cluster is somewhat representative of

the population as a whole. Thus, only a sample of the clusters needs to be taken to

capture all the variability in the population.

For example,. in a particular district there are 10,000 households and they are

clustered into 25 sections. In cluster sampling, you draw a random sample of 5

sections or clusters from the list of 25 sections or clusters. Then you study every

household in each of the 5 sections or clusters. The main advantage of cluster

sampling is that it saves time and money. However, it may be less precise than simple

random sampling.

LEARNING ACTIVITY

Male, full-time teachers = 90

Male, part-time teachers = 18

Female, full-time teachers = 63

Female, part-time teachers = 9

The data above shows the number of full-time and part-time

teachers in a school according to gender.

Select a sample of 40 teachers using stratified sampling.

Page 13: Chapter 1 Introduction

Chapter 1: Introduction

13

SUMMARY

Statistics is a branch of mathematics dealing with the collection, analysis,

interpretation, and presentation of masses of numerical data

Fear of mathematics can lead to a defeatist attitude which may affect the way

you approach statistics.

Descriptive statistics includes the construction of graphs, charts and tables and

the calculation of various descriptive measures such as averages (mean) and

measure of variation (standard deviation).

Inferential statistics or statistical induction comprises the use of statistics to

make inferences concerning some unknown aspect of a population.

A variable is a construct that is deliberately and consciously invented or

adopted for a special scientific purpose.

A variable can be either a continuous variable (ordinal variable) or categorical

variable (nominal variable).

An independent variable (IV) is the variable that is presumed cause a change

in the dependent variable (DV).

The dependent variable in this study is academic performance which cannot be

manipulated by the researcher.

Operational definition means that variables used in the study must be defined

as it is used in the context of the study.

Population (Universe) is defined as an aggregate of people, objects, items, etc.

possessing common characteristics while sample is that part of the population

or universe we select for the purpose of investigation.

Simple random sampling: All individuals in the defined population have an

equal and independent chance of being selected as a member of the sample.

In a stratified sample the sampling frame is divided into non-overlapping

groups or strata and a sample is taken from each stratum.

Systematic sampling is random sampling with a system. From the sampling

frame, a starting point is chosen at random, and thereafter at regular intervals.

In cluster sampling, the unit of sampling is not the individual but rather a

naturally group of individuals.

Page 14: Chapter 1 Introduction

Chapter 1: Introduction

14

KEY WORDS:

Statistics

Descriptive statistics

Inferential statistics

Variable

Nominal variable

Ordinal variable

Independent variable

Dependent variable

Sampling

Random sampling

Systematic sampling

Stratified sampling Cluster sampling

REFERENCES

Johnson, R. & Kuby, P. (2007). Elementary Statistics. Singapore: Thomson

Brooks/Cole.