Post on 20-May-2019
TWO GROUPS DISCRIMINANT ANALYSIS: BASIC EXPLORATION AND APPLICATION IN PSYCHOLOGY DATA
LEE SAN JING
UNIVERSITI TEKNOLOGI MALAYSIA
TWO GROUPS DISCRIMINANT ANALYSIS:
BASIC EXPLORATION AND APPLICATION IN PSYCHOLOGY DATA
LEE SAN JING
This thesis is submitted in fulfillment of the requirement for the award of the
degree of Bachelor of Science and Education (Mathematics)
Faculty of Education
Universiti Teknologi Malaysia
MAY 2006
iii
To my beloved family, brothers, and friends…..
iv
ACKNOWLEGDEMENT
First of all, thank you Lord. With all HIS guidance and the assets HE gave
me, I was able to complete my thesis successfully.
I would like to take this opportunity to express my deepest gratitude to my
thesis supervisor, Pn. Haliza Abdul Rahman who gave me her guidance and
suggestions to me. Thanks for giving me her time and effort to complete this thesis
successfully.
Besides, I would like to acknowledge Pn. Noraslinda Mohamed Ismail for
her guidance and helps to me. Thanks for her kindness to give time and effort in
helping me.
I also like to express my love and gratitude to my family members who have
been with me every step of the way and give me support. I want to thank my course
mates, roommate, and friends for their extremely great support and encouragements
for me. Thanks for always being there to share all my happiness and sorrow.
Last but no least, I appreciate all the support and guidance provided by
everyone who ever help me directly or indirectly. Thanks you to all.
v
ABSTRACT
Multivariate analysis is a continuation of univariate analysis that studies two
or more variables simultaneously. Discriminant analysis is part of the techniques in
multivariate analysis. This technique is designed to be used when dependent
variables are categorical data and metric data as independent variables. This report
will discuss about discriminant analysis for testing groups’ differences in term of the
value of the independent variables. It also involves the developing linear
discriminant function rule to classify individuals into the defined groups. Two
groups discriminant analysis using simultaneous direct estimation methods will be
focused. Error rate also will be computed to evaluate the performance of the
classification procedure. Discriminant analysis is applied in educational psychology
data from Wechsler Adult Intelligence Scale subtest Scores by using the Statistical
Package for Social Science (SPSS) Version 10.0.
vi
ABSTRAK
Analisis multivariat adalah analisis lanjutan daripada analisis univariat yang
melibatkan dua atau lebih pembolehubah secara serentak. Analisis diskriminasi iaitu
merupakan salah satu kaedah dalam analisis multivariat digunakan apabila
pembolehubah bersandar adalah daripada data kategori dan data metrik sebagai
pembolehubah tidak bersandar. Laporan ini akan membincangkan mengenai analisis
diskriminasi dalam menguji perbezaan di antara kumpulan dengan menggunakan
nilai pembolehubah tidak bersandar. Ini juga melibatkan pembentukan suatu Aturan
Fungsi Diskriminasi Linear untuk mengklasifikasikan individu ke kumpulan-
kumpulan yang telah diketahui. Laporan ini akan tertumpu kepada analisis
pembezalayan yang melibatkan dua kumpulan sahaja dan yang menggunakan
Keadah Penganggaran Serentak & Terus. Kadar ralat juga akan dikira untuk menilai
procedur klasifikasi itu. Akhirnya, analisis pembezalayan diaplikasikan dalam data
psikologi pendidikan daripada Wechsler Adult Intelligence Scale subtest Scores
dengan menggunakan Statistical Package for Social Science (SPSS) 10.0.
vii
TABLE OF CONTENTS
CHAPTER ITEM PAGE THESIS VALIDATION FORM
SUPERVISOR’S DECLARATION
TITLE i
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
ABSTRAK vi
TABLES OF CONTENTS vii
LIST OF TABLES xi
LIST OF FIGURES xii
LIST OF ABBREVIATIONS xiii
1 INTRODUCTION
1.1 Introduction 1
1.2 Objectives of Study 4
1.3 Scope of Study 4
1.4 Thesis Organization 5
2 DISCRIMINANT ANALYSIS
2.1 Introduction 6
viii
2.2 Goal of Discriminant Analysis 6
2.3 Application of Discriminant Analysis 9
2.4 Variables In Discriminant Analysis 12
2.5 Discriminant Rule 13
2.5.1 A Maximum Likelihood Rule 13
2.5.2 The Linear Discriminant Function Rule 15
2.5.3 A Mahalanobis Distances Rule 15
2.5.4 The Prior Probability Rule 16
2.6 Assumptions In Discriminant Analysis 23
2.6.1 Multivariate Normality 23
2.6.2 Common of Covariance Matrices 26
3 TWO GROUPS DISCRIMINANTION ANALYSIS
3.1 Introduction 27
3.2 Statistical Tests In Discriminant Analysis 27
3.2.1 Testing Differences Between
Two Groups Centroids
Using Hottelling’ T2 28
3.2.2 The Significance of
Discriminating Variables 30
3.3 Discriminant Function for Classification 31
3.3.1 Simultaneous Direct Methods 33
3.3.2 Stepwise Methods 33
3.4 Discriminant Score 33
3.5 Cuttoff Score Determination 34
3.6 Evaluating Discriminant Function 36
3.6.1 Classification Matrix 36
3.6.2 Estimating Probabilities
Misclassification 37
3.7 Linear Discriminant Function for Two
Multivariate Normal Populations with
Known Parameters when Σ=Σ=Σ 21 38
ix
3.8 Linear Discriminant Function for Samples of
Two Multivariate Normal Populations with
Known Parameters when Σ=Σ=Σ 21 42
4 DATA ANALYSIS USING DISCRIMINANT ANALYSIS
4.1 Description on Data 53
4.2 Data Analysis with Usage of SPSS 56
4.2.1 Introduction for SPSS
for Window Version 10.0 56
4.2.2 Step of Data Analysis Using SPSS 57
4.3 Results of Study Using Discriminant Analysis 64
4.3.1 Test of Normality 64
4.3.1.1 Statistical Method 64
4.3.1.2 Graphical Method 65
4.3.2 Box’s M 67
4.3.3 Group Statistics 68
4.3.4 Tests Equality of Groups Mean 68
4.3.5 Covariance Matrices 69
4.3.6 Eigenvalues 70
4.3.7 Wilks Lambda 71
4.3.8 Standard Canonical Discriminant
Function Coefficients 71
4.3.9 Structure Matrix 72
4.3.10 Unstandardized Discriminant
Function Coefficients 73
4.3.11 Function at Group Centroids 74
4.3.12 Classification Result 75
x
5 CONCLUSION AND RECOMMENDATION
5.1 Conclusion 76
5.2 Recommendation 77
REFERENCES 79
xi
LIST OF TABLES
TABLE NO. TITLE PAGE
1.1 Matrix 1
1.2 Multivariate Techniques 3
3.1 Computation of Discriminant Score 51
3.2 Classification Matrix 52
4.1 Wechsler Adult Intelligence Scale Subtest Scores 54
4.2 Descriptions of WAIS subtests 55
4.3 Test Normality 64
4.4 Log Determinants 67 4.5 Results of Box’s M 67
4.6 Mean and Standard Deviation of variables in two groups
4.7 ANOVA Table 68
4.8 Covariances Matrices of two groups 69
4.9 Pooled Within-Groups Matrices 70
4.10 Results of Eigenvalues 70
4.11 Significance test using Wilks Lambda 71
4.12 Standard Canonical Discriminant Function Coefficients 71
4.13 Structure Matrix 72 4.14 Unstandardized Coefficients 73
4.15 Group Centroids 74
4.16 Classification Results 75
xii
LIST OF FIGURES FIGURE NO TITLE PAGE
2.1 Plots of Skewness and Kurtosis 24
2.2 Normal Q-Q Plot 25
2.3 Histogram Plot 25
3.1 Plot of Discriminant Scores 34
4.1 Empty Data Editor 57
4.2 SPSS Data Editor Box for definition of variables 58
4.3 SPSS Data Editor Box to define the
dependent variable and independent variables 58
4.4 Value Label Box for “group” 59
4.5 Entering data in the data view 59
4.6 Starting Discriminant Analysis 60
4.7 Discriminant Analysis Dialog Box 60
4.8 Discriminant Analysis: Define Range of
grouping variable 61
4.9 Discriminant Analysis Box with the
“Enter variables together” 61
4.10 Discriminant Analysis Statistics for
Descriptives, Function Coefficients and matrices 62
4.11 Discriminant Analysis: Classification 62
4.12 Discriminant Analysis: Save 63
4.13 Output of the analysis 63
4.14 Normal Q-Q Plots of Variables 65
4.15 Plot of Histogram 66
xiii
LIST OF ABBREVIATIONS ( )21c - Costs when an observation 2π from incorrectly classified as 1π
( )12c - Costs when an observation 1π from incorrectly classified as 2π
id - Mahalanobis’s distance
Σ - Covariance matrix
oH - Null hypothesis
aH - Alternative hypothesis
m) - Cutoff value
in - Sample size
NG - Number of groups
p - Number of predictor variables
1p - Prior probability classified as 1π
2p - Prior probability classified as 2π
pooledS - Pooled sample variance-covariance matrix
1pooledS− - Inverse pooled sample variance-covariance matrix
iS - Sample variance-covariance matrices
wSS - Within-groups sum of squares
tSS - Total sum of squares
iµ - True mean vector
iW - Discriminant coefficient for variable i
WAIS - Wechsler Adult Intelligence Scale
iX - Values of Independent variable i
x - Predictors variables or independent variables
ix - Estimated mean vector
xiv
ox - New observation]
y - Dependent variable
iy - Group centroids or group means
nZ - Discriminant score for the nth individual
iπ - Populations
α - Significant Level
Λ Wilks Lambda
SPSS - Statistical Package for Science Social
CHAPTER 1
INTRODUCTION
1.1 Introduction
Multivariate data occur in all branches of science. An experimental unit is
an object that can be measured in some way. The objects may be are items, persons,
organizations, events, and so on. Measuring and evaluating experimental units are
two principal activities of most researchers. Multivariate data result whenever a
researcher measures more than one variables of each experimental unit. The
variables sometimes called characteristics or properties which are the aspects of the
objects that are measured. Multivariate data, whether metric (interval and ratio
scale) or nonmetric (nominal and ordinal), are typically arranged in a structure array
called matrix.
2
Table 1.1 : Matrix
p denote the number of variables and n denote the number of objects in the
sample. Each column of the matrix is a variable; each row corresponds to an object.
Multivariate techniques are increasingly popular techniques used nowadays.
Many multivariate techniques are extensions of univariate analysis such as analysis
of single distribution and bivariate analysis. Multivariate statistics are the complete
or general case, while univariate and bivariate statistics are special cases of the
multivariate model (Tabachnick, B.G 2001). The term “univariate analysis” refers
to analysis in which involves single dependent variables and multiple independent
variables. Moreover, “bivariate analysis” is an analysis of relationship between two
variables. With multivariate analysis, we can simultaneously analyze multiple
dependent and multiple independent variables.
Multivariate method is a collection of procedures for analyzing association
between two or more sets of measurements that have been made on each object in
one or more samples of objects. Multivariate method is extremely useful for helping
researchers making sense of large, complicated, and complex data sets that consists
of a lot of variables measured on large numbers of experimental units. Multivariate
methods are being widely applied in industry, government and research centers. It is
Variables
Objects
1 2 3 … p
1 11x 12x 13x … px1
2 21x 22x 23x … px2
3 31x 32x 33x … px3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
n 1nx 2nx 3nx … npx
3
becoming widely used because of the complicated research questions with univariate
analysis and the availability of computer software package in performing analyses.
The availability of computer packages such as SPSS, S-Plus, and SAS have made it
easier for statisticians and researcher to process large and complex database with
multivariate analysis.
Multivariate methods sometimes are classified as “variable directed”
techniques. These techniques are primarily concerned with the relationship that
might exist among the measured response variables. Multivariate methods used
principally for dependence analysis and independence analysis. For methods
concerning with dependence analysis, a set of dependent variables are predicted by
other independent variables. However in an independence analysis, there is no
single or no group of variables is defined as independents or dependent. It is
focusing on the distinction between methods that examine the independence among
variables.
Table 1.2 : Multivariate Techniques
Dependence Techniques Independence Techniques
• Canonical Correlation Analysis
• Conjoint Analysis
• Multiple Regression
• Multivariate Analysis of
Variance (MANOVA)
• Structural Equation Modeling
• Clusters Analysis
• Correspondence Analysis
• Discriminant Analysis
• Factor Analysis
• Linear Probability Models
• Multidimensional Scaling
• Principal Components Analysis
In this report, discriminant analysis which is one of the multivariate methods
will be discussed. Discriminant analysis is a special case of multiple regressions.
This technique is designed to be used with metric data and nonmetric data.
Discriminant analysis is similar to regression analysis except the dependent variable
is categorical rather than continuous. In regression, we want to be able to predict the
4
value of a variable of interest based on a set of predictor variables. In discriminant
analysis, we able to predict class membership of an individual based on a set of
predictors (Dallas E. Johnson, 1998).
Further studies about discriminant analysis will be discussed in the following
chapter.
1.2 Objectives of Study
In general, the objectives of this study are:
1. To learn and understand discriminant analysis between two groups.
(i) To test the significant differences between the two defined groups with
respect to a set of predictors.
(ii) To identify the predictor variables that best discriminate between two
groups.
(iii) To evaluate how well the discriminant rule be used to classify
individuals to one of the existing two groups.
2. Apply discriminant analysis in educational psychology field.
3. To learn and use the statistical computer software named Statistical Package
Science Social (SPSS) in analyzing data.
1.3 Scope of Study
This study discusses only the two-group discriminant analysis with
categorical dependent variables and metric independent variables. Hotelling’s T2 is
used to test for a significant between the two groups centroids. Simultaneous
estimation method is used to derive discriminant function. Linear discriminant rule
will be discussed with example of calculation. Discriminant analysis is applied to
5
the data in psychology educational field from Wechsler Adult Intelligence Scale
(WAIS) Subtest using SPSS.
1.4 Thesis Organization
Chapter 1 discusses about the introduction of multivariate analysis in general
followed by the objectives of the study, scope of the study and thesis organization.
Chapter 2 introduces the discriminant analysis. The objectives of
discriminant analysis and its application will be discussed in this chapter. Besides
that, it also included some limitation of the dependent and independent variables,
four different discriminant rules with an example of calculation and assessment
assumptions of discriminant analysis.
Chapter 3 discusses about a hypothesis test to test differences between two
groups centroids, and linear discriminant function for two multivariate normal
populations with known parameters when Σ=Σ=Σ 21 . It also discusses about the
determination of cutoff score, derivation dicriminant function to compute
discriminant score and evaluation of the discriminant function.
Chapter 4 discusses about the application of discriminant analysis in
educational psychology that data is from WAIS Subtest Score among Senile and No
Senile group. It also included every essential steps of using statistical software
SPSS version 10.0 in discriminant analysis and the result of the data analysis.
Chapter 5 discusses about conclusion of the whole study and some
recommendations for those who interested to pursue the study about the discriminant
analysis.