Abdm4064 week 11 data analysis
-
Upload
stephen-ong -
Category
Business
-
view
109 -
download
3
description
Transcript of Abdm4064 week 11 data analysis
Data AnalysisData AnalysisData AnalysisData Analysis
ABDM4064 BUSINESS RESEARCHABDM4064 BUSINESS RESEARCH
byStephen Ong
Principal Lecturer (Specialist)Visiting Professor, Shenzhen University
19–2
LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
1. Know when a response is really an error and should be edited
2. Appreciate coding of pure qualitative research
3. Understand the way data are represented in a data file
4. Understand the coding of structured responses including a dummy variable approach
5. Appreciate the ways that technological advances have simplified the coding process
After studying this chapter, you should be able to
6. Know what descriptive statistics are and why they are used
7. Create and interpret simple tabulation tables
8. Understand how cross-tabulations can reveal relationships
9. Perform basic data transformations
10. List different computer software products designed for descriptive statistical analysis
11. Understand a researcher’s role in interpreting the data
12. Implement the hypothesis-testing procedure
13. Use p-values to assess statistical significance
19–3
LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
14. Test a hypothesis about an observed mean compared to some standard
15. Know the difference between Type I and Type II errors
16. Know when a univariate χ2 test is appropriate and how to conduct one
17. Recognize when a bivariate statistical test is appropriate
18. Calculate and interpret a χ2 test for a contingency table
19. Calculate and interpret an independent samples t-test comparing two means
20. Understand the concept of analysis of variance (ANOVA)
21. Interpret an ANOVA table
19–4
LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
22. Apply and interpret simple bivariate correlations
23. Interpret a correlation matrix
24. Understand simple (bivariate) regression
25. Understand the least-squares estimation technique
26. Interpret regression output including the tests of hypotheses tied to specific parameter coefficients
27. Understand what multivariate statistical analysis involves and know the two types of multivariate analysis
28. Interpret results from multiple regression analysis
29. Interpret results from multivariate analysis of variance (MANOVA)
19–5
LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
30. Interpret basic exploratory factor analysis results
31. Know what multiple discriminant analysis can be used to do
32. Understand how cluster analysis can identify market segments
19–6
LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES
Remember this,Remember this,
Garbage in, garbage out!Garbage in, garbage out! If data is collected improperly, or coded If data is collected improperly, or coded
incorrectly, then the research results incorrectly, then the research results are “garbage”.are “garbage”.
Stages of Data AnalysisStages of Data Analysis Raw Data
The unedited responses from a respondent exactly as indicated by that respondent.
Nonrespondent Error Error that the respondent is not responsible
for creating, such as when the interviewer marks a response incorrectly.
Data Integrity The notion that the data file actually contains
the information that the researcher is trying to obtain to adequately address research questions.
19–9
EXHIBIT 19.EXHIBIT 19.11 Overview of the Stages of Data AnalysisOverview of the Stages of Data Analysis
EditingEditing Editing
The process of checking the completeness, consistency, and legibility of data and making the data ready for coding and transfer to storage.
E.g. How long you have stayed at your current address? 45
The researchers need to make adjustment/reconstruct responses
Field Editing – useful in personal interview
Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.
In-House Editing
A rigorous editing job performed by a centralized office staff.
Editing – what to do?Editing – what to do? Checking for Consistency
Respondents match defined population – e.g. SBS?
Check for consistency within the data collection framework – e.g. items listed by the respondents are within the definition.
Taking Action When Response is Obviously in Error Change/correct responses only when there are
multiple pieces of evidence for doing so. Editing Technology
Computer routines can check for consistency automatically.
19–13
Editing for CompletenessEditing for Completeness Item Nonresponse
The technical term for an unanswered question on an otherwise complete questionnaire resulting in missing data.
Most of the time the researchers will do nothing to it. But sometimes the question is linked to another question
therefore the researchers have to fill-in-the blank. Plug Value
An answer that an editor “plugs in” to replace blanks or missing values so as to permit data analysis.
Choice of value is based on a predetermined decision rule, e.g. take an average value or neutral value.
Several choices: Leave it blank Plug in alternate choices. Randomly select an answer. Impute a missing value.
Impute
To fill in a missing data point through the use of a statistical process providing an educated guess for the missing response based on available information.
I.e. based on the respondent’s choices to other questions.
Editing for Completeness Editing for Completeness (cont’d)(cont’d)
What about missing data? Many statistical software programs required complete data for an analysis to take place.
List-wise deletion The entire record for a respondent that has left a
response missing is excluded from use in statistical analysis.
Pair-wise deletion Only the actual variables for a respondent that
do not contain information are eliminated from use in statistical analysis.
Please take note,Please take note,
When a questionnaire has too many When a questionnaire has too many missing answer, it may not be suitable missing answer, it may not be suitable for the planned data analysis. In such for the planned data analysis. In such situation, that particular questionnaire situation, that particular questionnaire has to be dropped from the sample.has to be dropped from the sample.
Facilitating the Coding Facilitating the Coding ProcessProcess
Editing And Tabulating “Don’t Know” Answers Legitimate don’t know (no opinion) Reluctant don’t know (refusal to answer) Confused don’t know (does not
understand)
Editing (cont’d)Editing (cont’d) Pitfalls of Editing
Allowing subjectivity to enter into the editing process. Data editors should be intelligent, experienced, and
objective. A systematic procedure for assessing the
questionnaire should be developed by the research analyst so that the editor has clearly defined decision rules.
Pretesting Edit Editing during the pretest stage can prove very
valuable for improving questionnaire format, identifying poor instructions or inappropriate question wording.
Coding Qualitative ResponsesCoding Qualitative Responses Coding
The process of assigning a numerical score or other character symbol to previously edited data.
Codes Rules for interpreting, classifying, and
recording data in the coding process. The actual numerical or other character
symbols assigned to raw data. Dummy Coding
Numeric “1” or “0” coding where each number represents an alternate response such as “female” or “male.”
If k is the number of categories for a qualitative variable, k-1 dummy variables are needed.
Data File TerminologyData File Terminology Field
A collection of characters that represents a single type of data—usually a variable.
String Characters Computer terminology to represent formatting
a variable using a series of alphabetic characters (nonnumeric characters) that may form a word.
Record A collection of related fields that represents
the responses from one sampling unit.
Data File Terminology (cont’d)Data File Terminology (cont’d)
Data File The way a data set is stored electronically
in spreadsheet-like form in which the rows represent sampling units and the columns represent variables.
Value Labels Unique labels assigned to each possible
numeric code for a response.
Code ConstructionCode Construction Two Basic Rules for Coding Categories:
1. They should be exhaustive, meaning that a coding category should exist for all possible responses.
2. They should be mutually exclusive and independent, meaning that there should be no overlap among the categories to ensure that a subject or response can be placed in only one category.
Test Tabulation – especially useful for open-ended questions
Tallying of a small sample of the total number of replies to a particular question in order to construct coding categories.
Purpose is to preliminarily identify the stability and distribution of answers that will determine a coding scheme.
Test Tabulation
E.g. 1st respondent: I don’t like to use Facebook
because it is wasting time. 2nd respondent: I don’t know what is Facebook. 3rd respondent: Facebook takes me a lot of time.
Based on the above 3 answer, you can have 2 groups of answer: 1st group: Time factor 2nd group: No knowledge on Facebook
Devising the Coding SchemeDevising the Coding Scheme A coding scheme should not be too
elaborate. The coder’s task is only to summarize the
data. Categories should be sufficiently
unambiguous that coders will not classify items in different ways.
Code book Identifies each variable in a study and gives
the variable’s description, code name, and position in the data matrix.
The Nature of Descriptive The Nature of Descriptive AnalysisAnalysis
Descriptive Analysis The elementary transformation of raw data
in a way that describes the basic characteristics such as central tendency, distribution, and variability.
Histogram A graphical way of showing a frequency
distribution in which the height of a bar corresponds to the observed frequency of the category.
20–26
EXHIBIT 20.EXHIBIT 20.11 Levels of Scale Measurement and Suggested Descriptive StatisticsLevels of Scale Measurement and Suggested Descriptive Statistics
Creating and Interpreting Creating and Interpreting TabulationTabulation
Tabulation The orderly arrangement of data in a table or
other summary format showing the number of responses to each response category.
Tallying is the term when the process is done by hand.
Frequency Table A table showing the different ways
respondents answered a question. Sometimes called a marginal tabulation.
Frequency Table ExampleFrequency Table Example
Cross-TabulationCross-Tabulation Cross-Tabulation
Addresses research questions involving relationships among multiple less-than interval variables.
Results in a combined frequency table displaying one variable in rows and another variable in columns.
Contingency Table A data matrix that displays the frequency of some
combination of responses to multiple variables. Marginals
Row and column totals in a contingency table, which are shown in its margins.
20–30
EXHIBIT 20.EXHIBIT 20.22 Cross-Tabulation Tables from a Survey Regarding AIG and Cross-Tabulation Tables from a Survey Regarding AIG and Government BailoutsGovernment Bailouts
20–31
EXHIBIT 20.EXHIBIT 20.33 Different Ways of Depicting the Cross-Tabulation of Biological Sex Different Ways of Depicting the Cross-Tabulation of Biological Sex and Target Patronageand Target Patronage
Cross-Tabulation (cont’d)Cross-Tabulation (cont’d) Percentage Cross-Tabulations
Statistical base – the number of respondents or observations (in a row or column) used as a basis for computing percentages.
Elaboration and Refinement Elaboration analysis – an analysis of the
basic cross-tabulation for each level of a variable not previously considered, such as subgroups of the sample.
Moderator variable – a third variable that changes the nature of a relationship between the original independent and dependent variables.
EXHIBIT 20.EXHIBIT 20.44 Cross-Tabulation of Marital Status, Sex, and Responses to the Cross-Tabulation of Marital Status, Sex, and Responses to the Question “Do You Shop at Target?”Question “Do You Shop at Target?”
Cross-Tabulation (cont’d)Cross-Tabulation (cont’d) How Many Cross-Tabulations?
Every possible response becomes a possible explanatory variable.
When hypotheses involve relationships among two categorical variables, cross-tabulations are the right tool for the job.
Quadrant Analysis An extension of cross-tabulation in which
responses to two rating-scale questions are plotted in four quadrants of a two-dimensional table.
Importance-performance analysis
EXHIBIT 20.EXHIBIT 20.55 An Importance-Performance or Quadrant Analysis of HotelsAn Importance-Performance or Quadrant Analysis of Hotels
20–36
Data TransformationData Transformation Data Transformation
Process of changing the data from their original form to a format suitable for performing a data analysis addressing research objectives.
Bimodal
20–37
Problems with Data Problems with Data TransformationsTransformations
Median Split Dividing a data set into two categories by placing
respondents below the median in one category and respondents above the median in another.
The approach is best applied only when the data do indeed exhibit bimodal characteristics.
Inappropriate collapsing of continuous variables into categorical variables ignores the information contained within the untransformed values.
20–38
EXHIBIT 20.EXHIBIT 20.66 Bimodal Distributions Are Consistent with Bimodal Distributions Are Consistent with Transformations into Categorical ValuesTransformations into Categorical Values
20–39
EXHIBIT 20.EXHIBIT 20.77 The Problem with Median Splits with Unimodal DataThe Problem with Median Splits with Unimodal Data
20–40
Index NumbersIndex Numbers Index Numbers
Scores or observations recalibrated to indicate how they relate to a base number.
Price indexes Represent simple data transformations that
allow researchers to track a variable’s value over time and compare a variable(s) with other variables.
Recalibration allows scores or observations to be related to a certain base period or base number.
20–41
EXHIBIT 20.EXHIBIT 20.88 Hours of Television Usage per WeekHours of Television Usage per Week
20–42
Calculating Rank OrderCalculating Rank Order
Rank Order Ranking data can be summarized by
performing a data transformation. The transformation involves multiplying
the frequency by the ranking score for each choice resulting in a new scale.
20–43
EXHIBIT 20.EXHIBIT 20.99 Executive Rankings of Potential Conference DestinationsExecutive Rankings of Potential Conference Destinations
20–44
EXHIBIT 20.EXHIBIT 20.1010 Frequencies of Conference Destination RankingsFrequencies of Conference Destination Rankings
20–45
EXHIBIT 20.EXHIBIT 20.1111 Pie Charts Work Well with Tabulations and Cross-TabulationsPie Charts Work Well with Tabulations and Cross-Tabulations
20–46
Computer Programs for Computer Programs for AnalysisAnalysis
Statistical Packages Spreadsheets
Excel Statistical software:
SAS SPSS (Statistical
Package for Social Sciences)
MINITAB
20–47
Computer Graphics and Computer Graphics and Computer MappingComputer Mapping
Box and Whisker Plots Graphic representations of central
tendencies, percentiles, variabilities, and the shapes of frequency distributions.
Interquartile Range A measure of variability.
Outlier A value that lies outside the normal range
of the data.
20–48
EXHIBIT 20.15EXHIBIT 20.15 Computer Drawn Computer Drawn Box and Whisker Box and Whisker
PlotPlot
SPSS WindowsSPSS Windows The main program in SPSS is FREQUENCIES. It produces a The main program in SPSS is FREQUENCIES. It produces a
table of frequency counts, percentages, and cumulative table of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of the percentages for the values of each variable. It gives all of the associated statistics. associated statistics.
If the data are interval scaled and only the summary statistics If the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used. are desired, the DESCRIPTIVES procedure can be used.
The EXPLORE procedure produces summary statistics and The EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately for graphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation, groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics that minimum, maximum, and range are some of the statistics that can be calculated. can be calculated.
SPSS WindowsSPSS WindowsTo select these procedures click:To select these procedures click:
Analyze>Descriptive Statistics>FrequenciesAnalyze>Descriptive Statistics>FrequenciesAnalyze>Descriptive Statistics>DescriptivesAnalyze>Descriptive Statistics>DescriptivesAnalyze>Descriptive Statistics>ExploreAnalyze>Descriptive Statistics>Explore
The major cross-tabulation program is CROSSTABS.The major cross-tabulation program is CROSSTABS.This program will display the cross-classification tables and This program will display the cross-classification tables and provide cell counts, row and column percentages, the provide cell counts, row and column percentages, the chi-square test for significance, and all the measures of the chi-square test for significance, and all the measures of the strength of the association that have been discussed. strength of the association that have been discussed.
To select these procedures, click:To select these procedures, click:
Analyze>Descriptive Statistics>CrosstabsAnalyze>Descriptive Statistics>Crosstabs
SPSS WindowsSPSS WindowsThe major program for conducting parametric tests in SPSS is The major program for conducting parametric tests in SPSS is COMPARE MEANS. This program can be used to conduct COMPARE MEANS. This program can be used to conduct tt tests tests on one sample or independent or paired samples. To select these on one sample or independent or paired samples. To select these procedures using SPSS for Windows, click:procedures using SPSS for Windows, click:
Analyze>Compare Means>Means …Analyze>Compare Means>Means …
Analyze>Compare Means>One-Sample T Test …Analyze>Compare Means>One-Sample T Test …
Analyze>Compare Means>Independent-Samples T Test …Analyze>Compare Means>Independent-Samples T Test …
Analyze>Compare Means>Paired-Samples T Test …Analyze>Compare Means>Paired-Samples T Test …
SPSS WindowsSPSS WindowsThe nonparametric tests discussed in this chapter canThe nonparametric tests discussed in this chapter canbe conducted using NONPARAMETRIC TESTS. be conducted using NONPARAMETRIC TESTS.
To select these procedures using SPSS for Windows,To select these procedures using SPSS for Windows,click:click:
Analyze>Nonparametric Tests>Chi-Square …Analyze>Nonparametric Tests>Chi-Square …
Analyze>Nonparametric Tests>Binomial …Analyze>Nonparametric Tests>Binomial …
Analyze>Nonparametric Tests>Runs …Analyze>Nonparametric Tests>Runs …
Analyze>Nonparametric Tests>1-Sample K-S …Analyze>Nonparametric Tests>1-Sample K-S …
Analyze>Nonparametric Tests>2 Independent Samples …Analyze>Nonparametric Tests>2 Independent Samples …
Analyze>Nonparametric Tests>2 Related Samples …Analyze>Nonparametric Tests>2 Related Samples …
1 - 53
SPSS Windows: SPSS Windows: FrequenciesFrequencies
1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar.
2.2. Click DESCRIPTIVE STATISTICS and Click DESCRIPTIVE STATISTICS and select FREQUENCIES.select FREQUENCIES.
3.3. Move the variable “Familiarity [familiar]” Move the variable “Familiarity [familiar]” to the VARIABLE(s) box.to the VARIABLE(s) box.
4.4. Click STATISTICS.Click STATISTICS.
5.5. Select MEAN, MEDIAN, MODE, STD. Select MEAN, MEDIAN, MODE, STD. DEVIATION, VARIANCE, and RANGE.DEVIATION, VARIANCE, and RANGE.
SPSS Windows: SPSS Windows: Frequencies Frequencies
6.6. Click CONTINUE.Click CONTINUE.
7.7. Click CHARTS.Click CHARTS.
8.8. Click HISTOGRAMS, then click CONTINUE.Click HISTOGRAMS, then click CONTINUE.
9.9. Click OK.Click OK.
Introduction of a Third Variable in Introduction of a Third Variable in Cross-TabulationCross-Tabulation
1 - 57
SPSS Windows: Cross-SPSS Windows: Cross-tabulationstabulations
1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar.
2.2. Click on DESCRIPTIVE STATISTICS and select Click on DESCRIPTIVE STATISTICS and select CROSSTABS.CROSSTABS.
3.3. Move the variable “Internet Usage Group [iusagegr]” to Move the variable “Internet Usage Group [iusagegr]” to the ROW(S) box.the ROW(S) box.
4.4. Move the variable “Sex[sex]” to the COLUMN(S) box.Move the variable “Sex[sex]” to the COLUMN(S) box.
5.5. Click on CELLS.Click on CELLS.
6.6. Select OBSERVED under COUNTS and COLUMN under Select OBSERVED under COUNTS and COLUMN under PERCENTAGES. PERCENTAGES.
SPSS Windows: Cross-SPSS Windows: Cross-tabulations tabulations
7.7. Click CONTINUE.Click CONTINUE.
8.8. Click STATISTICS.Click STATISTICS.
9.9. Click on CHI-SQUARE, PHI AND CRAMER’S Click on CHI-SQUARE, PHI AND CRAMER’S VV..
10.10. Click CONTINUE.Click CONTINUE.
11.11. Click OK.Click OK.
20–60
InterpretationInterpretation Interpretation
The process of drawing inferences from the analysis results.
Inferences drawn from interpretations lead to managerial implications and decisions.
From a management perspective, the qualitative meaning of the data and their managerial implications are an important aspect of the interpretation.
Hypothesis TestingHypothesis Testing Types of Hypotheses
Relational hypotheses Examine how changes in one variable vary with
changes in another. Hypotheses about differences between
groups Examine how some variable varies from one group
to another. Hypotheses about differences from some
standard Examine how some variable differs from some
preconceived standard. These tests typify univariate statistical tests.
21–62
Types of Statistical AnalysisTypes of Statistical Analysis Univariate Statistical Analysis
Tests of hypotheses involving only one variable.
Testing of statistical significance
Bivariate Statistical Analysis Tests of hypotheses involving two variables.
Multivariate Statistical Analysis Statistical analysis involving three or more
variables or sets of variables.
21–63
The Hypothesis-Testing The Hypothesis-Testing ProcedureProcedure
Process1. The specifically stated hypothesis is derived
from the research objectives.2. A sample is obtained and the relevant
variable is measured. 3. The measured sample value is compared to
the value either stated explicitly or implied in the hypothesis. If the value is consistent with the hypothesis, the
hypothesis is supported. If the value is not consistent with the hypothesis,
the hypothesis is not supported.
21–64
Statistical Analysis: Key TermsStatistical Analysis: Key Terms Hypothesis
Unproven proposition: a supposition that tentatively explains certain facts or phenomena.
An assumption about nature of the world.
Null Hypothesis Statement about the status quo. No difference in sample and population.
Alternative Hypothesis Statement that indicates the opposite of the
null hypothesis.
21–65
Significance Levels and p-Significance Levels and p-valuesvalues Significance Level
A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true.
The acceptable level of Type I error. p-value
Probability value, or the observed or computed significance level.
p-values are compared to significance levels to test hypotheses.
Higher p-values equal more support for an hypothesis.
21–66
EXHIBIT 21.EXHIBIT 21.11 pp-Values and Statistical Tests-Values and Statistical Tests
21–67
EXHIBIT 21.EXHIBIT 21.22
As the observed mean gets further from the standard (proposed population mean), the p-value decreases. The lower the p-value, the more confidence you have that the sample mean is different.
21–68
An Example of Hypothesis TestingAn Example of Hypothesis TestingThe null hypothesis: the mean is equal to 3.0:
The alternative hypothesis: the mean does not equal to 3.0:
21–69
An Example of Hypothesis TestingAn Example of Hypothesis Testing
21–70
EXHIBIT 21.EXHIBIT 21.33 A Hypothesis Test Using the Sampling Distribution of A Hypothesis Test Using the Sampling Distribution of XX under the Hypothesis under the Hypothesis µµ = = 3.03.0
—
Critical Values Critical Values Values that lie Values that lie exactly on the exactly on the boundary of the boundary of the region of rejection.region of rejection.
Type I and Type II ErrorsType I and Type II Errors
Type I Error An error caused by rejecting the null
hypothesis when it is true.
Has a probability of alpha (α).
Practically, a Type I error occurs when the researcher concludes that a relationship or difference exists in the population when in reality it does not exist.
““There really are no monsters under the bed.”There really are no monsters under the bed.”
Type I and Type II Errors Type I and Type II Errors (cont’d)(cont’d)
Type II Error An error caused by failing to reject the null
hypothesis when the alternative hypothesis is true.
Has a probability of beta (β).
Practically, a Type II error occurs when a researcher concludes that no relationship or difference exists when in fact one does exist.
““There really are monsters under the bed.”There really are monsters under the bed.”
EXHIBIT 21.EXHIBIT 21.44 Type I and Type II Errors in Hypothesis TestingType I and Type II Errors in Hypothesis Testing
21–74
Choosing the Appropriate Choosing the Appropriate Statistical TechniqueStatistical Technique
Choosing the correct statistical technique requires considering: Type of question to be answered
E.g. Ranking question – rank order test Number of variables involved
One variable – univariate statistical analysis Two variable – bivariate statistical analysis More than two variables – multivariate analysis
Level of scale measurement E.g. in nominal scale, mean and median is
meaningless.
21–75
Parametric versus Parametric versus Nonparametric TestsNonparametric Tests
Parametric Statistics Involve numbers with known, continuous
distributions. Appropriate when:
Data are interval or ratio scaled.Sample size is large.
Nonparametric Statistics Appropriate when the variables being analyzed
do not conform to any known or continuous distribution.
EXHIBIT 21.EXHIBIT 21.55
Univariate Statistical Choice Made EasyUnivariate Statistical Choice Made Easy
21–77
The The tt-Distribution-Distribution t-test
A hypothesis test that uses the t-distribution.
A univariate t-test is appropriate when the variable being analyzed is interval or ratio.
Degrees of freedom (d.f.) The number of
observations minus the number of constraints or assumptions needed to calculate a statistical term.
21–78
EXHIBIT 21.EXHIBIT 21.66 The t-Distribution for Various Degrees of FreedomThe t-Distribution for Various Degrees of Freedom
21–79
Calculating a Confidence Interval Estimate Calculating a Confidence Interval Estimate Using the Using the tt-Distribution-Distribution
Calculating a Confidence Interval Estimate Calculating a Confidence Interval Estimate Using the t-Distribution (cont’d)Using the t-Distribution (cont’d)
28.5)18
81.2(12.289.3
49.2)18
81.2(12.289.3
21–81
One-Tailed Univariate One-Tailed Univariate tt-Tests-Tests One-tailed Test
Appropriate when a research hypothesis implies that an observed mean can only be greater than or less than a hypothesized value.
E.g. “Females score higher than males in English Test”
Only one of the “tails” of the bell-shaped normal curve is relevant.
A one-tailed test can be determined from a two-tailed test result by taking half of the observed p-value.
When there is any doubt about whether a one- or two-tailed test is appropriate, opt for the less conservative two-tailed test.
21–82
Two-Tailed Univariate Two-Tailed Univariate tt-Tests-Tests Two-tailed Test
Tests for differences from the population mean that are either greater or less. i.e. Identify whether there is any difference.
E.g. The English test scores of females are different from the scores of males.
Extreme values of the normal curve (or tails) on both the right and the left are considered.
When a research question does not specify whether a difference should be greater than or less than, a two-tailed test is most appropriate.
When the researcher has any doubt about whether a one- or two-tailed test is appropriate, he or she should opt for the less conservative two-tailed test.
Univariate Hypothesis Test Univariate Hypothesis Test Utilizing the Utilizing the tt-Distribution-Distribution
Example: Suppose a Pizza Inn manager believes the
average number of returned pizzas each day to be 20.
The store records the number of defective assemblies for each of the 25 days it was opened in a given month.
The mean was calculated to be 22, and the standard deviation to be 5.
20 0 :H
Univariate Hypothesis Test Univariate Hypothesis Test Utilizing theUtilizing the t t-Distribution: An -Distribution: An
ExampleExampleThe sample mean is
equal to 20.The sample mean is
equal not to 20.
20 1 :H
nSSX / 25/5 1
Univariate Hypothesis Test Univariate Hypothesis Test Utilizing the Utilizing the tt-Distribution: An -Distribution: An
Example (cont’d)Example (cont’d) The researcher desired a 95 percent
confidence; the significance level becomes 0.05.
The researcher must then find the upper and lower limits of the confidence interval to determine the region of rejection. Thus, the value of t is needed. For 24 degrees of freedom (n-1= 25-1),
the t-value is 2.064.
Univariate Hypothesis Test Utilizing Univariate Hypothesis Test Utilizing thethe t t-Distribution: An Example -Distribution: An Example
(cont’d)(cont’d)93617
25
5064220 ....
Xlc StLower limit
=
0642225
5064220 ....
Xlc StUpper limit
=
Univariate Hypothesis Test Univariate Hypothesis Test Utilizing theUtilizing the t t-Distribution: -Distribution:
An Example (cont’d)An Example (cont’d)Univariate Hypothesis Test Univariate Hypothesis Test tt-Test-Test
X
obs S
Xt
1
2022
1
2 2
This is less than the critical t-value of 2.064 at the 0.05 level with 24 degrees of freedom hypothesis is not supported.
21–88
The Chi-Square Test for The Chi-Square Test for Goodness of FitGoodness of Fit
Chi-square (χ2) test Tests for statistical significance. Is particularly appropriate for testing
hypotheses about frequencies arranged in a frequency or contingency table.
Goodness-of-Fit (GOF) A general term representing how well some
computed table or matrix of values matches some population or predetermined table or matrix of the same size.
The Chi-Square Test for The Chi-Square Test for Goodness of Fit: An ExampleGoodness of Fit: An Example
The Chi-Square Test for Goodness of The Chi-Square Test for Goodness of Fit: An Example (cont’d)Fit: An Example (cont’d)
i
ii( ²
E
E )²O
χ² = chi-square statisticsOi = observed frequency in the ith cellEi = expected frequency on the ith cell
n
CRE jiij
Chi-Square Test: Estimation for Chi-Square Test: Estimation for Expected Number for Each CellExpected Number for Each Cell
Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample size
Hypothesis Test of a ProportionHypothesis Test of a Proportion Hypothesis Test of a Proportion
Is conceptually similar to the one used when the mean is the characteristic of interest but that differs in the mathematical formulation of the standard error of the proportion.
pobs S
pZ
π is the population proportionp is the sample proportionπ is estimated with p
What Is the Appropriate Test What Is the Appropriate Test of Difference?of Difference?
Test of Differences
An investigation of a hypothesis that two (or more) groups differ with respect to measures on a variable.
Behaviour, characteristics, beliefs, opinions, emotions, or attitudes
Bivariate Tests of Differences
Involve only two variables: a variable that acts like a dependent variable and a variable that acts as a classification variable.
Differences in mean scores between groups or in comparing how two groups’ scores are distributed across possible response categories.
22–94
EXHIBIT 22.EXHIBIT 22.11 Some Bivariate HypothesesSome Bivariate Hypotheses
Cross-Tabulation Tables: The Cross-Tabulation Tables: The χχ22 Test for Goodness-of-FitTest for Goodness-of-Fit
Cross-Tabulation (Contingency) Table A joint frequency distribution of observations
on two more variables. χ2 Distribution
Provides a means for testing the statistical significance of a contingency table.
Involves comparing observed frequencies (Oi) with expected frequencies (Ei) in each cell of the table.
Captures the goodness- (or closeness-) of-fit of the observed distribution with the expected distribution.
Chi-Square TestChi-Square Test
i
ii
E
)²E(O χ²
χ² = chi-square statisticOi = observed frequency in the ith cellEi = expected frequency on the ith cell
n
CRE jiij
Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample size
Degrees of Freedom (d.f.)Degrees of Freedom (d.f.)
d.f.=(R-1)(C-1)d.f.=(R-1)(C-1)
22–98
Example: Papa John’s RestaurantsExample: Papa John’s RestaurantsUnivariate Hypothesis:Univariate Hypothesis:Papa John’s restaurants are Papa John’s restaurants are more likely to be located in more likely to be located in a stand-alone location or in a stand-alone location or in a shopping center.a shopping center.
Bivariate Bivariate Hypothesis: Stand-Hypothesis: Stand-alone locations are alone locations are more likely to be more likely to be profitable than are profitable than are shopping center shopping center locations.locations.
Example: Papa John’s Example: Papa John’s Restaurants (cont’d)Restaurants (cont’d)
In this example, χ2 = 22.16 with 1 d.f. From Table A.4, the critical value at the
0.05 level with 1 d.f. is 3.84. Thus, we are 95 percent confident that
the observed values do not equal the expected values.
But are the deviations from the expected values in the hypothesized direction?
χχ22 Test for Goodness-of-Fit Test for Goodness-of-Fit RecapRecap
Testing the hypothesis involves two key steps:
1. Examine the statistical significance of the observed contingency table.
2. Examine whether the differences between the observed and expected values are consistent with the hypothesized prediction.
The The tt-Test for Comparing Two Means-Test for Comparing Two Means Independent Samples t-Test
A test for hypotheses stating that the mean scores for some interval- or ratio-scaled variable grouped based on some less-than-interval classificatory variable are not the same.
means random ofy Variabilit
2 MeanSample - 1 MeanSample t
21
21 XXS
t
The The tt-Test for Comparing -Test for Comparing Two Means (cont’d)Two Means (cont’d)
Pooled Estimate of the Standard Error An estimate of the standard error for a t-test of
independent means that assumes the variances of both groups are equal.
2121
222
211 11
2
1121 nnnn
SnSnS XX
))(
© 2010 South-Western/Cengage Learning. All rights reserved. May not
be scanned, copied or duplicated, or posted to a publically accessible
website, in whole or in part.22–103
EXHIBIT 22.EXHIBIT 22.22 Independent Samples Independent Samples tt-Test Results-Test Results
Comparing Two Means (cont’d)Comparing Two Means (cont’d) Paired-Samples t-Test
Compares the scores of two interval variables drawn from related populations.
Used when means need to be compared that are not from independent samples.
© 2010 South-Western/Cengage Learning. All rights reserved. May not
be scanned, copied or duplicated, or posted to a publically accessible
website, in whole or in part.22–105
EXHIBIT 22.EXHIBIT 22.44 Example Results for a Paired Samples Example Results for a Paired Samples tt-Test-Test
A Classification of Hypothesis Testing A Classification of Hypothesis Testing Procedures for Examining DifferencesProcedures for Examining Differences
1 - 107
SPSS Windows: One SPSS Windows: One Sample Sample t t TestTest
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click COMPARE MEANS and then ONE Click COMPARE MEANS and then ONE SAMPLE T TEST.SAMPLE T TEST.
3.3. Move “Familiarity [familiar]” in to the TEST Move “Familiarity [familiar]” in to the TEST VARIABLE(S) box.VARIABLE(S) box.
4.4. Type “4” in the TEST VALUE box.Type “4” in the TEST VALUE box.
5.5. Click OK.Click OK.
SPSS Windows: SPSS Windows: Two Independent Samples t TestTwo Independent Samples t Test
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click COMPARE MEANS and then INDEPENDENT Click COMPARE MEANS and then INDEPENDENT SAMPLES T TEST.SAMPLES T TEST.
3.3. Move “Internet Usage Hrs/Week [iusage]” in to the TEST Move “Internet Usage Hrs/Week [iusage]” in to the TEST VARIABLE(S) box.VARIABLE(S) box.
4.4. Move “Sex[sex]” to GROUPING VARIABLE box.Move “Sex[sex]” to GROUPING VARIABLE box.
5.5. Click DEFINE GROUPS. Click DEFINE GROUPS.
6.6. Type “1” in GROUP 1 box and “2” in GROUP 2 box. Type “1” in GROUP 1 box and “2” in GROUP 2 box.
7.7. Click CONTINUE.Click CONTINUE.
8.8. Click OK.Click OK.
SPSS Windows: Paired Samples t SPSS Windows: Paired Samples t TestTest
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click COMPARE MEANS and then PAIRED Click COMPARE MEANS and then PAIRED SAMPLES T TEST.SAMPLES T TEST.
3.3. Select “Attitude toward Internet [iattitude]” and Select “Attitude toward Internet [iattitude]” and then select “Attitude toward technology then select “Attitude toward technology [tattitude].” Move these variables in to the PAIRED [tattitude].” Move these variables in to the PAIRED VARIABLE(S) box.VARIABLE(S) box.
4.4. Click OK.Click OK.
Relationship Amongst Test, Analysis of Relationship Amongst Test, Analysis of Variance, Analysis of Covariance, & Variance, Analysis of Covariance, &
RegressionRegression
One Independent One or More
Metric Dependent Variable
t Test
Binary
Variable
One-Way Analysisof Variance
One Factor
N-Way Analysisof Variance
More thanOne Factor
Analysis ofVariance
Categorical:Factorial
Analysis ofCovariance
Categoricaland Interval
Regression
Interval
Independent Variables
The The ZZ-Test for Comparing -Test for Comparing Two ProportionsTwo Proportions
Z-Test for Differences of Proportions Tests the hypothesis that proportions are
significantly different for two independent samples or groups.
Requires a sample size greater than thirty.
The hypothesis is: Ho: π1 = π2
may be restated as: Ho: π1 - π2 = 0
The The ZZ-Test for Comparing Two -Test for Comparing Two ProportionsProportions
ZZ-Test statistic for differences in large -Test statistic for differences in large random samples:random samples:
21
2121
ppS
ppZ
p1 = sample portion of successes in Group 1
p2 = sample portion of successes in Group 2
1 1) = hypothesized population proportion 1
minus hypothesized population proportion 2
Sp1-p2 = pooled estimate of the standard errors of
differences of proportions
The The ZZ-Test for Comparing Two -Test for Comparing Two ProportionsProportions
To calculate the standard error of the To calculate the standard error of the differences in proportions:differences in proportions:
21
1121 nn
qpS pp
One-Way Analysis of Variance One-Way Analysis of Variance (ANOVA)(ANOVA)
Analysis of Variance (ANOVA) An analysis involving the investigation of the
effects of one treatment variable on an interval-scaled dependent variable.
A hypothesis-testing technique to determine whether statistically significant differences in means occur between two or more groups.
A method of comparing variances to make inferences about the means.
The substantive hypothesis tested is: At least one group mean is not equal to another At least one group mean is not equal to another
group mean.group mean.
Partitioning Variance in Partitioning Variance in ANOVAANOVA
Total Variability Grand Mean
The mean of a variable over all observations.
SST = Total of (observed value-grand mean)2
Partitioning Variance in ANOVAPartitioning Variance in ANOVA
Between-Groups Variance The sum of differences between the group mean
and the grand mean summed over all groups for a given set of observations.
SSB = Total of ngroup(Group Mean − Grand Mean)2
Within-Group Error or Variance The sum of the differences between observed
values and the group mean for a given set of observations
Also known as total error variance.
SSE = Total of (Observed Mean − Group Mean)2
The The FF-Test-Test F-Test
Used to determine whether there is more variability in the scores of one sample than in the scores of another sample.
Variance components are used to compute F-ratios
SSE, SSB, SST
groupswithinVariance
groupsbetweenVarianceF
EXHIBIT 22.EXHIBIT 22.66 Interpreting ANOVAInterpreting ANOVA
1 - 120
SPSS WindowsSPSS Windows
One-way ANOVA can be efficiently One-way ANOVA can be efficiently performed using the program COMPARE performed using the program COMPARE MEANS and then One-way ANOVA. To MEANS and then One-way ANOVA. To select this procedure using SPSS for select this procedure using SPSS for Windows, click:Windows, click:
Analyze>Compare Means>One-Way ANOVA …Analyze>Compare Means>One-Way ANOVA …
N-way analysis of variance and analysis of N-way analysis of variance and analysis of covariance can be performed using covariance can be performed using GENERAL LINEAR MODEL. To select this GENERAL LINEAR MODEL. To select this procedure using SPSS for Windows, click:procedure using SPSS for Windows, click:
Analyze>General Linear Model>Univariate …Analyze>General Linear Model>Univariate …
SPSS Windows: One-Way SPSS Windows: One-Way ANOVAANOVA
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click COMPARE MEANS and then ONE-WAY ANOVA.Click COMPARE MEANS and then ONE-WAY ANOVA.
3.3. Move “Sales [sales]” in to the DEPENDENT LIST box.Move “Sales [sales]” in to the DEPENDENT LIST box.
4.4. Move “In-Store Promotion[promotion]” to the FACTOR Move “In-Store Promotion[promotion]” to the FACTOR box.box.
5.5. Click OPTIONS.Click OPTIONS.
6.6. Click Descriptive. Click Descriptive.
7.7. Click CONTINUE.Click CONTINUE.
8.8. Click OK.Click OK.
SPSS Windows: Analysis of CovarianceSPSS Windows: Analysis of Covariance
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click GENERAL LINEAR MODEL and then UNIVARIATE.Click GENERAL LINEAR MODEL and then UNIVARIATE.
3.3. Move “Sales [sales]” in to the DEPENDENT VARIABLE Move “Sales [sales]” in to the DEPENDENT VARIABLE box.box.
4.4. Move “In-Store Promotion[promotion]” to the FIXED Move “In-Store Promotion[promotion]” to the FIXED FACTOR(S) box. Then move “Coupon[coupon] also to FACTOR(S) box. Then move “Coupon[coupon] also to the FIXED FACTOR(S) box. the FIXED FACTOR(S) box.
5.5. Move “Clientel[clientel] to the COVARIATE(S) box.Move “Clientel[clientel] to the COVARIATE(S) box.
6.6. Click OK.Click OK.
The BasicsThe Basics Measures of Association
Refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables.
The chi-square (2) test provides information about whether two or more less-than interval variables are interrelated.
Correlation analysis is most appropriate for interval or ratio variables.
Regression can accommodate either less-than interval or interval independent variables, but the dependent variable must be continuous.
23–125
EXHIBIT 23.EXHIBIT 23.11
Bivariate Analysis—Bivariate Analysis—Common Procedures for Common Procedures for
Testing AssociationTesting Association
Simple Correlation Coefficient Simple Correlation Coefficient (continued)(continued)
Correlation coefficient A statistical measure of the covariation, or
association, between two at-least interval variables.
Covariance Extent to which two variables are
associated systematically with each other.
n
i
n
i
n
iii
yxxy
YYiXXi
YYXX
rr
1 1
22
1
Simple Correlation CoefficientSimple Correlation Coefficient Correlation coefficient (r)
Ranges from +1 to -1 Perfect positive linear relationship = +1 Perfect negative (inverse) linear relationship = -1 No correlation = 0
Correlation coefficient for two variables (X,Y)
EXHIBIT 23.EXHIBIT 23.22 Scatter Diagram to Illustrate Correlation PatternsScatter Diagram to Illustrate Correlation Patterns
Correlation, Covariance, and Correlation, Covariance, and CausationCausation
When two variables covary (i.e. vary systematically), they display concomitant variation.
This systematic covariation does not in and of itself establish causality.
e.g., Rooster’s crow and the rising of the sun Rooster does not cause the sun to rise.
Coefficient of DeterminationCoefficient of Determination
Coefficient of Determination (R2) A measure obtained by squaring the
correlation coefficient; the proportion of the total variance of a variable accounted for by another value of another variable.
Measures that part of the total variance of Y that is accounted for by knowing the value of X.
Variance Total
varianceExplained2 R
Correlation MatrixCorrelation Matrix
Correlation matrix The standard form for reporting correlation
coefficients for more than two variables. Statistical Significance
The procedure for determining statistical significance is the t-test of the significance of a correlation coefficient.
EXHIBIT 23.EXHIBIT 23.44 Pearson Product-Moment Correlation Matrix for Salesperson Pearson Product-Moment Correlation Matrix for Salesperson ExampleExampleaa
Regression AnalysisRegression Analysis Simple (Bivariate) Linear Regression
A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable.
The Regression Equation (Y = α + βX ) Y = the continuous dependent variable X = the independent variable α = the Y intercept (regression line intercepts
Y axis) β = the slope of the coefficient (rise over run)
130
120
110
100
90
80
80 90 100 110 120 130 140 150 160 170
X
Y
XaY ˆˆ
XY
Regression Line and SlopeRegression Line and Slope
The Regression EquationThe Regression Equation Parameter Estimate Choices
β is indicative of the strength and direction of the relationship between the independent and dependent variable.
α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X)
Standardized Regression Coefficient (β) Estimated coefficient of the strength of
relationship between the independent and dependent variables.
Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1).
The Regression Equation (cont’d)The Regression Equation (cont’d)
Parameter Estimate Choices Raw regression estimates (b1)
Raw regression weights have the advantage of retaining the scale metric—which is also their key disadvantage.
If the purpose of the regression analysis is forecasting, then raw parameter estimates must be used.
This is another way of saying when the researcher is interested only in prediction.
Standardized regression estimates (β) Standardized regression estimates have the advantage
of a constant scale. Standardized regression estimates should be used when
the researcher is testing explanatory hypotheses.
EXHIBIT 23.EXHIBIT 23.55 The Advantage of Standardized Regression WeightsThe Advantage of Standardized Regression Weights
EXHIBIT 23.EXHIBIT 23.66 Relationship of Sales Potential to Building Permits IssuedRelationship of Sales Potential to Building Permits Issued
EXHIBIT 23.EXHIBIT 23.77 The Best Fit Line or Knocking Out the PinsThe Best Fit Line or Knocking Out the Pins
Ordinary Least-Squares Ordinary Least-Squares (OLS) Method of Regression (OLS) Method of Regression
AnalysisAnalysis OLS Guarantees that the resulting straight line will produce the
least possible total error in using X to predict Y. Generates a straight line that minimizes the sum of
squared deviations of the actual values from this predicted regression line.
No straight line can completely represent every dot in the scatter diagram.
There will be a discrepancy between most of the actual scores (each dot) and the predicted score .
Uses the criterion of attempting to make the least amount of total error in prediction of Y from X.
Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)
(cont’d)(cont’d)
Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)
(cont’d)(cont’d)
The equation means that the predicted value for any value of X (Xi) is determined as a function of the estimated slope coefficient, plus the estimated intercept coefficient + some error.
© 2010 South-Western/Cengage Learning. All rights reserved. May not
be scanned, copied or duplicated, or posted to a publically accessible
website, in whole or in part.23–143
Ordinary Least-Squares Ordinary Least-Squares Method of Regression Method of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d)
© 2010 South-Western/Cengage Learning. All rights reserved. May not
be scanned, copied or duplicated, or posted to a publically accessible
website, in whole or in part.23–144
Ordinary Least-Squares Ordinary Least-Squares Method of Regression Method of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d) Statistical Significance Of Regression Model
F-test (regression) Determines whether more variability is explained
by the regression or unexplained by the regression.
Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)
(cont’d)(cont’d) Statistical Significance Of Regression ModelStatistical Significance Of Regression Model
ANOVA Table:ANOVA Table:
Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)
(cont’d)(cont’d) R2
The proportion of variance in Y that is explained by X (or vice versa)
A measure obtained by squaring the correlation coefficient; that proportion of the total variance of a variable that is accounted for by knowing the value of another variable.
875.040.882,3
49.398,32 R
EXHIBIT 23.EXHIBIT 23.88 Simple Regression Results for Building Permit ExampleSimple Regression Results for Building Permit Example
EXHIBIT 23.EXHIBIT 23.99 OLS Regression LineOLS Regression Line
Simple Regression and Simple Regression and Hypothesis TestingHypothesis Testing
The explanatory power of regression lies in hypothesis testing. Regression is often used to test relational hypotheses. The outcome of the hypothesis test involves
two conditions that must both be satisfied: The regression weight must be in the hypothesized
direction. Positive relationships require a positive coefficient and negative relationships require a negative coefficient.
The t-test associated with the regression weight must be significant.
What is Multivariate Data What is Multivariate Data Analysis?Analysis?
Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis. Methods analyze multiple variables or even
multiple sets of variables simultaneously. Business problems involve multivariate data
analysis: most employee motivation research customer psychographic profiles research that seeks to identify viable market segments
The “Variate” in MultivariateThe “Variate” in Multivariate
Variate A mathematical way in which a set of
variables can be represented with one equation.
A linear combination of variables, each contributing to the overall meaning of the variate based upon an empirically derived weight.
A function of the measured variables involved in an analysis: Vk = f (X1, X2, . . . , Xm )
EXHIBIT 24.EXHIBIT 24.11 Which Multivariate Approach Is Appropriate?Which Multivariate Approach Is Appropriate?
24–153
Classifying Multivariate Classifying Multivariate TechniquesTechniques
Dependence Techniques Explain or predict one or more dependent
variables. Needed when hypotheses involve distinction
between independent and dependent variables. Types:
Multiple regression analysis Multiple discriminant analysis Multivariate analysis of variance Structural equations modeling
Classifying Multivariate Classifying Multivariate Techniques (cont’d)Techniques (cont’d)
Interdependence Techniques Give meaning to a set of variables or seek
to group things together. Used when researchers examine questions
that do not distinguish between independent and dependent variables.
Types: Factor analysis Cluster analysis Multidimensional scaling
Classifying Multivariate Classifying Multivariate Techniques (cont’d)Techniques (cont’d)
Influence of Measurement Scales The nature of the measurement scales will
determine which multivariate technique is appropriate for the data.
Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables.
Nominal and ordinal scales are nonmetric. Interval and ratio scales are metric.
24–156
EXHIBIT 24.EXHIBIT 24.22 Which Multivariate Dependence Technique Should I Use?Which Multivariate Dependence Technique Should I Use?
24–157
EXHIBIT 24.EXHIBIT 24.33 Which Multivariate Interdependence Technique Should I Use?Which Multivariate Interdependence Technique Should I Use?
Analysis of DependenceAnalysis of Dependence General Linear Model (GLM)
A way of explaining and predicting a dependent variable based on fluctuations (variation) from its mean due to changes in independent variables.
μ = a constant (overall mean of the dependent variable)
∆X and ∆F = changes due to main effect independent variables(experimental variables) and blocking independent variables (covariates or grouping variables)
∆ XF = represents the change due to the combination(interaction effect) of those variables.
Interpreting Multiple RegressionInterpreting Multiple Regression Multiple Regression Analysis
An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously.
inni eXbXbXbXbbY 3322110
•Dummy variable The way a dichotomous (two group)
independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other.
Multiple Regression AnalysisMultiple Regression Analysis
A Simple Example Assume that a toy manufacturer wishes to explain
store sales (dependent variable) using a sample of stores from Canada and Europe.
Several hypotheses are offered: H1: Competitor’s sales are related negatively to
sales. H2: Sales are higher in communities with a sales
office thanwhen no sales office is present.
H3: Grammar school enrollment in a community is related
positively to sales.
Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d) Statistical Results of the Multiple Regression
Regression Equation:
Coefficient of multiple determination (R2) = 0.845
F-value= 14.6, p < 0.05
321 7362115387018102 XXXY ....
Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d)
Regression Coefficients in Multiple Regression Partial correlation
The correlation between two variables after taking into account the fact that they are correlated with other variables too.
R2 in Multiple Regression The coefficient of multiple determination in
multiple regression indicates the percentage of variation in Y explained by all independent variables.
24–163
Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d)
Statistical Significance in Multiple Regression F-test
Tests statistical significance by comparing the variation explained by the regression equation to the residual error variation.
Allows for testing of the relative magnitudes of the sum of squares due to the regression (SSR) and the error sum of squares (SSE).
MSE
MSR
knSSe
kSSrF
1/
/
Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d)
Degrees of Freedom (d.f.) k = number of independent variables n = number of observations or
respondents Calculating Degrees of Freedom (d.f.)
d.f. for the numerator = k d.f. for the denominator = n - k - 1
FF-test-test
MSE
MSR
knSSe
kSSrF
1/
/
EXHIBIT 24.EXHIBIT 24.44
Interpreting Multiple Interpreting Multiple Regression ResultsRegression Results
ANOVA (n-way) and MANOVAANOVA (n-way) and MANOVA
Multivariate Analysis of Variance (MANOVA) A multivariate technique that predicts
multiple continuous dependent variables with multiple categorical independent variables.
ANOVA (n-way) and MANOVA ANOVA (n-way) and MANOVA (cont’d)(cont’d)
Interpreting N-way (Univariate) ANOVA1. Examine overall model F-test result. If
significant, proceed.2. Examine individual F-tests for individual
variables.3. For each significant categorical independent
variable, interpret the effect by examining the group means.
4. For each significant, continuous covariate, interpret the parameter estimate (b).
5. For each significant interaction, interpret the means for each combination.
Discriminant AnalysisDiscriminant Analysis A statistical technique for predicting the
probability that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables. To calculate discriminant scores, the linear
function used is:
niniii XbXbXbZ 2211
Discriminant Analysis Discriminant Analysis ExampleExample
332211 XbXbXbZ
321 0007001300690 XXX ...
EXHIBIT 24.EXHIBIT 24.55 Multivariate Dependence Techniques SummaryMultivariate Dependence Techniques Summary
Factor AnalysisFactor Analysis Statistically identifies a reduced number
of factors from a larger number of measured variables.
Types: Exploratory factor analysis (EFA)—performed
when the researcher is uncertain about how many factors may exist among a set of variables.
Confirmatory factor analysis (CFA)—performed when the researcher has strong theoretical expectations about the factor structure before performing the analysis.
EXHIBIT 24.EXHIBIT 24.66 A Simple Illustration of Factor AnalysisA Simple Illustration of Factor Analysis
Factor Analysis (cont’d)Factor Analysis (cont’d)
How Many Factors Eigenvalues are a measure of how much
variance is explained by each factor. Common rule:
Base the number of factors on the number of eigenvalues greater than 1.0.
Factor Loading Indicates how strongly a measured
variable is correlated with a factor.
Factor Analysis (cont’d)Factor Analysis (cont’d) Factor Rotation
A mathematical way of simplifying factor analysis results to better identify which variables “load on” which factors.
Most common procedure is varimax rotation. Data Reduction Technique
Approaches that summarize the information from many variables into a reduced set of variates formed as linear combinations of measured variables.
The rule of parsimony: an explanation involving fewer components is better than one involving many more.
Factor Analysis (cont’d)Factor Analysis (cont’d)
Creating Composite Scales with Factor Results When a clear pattern of loadings exists, the
researcher may take a simpler approach by summing the variables with high loadings and creating a summated scale.
Very low loadings suggest a variable does not contribute much to the factor.
The reliability of each summated scale is tested by computing a coefficient alpha estimate.
Factor Analysis (cont’d)Factor Analysis (cont’d)
Communality A measure of the percentage of a
variable’s variation that is explained by the factors.
A relatively high communality indicates that a variable has much in common with the other variables taken as a group.
Communality for any variable is equal to the sum of the squared loadings for that variable.
Factor Analysis (cont’d)Factor Analysis (cont’d)
Total Variance Explained Squaring and totaling each loading factor;
dividing the total by the number of factors provides an estimate of variance in a set of variables explained by a factor.
This explanation of variance is much the same as R2 in multiple regression.
1 - 179
SPSS SPSS WindowsWindows
To select this procedure using SPSS for To select this procedure using SPSS for Windows, click:Windows, click:
Analyze>Data Reduction>Factor …Analyze>Data Reduction>Factor …
SPSS Windows: Principal Components SPSS Windows: Principal Components
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.2.2. Click DATA REDUCTION and then FACTOR.Click DATA REDUCTION and then FACTOR.3.3. Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],” Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],”
“Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth “Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth [v6]” into the VARIABLES box[v6]” into the VARIABLES box
4.4. Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click CONTINUE.CONTINUE.
5.5. Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL COMPONENTS (default). In the ANALYZE box, check CORRELATION COMPONENTS (default). In the ANALYZE box, check CORRELATION MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.
6.6. Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY box, check ROTATED SOLUTION. Click CONTINUE.box, check ROTATED SOLUTION. Click CONTINUE.
7.7. Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE COEFFICIENT MATRIX. Click CONTINUE.COEFFICIENT MATRIX. Click CONTINUE.
8.8. Click OK.Click OK.
Cluster AnalysisCluster Analysis Cluster analysis
A multivariate approach for grouping observations based on similarity among measured variables.
Cluster analysis is an important tool for identifying market segments.
Cluster analysis classifies individuals or objects into a small number of mutually exclusive and exhaustive groups.
Objects or individuals are assigned to groups so that there is great similarity within groups and much less similarity between groups.
The cluster should have high internal (within-cluster) homogeneity and external (between-cluster) heterogeneity.
EXHIBIT 24.EXHIBIT 24.77 Clusters of Individuals on Two DimensionsClusters of Individuals on Two Dimensions
24–184
EXHIBIT 24.EXHIBIT 24.88 Cluster Analysis of Test-Market CitiesCluster Analysis of Test-Market Cities
1 - 185
SPSS WindowsSPSS Windows
To select this procedure using SPSS for To select this procedure using SPSS for Windows, click:Windows, click:
Analyze>Classify>Hierarchical Cluster …Analyze>Classify>Hierarchical Cluster …
Analyze>Classify>K-Means Cluster …Analyze>Classify>K-Means Cluster …
Analyze>Classify>Two-Step Cluster Analyze>Classify>Two-Step Cluster
SPSS Windows: Hierarchical ClusteringSPSS Windows: Hierarchical Clustering
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click CLASSIFY and then HIERARCHICAL CLUSTER.Click CLASSIFY and then HIERARCHICAL CLUSTER.
3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.“Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.
4.4. In the CLUSTER box, check CASES (default option). In the DISPLAY box, check In the CLUSTER box, check CASES (default option). In the DISPLAY box, check STATISTICS and PLOTS (default options).STATISTICS and PLOTS (default options).
5.5. Click on STATISTICS. In the pop-up window, check AGGLOMERATION Click on STATISTICS. In the pop-up window, check AGGLOMERATION SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER OF CLUSTERS, enter 4. Click CONTINUE.OF CLUSTERS, enter 4. Click CONTINUE.
6.6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE box, check ALL CLUSTERS (default). In the ORIENTATION box, check box, check ALL CLUSTERS (default). In the ORIENTATION box, check VERTICAL. Click CONTINUE.VERTICAL. Click CONTINUE.
7.7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. Click CONTINUE.Click CONTINUE.
8.8. Click OK.Click OK.
SPSS Windows: K-Means SPSS Windows: K-Means ClusteringClustering
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click CLASSIFY and then K-MEANS CLUSTER.Click CLASSIFY and then K-MEANS CLUSTER.
3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.into the VARIABLES box.
4.4. For NUMBER OF CLUSTER, select 3.For NUMBER OF CLUSTER, select 3.
5.5. Click on OPTIONS. In the pop-up window, in the STATISTICS Click on OPTIONS. In the pop-up window, in the STATISTICS box, check INITIAL CLUSTER CENTERS and CLUSTER box, check INITIAL CLUSTER CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.INFORMATION FOR EACH CASE. Click CONTINUE.
6.6. Click OK.Click OK.
SPSS Windows: Two-Step SPSS Windows: Two-Step ClusteringClustering
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.
2.2. Click CLASSIFY and then TWO-STEP CLUSTER.Click CLASSIFY and then TWO-STEP CLUSTER.
3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the CONTINUOUS VARIABLES box.the CONTINUOUS VARIABLES box.
4.4. For DISTANCE MEASURE, select EUCLIDEAN.For DISTANCE MEASURE, select EUCLIDEAN.
5.5. For NUMBER OF CLUSTER, select DETERMINE For NUMBER OF CLUSTER, select DETERMINE AUTOMATICALLY.AUTOMATICALLY.
6.6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATION For CLUSTERING CRITERION, select AKAIKE’S INFORMATION CRITERION (AIC).CRITERION (AIC).
7.7. Click OK.Click OK.
Multidimensional ScalingMultidimensional Scaling
Multidimensional Scaling Measures objects in multidimensional
space on the basis of respondents’ judgments of the similarity of objects.
EXHIBIT 24.EXHIBIT 24.99 Perceptual Map of Six Graduate Business Schools: Simple SpacePerceptual Map of Six Graduate Business Schools: Simple Space
1 - 192
1 - 193
SPSS WindowsSPSS Windows
The multidimensional scaling program allows individual The multidimensional scaling program allows individual differences as well as aggregate analysis using ALSCAL. The differences as well as aggregate analysis using ALSCAL. The level of measurement can be ordinal, interval or ratio. Both level of measurement can be ordinal, interval or ratio. Both the direct and the derived approaches can be accommodated. the direct and the derived approaches can be accommodated.
To select multidimensional scaling procedures using SPSS To select multidimensional scaling procedures using SPSS for Windows, click:for Windows, click:
Analyze>Scale>Multidimensional Scaling …Analyze>Scale>Multidimensional Scaling …
The conjoint analysis approach can be implemented using The conjoint analysis approach can be implemented using regression if the dependent variable is metric (interval or regression if the dependent variable is metric (interval or ratio). ratio).
This procedure can be run by clicking:This procedure can be run by clicking:
Analyze>Regression>Linear …Analyze>Regression>Linear …
SPSS Windows : MDSSPSS Windows : MDSFirst convert similarity ratings to distances by subtracting each First convert similarity ratings to distances by subtracting each value of Table 21.1 from 8. The form of the data matrix has to value of Table 21.1 from 8. The form of the data matrix has to be square symmetric (diagonal elements zero and distances be square symmetric (diagonal elements zero and distances above and below the diagonal. See SPSS file Table 21.1 Input). above and below the diagonal. See SPSS file Table 21.1 Input).
1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.2.2. Click SCALE and then MULTIDIMENSIONAL SCALING Click SCALE and then MULTIDIMENSIONAL SCALING
(ALSCAL).(ALSCAL).3.3. Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate
[Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite [Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite [UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],” [UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],” “Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into “Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into the VARIABLES box.the VARIABLES box.
SPSS Windows : MDSSPSS Windows : MDS
4.4. In the DISTANCES box, check DATA ARE DISTANCES. In the DISTANCES box, check DATA ARE DISTANCES. SHAPE should be SQUARE SYMMETRIC (default).SHAPE should be SQUARE SYMMETRIC (default).
5.5. Click on MODEL. In the pop-up window, in the LEVEL OF Click on MODEL. In the pop-up window, in the LEVEL OF MEASUREMENT box, check INTERVAL. In the SCALING MEASUREMENT box, check INTERVAL. In the SCALING MODEL box, check EUCLIDEAN DISTANCE. In the MODEL box, check EUCLIDEAN DISTANCE. In the CONDITIONALITY box, check MATRIX. Click CONTINUE.CONDITIONALITY box, check MATRIX. Click CONTINUE.
6.6. Click on OPTIONS. In the pop-up window, in the DISPLAY Click on OPTIONS. In the pop-up window, in the DISPLAY box, check GROUP PLOTS, DATA MATRIX and MODEL box, check GROUP PLOTS, DATA MATRIX and MODEL AND OPTIONS SUMMARY. Click CONTINUE.AND OPTIONS SUMMARY. Click CONTINUE.
7.7. Click OK.Click OK.
24–197
EXHIBIT 24.EXHIBIT 24.1010 Summary of Multivariate Techniques for Analysis of InterdependenceSummary of Multivariate Techniques for Analysis of Interdependence
Further ReadingFurther Reading COOPER, D.R. AND SCHINDLER, P.S. (2011)
BUSINESS RESEARCH METHODS, 11TH EDN, MCGRAW HILL
ZIKMUND, W.G., BABIN, B.J., CARR, J.C. AND GRIFFIN, M. (2010) BUSINESS RESEARCH METHODS, 8TH EDN, SOUTH-WESTERN
SAUNDERS, M., LEWIS, P. AND THORNHILL, A. (2012) RESEARCH METHODS FOR BUSINESS STUDENTS, 6TH EDN, PRENTICE HALL.
SAUNDERS, M. AND LEWIS, P. (2012) DOING RESEARCH IN BUSINESS & MANAGEMENT, FT PRENTICE HALL.