Abdm4064 week 11 data analysis

Data AnalysisData AnalysisData AnalysisData Analysis

ABDM4064 BUSINESS RESEARCHABDM4064 BUSINESS RESEARCH

byStephen Ong

Principal Lecturer (Specialist)Visiting Professor, Shenzhen University

19–2

LEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMESLEARNING OUTCOMES

1. Know when a response is really an error and should be edited

2. Appreciate coding of pure qualitative research

3. Understand the way data are represented in a data file

4. Understand the coding of structured responses including a dummy variable approach

5. Appreciate the ways that technological advances have simplified the coding process

After studying this chapter, you should be able to

6. Know what descriptive statistics are and why they are used

7. Create and interpret simple tabulation tables

8. Understand how cross-tabulations can reveal relationships

9. Perform basic data transformations

10. List different computer software products designed for descriptive statistical analysis

11. Understand a researcher’s role in interpreting the data

12. Implement the hypothesis-testing procedure

13. Use p-values to assess statistical significance

19–3


14. Test a hypothesis about an observed mean compared to some standard

15. Know the difference between Type I and Type II errors

16. Know when a univariate χ2 test is appropriate and how to conduct one

17. Recognize when a bivariate statistical test is appropriate

18. Calculate and interpret a χ2 test for a contingency table

19. Calculate and interpret an independent samples t-test comparing two means

20. Understand the concept of analysis of variance (ANOVA)

21. Interpret an ANOVA table

19–4


22. Apply and interpret simple bivariate correlations

23. Interpret a correlation matrix

24. Understand simple (bivariate) regression

25. Understand the least-squares estimation technique

26. Interpret regression output including the tests of hypotheses tied to specific parameter coefficients

27. Understand what multivariate statistical analysis involves and know the two types of multivariate analysis

28. Interpret results from multiple regression analysis

29. Interpret results from multivariate analysis of variance (MANOVA)

19–5


30. Interpret basic exploratory factor analysis results

31. Know what multiple discriminant analysis can be used to do

32. Understand how cluster analysis can identify market segments

19–6


Remember this,Remember this,

Garbage in, garbage out!Garbage in, garbage out! If data is collected improperly, or coded If data is collected improperly, or coded

incorrectly, then the research results incorrectly, then the research results are “garbage”.are “garbage”.

Stages of Data AnalysisStages of Data Analysis Raw Data

The unedited responses from a respondent exactly as indicated by that respondent.

Nonrespondent Error Error that the respondent is not responsible

for creating, such as when the interviewer marks a response incorrectly.

Data Integrity The notion that the data file actually contains

the information that the researcher is trying to obtain to adequately address research questions.

19–9

EXHIBIT 19.EXHIBIT 19.11 Overview of the Stages of Data AnalysisOverview of the Stages of Data Analysis

EditingEditing Editing

The process of checking the completeness, consistency, and legibility of data and making the data ready for coding and transfer to storage.

E.g. How long you have stayed at your current address? 45

The researchers need to make adjustment/reconstruct responses

Field Editing – useful in personal interview

Preliminary editing by a field supervisor on the same day as the interview to catch technical omissions, check legibility of handwriting, and clarify responses that are logically or conceptually inconsistent.

In-House Editing

A rigorous editing job performed by a centralized office staff.

Editing – what to do?Editing – what to do? Checking for Consistency

Respondents match defined population – e.g. SBS?

Check for consistency within the data collection framework – e.g. items listed by the respondents are within the definition.

Taking Action When Response is Obviously in Error Change/correct responses only when there are

multiple pieces of evidence for doing so. Editing Technology

Computer routines can check for consistency automatically.

19–13

Editing for CompletenessEditing for Completeness Item Nonresponse

The technical term for an unanswered question on an otherwise complete questionnaire resulting in missing data.

Most of the time the researchers will do nothing to it. But sometimes the question is linked to another question

therefore the researchers have to fill-in-the blank. Plug Value

An answer that an editor “plugs in” to replace blanks or missing values so as to permit data analysis.

Choice of value is based on a predetermined decision rule, e.g. take an average value or neutral value.

Several choices: Leave it blank Plug in alternate choices. Randomly select an answer. Impute a missing value.

Impute

To fill in a missing data point through the use of a statistical process providing an educated guess for the missing response based on available information.

I.e. based on the respondent’s choices to other questions.

Editing for Completeness Editing for Completeness (cont’d)(cont’d)

What about missing data? Many statistical software programs required complete data for an analysis to take place.

List-wise deletion The entire record for a respondent that has left a

response missing is excluded from use in statistical analysis.

Pair-wise deletion Only the actual variables for a respondent that

do not contain information are eliminated from use in statistical analysis.

Please take note,Please take note,

When a questionnaire has too many When a questionnaire has too many missing answer, it may not be suitable missing answer, it may not be suitable for the planned data analysis. In such for the planned data analysis. In such situation, that particular questionnaire situation, that particular questionnaire has to be dropped from the sample.has to be dropped from the sample.

Facilitating the Coding Facilitating the Coding ProcessProcess

Editing And Tabulating “Don’t Know” Answers Legitimate don’t know (no opinion) Reluctant don’t know (refusal to answer) Confused don’t know (does not

understand)

Editing (cont’d)Editing (cont’d) Pitfalls of Editing

Allowing subjectivity to enter into the editing process. Data editors should be intelligent, experienced, and

objective. A systematic procedure for assessing the

questionnaire should be developed by the research analyst so that the editor has clearly defined decision rules.

Pretesting Edit Editing during the pretest stage can prove very

valuable for improving questionnaire format, identifying poor instructions or inappropriate question wording.

Coding Qualitative ResponsesCoding Qualitative Responses Coding

The process of assigning a numerical score or other character symbol to previously edited data.

Codes Rules for interpreting, classifying, and

recording data in the coding process. The actual numerical or other character

symbols assigned to raw data. Dummy Coding

Numeric “1” or “0” coding where each number represents an alternate response such as “female” or “male.”

If k is the number of categories for a qualitative variable, k-1 dummy variables are needed.

Data File TerminologyData File Terminology Field

A collection of characters that represents a single type of data—usually a variable.

String Characters Computer terminology to represent formatting

a variable using a series of alphabetic characters (nonnumeric characters) that may form a word.

Record A collection of related fields that represents

the responses from one sampling unit.

Data File Terminology (cont’d)Data File Terminology (cont’d)

Data File The way a data set is stored electronically

in spreadsheet-like form in which the rows represent sampling units and the columns represent variables.

Value Labels Unique labels assigned to each possible

numeric code for a response.

Code ConstructionCode Construction Two Basic Rules for Coding Categories:

1. They should be exhaustive, meaning that a coding category should exist for all possible responses.

2. They should be mutually exclusive and independent, meaning that there should be no overlap among the categories to ensure that a subject or response can be placed in only one category.

Test Tabulation – especially useful for open-ended questions

Tallying of a small sample of the total number of replies to a particular question in order to construct coding categories.

Purpose is to preliminarily identify the stability and distribution of answers that will determine a coding scheme.

Test Tabulation

E.g. 1st respondent: I don’t like to use Facebook

because it is wasting time. 2nd respondent: I don’t know what is Facebook. 3rd respondent: Facebook takes me a lot of time.

Based on the above 3 answer, you can have 2 groups of answer: 1st group: Time factor 2nd group: No knowledge on Facebook

Devising the Coding SchemeDevising the Coding Scheme A coding scheme should not be too

elaborate. The coder’s task is only to summarize the

data. Categories should be sufficiently

unambiguous that coders will not classify items in different ways.

Code book Identifies each variable in a study and gives

the variable’s description, code name, and position in the data matrix.

The Nature of Descriptive The Nature of Descriptive AnalysisAnalysis

Descriptive Analysis The elementary transformation of raw data

in a way that describes the basic characteristics such as central tendency, distribution, and variability.

Histogram A graphical way of showing a frequency

distribution in which the height of a bar corresponds to the observed frequency of the category.

20–26

EXHIBIT 20.EXHIBIT 20.11 Levels of Scale Measurement and Suggested Descriptive StatisticsLevels of Scale Measurement and Suggested Descriptive Statistics

Creating and Interpreting Creating and Interpreting TabulationTabulation

Tabulation The orderly arrangement of data in a table or

other summary format showing the number of responses to each response category.

Tallying is the term when the process is done by hand.

Frequency Table A table showing the different ways

respondents answered a question. Sometimes called a marginal tabulation.

Frequency Table ExampleFrequency Table Example

Cross-TabulationCross-Tabulation Cross-Tabulation

Addresses research questions involving relationships among multiple less-than interval variables.

Results in a combined frequency table displaying one variable in rows and another variable in columns.

Contingency Table A data matrix that displays the frequency of some

combination of responses to multiple variables. Marginals

Row and column totals in a contingency table, which are shown in its margins.

20–30

EXHIBIT 20.EXHIBIT 20.22 Cross-Tabulation Tables from a Survey Regarding AIG and Cross-Tabulation Tables from a Survey Regarding AIG and Government BailoutsGovernment Bailouts

20–31

EXHIBIT 20.EXHIBIT 20.33 Different Ways of Depicting the Cross-Tabulation of Biological Sex Different Ways of Depicting the Cross-Tabulation of Biological Sex and Target Patronageand Target Patronage

Cross-Tabulation (cont’d)Cross-Tabulation (cont’d) Percentage Cross-Tabulations

Statistical base – the number of respondents or observations (in a row or column) used as a basis for computing percentages.

Elaboration and Refinement Elaboration analysis – an analysis of the

basic cross-tabulation for each level of a variable not previously considered, such as subgroups of the sample.

Moderator variable – a third variable that changes the nature of a relationship between the original independent and dependent variables.

EXHIBIT 20.EXHIBIT 20.44 Cross-Tabulation of Marital Status, Sex, and Responses to the Cross-Tabulation of Marital Status, Sex, and Responses to the Question “Do You Shop at Target?”Question “Do You Shop at Target?”

Cross-Tabulation (cont’d)Cross-Tabulation (cont’d) How Many Cross-Tabulations?

Every possible response becomes a possible explanatory variable.

When hypotheses involve relationships among two categorical variables, cross-tabulations are the right tool for the job.

Quadrant Analysis An extension of cross-tabulation in which

responses to two rating-scale questions are plotted in four quadrants of a two-dimensional table.

Importance-performance analysis

EXHIBIT 20.EXHIBIT 20.55 An Importance-Performance or Quadrant Analysis of HotelsAn Importance-Performance or Quadrant Analysis of Hotels

20–36

Data TransformationData Transformation Data Transformation

Process of changing the data from their original form to a format suitable for performing a data analysis addressing research objectives.

Bimodal

20–37

Problems with Data Problems with Data TransformationsTransformations

Median Split Dividing a data set into two categories by placing

respondents below the median in one category and respondents above the median in another.

The approach is best applied only when the data do indeed exhibit bimodal characteristics.

Inappropriate collapsing of continuous variables into categorical variables ignores the information contained within the untransformed values.

20–38

EXHIBIT 20.EXHIBIT 20.66 Bimodal Distributions Are Consistent with Bimodal Distributions Are Consistent with Transformations into Categorical ValuesTransformations into Categorical Values

20–39

EXHIBIT 20.EXHIBIT 20.77 The Problem with Median Splits with Unimodal DataThe Problem with Median Splits with Unimodal Data

20–40

Index NumbersIndex Numbers Index Numbers

Scores or observations recalibrated to indicate how they relate to a base number.

Price indexes Represent simple data transformations that

allow researchers to track a variable’s value over time and compare a variable(s) with other variables.

Recalibration allows scores or observations to be related to a certain base period or base number.

20–41

EXHIBIT 20.EXHIBIT 20.88 Hours of Television Usage per WeekHours of Television Usage per Week

20–42

Calculating Rank OrderCalculating Rank Order

Rank Order Ranking data can be summarized by

performing a data transformation. The transformation involves multiplying

the frequency by the ranking score for each choice resulting in a new scale.

20–43

EXHIBIT 20.EXHIBIT 20.99 Executive Rankings of Potential Conference DestinationsExecutive Rankings of Potential Conference Destinations

20–44

EXHIBIT 20.EXHIBIT 20.1010 Frequencies of Conference Destination RankingsFrequencies of Conference Destination Rankings

20–45

EXHIBIT 20.EXHIBIT 20.1111 Pie Charts Work Well with Tabulations and Cross-TabulationsPie Charts Work Well with Tabulations and Cross-Tabulations

20–46

Computer Programs for Computer Programs for AnalysisAnalysis

Statistical Packages Spreadsheets

Excel Statistical software:

SAS SPSS (Statistical

Package for Social Sciences)

MINITAB

20–47

Computer Graphics and Computer Graphics and Computer MappingComputer Mapping

Box and Whisker Plots Graphic representations of central

tendencies, percentiles, variabilities, and the shapes of frequency distributions.

Interquartile Range A measure of variability.

Outlier A value that lies outside the normal range

of the data.

20–48

EXHIBIT 20.15EXHIBIT 20.15 Computer Drawn Computer Drawn Box and Whisker Box and Whisker

PlotPlot

SPSS WindowsSPSS Windows The main program in SPSS is FREQUENCIES. It produces a The main program in SPSS is FREQUENCIES. It produces a

table of frequency counts, percentages, and cumulative table of frequency counts, percentages, and cumulative percentages for the values of each variable. It gives all of the percentages for the values of each variable. It gives all of the associated statistics. associated statistics.

If the data are interval scaled and only the summary statistics If the data are interval scaled and only the summary statistics are desired, the DESCRIPTIVES procedure can be used. are desired, the DESCRIPTIVES procedure can be used.

The EXPLORE procedure produces summary statistics and The EXPLORE procedure produces summary statistics and graphical displays, either for all of the cases or separately for graphical displays, either for all of the cases or separately for groups of cases. Mean, median, variance, standard deviation, groups of cases. Mean, median, variance, standard deviation, minimum, maximum, and range are some of the statistics that minimum, maximum, and range are some of the statistics that can be calculated. can be calculated.

SPSS WindowsSPSS WindowsTo select these procedures click:To select these procedures click:

Analyze>Descriptive Statistics>FrequenciesAnalyze>Descriptive Statistics>FrequenciesAnalyze>Descriptive Statistics>DescriptivesAnalyze>Descriptive Statistics>DescriptivesAnalyze>Descriptive Statistics>ExploreAnalyze>Descriptive Statistics>Explore

The major cross-tabulation program is CROSSTABS.The major cross-tabulation program is CROSSTABS.This program will display the cross-classification tables and This program will display the cross-classification tables and provide cell counts, row and column percentages, the provide cell counts, row and column percentages, the chi-square test for significance, and all the measures of the chi-square test for significance, and all the measures of the strength of the association that have been discussed. strength of the association that have been discussed.

To select these procedures, click:To select these procedures, click:

Analyze>Descriptive Statistics>CrosstabsAnalyze>Descriptive Statistics>Crosstabs

SPSS WindowsSPSS WindowsThe major program for conducting parametric tests in SPSS is The major program for conducting parametric tests in SPSS is COMPARE MEANS. This program can be used to conduct COMPARE MEANS. This program can be used to conduct tt tests tests on one sample or independent or paired samples. To select these on one sample or independent or paired samples. To select these procedures using SPSS for Windows, click:procedures using SPSS for Windows, click:

Analyze>Compare Means>Means …Analyze>Compare Means>Means …

Analyze>Compare Means>One-Sample T Test …Analyze>Compare Means>One-Sample T Test …

Analyze>Compare Means>Independent-Samples T Test …Analyze>Compare Means>Independent-Samples T Test …

Analyze>Compare Means>Paired-Samples T Test …Analyze>Compare Means>Paired-Samples T Test …

SPSS WindowsSPSS WindowsThe nonparametric tests discussed in this chapter canThe nonparametric tests discussed in this chapter canbe conducted using NONPARAMETRIC TESTS. be conducted using NONPARAMETRIC TESTS.

To select these procedures using SPSS for Windows,To select these procedures using SPSS for Windows,click:click:

Analyze>Nonparametric Tests>Chi-Square …Analyze>Nonparametric Tests>Chi-Square …

Analyze>Nonparametric Tests>Binomial …Analyze>Nonparametric Tests>Binomial …

Analyze>Nonparametric Tests>Runs …Analyze>Nonparametric Tests>Runs …

Analyze>Nonparametric Tests>1-Sample K-S …Analyze>Nonparametric Tests>1-Sample K-S …

Analyze>Nonparametric Tests>2 Independent Samples …Analyze>Nonparametric Tests>2 Independent Samples …

Analyze>Nonparametric Tests>2 Related Samples …Analyze>Nonparametric Tests>2 Related Samples …

1 - 53

SPSS Windows: SPSS Windows: FrequenciesFrequencies

1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar.

2.2. Click DESCRIPTIVE STATISTICS and Click DESCRIPTIVE STATISTICS and select FREQUENCIES.select FREQUENCIES.

3.3. Move the variable “Familiarity [familiar]” Move the variable “Familiarity [familiar]” to the VARIABLE(s) box.to the VARIABLE(s) box.

4.4. Click STATISTICS.Click STATISTICS.

5.5. Select MEAN, MEDIAN, MODE, STD. Select MEAN, MEDIAN, MODE, STD. DEVIATION, VARIANCE, and RANGE.DEVIATION, VARIANCE, and RANGE.

SPSS Windows: SPSS Windows: Frequencies Frequencies

6.6. Click CONTINUE.Click CONTINUE.

7.7. Click CHARTS.Click CHARTS.

8.8. Click HISTOGRAMS, then click CONTINUE.Click HISTOGRAMS, then click CONTINUE.

9.9. Click OK.Click OK.

Introduction of a Third Variable in Introduction of a Third Variable in Cross-TabulationCross-Tabulation

1 - 57

SPSS Windows: Cross-SPSS Windows: Cross-tabulationstabulations

1.1. Select ANALYZE on the SPSS menu bar.Select ANALYZE on the SPSS menu bar.

2.2. Click on DESCRIPTIVE STATISTICS and select Click on DESCRIPTIVE STATISTICS and select CROSSTABS.CROSSTABS.

3.3. Move the variable “Internet Usage Group [iusagegr]” to Move the variable “Internet Usage Group [iusagegr]” to the ROW(S) box.the ROW(S) box.

4.4. Move the variable “Sex[sex]” to the COLUMN(S) box.Move the variable “Sex[sex]” to the COLUMN(S) box.

5.5. Click on CELLS.Click on CELLS.

6.6. Select OBSERVED under COUNTS and COLUMN under Select OBSERVED under COUNTS and COLUMN under PERCENTAGES. PERCENTAGES.

SPSS Windows: Cross-SPSS Windows: Cross-tabulations tabulations


8.8. Click STATISTICS.Click STATISTICS.

9.9. Click on CHI-SQUARE, PHI AND CRAMER’S Click on CHI-SQUARE, PHI AND CRAMER’S VV..



20–60

InterpretationInterpretation Interpretation

The process of drawing inferences from the analysis results.

Inferences drawn from interpretations lead to managerial implications and decisions.

From a management perspective, the qualitative meaning of the data and their managerial implications are an important aspect of the interpretation.

Hypothesis TestingHypothesis Testing Types of Hypotheses

Relational hypotheses Examine how changes in one variable vary with

changes in another. Hypotheses about differences between

groups Examine how some variable varies from one group

to another. Hypotheses about differences from some

standard Examine how some variable differs from some

preconceived standard. These tests typify univariate statistical tests.

21–62

Types of Statistical AnalysisTypes of Statistical Analysis Univariate Statistical Analysis

Tests of hypotheses involving only one variable.

Testing of statistical significance

Bivariate Statistical Analysis Tests of hypotheses involving two variables.

Multivariate Statistical Analysis Statistical analysis involving three or more

variables or sets of variables.

21–63

The Hypothesis-Testing The Hypothesis-Testing ProcedureProcedure

Process1. The specifically stated hypothesis is derived

from the research objectives.2. A sample is obtained and the relevant

variable is measured. 3. The measured sample value is compared to

the value either stated explicitly or implied in the hypothesis. If the value is consistent with the hypothesis, the

hypothesis is supported. If the value is not consistent with the hypothesis,

the hypothesis is not supported.

21–64

Statistical Analysis: Key TermsStatistical Analysis: Key Terms Hypothesis

Unproven proposition: a supposition that tentatively explains certain facts or phenomena.

An assumption about nature of the world.

Null Hypothesis Statement about the status quo. No difference in sample and population.

Alternative Hypothesis Statement that indicates the opposite of the

null hypothesis.

21–65

Significance Levels and p-Significance Levels and p-valuesvalues Significance Level

A critical probability associated with a statistical hypothesis test that indicates how likely an inference supporting a difference between an observed value and some statistical expectation is true.

The acceptable level of Type I error. p-value

Probability value, or the observed or computed significance level.

p-values are compared to significance levels to test hypotheses.

Higher p-values equal more support for an hypothesis.

21–66

EXHIBIT 21.EXHIBIT 21.11 pp-Values and Statistical Tests-Values and Statistical Tests

21–67

EXHIBIT 21.EXHIBIT 21.22

As the observed mean gets further from the standard (proposed population mean), the p-value decreases. The lower the p-value, the more confidence you have that the sample mean is different.

21–68

An Example of Hypothesis TestingAn Example of Hypothesis TestingThe null hypothesis: the mean is equal to 3.0:

The alternative hypothesis: the mean does not equal to 3.0:

21–69

An Example of Hypothesis TestingAn Example of Hypothesis Testing

21–70

EXHIBIT 21.EXHIBIT 21.33 A Hypothesis Test Using the Sampling Distribution of A Hypothesis Test Using the Sampling Distribution of XX under the Hypothesis under the Hypothesis µµ = = 3.03.0

—

Critical Values Critical Values Values that lie Values that lie exactly on the exactly on the boundary of the boundary of the region of rejection.region of rejection.

Type I and Type II ErrorsType I and Type II Errors

Type I Error An error caused by rejecting the null

hypothesis when it is true.

Has a probability of alpha (α).

Practically, a Type I error occurs when the researcher concludes that a relationship or difference exists in the population when in reality it does not exist.

““There really are no monsters under the bed.”There really are no monsters under the bed.”

Type I and Type II Errors Type I and Type II Errors (cont’d)(cont’d)

Type II Error An error caused by failing to reject the null

hypothesis when the alternative hypothesis is true.

Has a probability of beta (β).

Practically, a Type II error occurs when a researcher concludes that no relationship or difference exists when in fact one does exist.

““There really are monsters under the bed.”There really are monsters under the bed.”

EXHIBIT 21.EXHIBIT 21.44 Type I and Type II Errors in Hypothesis TestingType I and Type II Errors in Hypothesis Testing

21–74

Choosing the Appropriate Choosing the Appropriate Statistical TechniqueStatistical Technique

Choosing the correct statistical technique requires considering: Type of question to be answered

E.g. Ranking question – rank order test Number of variables involved

One variable – univariate statistical analysis Two variable – bivariate statistical analysis More than two variables – multivariate analysis

Level of scale measurement E.g. in nominal scale, mean and median is

meaningless.

21–75

Parametric versus Parametric versus Nonparametric TestsNonparametric Tests

Parametric Statistics Involve numbers with known, continuous

distributions. Appropriate when:

Data are interval or ratio scaled.Sample size is large.

Nonparametric Statistics Appropriate when the variables being analyzed

do not conform to any known or continuous distribution.


Univariate Statistical Choice Made EasyUnivariate Statistical Choice Made Easy

21–77

The The tt-Distribution-Distribution t-test

A hypothesis test that uses the t-distribution.

A univariate t-test is appropriate when the variable being analyzed is interval or ratio.

Degrees of freedom (d.f.) The number of

observations minus the number of constraints or assumptions needed to calculate a statistical term.

21–78

EXHIBIT 21.EXHIBIT 21.66 The t-Distribution for Various Degrees of FreedomThe t-Distribution for Various Degrees of Freedom

21–79

Calculating a Confidence Interval Estimate Calculating a Confidence Interval Estimate Using the Using the tt-Distribution-Distribution

Calculating a Confidence Interval Estimate Calculating a Confidence Interval Estimate Using the t-Distribution (cont’d)Using the t-Distribution (cont’d)

28.5)18

81.2(12.289.3

49.2)18

81.2(12.289.3

21–81

One-Tailed Univariate One-Tailed Univariate tt-Tests-Tests One-tailed Test

Appropriate when a research hypothesis implies that an observed mean can only be greater than or less than a hypothesized value.

E.g. “Females score higher than males in English Test”

Only one of the “tails” of the bell-shaped normal curve is relevant.

A one-tailed test can be determined from a two-tailed test result by taking half of the observed p-value.

When there is any doubt about whether a one- or two-tailed test is appropriate, opt for the less conservative two-tailed test.

21–82

Two-Tailed Univariate Two-Tailed Univariate tt-Tests-Tests Two-tailed Test

Tests for differences from the population mean that are either greater or less. i.e. Identify whether there is any difference.

E.g. The English test scores of females are different from the scores of males.

Extreme values of the normal curve (or tails) on both the right and the left are considered.

When a research question does not specify whether a difference should be greater than or less than, a two-tailed test is most appropriate.

When the researcher has any doubt about whether a one- or two-tailed test is appropriate, he or she should opt for the less conservative two-tailed test.

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing the Utilizing the tt-Distribution-Distribution

Example: Suppose a Pizza Inn manager believes the

average number of returned pizzas each day to be 20.

The store records the number of defective assemblies for each of the 25 days it was opened in a given month.

The mean was calculated to be 22, and the standard deviation to be 5.

20 0 :H

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing theUtilizing the t t-Distribution: An -Distribution: An

ExampleExampleThe sample mean is

equal to 20.The sample mean is

equal not to 20.

20 1 :H

nSSX / 25/5 1

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing the Utilizing the tt-Distribution: An -Distribution: An

Example (cont’d)Example (cont’d) The researcher desired a 95 percent

confidence; the significance level becomes 0.05.

The researcher must then find the upper and lower limits of the confidence interval to determine the region of rejection. Thus, the value of t is needed. For 24 degrees of freedom (n-1= 25-1),

the t-value is 2.064.

Univariate Hypothesis Test Utilizing Univariate Hypothesis Test Utilizing thethe t t-Distribution: An Example -Distribution: An Example

(cont’d)(cont’d)93617

25

5064220 ....

Xlc StLower limit

=

0642225

5064220 ....

Xlc StUpper limit

=

Univariate Hypothesis Test Univariate Hypothesis Test Utilizing theUtilizing the t t-Distribution: -Distribution:

An Example (cont’d)An Example (cont’d)Univariate Hypothesis Test Univariate Hypothesis Test tt-Test-Test

X

obs S

Xt

1

2022

1

2 2

This is less than the critical t-value of 2.064 at the 0.05 level with 24 degrees of freedom hypothesis is not supported.

21–88

The Chi-Square Test for The Chi-Square Test for Goodness of FitGoodness of Fit

Chi-square (χ2) test Tests for statistical significance. Is particularly appropriate for testing

hypotheses about frequencies arranged in a frequency or contingency table.

Goodness-of-Fit (GOF) A general term representing how well some

computed table or matrix of values matches some population or predetermined table or matrix of the same size.

The Chi-Square Test for The Chi-Square Test for Goodness of Fit: An ExampleGoodness of Fit: An Example

The Chi-Square Test for Goodness of The Chi-Square Test for Goodness of Fit: An Example (cont’d)Fit: An Example (cont’d)

i

ii( ²

E

E )²O

χ² = chi-square statisticsOi = observed frequency in the ith cellEi = expected frequency on the ith cell

n

CRE jiij

Chi-Square Test: Estimation for Chi-Square Test: Estimation for Expected Number for Each CellExpected Number for Each Cell

Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample size

Hypothesis Test of a ProportionHypothesis Test of a Proportion Hypothesis Test of a Proportion

Is conceptually similar to the one used when the mean is the characteristic of interest but that differs in the mathematical formulation of the standard error of the proportion.

pobs S

pZ

π is the population proportionp is the sample proportionπ is estimated with p

What Is the Appropriate Test What Is the Appropriate Test of Difference?of Difference?

Test of Differences

An investigation of a hypothesis that two (or more) groups differ with respect to measures on a variable.

Behaviour, characteristics, beliefs, opinions, emotions, or attitudes

Bivariate Tests of Differences

Involve only two variables: a variable that acts like a dependent variable and a variable that acts as a classification variable.

Differences in mean scores between groups or in comparing how two groups’ scores are distributed across possible response categories.

22–94

EXHIBIT 22.EXHIBIT 22.11 Some Bivariate HypothesesSome Bivariate Hypotheses

Cross-Tabulation Tables: The Cross-Tabulation Tables: The χχ22 Test for Goodness-of-FitTest for Goodness-of-Fit

Cross-Tabulation (Contingency) Table A joint frequency distribution of observations

on two more variables. χ2 Distribution

Provides a means for testing the statistical significance of a contingency table.

Involves comparing observed frequencies (Oi) with expected frequencies (Ei) in each cell of the table.

Captures the goodness- (or closeness-) of-fit of the observed distribution with the expected distribution.

Chi-Square TestChi-Square Test

i

ii

E

)²E(O χ²

χ² = chi-square statisticOi = observed frequency in the ith cellEi = expected frequency on the ith cell

n

CRE jiij

Ri = total observed frequency in the ith rowCj = total observed frequency in the jth columnn = sample size

Degrees of Freedom (d.f.)Degrees of Freedom (d.f.)

d.f.=(R-1)(C-1)d.f.=(R-1)(C-1)

22–98

Example: Papa John’s RestaurantsExample: Papa John’s RestaurantsUnivariate Hypothesis:Univariate Hypothesis:Papa John’s restaurants are Papa John’s restaurants are more likely to be located in more likely to be located in a stand-alone location or in a stand-alone location or in a shopping center.a shopping center.

Bivariate Bivariate Hypothesis: Stand-Hypothesis: Stand-alone locations are alone locations are more likely to be more likely to be profitable than are profitable than are shopping center shopping center locations.locations.

Example: Papa John’s Example: Papa John’s Restaurants (cont’d)Restaurants (cont’d)

In this example, χ2 = 22.16 with 1 d.f. From Table A.4, the critical value at the

0.05 level with 1 d.f. is 3.84. Thus, we are 95 percent confident that

the observed values do not equal the expected values.

But are the deviations from the expected values in the hypothesized direction?

χχ22 Test for Goodness-of-Fit Test for Goodness-of-Fit RecapRecap

Testing the hypothesis involves two key steps:

1. Examine the statistical significance of the observed contingency table.

2. Examine whether the differences between the observed and expected values are consistent with the hypothesized prediction.

The The tt-Test for Comparing Two Means-Test for Comparing Two Means Independent Samples t-Test

A test for hypotheses stating that the mean scores for some interval- or ratio-scaled variable grouped based on some less-than-interval classificatory variable are not the same.

means random ofy Variabilit

2 MeanSample - 1 MeanSample t

21

21 XXS

t

The The tt-Test for Comparing -Test for Comparing Two Means (cont’d)Two Means (cont’d)

Pooled Estimate of the Standard Error An estimate of the standard error for a t-test of

independent means that assumes the variances of both groups are equal.

2121

222

211 11

2

1121 nnnn

SnSnS XX

))(

© 2010 South-Western/Cengage Learning. All rights reserved. May not

be scanned, copied or duplicated, or posted to a publically accessible

website, in whole or in part.22–103

EXHIBIT 22.EXHIBIT 22.22 Independent Samples Independent Samples tt-Test Results-Test Results

Comparing Two Means (cont’d)Comparing Two Means (cont’d) Paired-Samples t-Test

Compares the scores of two interval variables drawn from related populations.

Used when means need to be compared that are not from independent samples.




EXHIBIT 22.EXHIBIT 22.44 Example Results for a Paired Samples Example Results for a Paired Samples tt-Test-Test

A Classification of Hypothesis Testing A Classification of Hypothesis Testing Procedures for Examining DifferencesProcedures for Examining Differences

1 - 107

SPSS Windows: One SPSS Windows: One Sample Sample t t TestTest

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.

2.2. Click COMPARE MEANS and then ONE Click COMPARE MEANS and then ONE SAMPLE T TEST.SAMPLE T TEST.

3.3. Move “Familiarity [familiar]” in to the TEST Move “Familiarity [familiar]” in to the TEST VARIABLE(S) box.VARIABLE(S) box.

4.4. Type “4” in the TEST VALUE box.Type “4” in the TEST VALUE box.


SPSS Windows: SPSS Windows: Two Independent Samples t TestTwo Independent Samples t Test


2.2. Click COMPARE MEANS and then INDEPENDENT Click COMPARE MEANS and then INDEPENDENT SAMPLES T TEST.SAMPLES T TEST.

3.3. Move “Internet Usage Hrs/Week [iusage]” in to the TEST Move “Internet Usage Hrs/Week [iusage]” in to the TEST VARIABLE(S) box.VARIABLE(S) box.

4.4. Move “Sex[sex]” to GROUPING VARIABLE box.Move “Sex[sex]” to GROUPING VARIABLE box.

5.5. Click DEFINE GROUPS. Click DEFINE GROUPS.

6.6. Type “1” in GROUP 1 box and “2” in GROUP 2 box. Type “1” in GROUP 1 box and “2” in GROUP 2 box.



SPSS Windows: Paired Samples t SPSS Windows: Paired Samples t TestTest


2.2. Click COMPARE MEANS and then PAIRED Click COMPARE MEANS and then PAIRED SAMPLES T TEST.SAMPLES T TEST.

3.3. Select “Attitude toward Internet [iattitude]” and Select “Attitude toward Internet [iattitude]” and then select “Attitude toward technology then select “Attitude toward technology [tattitude].” Move these variables in to the PAIRED [tattitude].” Move these variables in to the PAIRED VARIABLE(S) box.VARIABLE(S) box.


Relationship Amongst Test, Analysis of Relationship Amongst Test, Analysis of Variance, Analysis of Covariance, & Variance, Analysis of Covariance, &

RegressionRegression

One Independent One or More

Metric Dependent Variable

t Test

Binary

Variable

One-Way Analysisof Variance

One Factor

N-Way Analysisof Variance

More thanOne Factor

Analysis ofVariance

Categorical:Factorial

Analysis ofCovariance

Categoricaland Interval

Regression

Interval

Independent Variables

The The ZZ-Test for Comparing -Test for Comparing Two ProportionsTwo Proportions

Z-Test for Differences of Proportions Tests the hypothesis that proportions are

significantly different for two independent samples or groups.

Requires a sample size greater than thirty.

The hypothesis is: Ho: π1 = π2

may be restated as: Ho: π1 - π2 = 0

The The ZZ-Test for Comparing Two -Test for Comparing Two ProportionsProportions

ZZ-Test statistic for differences in large -Test statistic for differences in large random samples:random samples:

21

2121

ppS

ppZ

p1 = sample portion of successes in Group 1

p2 = sample portion of successes in Group 2

1 1) = hypothesized population proportion 1

minus hypothesized population proportion 2

Sp1-p2 = pooled estimate of the standard errors of

differences of proportions

The The ZZ-Test for Comparing Two -Test for Comparing Two ProportionsProportions

To calculate the standard error of the To calculate the standard error of the differences in proportions:differences in proportions:

21

1121 nn

qpS pp

One-Way Analysis of Variance One-Way Analysis of Variance (ANOVA)(ANOVA)

Analysis of Variance (ANOVA) An analysis involving the investigation of the

effects of one treatment variable on an interval-scaled dependent variable.

A hypothesis-testing technique to determine whether statistically significant differences in means occur between two or more groups.

A method of comparing variances to make inferences about the means.

The substantive hypothesis tested is: At least one group mean is not equal to another At least one group mean is not equal to another

group mean.group mean.

Partitioning Variance in Partitioning Variance in ANOVAANOVA

Total Variability Grand Mean

The mean of a variable over all observations.

SST = Total of (observed value-grand mean)2

Partitioning Variance in ANOVAPartitioning Variance in ANOVA

Between-Groups Variance The sum of differences between the group mean

and the grand mean summed over all groups for a given set of observations.

SSB = Total of ngroup(Group Mean − Grand Mean)2

Within-Group Error or Variance The sum of the differences between observed

values and the group mean for a given set of observations

Also known as total error variance.

SSE = Total of (Observed Mean − Group Mean)2

The The FF-Test-Test F-Test

Used to determine whether there is more variability in the scores of one sample than in the scores of another sample.

Variance components are used to compute F-ratios

SSE, SSB, SST

groupswithinVariance

groupsbetweenVarianceF

EXHIBIT 22.EXHIBIT 22.66 Interpreting ANOVAInterpreting ANOVA

1 - 120

SPSS WindowsSPSS Windows

One-way ANOVA can be efficiently One-way ANOVA can be efficiently performed using the program COMPARE performed using the program COMPARE MEANS and then One-way ANOVA. To MEANS and then One-way ANOVA. To select this procedure using SPSS for select this procedure using SPSS for Windows, click:Windows, click:

Analyze>Compare Means>One-Way ANOVA …Analyze>Compare Means>One-Way ANOVA …

N-way analysis of variance and analysis of N-way analysis of variance and analysis of covariance can be performed using covariance can be performed using GENERAL LINEAR MODEL. To select this GENERAL LINEAR MODEL. To select this procedure using SPSS for Windows, click:procedure using SPSS for Windows, click:

Analyze>General Linear Model>Univariate …Analyze>General Linear Model>Univariate …

SPSS Windows: One-Way SPSS Windows: One-Way ANOVAANOVA


2.2. Click COMPARE MEANS and then ONE-WAY ANOVA.Click COMPARE MEANS and then ONE-WAY ANOVA.

3.3. Move “Sales [sales]” in to the DEPENDENT LIST box.Move “Sales [sales]” in to the DEPENDENT LIST box.

4.4. Move “In-Store Promotion[promotion]” to the FACTOR Move “In-Store Promotion[promotion]” to the FACTOR box.box.

5.5. Click OPTIONS.Click OPTIONS.

6.6. Click Descriptive. Click Descriptive.



SPSS Windows: Analysis of CovarianceSPSS Windows: Analysis of Covariance


2.2. Click GENERAL LINEAR MODEL and then UNIVARIATE.Click GENERAL LINEAR MODEL and then UNIVARIATE.

3.3. Move “Sales [sales]” in to the DEPENDENT VARIABLE Move “Sales [sales]” in to the DEPENDENT VARIABLE box.box.

4.4. Move “In-Store Promotion[promotion]” to the FIXED Move “In-Store Promotion[promotion]” to the FIXED FACTOR(S) box. Then move “Coupon[coupon] also to FACTOR(S) box. Then move “Coupon[coupon] also to the FIXED FACTOR(S) box. the FIXED FACTOR(S) box.

5.5. Move “Clientel[clientel] to the COVARIATE(S) box.Move “Clientel[clientel] to the COVARIATE(S) box.


The BasicsThe Basics Measures of Association

Refers to a number of bivariate statistical techniques used to measure the strength of a relationship between two variables.

The chi-square (2) test provides information about whether two or more less-than interval variables are interrelated.

Correlation analysis is most appropriate for interval or ratio variables.

Regression can accommodate either less-than interval or interval independent variables, but the dependent variable must be continuous.

23–125


Bivariate Analysis—Bivariate Analysis—Common Procedures for Common Procedures for

Testing AssociationTesting Association

Simple Correlation Coefficient Simple Correlation Coefficient (continued)(continued)

Correlation coefficient A statistical measure of the covariation, or

association, between two at-least interval variables.

Covariance Extent to which two variables are

associated systematically with each other.

n

i

n

i

n

iii

yxxy

YYiXXi

YYXX

rr

1 1

22

1

Simple Correlation CoefficientSimple Correlation Coefficient Correlation coefficient (r)

Ranges from +1 to -1 Perfect positive linear relationship = +1 Perfect negative (inverse) linear relationship = -1 No correlation = 0

Correlation coefficient for two variables (X,Y)

EXHIBIT 23.EXHIBIT 23.22 Scatter Diagram to Illustrate Correlation PatternsScatter Diagram to Illustrate Correlation Patterns

Correlation, Covariance, and Correlation, Covariance, and CausationCausation

When two variables covary (i.e. vary systematically), they display concomitant variation.

This systematic covariation does not in and of itself establish causality.

e.g., Rooster’s crow and the rising of the sun Rooster does not cause the sun to rise.

Coefficient of DeterminationCoefficient of Determination

Coefficient of Determination (R2) A measure obtained by squaring the

correlation coefficient; the proportion of the total variance of a variable accounted for by another value of another variable.

Measures that part of the total variance of Y that is accounted for by knowing the value of X.

Variance Total

varianceExplained2 R

Correlation MatrixCorrelation Matrix

Correlation matrix The standard form for reporting correlation

coefficients for more than two variables. Statistical Significance

The procedure for determining statistical significance is the t-test of the significance of a correlation coefficient.

EXHIBIT 23.EXHIBIT 23.44 Pearson Product-Moment Correlation Matrix for Salesperson Pearson Product-Moment Correlation Matrix for Salesperson ExampleExampleaa

Regression AnalysisRegression Analysis Simple (Bivariate) Linear Regression

A measure of linear association that investigates straight-line relationships between a continuous dependent variable and an independent variable that is usually continuous, but can be a categorical dummy variable.

The Regression Equation (Y = α + βX ) Y = the continuous dependent variable X = the independent variable α = the Y intercept (regression line intercepts

Y axis) β = the slope of the coefficient (rise over run)

130

120

110

100

90

80

80 90 100 110 120 130 140 150 160 170

X

Y

XaY ˆˆ

XY

Regression Line and SlopeRegression Line and Slope

The Regression EquationThe Regression Equation Parameter Estimate Choices

β is indicative of the strength and direction of the relationship between the independent and dependent variable.

α (Y intercept) is a fixed point that is considered a constant (how much Y can exist without X)

Standardized Regression Coefficient (β) Estimated coefficient of the strength of

relationship between the independent and dependent variables.

Expressed on a standardized scale where higher absolute values indicate stronger relationships (range is from -1 to 1).

The Regression Equation (cont’d)The Regression Equation (cont’d)

Parameter Estimate Choices Raw regression estimates (b1)

Raw regression weights have the advantage of retaining the scale metric—which is also their key disadvantage.

If the purpose of the regression analysis is forecasting, then raw parameter estimates must be used.

This is another way of saying when the researcher is interested only in prediction.

Standardized regression estimates (β) Standardized regression estimates have the advantage

of a constant scale. Standardized regression estimates should be used when

the researcher is testing explanatory hypotheses.

EXHIBIT 23.EXHIBIT 23.55 The Advantage of Standardized Regression WeightsThe Advantage of Standardized Regression Weights

EXHIBIT 23.EXHIBIT 23.66 Relationship of Sales Potential to Building Permits IssuedRelationship of Sales Potential to Building Permits Issued

EXHIBIT 23.EXHIBIT 23.77 The Best Fit Line or Knocking Out the PinsThe Best Fit Line or Knocking Out the Pins

Ordinary Least-Squares Ordinary Least-Squares (OLS) Method of Regression (OLS) Method of Regression

AnalysisAnalysis OLS Guarantees that the resulting straight line will produce the

least possible total error in using X to predict Y. Generates a straight line that minimizes the sum of

squared deviations of the actual values from this predicted regression line.

No straight line can completely represent every dot in the scatter diagram.

There will be a discrepancy between most of the actual scores (each dot) and the predicted score .

Uses the criterion of attempting to make the least amount of total error in prediction of Y from X.

Ordinary Least-Squares Method Ordinary Least-Squares Method of Regression Analysis (OLS) of Regression Analysis (OLS)

(cont’d)(cont’d)


(cont’d)(cont’d)

The equation means that the predicted value for any value of X (Xi) is determined as a function of the estimated slope coefficient, plus the estimated intercept coefficient + some error.




Ordinary Least-Squares Ordinary Least-Squares Method of Regression Method of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d)




Ordinary Least-Squares Ordinary Least-Squares Method of Regression Method of Regression Analysis (OLS) (cont’d)Analysis (OLS) (cont’d) Statistical Significance Of Regression Model

F-test (regression) Determines whether more variability is explained

by the regression or unexplained by the regression.


(cont’d)(cont’d) Statistical Significance Of Regression ModelStatistical Significance Of Regression Model

ANOVA Table:ANOVA Table:


(cont’d)(cont’d) R2

The proportion of variance in Y that is explained by X (or vice versa)

A measure obtained by squaring the correlation coefficient; that proportion of the total variance of a variable that is accounted for by knowing the value of another variable.

875.040.882,3

49.398,32 R

EXHIBIT 23.EXHIBIT 23.88 Simple Regression Results for Building Permit ExampleSimple Regression Results for Building Permit Example

EXHIBIT 23.EXHIBIT 23.99 OLS Regression LineOLS Regression Line

Simple Regression and Simple Regression and Hypothesis TestingHypothesis Testing

The explanatory power of regression lies in hypothesis testing. Regression is often used to test relational hypotheses. The outcome of the hypothesis test involves

two conditions that must both be satisfied: The regression weight must be in the hypothesized

direction. Positive relationships require a positive coefficient and negative relationships require a negative coefficient.

The t-test associated with the regression weight must be significant.

What is Multivariate Data What is Multivariate Data Analysis?Analysis?

Research that involves three or more variables, or that is concerned with underlying dimensions among multiple variables, will involve multivariate statistical analysis. Methods analyze multiple variables or even

multiple sets of variables simultaneously. Business problems involve multivariate data

analysis: most employee motivation research customer psychographic profiles research that seeks to identify viable market segments

The “Variate” in MultivariateThe “Variate” in Multivariate

Variate A mathematical way in which a set of

variables can be represented with one equation.

A linear combination of variables, each contributing to the overall meaning of the variate based upon an empirically derived weight.

A function of the measured variables involved in an analysis: Vk = f (X1, X2, . . . , Xm )

EXHIBIT 24.EXHIBIT 24.11 Which Multivariate Approach Is Appropriate?Which Multivariate Approach Is Appropriate?

24–153

Classifying Multivariate Classifying Multivariate TechniquesTechniques

Dependence Techniques Explain or predict one or more dependent

variables. Needed when hypotheses involve distinction

between independent and dependent variables. Types:

Multiple regression analysis Multiple discriminant analysis Multivariate analysis of variance Structural equations modeling

Classifying Multivariate Classifying Multivariate Techniques (cont’d)Techniques (cont’d)

Interdependence Techniques Give meaning to a set of variables or seek

to group things together. Used when researchers examine questions

that do not distinguish between independent and dependent variables.

Types: Factor analysis Cluster analysis Multidimensional scaling

Classifying Multivariate Classifying Multivariate Techniques (cont’d)Techniques (cont’d)

Influence of Measurement Scales The nature of the measurement scales will

determine which multivariate technique is appropriate for the data.

Selection of a multivariate technique requires consideration of the types of measures used for both independent and dependent sets of variables.

Nominal and ordinal scales are nonmetric. Interval and ratio scales are metric.

24–156

EXHIBIT 24.EXHIBIT 24.22 Which Multivariate Dependence Technique Should I Use?Which Multivariate Dependence Technique Should I Use?

24–157

EXHIBIT 24.EXHIBIT 24.33 Which Multivariate Interdependence Technique Should I Use?Which Multivariate Interdependence Technique Should I Use?

Analysis of DependenceAnalysis of Dependence General Linear Model (GLM)

A way of explaining and predicting a dependent variable based on fluctuations (variation) from its mean due to changes in independent variables.

μ = a constant (overall mean of the dependent variable)

∆X and ∆F = changes due to main effect independent variables(experimental variables) and blocking independent variables (covariates or grouping variables)

∆ XF = represents the change due to the combination(interaction effect) of those variables.

Interpreting Multiple RegressionInterpreting Multiple Regression Multiple Regression Analysis

An analysis of association in which the effects of two or more independent variables on a single, interval-scaled dependent variable are investigated simultaneously.

inni eXbXbXbXbbY 3322110

•Dummy variable The way a dichotomous (two group)

independent variable is represented in regression analysis by assigning a 0 to one group and a 1 to the other.

Multiple Regression AnalysisMultiple Regression Analysis

A Simple Example Assume that a toy manufacturer wishes to explain

store sales (dependent variable) using a sample of stores from Canada and Europe.

Several hypotheses are offered: H1: Competitor’s sales are related negatively to

sales. H2: Sales are higher in communities with a sales

office thanwhen no sales office is present.

H3: Grammar school enrollment in a community is related

positively to sales.

Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d) Statistical Results of the Multiple Regression

Regression Equation:

Coefficient of multiple determination (R2) = 0.845

F-value= 14.6, p < 0.05

321 7362115387018102 XXXY ....

Multiple Regression Analysis Multiple Regression Analysis (cont’d)(cont’d)

Regression Coefficients in Multiple Regression Partial correlation

The correlation between two variables after taking into account the fact that they are correlated with other variables too.

R2 in Multiple Regression The coefficient of multiple determination in

multiple regression indicates the percentage of variation in Y explained by all independent variables.

24–163


Statistical Significance in Multiple Regression F-test

Tests statistical significance by comparing the variation explained by the regression equation to the residual error variation.

Allows for testing of the relative magnitudes of the sum of squares due to the regression (SSR) and the error sum of squares (SSE).

MSE

MSR

knSSe

kSSrF

1/

/


Degrees of Freedom (d.f.) k = number of independent variables n = number of observations or

respondents Calculating Degrees of Freedom (d.f.)

d.f. for the numerator = k d.f. for the denominator = n - k - 1

FF-test-test

MSE

MSR

knSSe

kSSrF

1/

/


Interpreting Multiple Interpreting Multiple Regression ResultsRegression Results

ANOVA (n-way) and MANOVAANOVA (n-way) and MANOVA

Multivariate Analysis of Variance (MANOVA) A multivariate technique that predicts

multiple continuous dependent variables with multiple categorical independent variables.

ANOVA (n-way) and MANOVA ANOVA (n-way) and MANOVA (cont’d)(cont’d)

Interpreting N-way (Univariate) ANOVA1. Examine overall model F-test result. If

significant, proceed.2. Examine individual F-tests for individual

variables.3. For each significant categorical independent

variable, interpret the effect by examining the group means.

4. For each significant, continuous covariate, interpret the parameter estimate (b).

5. For each significant interaction, interpret the means for each combination.

Discriminant AnalysisDiscriminant Analysis A statistical technique for predicting the

probability that an object will belong in one of two or more mutually exclusive categories (dependent variable), based on several independent variables. To calculate discriminant scores, the linear

function used is:

niniii XbXbXbZ 2211

Discriminant Analysis Discriminant Analysis ExampleExample

332211 XbXbXbZ

321 0007001300690 XXX ...

EXHIBIT 24.EXHIBIT 24.55 Multivariate Dependence Techniques SummaryMultivariate Dependence Techniques Summary

Factor AnalysisFactor Analysis Statistically identifies a reduced number

of factors from a larger number of measured variables.

Types: Exploratory factor analysis (EFA)—performed

when the researcher is uncertain about how many factors may exist among a set of variables.

Confirmatory factor analysis (CFA)—performed when the researcher has strong theoretical expectations about the factor structure before performing the analysis.

EXHIBIT 24.EXHIBIT 24.66 A Simple Illustration of Factor AnalysisA Simple Illustration of Factor Analysis

Factor Analysis (cont’d)Factor Analysis (cont’d)

How Many Factors Eigenvalues are a measure of how much

variance is explained by each factor. Common rule:

Base the number of factors on the number of eigenvalues greater than 1.0.

Factor Loading Indicates how strongly a measured

variable is correlated with a factor.

Factor Analysis (cont’d)Factor Analysis (cont’d) Factor Rotation

A mathematical way of simplifying factor analysis results to better identify which variables “load on” which factors.

Most common procedure is varimax rotation. Data Reduction Technique

Approaches that summarize the information from many variables into a reduced set of variates formed as linear combinations of measured variables.

The rule of parsimony: an explanation involving fewer components is better than one involving many more.


Creating Composite Scales with Factor Results When a clear pattern of loadings exists, the

researcher may take a simpler approach by summing the variables with high loadings and creating a summated scale.

Very low loadings suggest a variable does not contribute much to the factor.

The reliability of each summated scale is tested by computing a coefficient alpha estimate.


Communality A measure of the percentage of a

variable’s variation that is explained by the factors.

A relatively high communality indicates that a variable has much in common with the other variables taken as a group.

Communality for any variable is equal to the sum of the squared loadings for that variable.


Total Variance Explained Squaring and totaling each loading factor;

dividing the total by the number of factors provides an estimate of variance in a set of variables explained by a factor.

This explanation of variance is much the same as R2 in multiple regression.

1 - 179

SPSS SPSS WindowsWindows

To select this procedure using SPSS for To select this procedure using SPSS for Windows, click:Windows, click:

Analyze>Data Reduction>Factor …Analyze>Data Reduction>Factor …

SPSS Windows: Principal Components SPSS Windows: Principal Components

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.2.2. Click DATA REDUCTION and then FACTOR.Click DATA REDUCTION and then FACTOR.3.3. Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],” Move “Prevents Cavities [v1],” “Shiny Teeth [v2],” “Strengthen Gums [v3],”

“Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth “Freshens Breath [v4],” “Tooth Decay Unimportant [v5],” and “Attractive Teeth [v6]” into the VARIABLES box[v6]” into the VARIABLES box

4.4. Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check Click on DESCRIPTIVES. In the pop-up window, in the STATISTICS box check INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND INITIAL SOLUTION. In the CORRELATION MATRIX box, check KMO AND BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click BARTLETT’S TEST OF SPHERICITY and also check REPRODUCED. Click CONTINUE.CONTINUE.

5.5. Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL Click on EXTRACTION. In the pop-up window, for METHOD select PRINCIPAL COMPONENTS (default). In the ANALYZE box, check CORRELATION COMPONENTS (default). In the ANALYZE box, check CORRELATION MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the MATRIX. In the EXTRACT box, check EIGEN VALUE OVER 1(default). In the DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.DISPLAY box, check UNROTATED FACTOR SOLUTION. Click CONTINUE.

6.6. Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY Click on ROTATION. In the METHOD box, check VARIMAX. In the DISPLAY box, check ROTATED SOLUTION. Click CONTINUE.box, check ROTATED SOLUTION. Click CONTINUE.

7.7. Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE Click on SCORES. In the pop-up window, check DISPLAY FACTOR SCORE COEFFICIENT MATRIX. Click CONTINUE.COEFFICIENT MATRIX. Click CONTINUE.


Cluster AnalysisCluster Analysis Cluster analysis

A multivariate approach for grouping observations based on similarity among measured variables.

Cluster analysis is an important tool for identifying market segments.

Cluster analysis classifies individuals or objects into a small number of mutually exclusive and exhaustive groups.

Objects or individuals are assigned to groups so that there is great similarity within groups and much less similarity between groups.

The cluster should have high internal (within-cluster) homogeneity and external (between-cluster) heterogeneity.

EXHIBIT 24.EXHIBIT 24.77 Clusters of Individuals on Two DimensionsClusters of Individuals on Two Dimensions

24–184

EXHIBIT 24.EXHIBIT 24.88 Cluster Analysis of Test-Market CitiesCluster Analysis of Test-Market Cities

1 - 185


To select this procedure using SPSS for To select this procedure using SPSS for Windows, click:Windows, click:

Analyze>Classify>Hierarchical Cluster …Analyze>Classify>Hierarchical Cluster …

Analyze>Classify>K-Means Cluster …Analyze>Classify>K-Means Cluster …

Analyze>Classify>Two-Step Cluster Analyze>Classify>Two-Step Cluster

SPSS Windows: Hierarchical ClusteringSPSS Windows: Hierarchical Clustering


2.2. Click CLASSIFY and then HIERARCHICAL CLUSTER.Click CLASSIFY and then HIERARCHICAL CLUSTER.

3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.“Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.

4.4. In the CLUSTER box, check CASES (default option). In the DISPLAY box, check In the CLUSTER box, check CASES (default option). In the DISPLAY box, check STATISTICS and PLOTS (default options).STATISTICS and PLOTS (default options).

5.5. Click on STATISTICS. In the pop-up window, check AGGLOMERATION Click on STATISTICS. In the pop-up window, check AGGLOMERATION SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. SCHEDULE. In the CLUSTER MEMBERSHIP box, check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER Then, for MINIMUM NUMBER OF CLUSTERS, enter 2 and for MAXIMUM NUMBER OF CLUSTERS, enter 4. Click CONTINUE.OF CLUSTERS, enter 4. Click CONTINUE.

6.6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE box, check ALL CLUSTERS (default). In the ORIENTATION box, check box, check ALL CLUSTERS (default). In the ORIENTATION box, check VERTICAL. Click CONTINUE.VERTICAL. Click CONTINUE.

7.7. Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the Click on METHOD. For CLUSTER METHOD, select WARD’S METHOD. In the MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. MEASURE box, check INTERVAL and select SQUARED EUCLIDEAN DISTANCE. Click CONTINUE.Click CONTINUE.


SPSS Windows: K-Means SPSS Windows: K-Means ClusteringClustering


2.2. Click CLASSIFY and then K-MEANS CLUSTER.Click CLASSIFY and then K-MEANS CLUSTER.

3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the VARIABLES box.into the VARIABLES box.

4.4. For NUMBER OF CLUSTER, select 3.For NUMBER OF CLUSTER, select 3.

5.5. Click on OPTIONS. In the pop-up window, in the STATISTICS Click on OPTIONS. In the pop-up window, in the STATISTICS box, check INITIAL CLUSTER CENTERS and CLUSTER box, check INITIAL CLUSTER CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click CONTINUE.INFORMATION FOR EACH CASE. Click CONTINUE.


SPSS Windows: Two-Step SPSS Windows: Two-Step ClusteringClustering


2.2. Click CLASSIFY and then TWO-STEP CLUSTER.Click CLASSIFY and then TWO-STEP CLUSTER.

3.3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into Buys [v4],” “Don’t Care [v5],” and “Compare Prices [v6]” into the CONTINUOUS VARIABLES box.the CONTINUOUS VARIABLES box.

4.4. For DISTANCE MEASURE, select EUCLIDEAN.For DISTANCE MEASURE, select EUCLIDEAN.

5.5. For NUMBER OF CLUSTER, select DETERMINE For NUMBER OF CLUSTER, select DETERMINE AUTOMATICALLY.AUTOMATICALLY.

6.6. For CLUSTERING CRITERION, select AKAIKE’S INFORMATION For CLUSTERING CRITERION, select AKAIKE’S INFORMATION CRITERION (AIC).CRITERION (AIC).


Multidimensional ScalingMultidimensional Scaling

Multidimensional Scaling Measures objects in multidimensional

space on the basis of respondents’ judgments of the similarity of objects.

EXHIBIT 24.EXHIBIT 24.99 Perceptual Map of Six Graduate Business Schools: Simple SpacePerceptual Map of Six Graduate Business Schools: Simple Space

1 - 192

1 - 193


The multidimensional scaling program allows individual The multidimensional scaling program allows individual differences as well as aggregate analysis using ALSCAL. The differences as well as aggregate analysis using ALSCAL. The level of measurement can be ordinal, interval or ratio. Both level of measurement can be ordinal, interval or ratio. Both the direct and the derived approaches can be accommodated. the direct and the derived approaches can be accommodated.

To select multidimensional scaling procedures using SPSS To select multidimensional scaling procedures using SPSS for Windows, click:for Windows, click:

Analyze>Scale>Multidimensional Scaling …Analyze>Scale>Multidimensional Scaling …

The conjoint analysis approach can be implemented using The conjoint analysis approach can be implemented using regression if the dependent variable is metric (interval or regression if the dependent variable is metric (interval or ratio). ratio).

This procedure can be run by clicking:This procedure can be run by clicking:

Analyze>Regression>Linear …Analyze>Regression>Linear …

SPSS Windows : MDSSPSS Windows : MDSFirst convert similarity ratings to distances by subtracting each First convert similarity ratings to distances by subtracting each value of Table 21.1 from 8. The form of the data matrix has to value of Table 21.1 from 8. The form of the data matrix has to be square symmetric (diagonal elements zero and distances be square symmetric (diagonal elements zero and distances above and below the diagonal. See SPSS file Table 21.1 Input). above and below the diagonal. See SPSS file Table 21.1 Input).

1.1. Select ANALYZE from the SPSS menu bar.Select ANALYZE from the SPSS menu bar.2.2. Click SCALE and then MULTIDIMENSIONAL SCALING Click SCALE and then MULTIDIMENSIONAL SCALING

(ALSCAL).(ALSCAL).3.3. Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate Move “Aqua-Fresh [AquaFresh],” “Crest [Crest],” “Colgate

[Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite [Colgate],” “Aim [Aim],” “Gleem [Gleem],” “Ultra Brite [UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],” [UltraBrite],” “Ultra-Brite [var00007],” “Close-Up [CloseUp],” “Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into “Pepsodent [Pepsodent],” and “Sensodyne [Sensodyne]” into the VARIABLES box.the VARIABLES box.

SPSS Windows : MDSSPSS Windows : MDS

4.4. In the DISTANCES box, check DATA ARE DISTANCES. In the DISTANCES box, check DATA ARE DISTANCES. SHAPE should be SQUARE SYMMETRIC (default).SHAPE should be SQUARE SYMMETRIC (default).

5.5. Click on MODEL. In the pop-up window, in the LEVEL OF Click on MODEL. In the pop-up window, in the LEVEL OF MEASUREMENT box, check INTERVAL. In the SCALING MEASUREMENT box, check INTERVAL. In the SCALING MODEL box, check EUCLIDEAN DISTANCE. In the MODEL box, check EUCLIDEAN DISTANCE. In the CONDITIONALITY box, check MATRIX. Click CONTINUE.CONDITIONALITY box, check MATRIX. Click CONTINUE.

6.6. Click on OPTIONS. In the pop-up window, in the DISPLAY Click on OPTIONS. In the pop-up window, in the DISPLAY box, check GROUP PLOTS, DATA MATRIX and MODEL box, check GROUP PLOTS, DATA MATRIX and MODEL AND OPTIONS SUMMARY. Click CONTINUE.AND OPTIONS SUMMARY. Click CONTINUE.


24–197

EXHIBIT 24.EXHIBIT 24.1010 Summary of Multivariate Techniques for Analysis of InterdependenceSummary of Multivariate Techniques for Analysis of Interdependence

Further ReadingFurther Reading COOPER, D.R. AND SCHINDLER, P.S. (2011)

BUSINESS RESEARCH METHODS, 11TH EDN, MCGRAW HILL

ZIKMUND, W.G., BABIN, B.J., CARR, J.C. AND GRIFFIN, M. (2010) BUSINESS RESEARCH METHODS, 8TH EDN, SOUTH-WESTERN

SAUNDERS, M., LEWIS, P. AND THORNHILL, A. (2012) RESEARCH METHODS FOR BUSINESS STUDENTS, 6TH EDN, PRENTICE HALL.

SAUNDERS, M. AND LEWIS, P. (2012) DOING RESEARCH IN BUSINESS & MANAGEMENT, FT PRENTICE HALL.

Abdm4064 week 11 data analysis

Business

Transcript of Abdm4064 week 11 data analysis