CHAPTER 16 THE FURTHER DATA ANALYSIS
description
Transcript of CHAPTER 16 THE FURTHER DATA ANALYSIS
![Page 1: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/1.jpg)
CHAPTER 16 THE FURTHER DATA ANALYSIS
![Page 2: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/2.jpg)
16.1 Introduction
![Page 3: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/3.jpg)
16.2FURTHER DATA ANALYSIS: (MEASURED
V ATTRIBUTE) FDA is procedure that enables a decision to
be made, based on the sample evidence: There is no relationship There is a relationship
These statistical procedures are called hypothesis tests
![Page 4: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/4.jpg)
Hypothesis A statement about a population developed for
purpose of testing. Hypothesis tests
A Procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement.
Four stages of hypothesis tests Stage 1: Specifying the hypotheses. Stage 2: Defining the test parameters and the
decision rule. Stage 3: Examining the sample evidence. Stage 4: The conclusions.
![Page 5: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/5.jpg)
FDA for Measured v Attribute requires two different hypotheses tests Two levels of attribute explanatory variable three or more levels of attribute
explanatory variable
![Page 6: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/6.jpg)
16.3 HYPOTHESIS TEST 1 Measured Response v Attribute Explanatory Variable with exactly two levels
Illustrative Example Response Variable: AMOUNT Spent on Clothes per
month Attribute Explanatory Variable GENDER
(Male/Female) If Males and Females have the same 'spending on
clothes' characteristics then the average amounts spent monthly by Males and by Female should be the same.
If Male and Females have different 'spending on clothes' characteristics then the average amount spent monthly by Males and Female would be different.
![Page 7: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/7.jpg)
Total population can be split into two or more sub-populations according to the level of the attribute, a population of Males and a population of Females.
![Page 8: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/8.jpg)
POPULATION MEANS THE SAME
![Page 9: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/9.jpg)
Stage 1: Specifying the hypotheses. NULL HYPOTHESIS:
ALTERNATIVE HYPOTHESIS
100 : H
101 : H
![Page 10: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/10.jpg)
Stage 2: The Decision Rule Results of IDA for Illustrative Example Outcome 1
Male Mean = £45 (Stand Dev = £20)Female Mean = £55 (Stand Dev = £20)Noenough evidence to form a clear judgement FDA is required.
![Page 11: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/11.jpg)
Outcome 2Male Mean = £45 (Stand Dev = £10)Female Mean = £55 (Stand Dev = £10) The widths of the boxes would lead to the decision
from the I.D.A. that there is definitely a link.
![Page 12: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/12.jpg)
Outcome 3Male Mean = £45 (Stand Dev = £40)Female Mean = £55 (Stand Dev = £40) FDA is required and Stand Dev is bigger
![Page 13: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/13.jpg)
Measure of Relative Separation of the boxplots Considering not only MEANS but also STANDA
RD DEVIATIONof the two samples Finding “Threshold value”
If Measure of Relative Separation > Threshold value, there is a connection If Measure of Relative Separation < Threshold value there is no connection
![Page 14: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/14.jpg)
Student's t Ratio (a measure of the relative separation of the boxplots )Sample data is Normal distributionStudent’s t-testtcalc --- value of t-ratio
2
22
1
21
21
ns
ns
XXtcalc
![Page 15: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/15.jpg)
Bigger |tcalc| Larger SeparationOutcome2 >Outcome 1>Outcome3Set up decision rule
![Page 16: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/16.jpg)
Decision RuleIf tcalc value is numerically between the range - tcri
t & + tcrit then the decision rule is flagging H0 Supporting the viewpoint that there is no relationship
If tcalc value is numerically outside the range - tcrit & + tcrit then the decision rule is flagging H1 Supporting the viewpoint that there is a relationship.
Value of tcrit
Depending upon the sample size, through a measure called Degrees of Freedom(DF)
Could be looked up in the tables.
![Page 17: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/17.jpg)
The hypothesis test described above is called the student's t test and is a two tailed test using the 5% level of significance.
Formally the level of significance may be defined as the chance the tester is prepared to take in coming to the wrong conclusion about H0
![Page 18: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/18.jpg)
Stage 3: Doing the calculations If tcalc value is numerically between the ran
ge - tTable & + tTable then the decision rule is flagging H0 There is no relationship
If tcalc value is numerically outside the range - tTable & + tTable then the decision rule is flagging H1 There is a relationship
![Page 19: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/19.jpg)
Stage 4: The conclusions In terms of the original business problem
specification For example, On the basis of the sample
evidence there is evidence to suggest that there is a link between the amount spent on clothes and gender, Males on average spend about £45 per month and females spend on average £55.
![Page 20: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/20.jpg)
Worked Example CREDIT IDA
![Page 21: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/21.jpg)
FDA Stage 1: Define the hypotheses:
0--true average amount borrowed on credit for house owners
1--true average amount borrowed on credit for non house owners}
100 : H
101 : H
![Page 22: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/22.jpg)
Stage 2: Defining the test parameters and the decision ruleStudent’s t-test
![Page 23: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/23.jpg)
Stage 3: Examining the sample evidence MINITAB to do the calculations on the sampl
e data
tTable = 1.96 tcalc = -4.51 lies outside the range -1.96 to 1.9
6, reject H0 , accept H1
![Page 24: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/24.jpg)
Stage 4: The conclusions. Based on the sample evidence there is a
connection between Amount Borrowed on Credit and House-ownership. On average house owners borrow £869.5 and non house owners borrow £1009.00.
![Page 25: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/25.jpg)
16.4 HYPOTHESIS TEST 2: Measured Response v Attribute Explanatory Variable with
three or more levels For example
Response variable: amount spent in a supermarket Explanatory Variable: the customer's marital status--four
categories, Single, Married, Divorced, or widowed The common data analysis methodology applies and has
the following three stages: Initial Data Analysis Further Data Analysis Describing the Relationship
![Page 26: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/26.jpg)
Example 1: No evidence of a connection.
Example 2: Some degree of separation Measure of relative separation
![Page 27: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/27.jpg)
Hypothesis Test--Four stages Stage 1:Specifying the hypotheses. Stage 2:Defining the test parameters and
the decision rule. Stage 3:Examining the sample evidence. Stage 4:The conclusions.
![Page 28: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/28.jpg)
Stage 1: Specifying the hypotheses. By definition if there is no connection then
all the population means are equal, whilst if there is a connection at least on of the means must be different,
Null hypotheses
Alternative hypotheses
43210 : H
different ismean on least at :1H
![Page 29: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/29.jpg)
Stage 2: Defining the test parameters and the decision rule. Decision rule: based on F-Ratio. Test procedure: Oneway Analysis of Variance ANalysis Of VAriance : ANOVA Fcrit is the particular value of F that split the area un
der the distribution in the proportions 95%/5%.
![Page 30: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/30.jpg)
Decision ruleIf the value of Fcalc is between 0 and Fcrit then co
nclude that there is no linkIf the value of Fcalc is greater than Fcrit then concl
ude that on the basis of the sample evidence there is a link.
![Page 31: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/31.jpg)
Stage 3:Examining the sample evidence
Example1: Fcalc would be small. The F-Ratio is defined in such a way that if the
null hypothesis is true, i.e. all the means are equal then Fcalc is expected to be 1.
Example 2Fcalc measures the relative separationwider the separation, larger Fcalc value
![Page 32: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/32.jpg)
To find Threshold Value: Fcrit
For F-Ratio: two degrees of freedom(depends on sample siz
e)Look up the statistical tables: Ftable
Suppose:Fcalc
= 8.91 The degrees of freedom as (3, 80) Then Ftable=2.72
![Page 33: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/33.jpg)
Stage 4:The conclusions. Since the value of Ftable is larger than the va
lue of Fcalc the conclusion is that on the basis of the sample evidence, there is enough evidence to suggest that there is a link between amount spent by customers in a supermarket and the customer's marital status. The remaining issue is to describe the connection.
![Page 34: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/34.jpg)
Worked Example CREDIT data scenario
Question: The explanatory variable 'REGION' influence the
response variable 'CREDIT'? The amount borrowed on credit is dependent upon the
region of the country where the customer lives?
![Page 35: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/35.jpg)
IDA
![Page 36: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/36.jpg)
FDA Stage 1:Specifying the hypotheses.
Stage 2: Defining the test parameters and the decision rule.
543210 : Hdifferent ismean on least at :1H
![Page 37: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/37.jpg)
Stage 3:Examining the sample evidenceMINITAB—ANOVA—ONE WAY
Analysis of Variance for CREDIT Source DF SS MS F PREGION 4 3445125 861281 5.10 0.0Error 649 109631953 168924 Total 653 113077078
Ftable=2.39Since Fcalc= 5.10 > Ftable=2.39 , the sample evide
nce is indicating a link between "Amount borrowed on credit" and "The region the customer lives in"
![Page 38: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/38.jpg)
Stage 4:The conclusions
Examination of the average values shows London to be the region with the highest amount on credit, then the South-West and South-East with similar average credits; the North having the lowest amount on credit.
REGION AMOUNT
SOUTH-WEST £977.10
SOUTH-EAST £958.40
LONDON £1061.80
MIDLANDS £898.10
NORTH £864.30
![Page 39: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/39.jpg)
Examine diagram displaying the 95% confidence intervals for each level of the attribute variable
Interpretation:The decision rule is that if the confidence limits
don't overlap then there is a real difference in the sample means for the two levels of the attribute.
For example Region 3 London has an average amount on credit that is statistically significantly larger than average amount on credit for Regions 4, The Midlands, because the two confidence limits don't overlap.
![Page 40: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/40.jpg)
The final description of the link can be summarised, as the amount spent on credit in London is significantly higher than in the Midlands and the North.
level 2 level 3 level 4 level 5
level 1 No Difference No Difference No Difference No Difference
level 2 No Difference No Difference No Difference
level 3 Difference Difference
level 4 No Difference
![Page 41: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/41.jpg)
![Page 42: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/42.jpg)
![Page 43: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/43.jpg)
![Page 44: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/44.jpg)
![Page 45: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/45.jpg)
![Page 46: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/46.jpg)
![Page 47: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/47.jpg)
![Page 48: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/48.jpg)
![Page 49: CHAPTER 16 THE FURTHER DATA ANALYSIS](https://reader035.fdocuments.in/reader035/viewer/2022062521/56816860550346895ddeaae3/html5/thumbnails/49.jpg)