DATA ANALYSIS Module Code CA660 Supplementary Extended examples.
description
Transcript of DATA ANALYSIS Module Code CA660 Supplementary Extended examples.
DATA ANALYSIS
Module Code CA660Supplementary
Extended examples
Extended example – Value of informationRecall : price of new computer tablet. When expected payoffs used in decision strategy, that action is selected which has the largest expected payoff. Hence, Expected Value of = (Average Payoff using a perfect predictor) - Perfect Information (Average payoff for whatever Action actually select)
States of nature (Si) = time to major competitor introducing a similar product. Maximum payoffs are summarised below for each Si
States of nature Max. payoff (millions) Suppose P{Si } to beS1 < 6 months 250 (for A1) 0.1S2 6-12 months 320 (for A1) 0.5S3 12-18 months 410 (for A4) 0.3S4 > 18 months 550 (for A4) 0.1
Exp. Payoff (using perfect predictor) = (0.1)(250) + (0.5)(320) + (0.3)(410) (0.1)(550) = 363 So, EVPI = 363 – 330 = 33 (Million)
N.B. upper limit for any info. on future Co. purchase. Predicted state of nature estimated. So, depends e.g. on consultant reliability, volatility of market etc.
Example contd.Not using Expected Payoff as decision base: – e.g. gambling, e.g. Insure vs do not. What about Risk? – can use variance of respective payoffs as have seen.
Need – way to combine given attitude to payoff with corresponding risk of each alternative (profit vs loss) Utility Value
Steps: 1. Assign utility values to smallest and largest payoff , U – range 0 to 100. so have U(Min) = 0,
U(Max) =100 (Relative values important)2. Utility Value for any payoff (F) to be considered = U(F) = P x 100. [P = what the probability
would have to be of getting that payoff with certainty to be equally attractive to getting Max payoff (with prob 1-P)].
Note: this probability relates to willingness to take a risk, not to Prob{Si } Check attitude to risk on Utility vs Profit line. If above = risk avoider
So suppose have Utility values for a simple example: P{S1} = 0.7 P{S2} = 0.3A1 Gilt-edged 50 50 A1: Exp. Utility = (50)(0.7) +(50)(0.3) = 50A2 Oil well 0 100 A2: Exp. Utility = (0)(0.7) +(100)(0.3) = 30
Larger expected utility associated with gilt-edged (as risk avoider)
Example contd. So from the table for various payoffs for computer tablet selling-price decision, Min = 80, Max = 550, so U(80) = 0, U(550)= 100. Easiest way to determine utility values for other 14 possible payoffs, is to sketch a curve of U vs Profit (Payoff)
Typically, pick values in range between min and max payoff and ask the question “for a payoff of 200 e.g., what value of P would make getting that payoff with certainty equally as attractive as a payoff, of 550 with prob P and a payoff of 80 with prob. 1-P. If decision maker responds by saying P = 0.55 acceptable, then U(200) = 0.55 x 100 = 55Curve basis
Hypothetical payoff 100 150 200 300 400 500Prob P 0.2 0.4 0.55 0.75 0.90 0.97Utlility U 20 40 55 75 90 97
Thus for actual payoffs corresponding to Ai , Si , read off curve.
Example contd. S1 S2 S3 S4 Expected Action (0.1) (0.5) (0.3) (0.1) UtilityA1: price at 1500 67 79 84 90 80.4A2: price at 1750 40 68 75 87 69.2A3: price at 2000 30 74 88 94 75.8A4: price at 2500 0 72 91 100 73.3
Where e.g. for Action A2: Expected Utility = (0.1) 40) + (0.5) (68)+(0.3) (75) + (0.1)(87) = 69.2
Choosing the action with largest expected utility, the decision is to select action A1 (selling price 1500). For this particular example, it appears that A1 maximises both expected payoff and expected utility.
6
Examples in General Linear Models - ANOVA• Suppose, as part of QA, components are subjected to a strength test,
where rods with pointed tips are forced into a hinge and movement is measured. There are 3 types of tip available, 10 hinges are randomly selected and both tips used against each hinge.
the coded data (RANDOMISED BLOCK design) are:
Hinge: 1 2 3 4 5 6 7 8 9 10Tip #1 68 40 82 56 70 80 47 55 78 53Tip #2 72 43 89 60 75 91 58 68 77 65 Tip #3 65 42 84 50 68 86 50 52 75 60
• If assume Normality (interested in Random effects) + assume zero covariance between genetic effects and error
ijiij Ty
222 Tm
),0(~),,0(~),,(~ 222eTm NIDNIDTNIDy
7
Example - RB Models contd.• What about other factors, e.g. Process and TP interactions?
Extension to Simple Model.
ANOVA Table: Randomized Blocks within Process . For b = replications. Focus - on Tip strength
Source dof Expected MSQ Process p-1 know there are differences
Blocks (b-1)p again – know there are differences
Tip type T-1 TP (T-1)(p-1) Error (b-1)(T-1)PNote: individuals blocked within processes, so process effect intrinsic to error.
Model form is standard, but only meaningful comparisons are within process, hence form of random error = population variance = ; so random effects of interest obtained from ratios of variances.
222TTP bPb
22TPb
2
ijkijjiijk TPPTy )(
2
Tip strength effects measured within blocks
Example contd.ANOVA TableSource dof SSQ MSQ F-Ratio
Factor k - 1 SS(Factor) SS(Factor)/k-1 MS(Factor)/MSE(Tips) 2 304.2 152.1 15.63 *
Blocks b – 1 SS(Blocks) SS(Blocks)/b – 1 MS(Blocks)/MSE(Hinges) 9 5705.0 633.9 65.15 *
Error (k – 1) ( b – 1) SS(Error) SS(Error)/(k – 1)(b – 1) 18 175.1 9.73______________________________________________________________________Total 29 6184.3
Note: If treat the 30 observations as 3 replicates for each Tip (one-way design) and ignore blocking , F not significant
Example – factorial design
Sickness claims costs by employee category and sex : 2 factors + interaction.
p = 3 replicates per cell
Employee Classification C1 C2 C3 C4 Total Ave.Sex M 190,225,200 135, 180,100 260, 330, 350 305, 275, 240 2790 232.50 (615) (415) (940) (820) F 235,190, 270 275,305,285 160, 205, 140 155,110, 75 2405 200.42 (695) (865) (505) (340)
Total 1310 1280 1445 1160 5195
ijkijjiijk CSSCy
Example contd.ANOVA TableSource dof SSQ MSQ F-Ratio
Factor 1 r - 1 SS(Factor 1) SS(Factor 1)/r-1 MS(Factor 1)/MSE(Sex) 1 6176.04 6176.04 5.05 *
Factor 2 c – 1 SS(Factor 2) SS(Factor 2)/c – 1 MS(Factor 2)/MSE(Emp. Class) 3 6853.12 2284.37 1.87
Interaction (r - 1)(c – 1) SS(Cells) SS(Cells)/ (r – 1) (c – 1) MS (Cells)/MSE 3 98578.13 32859.38 26.87 *
Error (r – 1) ( c – 1) p SS(Error) SS(Error)/(r – 1)(c – 1) p 16 19566.67 1222.92______________________________________________________________________Total rcp – 1 SS(Total) 23 131173.96
Note: To see effect of interaction, sketch amounts claimed (on average) vs employment category, for M and F
Multiple Populations: Mendel - 2 and G
Plant Round Seed Wrinkled Seed dof 2 p-value G p-value
Count Expected Count Expected 1 45 42.75 12 14.25 1 0.47 0.49 0.49 0.492 27 26.25 8 8.75 1 0.09 0.77 0.09 0.773 24 23.25 7 7.75 1 0.10 0.76 0.10 0.754 19 21.75 10 7.25 1 1.39 0.24 1.30 0.265 32 32.25 11 10.75 1 0.01 0.93 0.01 0.936 26 24.00 6 8.00 1 0.67 0.41 0.71 0.407 88 84.00 24 28.00 1 0.76 0.38 0.79 0.388 22 24.00 10 8.00 1 0.67 0.41 0.63 0.439 28 25.50 6 8.50 1 0.98 0.32 1.06 0.30
10 25 24.00 7 8.00 1 0.17 0.68 0.17 0.68Total 336 101 10 5.30 5.34
Pooled 336 327.75 101 109.25 1 0.83 0.36 0.85 0.36
Heterogeneity 9 4.47 0.88 4.50 0.88
12
Multiple Populations - summary
• Parallels
• Partitions therefore
and Gheterogeneity = Gtotal - GPooled (n=no. classes, p = no.populations)
2
p
i
n
j ij
ijiTotal E
OOG
1 1
log2
n
j
p
i
p
i
ij
p
i
ij
ijPooled
E
O
OG1 1
1
1log2
Smoothing Time series – choice of smoothing constant for exponentially smoothed forecastsSuppose actual weekly demand, Yt as below, Want Smoothed values St for smoothing constants () of 0.1 and 0.8 contrasted; S1 = Y1 by convention
Week 1 2 3 4 5 6 7 8 9 10
Actual demandYt
100 100 100 100 150 100 100 100 100 100
Forecast St
(with = 0.1)100 100 100 100 105 104.5 104.05 103.645 103.2805 102.95245
Forecast St (with = 0.8)
100 100 100 100 140 108 101.6 100.32 100.064 100.0128