DATA ANALYSIS Module Code CA660 Supplementary Extended examples.

DATA ANALYSIS

Module Code CA660Supplementary

Extended examples

Extended example – Value of informationRecall : price of new computer tablet. When expected payoffs used in decision strategy, that action is selected which has the largest expected payoff. Hence, Expected Value of = (Average Payoff using a perfect predictor) - Perfect Information (Average payoff for whatever Action actually select)

States of nature (Si) = time to major competitor introducing a similar product. Maximum payoffs are summarised below for each Si

States of nature Max. payoff (millions) Suppose P{Si } to beS1 < 6 months 250 (for A1) 0.1S2 6-12 months 320 (for A1) 0.5S3 12-18 months 410 (for A4) 0.3S4 > 18 months 550 (for A4) 0.1

Exp. Payoff (using perfect predictor) = (0.1)(250) + (0.5)(320) + (0.3)(410) (0.1)(550) = 363 So, EVPI = 363 – 330 = 33 (Million)

N.B. upper limit for any info. on future Co. purchase. Predicted state of nature estimated. So, depends e.g. on consultant reliability, volatility of market etc.

Example contd.Not using Expected Payoff as decision base: – e.g. gambling, e.g. Insure vs do not. What about Risk? – can use variance of respective payoffs as have seen.

Need – way to combine given attitude to payoff with corresponding risk of each alternative (profit vs loss) Utility Value

Steps: 1. Assign utility values to smallest and largest payoff , U – range 0 to 100. so have U(Min) = 0,

U(Max) =100 (Relative values important)2. Utility Value for any payoff (F) to be considered = U(F) = P x 100. [P = what the probability

would have to be of getting that payoff with certainty to be equally attractive to getting Max payoff (with prob 1-P)].

Note: this probability relates to willingness to take a risk, not to Prob{Si } Check attitude to risk on Utility vs Profit line. If above = risk avoider

So suppose have Utility values for a simple example: P{S1} = 0.7 P{S2} = 0.3A1 Gilt-edged 50 50 A1: Exp. Utility = (50)(0.7) +(50)(0.3) = 50A2 Oil well 0 100 A2: Exp. Utility = (0)(0.7) +(100)(0.3) = 30

Larger expected utility associated with gilt-edged (as risk avoider)

Example contd. So from the table for various payoffs for computer tablet selling-price decision, Min = 80, Max = 550, so U(80) = 0, U(550)= 100. Easiest way to determine utility values for other 14 possible payoffs, is to sketch a curve of U vs Profit (Payoff)

Typically, pick values in range between min and max payoff and ask the question “for a payoff of 200 e.g., what value of P would make getting that payoff with certainty equally as attractive as a payoff, of 550 with prob P and a payoff of 80 with prob. 1-P. If decision maker responds by saying P = 0.55 acceptable, then U(200) = 0.55 x 100 = 55Curve basis

Hypothetical payoff 100 150 200 300 400 500Prob P 0.2 0.4 0.55 0.75 0.90 0.97Utlility U 20 40 55 75 90 97

Thus for actual payoffs corresponding to Ai , Si , read off curve.

Example contd. S1 S2 S3 S4 Expected Action (0.1) (0.5) (0.3) (0.1) UtilityA1: price at 1500 67 79 84 90 80.4A2: price at 1750 40 68 75 87 69.2A3: price at 2000 30 74 88 94 75.8A4: price at 2500 0 72 91 100 73.3

Where e.g. for Action A2: Expected Utility = (0.1) 40) + (0.5) (68)+(0.3) (75) + (0.1)(87) = 69.2

Choosing the action with largest expected utility, the decision is to select action A1 (selling price 1500). For this particular example, it appears that A1 maximises both expected payoff and expected utility.

6

Examples in General Linear Models - ANOVA• Suppose, as part of QA, components are subjected to a strength test,

where rods with pointed tips are forced into a hinge and movement is measured. There are 3 types of tip available, 10 hinges are randomly selected and both tips used against each hinge.

the coded data (RANDOMISED BLOCK design) are:

Hinge: 1 2 3 4 5 6 7 8 9 10Tip #1 68 40 82 56 70 80 47 55 78 53Tip #2 72 43 89 60 75 91 58 68 77 65 Tip #3 65 42 84 50 68 86 50 52 75 60

• If assume Normality (interested in Random effects) + assume zero covariance between genetic effects and error

ijiij Ty

222 Tm

),0(~),,0(~),,(~ 222eTm NIDNIDTNIDy

7

Example - RB Models contd.• What about other factors, e.g. Process and TP interactions?

Extension to Simple Model.

ANOVA Table: Randomized Blocks within Process . For b = replications. Focus - on Tip strength

Source dof Expected MSQ Process p-1 know there are differences

Blocks (b-1)p again – know there are differences

Tip type T-1 TP (T-1)(p-1) Error (b-1)(T-1)PNote: individuals blocked within processes, so process effect intrinsic to error.

Model form is standard, but only meaningful comparisons are within process, hence form of random error = population variance = ; so random effects of interest obtained from ratios of variances.

222TTP bPb

22TPb

2

ijkijjiijk TPPTy )(

2

Tip strength effects measured within blocks

Example contd.ANOVA TableSource dof SSQ MSQ F-Ratio

Factor k - 1 SS(Factor) SS(Factor)/k-1 MS(Factor)/MSE(Tips) 2 304.2 152.1 15.63 *

Blocks b – 1 SS(Blocks) SS(Blocks)/b – 1 MS(Blocks)/MSE(Hinges) 9 5705.0 633.9 65.15 *

Error (k – 1) ( b – 1) SS(Error) SS(Error)/(k – 1)(b – 1) 18 175.1 9.73______________________________________________________________________Total 29 6184.3

Note: If treat the 30 observations as 3 replicates for each Tip (one-way design) and ignore blocking , F not significant

Example – factorial design

Sickness claims costs by employee category and sex : 2 factors + interaction.

p = 3 replicates per cell

Employee Classification C1 C2 C3 C4 Total Ave.Sex M 190,225,200 135, 180,100 260, 330, 350 305, 275, 240 2790 232.50 (615) (415) (940) (820) F 235,190, 270 275,305,285 160, 205, 140 155,110, 75 2405 200.42 (695) (865) (505) (340)

Total 1310 1280 1445 1160 5195

ijkijjiijk CSSCy

Example contd.ANOVA TableSource dof SSQ MSQ F-Ratio

Factor 1 r - 1 SS(Factor 1) SS(Factor 1)/r-1 MS(Factor 1)/MSE(Sex) 1 6176.04 6176.04 5.05 *

Factor 2 c – 1 SS(Factor 2) SS(Factor 2)/c – 1 MS(Factor 2)/MSE(Emp. Class) 3 6853.12 2284.37 1.87

Interaction (r - 1)(c – 1) SS(Cells) SS(Cells)/ (r – 1) (c – 1) MS (Cells)/MSE 3 98578.13 32859.38 26.87 *

Error (r – 1) ( c – 1) p SS(Error) SS(Error)/(r – 1)(c – 1) p 16 19566.67 1222.92______________________________________________________________________Total rcp – 1 SS(Total) 23 131173.96

Note: To see effect of interaction, sketch amounts claimed (on average) vs employment category, for M and F

Multiple Populations: Mendel - 2 and G

Plant Round Seed Wrinkled Seed dof 2 p-value G p-value

Count Expected Count Expected 1 45 42.75 12 14.25 1 0.47 0.49 0.49 0.492 27 26.25 8 8.75 1 0.09 0.77 0.09 0.773 24 23.25 7 7.75 1 0.10 0.76 0.10 0.754 19 21.75 10 7.25 1 1.39 0.24 1.30 0.265 32 32.25 11 10.75 1 0.01 0.93 0.01 0.936 26 24.00 6 8.00 1 0.67 0.41 0.71 0.407 88 84.00 24 28.00 1 0.76 0.38 0.79 0.388 22 24.00 10 8.00 1 0.67 0.41 0.63 0.439 28 25.50 6 8.50 1 0.98 0.32 1.06 0.30

10 25 24.00 7 8.00 1 0.17 0.68 0.17 0.68Total 336 101 10 5.30 5.34

Pooled 336 327.75 101 109.25 1 0.83 0.36 0.85 0.36

Heterogeneity 9 4.47 0.88 4.50 0.88

12

Multiple Populations - summary

• Parallels

• Partitions therefore

and Gheterogeneity = Gtotal - GPooled (n=no. classes, p = no.populations)

2

p

i

n

j ij

ijiTotal E

OOG

1 1

log2

n

j

p

i

p

i

ij

p

i

ij

ijPooled

E

O

OG1 1

1

1log2

Smoothing Time series – choice of smoothing constant for exponentially smoothed forecastsSuppose actual weekly demand, Yt as below, Want Smoothed values St for smoothing constants () of 0.1 and 0.8 contrasted; S1 = Y1 by convention

Week 1 2 3 4 5 6 7 8 9 10

Actual demandYt

100 100 100 100 150 100 100 100 100 100

Forecast St

(with = 0.1)100 100 100 100 105 104.5 104.05 103.645 103.2805 102.95245

Forecast St (with = 0.8)

100 100 100 100 140 108 101.6 100.32 100.064 100.0128

DATA ANALYSIS Module Code CA660 Supplementary Extended examples.

Documents

Transcript of DATA ANALYSIS Module Code CA660 Supplementary Extended examples.