Analysis of multivariate transformations. Transformation of the response in regression The...

Post on 28-Mar-2015

214 views 1 download

Tags:

Transcript of Analysis of multivariate transformations. Transformation of the response in regression The...

Analysis of multivariate Analysis of multivariate transformationstransformations

Transformation of the response in Transformation of the response in regression regression

• The normalized power transformation is:

is the geometric mean of the observations

The purpose is to find an estimate of for which the errors in z() are approximately normally distributed with constant variance

Score test for transformationScore test for transformation

0

)()()()( 00

z

zz

)()()()( 000 wzz

Txz )(

)()()( 000 wxz T

The score test TThe score test Tscsc((= = 00) is the ) is the

tt-statistic on the constructed -statistic on the constructed variable variable ww((00))

Multivariate transformationsMultivariate transformations

• In this case yi is a v 1 vector of responses at observation i with yij the observation on response j. The normalized transformation of yij is given by:

is the geometric mean of the jth response

Multivariate transformationsMultivariate transformations

• We assume a multivariate linear regression model of the form

Mult. transformations to normalityMult. transformations to normality

• If the transformed obs. are normally distributed with mean μi and cov. matrix Σ the max. loglikelihood is given by

Mult. transformations to normalityMult. transformations to normality

• If the explanatory variables are the same

• The max. lik. estimator of Σ is given by

ei(λ) is a v 1 vector of residuals for observation i for some value of

The profile loglikelihood (i.e. The profile loglikelihood (i.e. maximized over maximized over μ μ and and Σ) Σ) isis

Multivariate likelihood ratio testMultivariate likelihood ratio test

• The multivariate generalization of TSC is given by:

This statistic must be compared with a 2 distr. with v df.

Swiss heads: monitoring lik. ratio Swiss heads: monitoring lik. ratio test for transf. Htest for transf. H00::λλ=1=1

The last two units (104 and 111) to enter provide all the evidence for a transformation

Boxplot of 6 var. with univariate Boxplot of 6 var. with univariate outliers labelledoutliers labelled

Swiss headsSwiss heads

• The marginal distribution of y4 had the two outliers (units 104 and 111).

• We want to test whether all the evidence for a transformation is due to y4.

• We recalculate the likelihood ratio but now testing whether 4 is equal to 1.

Forward plot of the lik. ratio test HForward plot of the lik. ratio test H00: : 44=1=1

The last two units to enter provide all the evidence for a transformation

Mussels dataMussels data

82 observations on Horse mussels (cozze) from New Zealand. Five variables:

Purpose: to see whether multivariate normality can be obtained by joint transformation of all 5 variables

Mussels data: spmMussels data: spm

Forward lik. ratio for HForward lik. ratio for H00::=1=1

Finding a multivariate transformation Finding a multivariate transformation with the forward searchwith the forward search

• With just one variable for transformation it is extremely easy to use the fan plot from the forward search to find satisfactory transformations and observations which are influential

• With v variables there are 5v combinations of the 5 values of =(-1,-0.5,0,0.5,1)

Suggested procedure for finding Suggested procedure for finding multivariate transformationsmultivariate transformations

• Run the FS through untransformed data, ordering the observations at each m by MD calculated from untransformed observations.

• Estimate at each step.

• Select a preliminary set of transformation parameters

Monitoring of MLE of Monitoring of MLE of H H00:: =1=1

-0.4

0

0.4

0.8

1.2

1.6

Subset size m

la1

la2

la3

la4

la5

HH00: : =(0.5, 0, 0.5, 0, 0)=(0.5, 0, 0.5, 0, 0)

Monitoring of MLE of Monitoring of MLE of HH00:: =(0.5, 0, 0.5, 0, 0)=(0.5, 0, 0.5, 0, 0)

-1

-0.5

0

0.5

1

Subset size m

Mle

of la

mbda

la1

la2

la3

la4

la5

Forward lik. ratio for Forward lik. ratio for HH00::=(0.5,0,0.5,0,0)=(0.5,0,0.5,0,0)

Validation of the transformationValidation of the transformation

• In univariate analysis the likelihood ratio test is

• Asymptotically the null distribution of TLR is chi-squared on one degree of freedom.

Signed square root of TSigned square root of TLRLR

• This test asymptotically has N(0,1)

• Including the sign of the difference between the two gives an indication of the direction of any departure from the hypothesised value

Multivariate version of the signed Multivariate version of the signed sqrt lik. ratiosqrt lik. ratio

• We test just one component of when all others are kept at some specified value

• We calculate a set of tests by varying each component of about 0

Example: mussels data validation Example: mussels data validation of of 00=(0.5,0,0.5,0,0)=(0.5,0,0.5,0,0)

• Purpose to validate in a multivariate way 1=0.5 for the first variable

• To form the likelihood ratio test we need an estimator = (1, …, v) found by maximization only over 1.

• The other parameters keep their values in 0. (In this example 0,0.5,0,0)

1 takes the 5 standard values of (-1,-0.5,0,0.5,1)

Example: validation of Example: validation of 11

• We perform 5 independent FS with0=(-1, 0,0.5,0,0)0=(-0.5, 0,0.5,0,0)0=(0, 0,0.5,0,0)0=(0.5, 0,0.5,0,0)0=(-1, 0,0.5,0,0)• We monitor for each search the signed

square root likelihood ratio test

Version for multivariate data of the Version for multivariate data of the signed sqrt LR testsigned sqrt LR test

j is the parameter under test S

j is one of the 5 standard values of 0j is the vector of parameter values in which j

takes one of the 5 standard values S while the other parameters keep their value in 0

• One plot for each j j =1, …, v

Mussels data: validation of Mussels data: validation of 00=(0.5,0,0.5,0,0)=(0.5,0,0.5,0,0)

Forward lik. ratio for Forward lik. ratio for HH00::=(1/3,1/3,1/3,0,0)=(1/3,1/3,1/3,0,0)

Mussels data: spm (transf. obs.)Mussels data: spm (transf. obs.)

Monitoring MD before transformingMonitoring MD before transforming

Monitoring MD after transformingMonitoring MD after transforming

Minimum MD before and after Minimum MD before and after transformingtransforming

The transformation has separated the outliers from the bulk of the

data.

Gap before and after transformingGap before and after transforming

ConclusionsConclusions

• This was an example of our approach to finding a mult. transformation in the presence of potential influential obs. and outliers.

• Procedure: start the search with untransformed data to suggest a transformation and repeat the analysis until you find an acceptable transformation.

• In this example only 3 searches were necessary to find a transformation which is stable for all the search, any changes being at the end.

ExercisesExercises

Exercise 1Exercise 1

• The next slide gives two sets of bivariate data. Which of the two has to be transformed to achieve bivariate normality?

• Consider a forward search in which you monitor the likelihood ratio test for the hypothesis of no transformation. Describe the plot you would expect to get for each of the two sets of data.

Two sets of simulated bivariate Two sets of simulated bivariate datadata