Investigation of Treatment of Influential Values

38
Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch

description

Investigation of Treatment of Influential Values. Mary H. Mulry Roxanne M. Feldpausch. Outline. Current practices Methods investigated Results Next steps. Influential Observation. - PowerPoint PPT Presentation

Transcript of Investigation of Treatment of Influential Values

Page 1: Investigation of Treatment of Influential Values

Investigation of Treatment of Influential Values

Mary H. Mulry

Roxanne M. Feldpausch

Page 2: Investigation of Treatment of Influential Values

Outline

• Current practices

• Methods investigated

• Results

• Next steps

Page 3: Investigation of Treatment of Influential Values

Influential Observation

An observation is considered influential if its weighted contribution has an excessive effect on the estimate of the total (Chambers et al 2000)

Page 4: Investigation of Treatment of Influential Values

The Data - U.S. Monthly Retail Trade Survey

• Collect sales and inventories• Monthly survey of about 12,500 retail

business with paid employees• Sample selected every 5 years

– Sample is stratified based on industry and sales

– Quarterly sample of births– Deaths are removed

Page 5: Investigation of Treatment of Influential Values

The Data

• Analysis done at published NAICS level

• Hidiroglou-Berthelot algorithm ran on the data before looking for influential values

• Horvitz-Thompson estimator

Page 6: Investigation of Treatment of Influential Values

Causes of Influential Units

• One time or rare event

• Erroneous measure of size

• Change in the make-up of the unit

• Seasonal Businesses

Page 7: Investigation of Treatment of Influential Values

Current Practices

• Analyst review an effect listing of micro level data and investigates units that may be influential

• When the analyst determines a correctly reporting unit may be influential, the case is referred to a statistician

Page 8: Investigation of Treatment of Influential Values

Current Practices

• One time influential value– Imputation

• Recurring influential value– Weight adjustment based on the principles

of representativeness– Moving the unit to a different industry

when the nature of the business changes

Page 9: Investigation of Treatment of Influential Values

Goals

• To improve upon current methodology by making it more objective and rigorous

• To find methodology that uses the observation but in a manner that assures its contribution does not have an excessive effect on the total

Page 10: Investigation of Treatment of Influential Values

Assumptions

• Influential observations occur infrequently, but are problematic when they appear.

• The influential observation is true, although unusual. It is not the result of a reporting or coding error.

Page 11: Investigation of Treatment of Influential Values

Strategy

Identify candidate methodologies and test with real data from one industry (about 700 businesses) for a month that contains an influential value

Page 12: Investigation of Treatment of Influential Values

Evaluation Criteria

• Number of influential observations detected, including the number of true and false detections made

• Estimate of bias

• Impact on month-to-month change

Page 13: Investigation of Treatment of Influential Values

Notation

where

Yi is the sales for the i-th business in a survey sample of size n

wi is the sample weight for the i-th unit

Xi is the previous month’s sales for the ith business

i

n

iiYwY

1

ˆ

Page 14: Investigation of Treatment of Influential Values

Methods Examined

• Weight trimming

• Reverse calibration

• Winsorization

• Generalized M-estimation

Page 15: Investigation of Treatment of Influential Values

Weight Trimming

• Does not identify influential units

• Adjusts the weight of the observation

Page 16: Investigation of Treatment of Influential Values

Weight Trimming

• Truncate the weight of the influential observation

• Adjust the weights of the non-influential observations to account for the remainder of the truncated weight

• Sum of the new weights is the same as the sum of the original weights

(Potter 1990)

Page 17: Investigation of Treatment of Influential Values

Weight Trimming Notes

• Calculations were done within sample stratum.

• Choice of correction factor could be investigated. We arbitrarily chose ci=wi/3.

Page 18: Investigation of Treatment of Influential Values

Reverse Calibration

• Does not identify influential units

• Adjusts the value of the observation

Page 19: Investigation of Treatment of Influential Values

Reverse Calibration

1. Use a robust estimation method to estimate the total

2. Modify the influential observations to achieve that total

(Chambers and Ren 2004)

Page 20: Investigation of Treatment of Influential Values

Winsorization

• Identifies influential units

• Adjusts the value of the observation

Page 21: Investigation of Treatment of Influential Values

Winsorization

Type I

Type II

otherwiseY

KYKY

i

ii

,

*,

otherwiseY

KYKYKY

i

iiw

ii

,

1*

),(

Page 22: Investigation of Treatment of Influential Values

Winsorization – Defining K

• Define a separate Kh for each stratum in a manner than minimizes the mse (Kokic and Bell 1994)

• Define a separate Ki for each observation in a manner that minimizes the mse (Clarke 1995)

Page 23: Investigation of Treatment of Influential Values

Winsorization – Defining K

• Use unweighted data to define Kh for each stratum where Kh = h +2sh

• Use weighted data to define Kh for each stratum where Kh = h +2sh where h and sh are based on the weighted data

Page 24: Investigation of Treatment of Influential Values

Winsorization-Our Implementation

Used a robust regression in SAS to estimate the parameters needed in the calculations

Page 25: Investigation of Treatment of Influential Values

M-estimation

M-estimators are robust estimators that come from a generalization of maximum likelihood estimation

Page 26: Investigation of Treatment of Influential Values

M-estimation

• Identifies influential units

• Adjusts either the weight or the value of the influential observation

Page 27: Investigation of Treatment of Influential Values

M-estimation

Used a weighted M-estimation technique that is able to modify the weights or the values of the influential observations (Beaumont and Alavi 2004)

Page 28: Investigation of Treatment of Influential Values

Results

Page 29: Investigation of Treatment of Influential Values
Page 30: Investigation of Treatment of Influential Values
Page 31: Investigation of Treatment of Influential Values
Page 32: Investigation of Treatment of Influential Values

Number of Outliers Detected

Weight trimming 1*Winsor by stratum 51Winsor by obs 1Winsor +2s 0Winsor wgt +2s 4Reverse Calibration 1*M-estimation obs 1M-estimation wgt 1

*Method does not detect outliers, one outlier was specified

Page 33: Investigation of Treatment of Influential Values

Replacement Values (in Millions)

*Weight trimming adjusts the other 18 weights in the stratum **Winsor wgt +2s identified 3 other values

Value WeightWeighted

Valueprevious month 0.6 55 31current month 7.5 55 413Weight trimming* 7.5 18 135Winsor by obs 4.0 55 220Winsor wgt +2s ** 1.6 55 87M-estimation obs 4.3 55 234M-estimation wgt 7.5 30 225

Page 34: Investigation of Treatment of Influential Values

Total Sales for the IndustryTotal

(billions)Month-to-month percent change

previous month 42.4current month 38.6 -9.1weight trimming 38.3 -9.7Winsor by obs 38.5 -9.5Winsor wgt +2s 38.2 -9.9M-estimation obs 38.4 -9.5M-estimation wgt 38.4 -9.5

Page 35: Investigation of Treatment of Influential Values
Page 36: Investigation of Treatment of Influential Values
Page 37: Investigation of Treatment of Influential Values

Chosen for Further Study

• Winsorization by each observation

• M-estimation by observation

• M-estimation by weight

Page 38: Investigation of Treatment of Influential Values

Contact Information

[email protected]

[email protected]