Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch.

Upload
erinmalone 
Category
Documents

view
214 
download
0
Transcript of Investigation of Treatment of Influential Values Mary H. Mulry Roxanne M. Feldpausch.
Investigation of Treatment of Influential Values
Mary H. Mulry
Roxanne M. Feldpausch
Outline
• Current practices
• Methods investigated
• Results
• Next steps
Influential Observation
An observation is considered influential if its weighted contribution has an excessive effect on the estimate of the total (Chambers et al 2000)
The Data  U.S. Monthly Retail Trade Survey
• Collect sales and inventories• Monthly survey of about 12,500 retail
business with paid employees• Sample selected every 5 years
– Sample is stratified based on industry and sales
– Quarterly sample of births– Deaths are removed
The Data
• Analysis done at published NAICS level
• HidiroglouBerthelot algorithm ran on the data before looking for influential values
• HorvitzThompson estimator
Causes of Influential Units
• One time or rare event
• Erroneous measure of size
• Change in the makeup of the unit
• Seasonal Businesses
Current Practices
• Analyst review an effect listing of micro level data and investigates units that may be influential
• When the analyst determines a correctly reporting unit may be influential, the case is referred to a statistician
Current Practices
• One time influential value– Imputation
• Recurring influential value– Weight adjustment based on the principles
of representativeness– Moving the unit to a different industry
when the nature of the business changes
Goals
• To improve upon current methodology by making it more objective and rigorous
• To find methodology that uses the observation but in a manner that assures its contribution does not have an excessive effect on the total
Assumptions
• Influential observations occur infrequently, but are problematic when they appear.
• The influential observation is true, although unusual. It is not the result of a reporting or coding error.
Strategy
Identify candidate methodologies and test with real data from one industry (about 700 businesses) for a month that contains an influential value
Evaluation Criteria
• Number of influential observations detected, including the number of true and false detections made
• Estimate of bias
• Impact on monthtomonth change
Notation
where
Yi is the sales for the ith business in a survey sample of size n
wi is the sample weight for the ith unit
Xi is the previous month’s sales for the ith business
i
n
iiYwY
1
ˆ
Methods Examined
• Weight trimming
• Reverse calibration
• Winsorization
• Generalized Mestimation
Weight Trimming
• Does not identify influential units
• Adjusts the weight of the observation
Weight Trimming
• Truncate the weight of the influential observation
• Adjust the weights of the noninfluential observations to account for the remainder of the truncated weight
• Sum of the new weights is the same as the sum of the original weights
(Potter 1990)
Weight Trimming Notes
• Calculations were done within sample stratum.
• Choice of correction factor could be investigated. We arbitrarily chose ci=wi/3.
Reverse Calibration
• Does not identify influential units
• Adjusts the value of the observation
Reverse Calibration
1. Use a robust estimation method to estimate the total
2. Modify the influential observations to achieve that total
(Chambers and Ren 2004)
Winsorization
• Identifies influential units
• Adjusts the value of the observation
Winsorization
Type I
Type II
otherwiseY
KYKY
i
ii
,
*,
otherwiseY
KYKYKY
i
iiw
ii
,
1*
),(
Winsorization – Defining K
• Define a separate Kh for each stratum in a manner than minimizes the mse (Kokic and Bell 1994)
• Define a separate Ki for each observation in a manner that minimizes the mse (Clarke 1995)
Winsorization – Defining K
• Use unweighted data to define Kh for each stratum where Kh = h +2sh
• Use weighted data to define Kh for each stratum where Kh = h +2sh where h and sh are based on the weighted data
WinsorizationOur Implementation
Used a robust regression in SAS to estimate the parameters needed in the calculations
Mestimation
Mestimators are robust estimators that come from a generalization of maximum likelihood estimation
Mestimation
• Identifies influential units
• Adjusts either the weight or the value of the influential observation
Mestimation
Used a weighted Mestimation technique that is able to modify the weights or the values of the influential observations (Beaumont and Alavi 2004)
Results
Number of Outliers Detected
Weight trimming 1*Winsor by stratum 51Winsor by obs 1Winsor +2s 0Winsor wgt +2s 4Reverse Calibration 1*Mestimation obs 1Mestimation wgt 1
*Method does not detect outliers, one outlier was specified
Replacement Values (in Millions)
*Weight trimming adjusts the other 18 weights in the stratum **Winsor wgt +2s identified 3 other values
Value WeightWeighted
Valueprevious month 0.6 55 31current month 7.5 55 413Weight trimming* 7.5 18 135Winsor by obs 4.0 55 220Winsor wgt +2s ** 1.6 55 87Mestimation obs 4.3 55 234Mestimation wgt 7.5 30 225
Total Sales for the IndustryTotal
(billions)Monthtomonth percent change
previous month 42.4current month 38.6 9.1weight trimming 38.3 9.7Winsor by obs 38.5 9.5Winsor wgt +2s 38.2 9.9Mestimation obs 38.4 9.5Mestimation wgt 38.4 9.5
Chosen for Further Study
• Winsorization by each observation
• Mestimation by observation
• Mestimation by weight