Post on 31-Mar-2015
Plans to improve estimators to better utilize panel data
John Coulston
Southern Research Station
Forest Inventory and Analysis
Background and Motivation
• Symposium session on combining panel data: Recommendation– “…any serious attempt at defining an
estimation system for analysis of changes and trends over time must explicitly account for time in the assumed underlying model…adopt and encourage an inferential model for FIA that places time on an equal footing with area…”
• Putting the “A” back in FIA – Clutter 2006.
Examples and approach
• Forest area change in Georgia from 1998-2007
• Spatial realization of forest age structure in Alabama in 2007
• Use an appropriate technique for the question posed
Why reinvent the wheel?
• Some analytical alternatives to Bechtold and Patterson 2005 for the annual forest inventory– Mixed estimator (Van Deusen 1999, 2002)
• Current estimates – flexible underlying trend
– Mixed model (Smith & Conkling 2005)• Current estimates and significance of annual change – linear
trend
– Random Forest ( Breiman 2001, Crookston & Finley 2008)
• Machine learning approach to classification and regression. Implemented in temporal map based estimation.
Is there a trend in Georgia forest area
from 1998-2007#
# #
# ##
##
###
#
#
# #
#
#Southwest
Southeast
CentralNorth Central
North
Mixed Estimator
ttt ey
equation)n transitioaby (described over timet coefficien random
error randomt independen
tat time splot value ofmean
t
t
t
e
y
Mixed Model
Stratified Estimate
h stratumin valuelevel-plotmean y
example)for NLCD n,informatio
sensedremotely with defined (typicallyh stratum of weight
valuelevel-plotmean
h
1
h
h
h hh
w
y
ywy
Is there a trend in Georgia forest area
from 1998-2007#
# #
# ##
##
###
#
#
# #
#
#Southwest
Southeast
CentralNorth Central
North
year
Pro
po
rtio
n F
ore
st
0.50
0.55
0.60
0.65
0.70
0.75
1998 2000 2002 2004 2006
Central North
1998 2000 2002 2004 2006
North Central
South East
1998 2000 2002 2004 2006
0.50
0.55
0.60
0.65
0.70
0.75
South West
mixed estimationmixed modelsimple random samplestratified estimation
Example: forest area trends in GA 1998-2007
Example: forest area trends in GA 1998-2007
year
S.E
.(P
rop
ort
ion
Fo
rest
)
0.010
0.015
0.020
0.025
1998 2000 2002 2004 2006
Central North
1998 2000 2002 2004 2006
North Central
South East
1998 2000 2002 2004 2006
0.010
0.015
0.020
0.025
South West
mixed estimationmixed modelsimple random samplestratified estimation
Typical “sampling error” approach
0.51
0.515
0.52
0.525
0.53
0.535
0.54
0.545
0.55
2000 2001 2002 2003 2004 2005
year
Prop
ortio
n Fo
rest
Hypothesis:H0: Δpf=0H1: Δpf≠0
Approach:Sampling errors overlap so no significant change.
Issues:Type II errors;Failure to leverage repeated measures
Explicitly testing for change
• If trend is “sufficiently linear” then the mixed model can be used to test
• HO: b1 = 0• H1: b1 ≠ 0
Unit b1 t-value Prob > t
Southeast -0.05% -0.730 0.466Southwest 0.13% 1.094 0.274
Central 0.00% -0.004 0.997North Central -0.31% -2.489 0.013
North -0.21% -2.401 0.017
year
Pro
po
rtio
n F
ore
st
0.50
0.55
0.60
0.65
0.70
0.75
1998 2000 2002 2004 2006
North
1998 2000 2002 2004 2006
North Central
mixed estimationmixed modelsimple random samplestratified estimation
•Recall the mixed model: b1 is the slope (change in y over time).
Example 2: Spatial realization of forest age structure in Alabama in 2007
• Using a time-series on Landsat images identify the disturbance year and magnitude for each pixel.
• Calibrate the disturbance year and magnitude information to FIA age class information based on:
Cjz=f(Xz,Yz,Mz(j-d),(j-d)z,Fjz)Where cjz is the age class for location z at time j.Xz=longitude of location zYz=latitude of location zMz(j-d)=magnitude of last disturbance in year j-d at location z.
(j-d)z=the number of years since the last disturbance at
location z.Fjz=land cover in year j at location z.
Random Forest AlgorithmLearning algorithm
Each tree is constructed using the following algorithm:
1. Let the number of training cases be N, and the number of variables in the classifier be M.
2. We are told the number m of input variables to be used to determine the decision at a node of the tree; m should be much less than M.
3. Choose a training set for this tree by choosing N times with replacement from all N available training cases (i.e. take a bootstrap sample). Use the rest of the cases to estimate the error of the tree, by predicting their classes.
4. For each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set.
5. Each tree is fully grown and not pruned (as may be done in constructing a normal tree classifier).
Accuracy of Age Class Map
0-8 9-16 17-24 25+ non-forest0-8 233 46 10 35 37
FIA data 9-16 51 323 42 101 4717-24 22 34 192 117 21
25+ 38 30 45 1265 16non-forest 29 29 17 103 1710
OverallUser's accuracy 62% 70% 63% 78% 93% 81%
Random Forest model
Conclusions• No one technique could answer the two question posed• Use the appropriate methodology or combination of methodologies
to address your question.• From the examples, time should be explicitly accounted for when
doing trend analysis or making “current” estimates.• Leverage the longitudinal (repeated measure) data when possible. • The temporally indifferent method currently used by FIA does
generally provide estimates with smaller standard error. However, it is not a current estimate and the estimate should be tied to the approximate mid-point of the cycle – not the end year.
• All demonstrated techniques run using the R statistical package which can be directly linked to either internal oracle tables or FIADB.