Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter...

36
Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate. edu Nov 15, 2013

Transcript of Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter...

Page 1: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 1 of 31

Properties of a kNN tree-list imputation strategy for prediction of

diameter densities from lidar

Jacob L [email protected]

Nov 15, 2013

Page 2: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 2 of 31

Note

• “Diameter Density” in this context is referring to the probability density function– Proportion of trees in a diameter class (dcl)

p(d)

dcl (cm)

Page 3: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 3 of 31

Please!

• Share your critiques• It will help the manuscript

Page 4: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 4 of 31

Overview

• Conclusion• Context• kNN Tree List – some background• Study objectives• Indices of diameter density prediction

performance• Results• Conclusion Revisited

Page 5: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 5 of 31

Conclusion

• kNN diameter density estimation with LiDAR was comparable with or superior (precision) to a Post-stratification approach with 1600 variable radius plots– Equivalent: Stratum, Tract– Superior: Plot, Stand

• Mahalanobis with k=3, lidar P30 and P90 metrics worked well

• Stratification did not help – may be due to sample size (~200)

Page 6: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 6 of 31

Aside: Brief Survey1. Who uses diameter distributions in day to day work?

2. For distribution users: Inventory type? - Stand, Stratum, 2-stage, lidar …

3. Approach? – parametric, non-parametric

4. Sensitivity to noise in distribution? – Very, not very, what noise

5. What measure of reliability do you use for diameter information?• Index of fit • P-value• None• CIs for bins• Other

p(d)

dcl (cm)

Page 7: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 7 of 31

Study Context• Lidar approaches can support many applications in forest

inventory and monitoring

But

- Diameter densities are required for forestry applications- Lidar literature (on diameters) unclear on performance

• Problems:– Performance measures: p-values & indices* – No comparisons with traditional approaches– No Asymptotic properties*I am OK, with indices, but the suggested indices may not be enough

Lidar x

Fiel

d-D

eriv

ed y

Page 8: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 8 of 31

kNN – a flexible solution

• Multivariate• Conceptually simple• Works well with some response variables• Realistic answers (can’t over-extrapolate)

• Can impute a tree list directly (kNN TL)– No need for theoretical distribution

Page 9: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 9 of 31

KNN weaknesses

• Error statistics often not provided• Sampling inference not well described in

literature• People don’t understand limitations in results• Can’t extrapolate• Imputed values may be noisier than using

mean…• Poorer performance than OLS (NLS) usually

Page 10: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 10 of 31

kNN TL Imputation

Impute: Substitute for a missing value

1. Measure X everywhere (U)

2. Measure Y on a sample (s)

3. Find distance from s to U• In X space – height, cover, etc.

4. Donate y from sample to nearest (X space) neighbors– Bring distance-weighted tree list

Auxiliary Data

=.75

=.25

Plot Color = x values

=.75

f(.75)

.25)

Forest (e.g.)

Page 11: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 11 of 31

kNN Components

• k (number of neighbors imputed)• Distance metric (Euc., Mah., MSN, RF)• Explanatory variables– Age, Lidar height, lidar cover, FWOF (modeled)

• Response variables (only for MSN and RF)– Vol, BA, Ht, Dens., subgroups (> 5 in., > …)

• Stratification – dominant species group (5) – Hardwood, Lobl. Pine, Longl. Pine, Slash P.,

Page 12: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 12 of 31

Distance Metrics

yaImpute documentation:

“Euclidean distance is computed in a normalized X space.”

“Mahalanobis distance is computed in its namesakes space.”

“MSN distance is computed in a projected canonical space.”

“randomForest distance is one minus the proportion of randomForest trees where a target observation is in the same terminal node as a reference observation”

I assume this means shifted and

rescaled.

normalized

Page 13: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 13 of 31

Study Objectives

• Enable relative, absolute, comparative inference for diameter density prediction

• Contrast kNN and TIS performances

• Evaluate kNN strategies for diameter density prediction

TIS

“Traditional” inventory system

Page 14: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 14 of 31

“Enable relative, absolute, comparative inference”

• I will argue that we have already settled on some excellent measures of performance:– Coefficient of determination (R2)– Root mean square error (RMSE)– Standard error (sample based estimator of sd of

estimator)• Very convenient for inference• Straight forward to translate to diameter

densities…

Page 15: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 15 of 31

Indices – Residual Computation

• Computed with Leave One Out (LOO) cross-validation

• LOO cross-validation 1. Omit one plot2. Fit model3. Predict omitted plot4. Compute error metric (observed vs predicted)5. Repeat n-1 times

After LOO cross-validation

6. Compute indices from vector of residual

Page 16: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 16 of 31

Proposed Indices – index I

• Similar to coefficient of determination– Relative inference

plots. allfor j classdiameter in density mean

iplot on j classdiameter in density predicted ˆ

iplot on j classdiameter in density observed

bindiameter given a j

plotgiven a

ˆ

1Iindex 2

2

ij

ij

ij

i jijj

i jijij

d

d

d

i

dd

dd

Variability around population density

Variability of predictions around observed densities

Page 17: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 17 of 31

Proposed Indices – index K

• Similar to model RMSE– absolute (and comparative) inference

plots. sample ofnumber 1 n

plotgiven a

ˆ

Kindex

i

2

in

ddi j

ijij

Page 18: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 18 of 31

Proposed Indices – index kn

• Similar to standard error (estimated sd of estimator)– comparative inference

n. size of samples a fromestimator density afor E[K]

size sample

k

increasesn as k

n

n

n

n

K

nn

K

K

Page 19: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 19 of 31

Why these indices

• Index I – Intuitive inference: how much variation did we explain– Doesn’t work well when comparing 2 designs…

• Index K – an absolute measure of prediction performance that to

compare models from different sampling designs

• Index kn – Look at asymptotic estimation properties with different

designs and modeling strategies

Page 20: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 20 of 31

Study Area

• Savannah River Site – South Carolina– 200 k acres & wall to wall lidar– ~200 FR plots (40 trees / plot on average)– 1600 VR plots (10 trees / plot on average)

Page 21: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 21 of 31

FR Design

• 200 Fixed radius 1/10th or 1/5th acre plots• Distributed across size and species groups• Survey-grade GPS positioning

Page 22: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 22 of 31

Traditional Inventory System (TIS)“Traditional” –i.e. a fairly common approachDesign:• ~200K acres of forest on Savannah River Site• 1607 Variable Radius Plots ~gridded• Post-stratification on field measurements

<Best-case scenario for reference method>– Height– Cover– Dominant Species Group->63 Strata

• 7000+ Stands (~30 acres each)• Serves as baseline or reference approach

– Lots of people familiar with its performance

Page 23: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 23 of 31

Results

1. Compare kNN with TIS• Plot• Stratum• Stand• Tract

2. kNN components • K & distance metric• predictors• responses• stratification

Page 24: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 24 of 31

Results: Point /Plot

• kNN performance >> TIS performance– Reasonable result– kNN can vary with lidar height & cover metrics– Single density within a stratum for TIS

14.0

48.0

kNN

TIS

K

K

K = Quasi RMSE(smaller is better)

Page 25: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 25 of 31

Results Stratum: Setup

• 63 Strata• 200 FR plots• ~ 3 FR plots / stratum• Stratum-level kNN

performance:

Single Stratum

3

14.0k

14.0

3

kNNK

Page 26: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 26 of 31

Results Stand: Setup

• 7000+ Stands• 200 FR plots• ~ 0 FR plots / stand• No asymptotic

properties• Stand-level kNN

performance:

Stands w/in Single Stratum

14.0kNNK

Page 27: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 27 of 31

KkNN

TIS vs kNN

Tract performances (kn) were equivalent for kNN and TIS

n

K

K

K

kNN

TIS

nk

14.0

48.0

kn = Quasi Standard Error (smaller is better)

K = Quasi RMSE(smaller is better)

Stratum Level Performance (63 TIS Strata)

*Stand* level performance (7000+ stands)

Page 28: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 28 of 31

Tract

• Equivalent performance kNN and TIS– kn TIS: 0.12

– kn kNN: 0.10

Page 29: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 29 of 31

kNN strategy Components

Page 30: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 30 of 31

New Index

• Index I– Similar to coefficient of determination (R2)– Closer to 1.0 is better

Page 31: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 31 of 31

kNN: k & distance metric

1 3 5 10 15 200.45

0.50

0.55

0.60

0.65

0.70

0.75

0.80

Euc.Mah.MSNRF

k

Inde

x I

Page 32: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 32 of 31

kNN: Predictors

P30, P90

P30, P90, a

ge

P30, P50, P

90, FW

OF, age

P30, P90, F

WOF

P30, P50, P

90, age

P30, P90, c

over(1.50)

P30, P90, c

over(1.50), F

WOF

P30, P50, P

90, cover(1

.50), FW

OF, age

P30, P50, P

90, cover(1

.50), age

P90, age

P90, FW

OF

P30, age

P30, FW

OF

P90, cover(1

.50)

P30, cover(1

.50)0.450.500.550.600.650.700.750.800.85

Euc.Mah.MSNRF

Inde

x I

Best Performing Worst Performing

Page 33: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 33 of 31

kNN: Responses

0.55

0.60

0.65

0.70

0.75

0.80

0.85

MSNRF

Inde

x I

Best Performing Worst Performing

Page 34: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 34 of 31

kNN: Stratification

all (n

=190)

hardwood (n

=176)

conife

r (n=176)

Loblolly

pine (n=151)

Wate

r oak

(n=102)

Sweetgu

m (n=85)

Longle

af pine (n

=79)

Black c

herry (n

=71)

Snag

(n=66)

Laurel o

ak (n

=62)

Mock

ernut h

ickory

(n=54)

Blackg

um (n=54)

Post oak

(n=51)

0.3

0.4

0.5

0.6

0.7

0.8

un-strati-fied

stratified

Inde

x I

Large n Small n

Page 35: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 35 of 31

Conclusion - Revisited

• kNN diameter density estimation with LiDAR is comparable with or superior (precision) to a Post-stratified approach with variable radius plots– Equivalent: Stratum, Tract– Superior: Plot, Stand

• Mahalanobis with k=3, lidar P30 and P90 metrics worked well

• Stratification did not help – may be due to sample size (~200)

Page 36: Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Jacob.Strunk@oregonstate.edu.

Slide Number 36 of 31

Thank you!

• Any questions? Comments? Suggestions?

• I am planning to submit a manuscript in December