Locating the Most Relevant Identification Information in ... · 21 It can be shown that the...

43
Locating the Most Relevant Identification Locating the Most Relevant Identification Information in Spectral Cubes and Information in Spectral Cubes and Similar Large Data Sets Similar Large Data Sets Robert K. McConnell WAY-2C Arlington, Massachusetts [email protected] http://www.way2c.com WAY WAY- 2C 2C

Transcript of Locating the Most Relevant Identification Information in ... · 21 It can be shown that the...

Page 1: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

Locating the Most Relevant Identification Locating the Most Relevant Identification Information in Spectral Cubes and Information in Spectral Cubes and

Similar Large Data SetsSimilar Large Data Sets

Robert K. McConnellWAY-2C Arlington, [email protected]://www.way2c.com

WAYWAY--2C2C

Page 2: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

2

Typical Object Detection StepsTypical Object Detection Steps

1. Choose the target class(es). 2. Choose attributes to be sensed.3. Acquire training data set (e.g. image).4. Train the system.5. Acquire a test data set.6. Classify the test data set.7. Locate blobs representing the

class(es) of interest.WAYWAY--2C2C

Page 3: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

3

Typical Object Detection StepsTypical Object Detection Steps

1. Choose the target class(es). 2. Choose attributes to be sensed.3. Acquire training data set (e.g. image).4. Train the system5. Acquire a test data set.6. Classify the test data set.7. Locate blobs representing the

class(es) of interest.WAYWAY--2C2C

Page 4: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

4

Example: hyperspectral “cube” Example: hyperspectral “cube”

WAYWAY--2C2C

Data Band n

Data Band 2

Data Band 1

...

Lake Training Region

Forest Training Region

Field Training Region

Find optimum data subset for classification

Page 5: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

5

Classification ObjectivesClassification Objectives

WAYWAY--2C2C<5 bits/pixel

??

Training regions

ASAP

>1K bits/pixel

Page 6: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

6

acquired bands and moreacquired bands and more

WAYWAY--2C2C

HSI bandsother

“bands”derived “bands”

Page 7: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

7

Example: “Green Chips” Example: “Green Chips” Hyperspectral CubeHyperspectral Cube

WAYWAY--2C2C

Courtesy Resonon, Inc.

• 80 bands• 12 bits A/D/pixel/band • 16 total bits/pixel/band• ~1K bits/pixel

Page 8: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

8

“True Color” Image From Cube“True Color” Image From Cube

WAYWAY--2C2C

64240R55025G45810B

nmBand

8 Classes8 ClassesRequired:Required:Band combination to differentiate reliably.Band combination to differentiate reliably.

Image Courtesy Resonon, Inc.

“Green Chips”

Page 9: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

9

“High Contrast” Image From Cube“High Contrast” Image From Cube

WAYWAY--2C2C

76460*R74557*G50117*B

nmBand

8 classes can be differentiated reliably.8 classes can be differentiated reliably.

*some high order bits discarded

Page 10: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

10

Pixel by pixel interpretation of high Pixel by pixel interpretation of high contrast “intermediate” image.contrast “intermediate” image.

WAYWAY--2C2C

Page 11: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

11

Maximum likelihood classification Maximum likelihood classification based on ~30K pixel regions.based on ~30K pixel regions.

WAYWAY--2C2C

Page 12: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

12

Not all bands are equally Not all bands are equally relevant.relevant.

WAYWAY--2C2C

Page 13: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

13

Band 60Band 60 PSPA=67.88% PSPA=67.88%

WAYWAY--2C2C Base Image Courtesy Resonon, Inc.

Page 14: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

14

Band 00Band 00 PSPA=14.01% PSPA=14.01%

WAYWAY--2C2C Base Image Courtesy Resonon, Inc.

Page 15: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

15

Band 79Band 79 PSPA=27.19% PSPA=27.19%

WAYWAY--2C2C Base Image Courtesy Resonon, Inc.

Page 16: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

16

Entropy as a measure of Entropy as a measure of uncertaintyuncertainty

WAYWAY--2C2C

Page 17: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

ClassificationClassification→→

WAYWAY--2C2C

Before

A DCB

Pro

babi

lity

Class

“In mind’s eye”

*uncertainty

2bits=

After

A D

B

C

Pro

babi

lity

Class

uncertainty0 bits→

*2log ( )

J

j jj

u p p= −∑

Page 18: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

18

Not all positions within Not all positions within “band” are equally relevant.“band” are equally relevant.

WAYWAY--2C2C

Page 19: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

19

Most Relevant Information: Most Relevant Information: Data Window UncertaintyData Window Uncertainty

WAYWAY--2C2C

“Noise”Padding Relevant Information

10100 00110 110 000

UN

CER

TAIN

TY(E

NTR

OP

Y -B

ITS

)

0.0

3.0

0 10

1 2 9876543 13

12

11WINDOW SHIFT (BITS)

Data Window

Page 20: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

20

Objective: Find optimum Objective: Find optimum combination of “bands” and combination of “bands” and data “windows” within those data “windows” within those

bands to differentiate the bands to differentiate the classes of interest.classes of interest.

WAYWAY--2C2C

Page 21: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

21

It can be shown that the relevant information in the data set D for differentiating the class set C(known as the mutual information between C and D) is given by

relevance = H(C)+ H(D) - H(C&D) (3)

where:H(C) is the entropy of the class probability distribution,H(D) is the entropy of the data distribution andH(C&D) is the entropy of the joint distribution of C&D.

relevancerelevance

WAYWAY--2C2C

Page 22: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

22

It can also be shown that information required to completely specify the class once the data is known, must be given by

uncertainty = H(C&D)- H(D) (6)

= 0 when C and D are perfectlycorrelated and

= H(C) for no correlation between C and D.

uncertainty uncertainty

WAYWAY--2C2C

Maximizing relevance minimizes uncertainty.

Page 23: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

23

Finding Most Relevant Data Finding Most Relevant Data to Construct to Construct

Intermediate ImageIntermediate Imagefor Classificationfor Classification

WAYWAY--2C2C

Page 24: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

24

Training RegionsTraining Regions

WAYWAY--2C2C Base Image Courtesy Resonon, Inc.

Page 25: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

25

0

0.5

1

1.5

2

2.5

3

3.5

0 10 20 30 40 50 60 70 80 90

Band Number

Info

rmat

ion

(bits

)

ClassDataRelevant

WAYWAY--2C2C

Most relevant window: Most relevant window: measured single band measured single band

information contribution.information contribution.

S=5

Page 26: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

26

Image constructed from Image constructed from single most relevant band single most relevant band

and window only.and window only.

WAYWAY--2C2C

76460*R

nmBandPlane

relrel = 2.318 bits= 2.318 bitsPSPA = 67.88%PSPA = 67.88%

S=5

Page 27: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

27

0

1

2

3

4

5

6

0 10 20 30 40 50 60 70 80 90

Band Number

Info

rmat

ion

(bits

)

ClassData

Relevent

WAYWAY--2C2C

Relevance: (band & window) Relevance: (band & window) pair information contribution pair information contribution

(band 60 window fixed)(band 60 window fixed)

S=7

Page 28: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

28

Image constructed from Image constructed from most relevant band pair.most relevant band pair.

WAYWAY--2C2C

76460*R50117*B

nmBandPlane

relrel = 2.742 bits= 2.742 bitsPSPA = 91.07%PSPA = 91.07%

Page 29: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

29

2

3

4

5

6

7

0 10 20 30 40 50 60 70 80 90

Band Number

Info

rmat

ion

(bits

)

ClassDataRelevant

WAYWAY--2C2C

Relevance: (band & window) Relevance: (band & window) triplet information contribution triplet information contribution (bands 60,17 windows fixed)(bands 60,17 windows fixed)

S=7

Page 30: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

30

Intermediate image from most Intermediate image from most relevant (band & window) triplet.relevant (band & window) triplet.

WAYWAY--2C2C

76460*R74557*G50117*B

nmBandPlane

relrel = 2.844 bits= 2.844 bitsPSPA = 97.74%PSPA = 97.74%

Page 31: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

31

0

0 .1

0 .2

0 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

0 10 20 30 40 50 60 70 80

B and N umb e r

Color 1Color 2Color3Color 4Color 5Color 6Color 7Background

1

3

2

Automatically Selected BandsAutomatically Selected Bands

WAYWAY--2C2C

Page 32: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

32

Now, using only optimum data subset:Now, using only optimum data subset:

1. Construct an “intermediate” image.2. Train using the same class training

regions as for locating that subset.3. Perform maximum likelihood

interpretations on this and similar images based only on the optimum data subset.

WAYWAY--2C2C

Page 33: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

33

Pixel by pixel interpretation of Pixel by pixel interpretation of most relevant band triplet image.most relevant band triplet image.

WAYWAY--2C2C

Page 34: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

34

Pixel by pixel class # 1 extractionPixel by pixel class # 1 extraction

WAYWAY--2C2C

Page 35: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

35

Blob location and processing.Blob location and processing.

WAYWAY--2C2C

Page 36: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

36

“True Color” TERRAIN Image“True Color” TERRAIN Image

WAYWAY--2C2C

thick grass

road

thin grass

trees

bare soilHYDICE

bands 49,35,15

Page 37: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

37

Terrain: Mean Spectra and Terrain: Mean Spectra and Selected BandsSelected Bands

WAYWAY--2C2C0

0.05

0.1

0.15

0.2

0.25

0 50 100 150 200

'road'

'bare soil'

'thin grass'

'thick grass'

'trees' 1

2

3

Page 38: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

38

“Most Relevant” TERRAIN Image“Most Relevant” TERRAIN Image

WAYWAY--2C2C

thick grass

road

thin grass

trees

bare soilHYDICE

bands 199,52,17

> 98% data > 98% data reduction from reduction from

original original hyperspectral hyperspectral

cube.cube.

Page 39: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

39

Interpreted TERRAIN ImageInterpreted TERRAIN Image

WAYWAY--2C2C

thick grass

road

thin grass

trees

bare soilBased on HYDICE

bands 199,52,17

uncertainty = 0.48 bitsPSPA = 71.8%

Page 40: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

40

We define the PSPA, or predicted single pixel agreement (with training) by:

log(PSPA) = H(D) - H(C&D) (7)

Predicted Agreement Predicted Agreement

WAYWAY--2C2C

Page 41: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

41

“Thin Grass”“Thin Grass”Training Region DetailTraining Region Detail

thick grass

road

thin grasstree

bare soil

Assigned training regions may not be “pure”, leading to system accuracy underestimation.WAYWAY--2C2C

Page 42: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

42

“Thin Grass”“Thin Grass”Training Region ClassificationTraining Region Classification

WAYWAY--2C2C

thick grass

road

thin grasstree

bare soil

System may correctly classify but assumes it’s wrong, thus underestimating its own accuracy.

Page 43: Locating the Most Relevant Identification Information in ... · 21 It can be shown that the relevant information in the data set D for differentiating the class set C (known as the

Locating the Most Relevant Identification Locating the Most Relevant Identification Information in Spectral Cubes and Information in Spectral Cubes and

Similar Large Data SetsSimilar Large Data Sets

WAYWAY--2C2C

EndEnd

Robert K. McConnellWAY-2C [email protected]

Technology AvailableUS Patent 8,918,347