Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A....

Groundwater3D Geological Modeling: Solving as Classification Problem with Support Vector Machine

A. Smirnoff, E. Boisvert, S. J.Paradis

Earth Sciences Sector

Groundwater

Groundwater

Objectives

• Find an algorithm for automating the 3D modeling procedure from sparse data

• Test the algorithm on available data• Make conclusions about its applicability

Groundwater

Possible Input Data

• Well data• Surface geology maps• Cross-section data• Can be used alone or in combination

Groundwater

Algorithms Currently in Use and Their Limitations

• Voronoi diagrams• Potential fields• Normally require too much information

and/or additional procedures• What if we only have a few sections to start

with?

Groundwater

3D Reconstruction as a Classification Problem

Unit 1

Unit 2

Reconstruction Space

• Given a set of points in 3D with known geological information• For the rest of points in reconstruction space, information is not available• Based on known points, classify the rest into known number of units (classes)

Groundwater

Available Classification Methods• Bayesian classification

– a priory knowledge of probabilities • Nearest-Neighbor classifiers

– extremely sensitive to parameter choice and scaling• Decision trees

– not flexible with many samples• Neural networks

– slow and difficult to use• Support Vector Machine (SVM)

– relatively new method– becoming more and more popular

Groundwater

SVM Algorithm

• Input: Take a set of training samples with known features and classes

• Model: Build a model (boundary) separating the training samples

• Output: Classify any new (unclassified) or test samples using the model

Groundwater

1. Original2. Training set

3. Output

X

ZY

Binary Reconstruction

Groundwater

Input Data and Results

• Total points: 389235• Training Set: 17452 (4.48%) - 2 units on 11 sections• Points to be classified: 371783

Input Data:

Results:• Total classified: 371783• Success: 361909 (97.34%)• Failure: 9874 (2.66%)

Groundwater

Detailed Analysis (Class 1)

0

10

20

30

40

50

60

70

80

90

100

0 100 240

All Model Sections

Success Rate (%)

Section 2Section 1 Section 3 Section 4 Section 5 Section 6 Section 7 Section 8 Section 9 Section 10 Section 11

Training Sections10 2302202102001901801701601501401301201109080706050403020

Groundwater

Peeking into the SVM Black Box• A simple case: two classes and two features (e.g.,

length of petal and sepal in flowers)• Training Set: known data vectors : xi, where i = 1, …., l

Training Records (i)

Class Labels (yi)Data Vector (xi)

Feature 1 Feature 2

1 1 2 4 2 -1 5 3 3 1 6 8 … … … … l -1 7 3

GroundwaterMaximum Margin Separating Hyperplane

(MMSH)

123456789

10

Feature 1

Feature 2

1 2 43 65 8 9 107

Maximum Margin

1/2

1/2

1/2

Support Vectors

123456789

10

Feature 1

Feature 2

1 2 43 65 8 9 107

Class: +1Class: +1

Class: -1Class: -1

1< 3< 2

1

2 3

• Linearly separable data• Which linear separator is the best?• V.Vapnik (1995) suggested maximum margin

Groundwater

Hard Margin Classification-HMSH

• If wTx+b = 0 is separating hyperplane:

• Decision function: f(x) = sign(wTx+b), x is a test sample

10

10

if ybxw

if ybxw

iiT

iiT

x2

x3

x1

xl

xi

xixixi

xixi

xi

xi

xi

wwTT x

+ b

x + b

= 0= 0xi

Class: +1Class: +1

Class: -1Class: -1wwTT x xii + b + b < 0< 0

wwTT x xii + b + b >> 00

123456789

10

Feature 1

Feature 2

1 2 43 65 8 9 107

HMSH

Groundwater

How to Maximize the Margin?

• For wTx+b = 0 consider a pipe defined by:

• Then: or yi (wTxi+b) 1

• Maximize distance between: wTx+b 1

11

11

if ybxw

if ybxw

iiT

iiT

1

1

bxw

bxwT

T

x2

Maximize Distance

1 2 43 65 8 9 107

123456789

10

Feature 1

Feature 2

x3

x1

xl

xi

xixixi

xixi

xi

xi

xi

wwTT x

+ b =

x + b

= +1+1

xi

Class: +1Class: +1

Class: -1Class: -1wwTT x xii + b + b < -1< -1

wwTT x xii + b + b >> +1+1

wwTT x

+ b =

x + b

= -1-1

wwTT x

+ b

x + b

= 0= 0

Groundwater

Problem Formulation

w

1

1 2 43 65 8 9 107

123456789

10

Feature 1

Feature 2x3

x1

xlx2

xi

xixixi

xixi

xi

xi

xi

wwTT x

+ b=

x + b=

+1+1

xi

wwTT x

+ b=

x + b=

-1-1

wwTT x

+ b

x + b

= 0= 0

• Or:

• Quadratic optimization problem

• Solution exists

1)(

2

1 2

bxwy with

w minimize

iT

i

1)(

2

bxwy with

wminimize or w

maximize

iT

i

w

2• Distance between : wTx+b 1 is given as:

• Then:

Groundwater

Soft Margin Classification - SMSH

• Data are noisy, not easily separable • Allow classification errors by introducing slack variable: • Support vectors: ones with distance ½ from SMSH + misclassified ones

• Thus:

• Where C – cost or penalty parameter

),...,2,1( 1)(

),...,2,1( 2

1 2

libxwy with

liCw minimize

iiT

i

i

xi

xi

1 2 43 65 8 9 107

123456789

10

Feature 1

Feature 2x3

x1

xlx2

xixixi

xixi

xi

xixi

i

xi

i

xi

SMSH

SMSH H

MSH

HM

SH

Support Vectors

Groundwater

Non-Separable Data• Data are separable or separable with some noise – no problem

(HMSH or SMSH)

• What if data is not linearly separable in data space?

• Find a function to re-map data into a higher-dimensional space (feature space) where it is separable e.g., xR1 -> R2

0 x

0 x

f(x)

Class: +1Class: +1 Class: -1Class: -1

Groundwater

Non-Linear SVM

1. Problem

Data (Input) Space R1

0 x

Class: -1Class: -10

Class: +1Class: +1

Class: +1Class: +13. Solution

f(x)

x

Feature Space R2: (x) = (x, x2)

0

x2

x2. Solution

Class: +1Class: +1

Class: -1Class: -1

Groundwater

Kernel Trick• How to find the function in more complicated situation?• We do not need to explicitly know the function!• Formulation and solution of optimization problem use only inner

products of vectors

• Kernel function inner product of some function in its feature space

• Thus the final decision function is:f(x) = Σαiyi K(xi, x) + b

K(xi,x)= φ(xi)Tφ(x)

f(x) = Σ Tx + b (i weighing factors i >0 only for support vectors)

iw

iii xy

Groundwater

Kernel Functions

• Known kernel functions: linear, polynomial, radial-basis function (RBF), etc.

• The RBF is the most general form of kernel:

• The decision function then:

• The only adjustable kernel parameter is

2|||| xxie K(xi,xj) =

f(x) = Σαiyi + b2|||| xxie

Groundwater

How Did We Use SVM?

e.g., C=2-8, 2-7, …, 215; = 2-15, 2-14, …, 212

• Using geological units as classes• Using X, Y, Z coordinates as features• Using non-linear SM SVM with RBF kernel• Using LIBSVM from National University of Taiwan• Only two parameters to control: C and • Selecting parameters is a black art, done on try and see

basis• Simple grid search with validation is recommended

Groundwater

C and Grid Search

lg (C)

121110 9 8 7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9-10-11-12-13-14-15

lg ()

Proposed Range: C=2-3- 215; = 24- 29

9 8 7 6 5 4

-3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

lg (C)

lg ()

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

- Best Binary Result (97.79% at C=21, =26)

- Previous Example (97.34%)

All Experiments:

GroundwaterInfluence of

C and Low C, High

Low C, Low High C, Low

High C, High

Avg C, Avg

Groundwater

Multi-Class Classification

X

Z

Y

1 - Organic

2 - Littoral

3 - Clay

4 - Esker

5 - Till

6 - Bedrock

1. Original2. Training set

3. Output

Groundwater

Data Statistics and Results

Class To Classify Training Set % of Total Success %

1. Organic 1162 48 0.01 18.76

2. Littoral 3626 193 0.05 37.20

3. Clay 12667 628 0.16 57.10

4. Esker 19305 995 0.26 67.65

5. Till 15118 747 0.19 45.72

6. Bedrock 319905 14841 3.81 95.45

Groundwater

Success per Class

0102030405060708090

100

0.01 0.1 1 10

Training Points per Class (%)

Success (%)Organic

LittoralTill

ClayEsker

Bedrock

1

Groundwater

Area and Volume Comparison

1.00E+08

1.00E+09

1.00E+07 1.00E+08 1.00E+09

Original

Rec

onst

ruct

ed

1.00E+07

Bedrock

EskerTill

Clay

LittoralOrganic

Area

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

1.00E+07 1.00E+08 1.00E+09 1.00E+10 1.00E+11

Original

Rec

onst

ruct

ed

Bedrock

EskerTillClay

LittoralOrganic

Volume

Groundwater

Conclusions

• The SVM can successfully be used in single and multi-unit 3D geological reconstructions:– Reasonable results are obtained with just a few training sections– Parameters must be picked from the range: C=2-3- 215; = 24- 29

– Low C values - less details, more generalized model– High C values - more details, less generalized model– Additional Experiments Demonstrated:

• Number of units can vary (all units must be represented in training set)• Sections can be arbitrarily located• Other types of information (well data, surface geology maps) can be used

Groundwater

References

• Abe, S., 2005. Support Vector Machines for Pattern Classification.

Springer-Verlag, London, 343 pp.

• Cristianini, N., Shawe-Taylor, J., 2000. Support Vector Machines. Cambridge University Press, 189 pp.

• Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 311 pp.

Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A....

Documents

Transcript of Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A....