Outlier Selection and One Class Classification by Jeroen Janssens

Post on 27-Jan-2015

121 views 1 download

Tags:

description

I present a novel algorithm called Stochastic Outlier Selection (SOS). The SOS algorithm computes for each data point an outlier probability. These probabilities are more intuitive than the unbounded outlier scores computed by existing outlier-selection algorithms. I have evaluated SOS on a variety of real-world and synthetic datasets, and compared it to four state-of-the-art outlier-selection algorithms. The results show that SOS has a superior performance while being more robust to data perturbations and parameter settings.

Transcript of Outlier Selection and One Class Classification by Jeroen Janssens

...

Outlier Selection andOne-Class Classification

.

Jeroen Janssens @jeroenhjanssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Overview

• Anomalies and outliers

• Stochastic Outlier Selection

• Experiments and results

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Anomalies and outliers

Outlier Selection and One-Class Classification Jeroen Janssens

..

..

.

.

.

.

.

..

.

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Definition (Anomaly)

An anomaly is an observation or event that deviates qualitatively from what isconsidered to be normal, according to a domain expert.

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Detecting anomalies is important

• Expensive

• Dangerous

• Mess up your model

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Detecting anomalies is important

• Expensive

• Dangerous

• Mess up your model

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Human anomaly detection may suffer from

• Fatigue

• Information overload

• Emotional bias

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Computers work with numbers

.

....6

.8

.10

. 4.

6

.

8

.

10

.

width

.

height

.

. ..apple

. ..orange.

Visualisation

.

width height label8.4 7.3 apple6.7 7.1 orange8.0 6.8 apple7.4 7.2 apple9.6 9.2 orange .

….

….

….

Data points

..

Observations

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Computers work with numbers

.

....6

.8

.10

. 4.

6

.

8

.

10

.

width

.

height

.

. ..apple

. ..orange.

Visualisation

.

width height label8.4 7.3 apple6.7 7.1 orange8.0 6.8 apple7.4 7.2 apple9.6 9.2 orange .

….

….

….

Data points

..

Observations

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Computers work with numbers

.....6

.8

.10

. 4.

6

.

8

.

10

.

width

.

height

.

. ..apple

. ..orange.

Visualisation

.

width height label8.4 7.3 apple6.7 7.1 orange8.0 6.8 apple7.4 7.2 apple9.6 9.2 orange .

….

….

….

Data points

..

Observations

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

From anomaly to outlier

.....

speed over ground

.

rateofturn

.

. ..anomalous vessel

. ..normal vessel

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Definition (Outlier)

An outlier is a data point that deviates quantitatively from the majority of thedata points, according to an outlier-selection algorithm.

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Three standard deviations

.....−3σ

.−2σ

.−1σ

.1σ

.2σ

.3σ

...

x

.

. ..outlier

. ..inlier

..

.. 3σ..

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

..

Euler diagram

.

Set legend

.

Set notation

.

real-world observations

.

X.X.(a)

.

A

.

X

.

(b)

.

labeled by expert as anomalous

.

labeled by expert as normal

.

A

.

X ∩A

.

(c)

.

data points

.

unrecorded

.

D

.

X ∩D

.

A

.

D

.

X

.(d) .

anomalies represented as data points

. normalities represented as data points.

CA = D ∩A. CN = D ∩A.A . D.

X.

(e)

.

classified by algorithm as an outlier

.

classified by algorithm as an inlier

.

CO

.

CI = D ∩ CO

.

A

.

D

.

CO

.

X

.

(f )

.

hits

.

misses

.

false alarms

.

correct rejects

.

H= CA ∩ CO

.

FA = CN ∩ CO

.

M= CA ∩ CI

.

CR = CN ∩ CI

.

A

.

D

.

CO

.

X

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

..

Euler diagram

.

Set legend

.

Set notation

.

real-world observations

.

X.X.(a) .

A

.

X

.

(b)

.

labeled by expert as anomalous

.

labeled by expert as normal

.

A

.

X ∩A

.

(c)

.

data points

.

unrecorded

.

D

.

X ∩D

.

A

.

D

.

X

.(d) .

anomalies represented as data points

. normalities represented as data points.

CA = D ∩A. CN = D ∩A.A . D.

X.

(e)

.

classified by algorithm as an outlier

.

classified by algorithm as an inlier

.

CO

.

CI = D ∩ CO

.

A

.

D

.

CO

.

X

.

(f )

.

hits

.

misses

.

false alarms

.

correct rejects

.

H= CA ∩ CO

.

FA = CN ∩ CO

.

M= CA ∩ CI

.

CR = CN ∩ CI

.

A

.

D

.

CO

.

X

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

..

Euler diagram

.

Set legend

.

Set notation

.

real-world observations

.

X.X.(a) .

A

.

X

.

(b)

.

labeled by expert as anomalous

.

labeled by expert as normal

.

A

.

X ∩A

.

(c)

.

data points

.

unrecorded

.

D

.

X ∩D

.

A

.

D

.

X

.(d) .

anomalies represented as data points

. normalities represented as data points.

CA = D ∩A. CN = D ∩A.A . D.

X.

(e)

.

classified by algorithm as an outlier

.

classified by algorithm as an inlier

.

CO

.

CI = D ∩ CO

.

A

.

D

.

CO

.

X

.

(f )

.

hits

.

misses

.

false alarms

.

correct rejects

.

H= CA ∩ CO

.

FA = CN ∩ CO

.

M= CA ∩ CI

.

CR = CN ∩ CI

.

A

.

D

.

CO

.

X

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

..

Euler diagram

.

Set legend

.

Set notation

.

real-world observations

.

X.X.(a) .

A

.

X

.

(b)

.

labeled by expert as anomalous

.

labeled by expert as normal

.

A

.

X ∩A

.

(c)

.

data points

.

unrecorded

.

D

.

X ∩D

.

A

.

D

.

X

.(d) .

anomalies represented as data points

. normalities represented as data points.

CA = D ∩A. CN = D ∩A.A . D.

X

.

(e)

.

classified by algorithm as an outlier

.

classified by algorithm as an inlier

.

CO

.

CI = D ∩ CO

.

A

.

D

.

CO

.

X

.

(f )

.

hits

.

misses

.

false alarms

.

correct rejects

.

H= CA ∩ CO

.

FA = CN ∩ CO

.

M= CA ∩ CI

.

CR = CN ∩ CI

.

A

.

D

.

CO

.

X

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

..

Euler diagram

.

Set legend

.

Set notation

.

real-world observations

.

X.X.(a) .

A

.

X

.

(b)

.

labeled by expert as anomalous

.

labeled by expert as normal

.

A

.

X ∩A

.

(c)

.

data points

.

unrecorded

.

D

.

X ∩D

.

A

.

D

.

X

.(d) .

anomalies represented as data points

. normalities represented as data points.

CA = D ∩A. CN = D ∩A.A . D.

X.

(e)

.

classified by algorithm as an outlier

.

classified by algorithm as an inlier

.

CO

.

CI = D ∩ CO

.

A

.

D

.

CO

.

X

.

(f )

.

hits

.

misses

.

false alarms

.

correct rejects

.

H= CA ∩ CO

.

FA = CN ∩ CO

.

M= CA ∩ CI

.

CR = CN ∩ CI

.

A

.

D

.

CO

.

X

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

..

Euler diagram

.

Set legend

.

Set notation

.

real-world observations

.

X.X.(a) .

A

.

X

.

(b)

.

labeled by expert as anomalous

.

labeled by expert as normal

.

A

.

X ∩A

.

(c)

.

data points

.

unrecorded

.

D

.

X ∩D

.

A

.

D

.

X

.(d) .

anomalies represented as data points

. normalities represented as data points.

CA = D ∩A. CN = D ∩A.A . D.

X.

(e)

.

classified by algorithm as an outlier

.

classified by algorithm as an inlier

.

CO

.

CI = D ∩ CO

.

A

.

D

.

CO

.

X

.

(f )

.

hits

.

misses

.

false alarms

.

correct rejects

.

H= CA ∩ CO

.

FA = CN ∩ CO

.

M= CA ∩ CI

.

CR = CN ∩ CI

.

A

.

D

.

CO

.

X

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Confusion matrix

..

..hitHi

..false alarmFA

..missMi

..correct rejectCR

.

Anomaly (CA)

.

Normality (CN)

.Outlier(

CO)

.Inlier(

CI)

.

Expert labels the observation as a(n)

.Algo

rithm

classi

esthedatapo

intasa

n

Outlier Selection and One-Class Classification Jeroen Janssens

...

....

Data set

.

. ..data point

..

....

Labels by the expert

.

. ..anomaly. ..normality

.. .....

....

Classi cations by the algorithm

.

. ..inlier. ..outlier

..

....

Outcome

.

. ..hit. ..false alarm. ..miss. ..correct reject

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Stochastic Outlier Selection

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Stochastic Outlier Selection

• Unsupervised outlier selection algorithm

• Employs concept of affinity

• Computes outlier probabilities

• One parameter: perplexity

• Inspired by t-SNE

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

A data point is selected as an outlier when all the other data pointshave insufficient affinity with it.

Outlier Selection and One-Class Classification Jeroen Janssens

..

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

From input to output

...

..1. 2.1.

2

.

3

.

4

.

5

.

6

...X

.. .. 0.2

.

4

.

6

.

8

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...D

.. .. 0.2

.

4

.

6

.

8

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...A

.. .. 0.0.2

.

0.4

.

0.6

.

0.8

.

1

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

.. .. 0.

0.1

.

0.2

.

0.3

....

..1.1.

2

.

3

.

4

.

5

.

6

...Φ

.. .. 0.0.2

.

0.4

.

0.6

.

0.8

.

1

...

4.2.1

.

4.2.2

.

4.2.3

.

4.2.4 – 6

.

n

.m

.

Subsections:

.n

.n

.n

.1

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Demo: SOS on thecommand-line

(see http://sos.jeroenjanssens.com)

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

From feature matrix to dissimilarity matrix

...........

d2,6 ≡ d6,2 =5.056

.

x1

.

x2

.

x3

.

x4

.

x5

.

x6

.0.

1.

2.

3.

4.

5.

6.

7.

8.

9.0 .

1

.

2

.

3

.

4

.

rst feature x1

.

second

feature

x 2

.

. ..data point

..

..1. 2.1.

2

.

3

.

4

.

5

.

6

...X

.. .. 0.2

.

4

.

6

.

8

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...D

.. .. 0.2

.

4

.

6

.

8

...

Equation 2.1

dij =

¿ÁÁÀ

m

∑k=1(xjk − xik)2

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

From feature matrix to dissimilarity matrix

...........

d2,6 ≡ d6,2 =5.056

.

x1

.

x2

.

x3

.

x4

.

x5

.

x6

.0.

1.

2.

3.

4.

5.

6.

7.

8.

9.0 .

1

.

2

.

3

.

4

.

rst feature x1

.

second

feature

x 2

.

. ..data point

..

..1. 2.1.

2

.

3

.

4

.

5

.

6

...X

.. .. 0.2

.

4

.

6

.

8

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...D

.. .. 0.2

.

4

.

6

.

8

...

Equation 2.1

dij =

¿ÁÁÀ

m

∑k=1(xjk − xik)2

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Affinity between data points

.....0.

2.

4.

6.

8.

10.0 .

0.2

.

0.4

.

0.6

.

0.8

.

1

.

dissimilarity dij

.

affinity

a ij

.

. ..σ 2i = 0.1

. ..σ 2i = 1

. ..σ 2i = 10

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...D

.. .. 0.2

.

4

.

6

.

8

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...A

.. .. 0.0.2

.

0.4

.

0.6

.

0.8

.

1

...

Equation 4.2

aij =⎧⎪⎪⎨⎪⎪⎩

exp (−d 2ij / 2σ 2

i ) if i ≠ j0 if i = j

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Affinity between data points

.....0.

2.

4.

6.

8.

10.0 .

0.2

.

0.4

.

0.6

.

0.8

.

1

.

dissimilarity dij

.

affinity

a ij

.

. ..σ 2i = 0.1

. ..σ 2i = 1

. ..σ 2i = 10

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...D

.. .. 0.2

.

4

.

6

.

8

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...A

.. .. 0.0.2

.

0.4

.

0.6

.

0.8

.

1

...

Equation 4.2

aij =⎧⎪⎪⎨⎪⎪⎩

exp (−d 2ij / 2σ 2

i ) if i ≠ j0 if i = j

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Smooth neighborhoods

...........

σ 26

.

x1

.

x2

.

x3

.

x4

.

x5

.

x6

.−1.

0.

1.

2.

3.

4.

5.

6.

7.

8.

9.−1 .

0

.

1

.

2

.

3

.

4

.

5

.

6

.

rst feature x1

.

second

feature

x 2

.

. ..σ 21. ..σ 2

2. ..σ 2

3. ..σ 2

4. ..σ 2

5. ..σ 2

6

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

From affinity to binding probability

...

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...A

.. .. 0.0.2

.

0.4

.

0.6

.

0.8

.

1

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

.. .. 0.

0.1

.

0.2

.

0.3

...

Equation 4.5 / 4.6

bij = p (i→ j ∈ EG)∝ aij

bij =aij

∑nk=1 aik

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

From affinity to binding probability

...

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...A

.. .. 0.0.2

.

0.4

.

0.6

.

0.8

.

1

....

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

.. .. 0.

0.1

.

0.2

.

0.3

...

Equation 4.5 / 4.6

bij = p (i→ j ∈ EG)∝ aij

bij =aij

∑nk=1 aik

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Binding probabilities

..

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Binding probabilities

.

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Binding probabilities

.

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Binding probabilities

.

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

.

..

(a)

.v1.v2.

v3

.

v4

.

v5

.

v6

. .04.

.21

..24

.

.25

.

.25

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

.. .. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

...

(b)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.10

.

.21

.

.29

.

.30

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.

0.1

.

0.2

.

0.3

..

.

(c)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.10

.

.18

.

.22

.

.23

.

.24

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.05

.

0.1

.

0.15

.

0.2

.

0.25

..

.

(d)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

.04

.

.04

.

.04

.

.05

.

.05

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

..

...

..

.. 0.0.01

.

0.02

.

0.03

.

0.04

.

0.05

..

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Stochastic Neighbor Graph

G = (V,EG)

p(G) =∏i→j ∈EG

bij .

CO ∣G = {xi ∈ X ∣ deg−G(vi) = 0}= {xi ∈ X ∣ ∄vj ∈ V ∶ j→ i ∈ EG}= {xi ∈ X ∣ ∀vj ∈ V ∶ j→ i ∉ EG}

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Stochastic Neighbor Graph

G = (V,EG)

p(G) =∏i→j ∈EG

bij .

CO ∣G = {xi ∈ X ∣ deg−G(vi) = 0}= {xi ∈ X ∣ ∄vj ∈ V ∶ j→ i ∈ EG}= {xi ∈ X ∣ ∀vj ∈ V ∶ j→ i ∉ EG}

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Stochastic Neighbor Graph

G = (V,EG)

p(G) =∏i→j ∈EG

bij .

CO ∣G = {xi ∈ X ∣ deg−G(vi) = 0}= {xi ∈ X ∣ ∄vj ∈ V ∶ j→ i ∈ EG}= {xi ∈ X ∣ ∀vj ∈ V ∶ j→ i ∉ EG}

Outlier Selection and One-Class Classification Jeroen Janssens

..

(Ga)

.v1.v2.

v3

.

v4

.

v5

.

v6

.

p(Ga) = 3.931 ⋅ 10−4

.CO∣Ga = {x1, x4, x6}

.

(Gb)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

p(Gb) = 4.562 ⋅ 10−5

.

CO∣Gb = {x5, x6}

.

(Gc)

.

v1

.

v2

.

v3

.

v4

.

v5

.

v6

.

p(Gc) = 5.950 ⋅ 10−7

.

CO∣Gc = {x1, x3}

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Set of all SNGs

.....0 .

2

.

4

.

6

.

8

.

10

.

⋅10−4

..

Ga

..

Gb

..

Gc

.G

.

prob

abilitymass

.. ..0 .

0.2

.

0.4

.

0.6

.

0.8

.

1

.G

.cumulativeprob

abilitymass

.

. ..h = 4.0

. ..h = 4.5

. ..h = 5.0

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Approximating outlier probabilities by sampling SNGs

... ..100.

101.

102.

103.

104.

105.

106.0 .

0.2

.

0.4

.

0.6

.

0.8

.

1

.

sampling iteration

.

outlierprob

ability

.

. ..x1

. ..x2

. ..x3

. ..x4

. ..x5

. ..x6

p(xi ∈ CO) = limS→∞

1

S

S

∑s=1

I{xi ∈ CO ∣G(s)} , G(s) ∼ P(G)

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Demo: Sampling SNGs inCoffeeScript and D3(see http://sos.jeroenjanssens.com)

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Computing outlier probabilities through marginalisation

p(xi ∈ CO) =∑G∈G

I{xi ∈ CO ∣G} ⋅ p(G)

=∑G∈G

I{xi ∈ CO ∣G} ⋅∏q→r ∈EG

bqr .

∣G∣ = (n − 1)n

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Computing outlier probabilities in closed form

...........

x1

.

x2

.

x3

.

x4

.

x5

.

x6

.0.

1.

2.

3.

4.

5.

6.

7.

8.

9.0 .

1

.

2

.

3

.

4

.

rst feature x1

.

second

feature

x 2

.

. ..data point

..

..1. 2. 3. 4. 5. 6.1.

2

.

3

.

4

.

5

.

6

...B

.... 0.

0.1

.

0.2

.

0.3

.. ..

..1.1.

2

.

3

.

4

.

5

.

6

...Φ

.... 0.0.2

.

0.4

.

0.6

.

0.8

.

1

.. .

Equation 4.14

p(xi ∈ CO) =∏j≠i(1 − bji)

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

xi ∈ CO ∣G ⇐⇒ deg−G(vi) = 0 (1)

p(xi ∈ CO) = EG [I{deg−G(vi) = 0}] (2)

p(xi ∈ CO) = EG [∏j≠i

I{ j→ i ∉ EG}] (3)

p(xi ∈ CO) = EG [∏j≠i(1 − I{ j→ i ∈ EG})] (4)

p(xi ∈ CO) =∏j≠i(1 −EG [I{ j→ i ∈ EG}]) (5)

p(xi ∈ CO) =∏j≠i(1 − p( j→ i ∈ EG)) (6)

p(xi ∈ CO) =∏j≠i(1 − bji) (7)

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Selecting outliers

...........

x1

.

x2

.

x3

.

x4

.

x5

.

x6

.−1.

0.

1.

2.

3.

4.

5.

6.

7.

8.

9.−1 .

0

.

1

.

2

.

3

.

4

.

5

.

rst feature x1

.

second

feature

x 2

.

. ..outlier

. ..inlier

f(x) =⎧⎪⎪⎨⎪⎪⎩

outlier if p(x ∈ CO) > θ,inlier if p(x ∈ CO) ≤ θ.

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Adaptive variances via the perplexity parameter

... ..10−1

.100.

101.

102.

1.

2

.

3

.

4

.

5

.

variance σ 2i

.

perplexity

h(b i)

.. ..5

.10

.15

......

variance σ 2i

..

. ..h = 4.5

. ..x1

. ..x2

. ..x3

. ..x4

. ..x5

. ..x6

h(bi) = 2H(bi) , H(bi) = −n

∑j=1j≠i

bij log2 (bij)

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Continuous binary search

...

.............

5

.

10

.

15

..

variance

σ2 i

.

. ..x1

. ..x2

. ..x3

. ..x4

. ..x5

. ..x6..

..

0

.

1

.

2

.

3

.

4

.

5

.

6

.

7

.

8

.

9

.

10

.

4.2

.

4.4

.

4.6

.

4.8

.

binary search iteration

.

perplexity

h(b i)

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Perplexity influences outlier probabilities

.....0.

1.

2.

3.

4.

5.

6.0 .

0.2

.

0.4

.

0.6

.

0.8

.

1

.

perplexity h

.

outlierprob

ability

.

. ..x1

. ..x2

. ..x3

. ..x4

. ..x5

. ..x6

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Experiments and results

Outlier Selection and One-Class Classification Jeroen Janssens

..

..

......

...

......

...

......

...

......

...

......

...

.

Banana

.

Densities

.

Ring

.

KNNDD(k=

10)

.LO

F(k=10)

.

LOCI

.

SOS(h=10)

.

LSOD

.

A

.

B

.

C

.

D

.

E

.

H

.G .

F

.

I

.

J

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

One-class datasets

..

...

..4.

5.

6.

7.

8.0 .

1

.

2

.

3

.

Sepal length

.

Petalw

idth

.

DM = Iris ower data set

.

. ..C1 = Setosa

. ..C2 = Versicolor

. ..C3 = Virginica

......

D1

.

. ..CA = Versicolor ∪ Virginica

. ..CN = Setosa

......

D2

.

. ..CA = Setosa ∪ Virginica

. ..CN = Versicolor

.. ....

D3

.

. ..CA = Setosa ∪ Versicolor

. ..CN = Virginica

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Weighted AUC

..

.....0 .

0.5

.

1

..

outlierscore

.

. ..hit. ..false alarm. ..miss. ..correct reject

.CA

.CN

.

CO

.

CI

.

θ′.

.....0

.0.2

.0.4

.0.6

.0.8

.1

.0 .

0.2

.

0.4

.

0.6

.

0.8

.

1

.

false alarm rate

.

hitrate

.. ... 0.2.

0.4

.

0.6

.

0.8

.

1

...

θ′= 0.57

.

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Real-world datasets

...

..

Iris

.

Ecoli

.

Breast W. Original.

Delft Pump.

Waveform Generator.

Wine Recognition.

Colon Gene.

BioMed.

Vehicle Silhouettes.

Glass Identi cation.

Boston Housing.

Heart Disease.

Arrhythmia.

Haberman’s Survival.

Breast W. New.

Liver Disorders.

SPECTF Heart.

Hepatitis.

0

.

0.5

.

0.6

.

0.7

.

0.8

.

0.9

.

1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

..

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. ..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

..

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

..

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

..

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

..

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

..

weigh

tedAU

C

.

. ..SOS . ..KNNDD . ..LOF . ..LOCI . ..LSOD

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Synthetic datasets

Parameter λ

Data set Determines λstart step size λend

(a) Radius of ring 5 0.1 2(b) Cardinality of cluster and ring 100 5 5(c) Distance between clusters 4 0.1 0(d) Cardinality of one cluster and ring 0 5 45(e) Density of one cluster and ring 1 0.05 0(f ) Radius of square ring 2 0.05 0.8(g) Radius of square ring 2 0.05 0.8

Outlier Selection and One-Class Classification Jeroen Janssens

..

..

..

(a)

.......

.......

......

..

(b)

.......

.......

......

..

(c)

.......

.......

......

..

(d)

.......

.......

......

..

(e)

.......

.......

......

..

(f)

.......

.......

......

..

(g)

.......

.......

......

.

λstart

. λinter

.

λend

.

..

...anom

aly.

..normality

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Results on synthetic datasets

.. ...

..

2

.

2.5

.

3

.

3.5

.

4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(a)

....10 .20 .30 .40 .........

λ..

(b)

..

..0.1

.2 .3 .4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....λ

.

AUC

.

(c)

..

..

10

.

20

.

30

.

40

.........

λ

..

(d)

..

..

0

.

0.1

.

0.2

.

0.3

.

0.4

.

0.5

.0.6 .0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(e)

..

..

0.8

.

1

.

1.2

.

1.4

.........λ

..

(f )

..

..0.8 .1 .1.2 .1.4 .0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(g)

.

. ..SOS

. ..KNNDD

. ..LOF

. ..LOCI

. ..LSOD

.

...

..

2

.

2.5

.

3

.

3.5

.

4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(a)

....10 .20 .30 .40 .........

λ..

(b)

..

..0.1

.2 .3 .4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....λ

.

AUC

.

(c)

..

..

10

.

20

.

30

.

40

.........

λ

..

(d)

..

..

0

.

0.1

.

0.2

.

0.3

.

0.4

.

0.5

.0.6 .0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(e)

..

..

0.8

.

1

.

1.2

.

1.4

.........λ

..

(f )

..

..0.8 .1 .1.2 .1.4 .0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(g)

.

. ..SOS

. ..KNNDD

. ..LOF

. ..LOCI

. ..LSOD

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Results on synthetic datasets

.

. ...

..

2

.

2.5

.

3

.

3.5

.

4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(a)

....10 .20 .30 .40 .........

λ..

(b)

..

..0.1

.2 .3 .4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....λ

.

AUC

.

(c)

..

..

10

.

20

.

30

.

40

.........

λ

..

(d)

..

..

0

.

0.1

.

0.2

.

0.3

.

0.4

.

0.5

.0.6 .0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(e)

..

..

0.8

.

1

.

1.2

.

1.4

.........λ

..

(f )

..

..0.8 .1 .1.2 .1.4 .0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(g)

.

. ..SOS

. ..KNNDD

. ..LOF

. ..LOCI

. ..LSOD

.

...

..

2

.

2.5

.

3

.

3.5

.

4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(a)

....10 .20 .30 .40 .........

λ..

(b)

..

..0.1

.2 .3 .4

.

0.6

.

0.7

.

0.8

.

0.9

.

1

....λ

.

AUC

.

(c)

..

..

10

.

20

.

30

.

40

.........

λ

..

(d)

..

..

0

.

0.1

.

0.2

.

0.3

.

0.4

.

0.5

.0.6 .0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(e)

..

..

0.8

.

1

.

1.2

.

1.4

.........λ

..

(f )

..

..0.8 .1 .1.2 .1.4 .0.6

.

0.7

.

0.8

.

0.9

.

1

....

λ

.

AUC

.

(g)

.

. ..SOS

. ..KNNDD

. ..LOF

. ..LOCI

. ..LSOD

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

SOS performs significantly better

..1

.2

.3

.4

.5

.

cd (p < .05)

.

SOS

.

LOF

.

LOCI

.

KNNDD

.

LSOD

.

1

.

2

.

3

.

4

.

5

.

cd (p < .01)

.

SOS

.

LOCI

.

LSOD

.

LOF

.

KNNDD

Outlier Selection and One-Class Classification Jeroen Janssens

. . . . . . . . . . . .Anomalies and outliers

. . . . . . . . . . . . . . . . . . . . . . .Stochastic Outlier Selection

. . . . . . . . . . .Experiments and results

Conclusion

• Outlier selection can support the detection of anomalies

• SOS is an intuitive and probabilistic algorithm to select outliers

• SOS has a very good performance

• No free lunch

Outlier Selection and One-Class Classification Jeroen Janssens

...

Outlier Selection andOne-Class Classification

.

Jeroen Janssens @jeroenhjanssens