Engine Knock Detection Based on Computational Intelligence ...
Towards a Detection Theory For Intelligence Applications
Transcript of Towards a Detection Theory For Intelligence Applications
Towards a Detection Theory For Intelligence Applications
Stephen AhearnJim FerryDarren Lo Aaron Phillips
http://www.metsci.com
2
Motivation
Classical detection theory has been developed and applied for decades to detect and track stealthy targets– Submarines, aircraft, missiles, …– Using a variety of sensors: sonar, radar, IR, …– Gives U.S. a distinct advantage over adversaries
We seek to develop an analogous theory of detection on networks– To detect and track threat networks and activities
that cannot be observed directly– Exploiting diverse data: SIGINT, HUMINT, IMINT …– To maintain our advantage over today’s adversary
and win the GWOT
3
Classical Detection Theory Applications
Wind
Current
( )( ) ( ) ( )
( ) ( ) ( )
T Tt t t t t t t t
t t t T Tt t t t
P x P x x x z dxx z
P P x x z dxX
X
++Δ +Δ
+Δ +
∅ + ΛΛ =
∅∅ + ∅ Λ
∫∫
, 1 ,( , ( , ( )))i t i t ty F y map G Yη+ =
( ) ,SZ S H ∪ ε=( )
( )
2
( )2
( )( )
( )
(1 ) if and ,
( | ) 0 if and ,
(1 ) if .
n
n e G
e Ge G HS
S
e G
p p S N H G
L Z G S S N H G
p p S N−
−−⎧⎪⎪ − ≠ ⊆⎪⎪⎪⎪= = ≠⎨⎪⎪⎪ − =⎪⎪⎪⎩
( ) ( ) *
( ) ( ) *
( 1) ( ; )
( ) ( 1) ( ; )
S YSH GS
S
e G e K m
H L G
e G e L mH n
H L G
c L p
X K c L p
∈⊆
−
⊆ ⊆
−
⊆ ⊆
−
Λ=−
∑ ∑
∑
Detection TheoryDetection Theory
Detection theory enables detection and tracking of stealthy targets– Such targets cannot be detected by
analyzing sensor reports separately– Only through principled data fusion
does the signal stand out from noise– The probabilistic framework correctly
manages uncertainty and risk
4
Data of InterestVast Sea of Data
AA #77(Pentagon)
AA #11(WTC North)
UA #93(Pennsylvania)
UA #175(WTC South)
SalemAlhamzi
HaniHanjour
NawafAlhmazi
KhalidAlmihdharMajed
MoqedAhmedAlnami
SaeedAlghamdi
HamzaAlghamdi
AhmedAlghamdi
AbdulAziz Alomari
MohamedAtta
Adapted from “Connecting the Dots --Tracking Two Identified Terrorists,”Valdis Krebs, 2001.
Detection Theory for Intelligence Applications
COMINT
HUMINT
ELINT
IMINT
MASINT
Sig
nal p
roce
ssin
g, d
ata
colle
ctio
n
( )( ) ( ) ( )
( ) ( ) ( )
T Tt t t t t t t t
t t t T Tt t t t
P x P x x x z dxx z
P P x x z dxX
X
++Δ +Δ
+Δ +
∅ + ΛΛ =
∅∅ + ∅ Λ
∫∫
, 1 ,( , ( , ( )))i t i t ty F y map G Yη+ =
( ) ,SZ S H ∪ ε=( )
( )
2
( )2
( )( )
( )
(1 ) if and ,
( | ) 0 if and ,
(1 ) if .
n
n e G
e Ge G HS
S
e G
p p S N H G
L Z G S S N H G
p p S N−
−−⎧⎪⎪ − ≠ ⊆⎪⎪⎪⎪= = ≠⎨⎪⎪⎪ − =⎪⎪⎪⎩
( ) ( ) *
( ) ( ) *
( 1) ( ; )
( ) ( 1) ( ; )
S YSH GS
S
e G e K m
H L G
e G e L mH n
H L G
c L p
X K c L p
∈⊆
−
⊆ ⊆
−
⊆ ⊆
−
Λ=−
∑ ∑
∑
Detection TheoryDetection Theory
Data transformation
Detect and track networks
Detect and track activities
Assess threat levels and intent
Detection Theory
Detection Theory
5
Approach: Two well-established theories
Likelihood Ratio Detection and Tracking– Explicitly models noise and signal– Principled Bayesian framework for managing uncertainty– Used for decades for tracking stealthy kinematical targets– Traditional domain has metric structure
Random graph theory– Models transactional domain– Discrete structure– Rich mathematics
6
Likelihood Ratio Detection and Tracking (LRDT)
Requirements– State space– Measurement model L(z | x) (or L(z | x) = L(z | x) / L(z | Ø) )– Motion model PT(xt | xt-Δt)
Result – Update equations: probability form — P(xt)
– Update equations: likelihood ratio form — Λ(xt) = P(xt) / P(Ø)
( ) ( ) ( )Tt t t t t t t tP x P x x P x dx−
−Δ −Δ −Δ= ∫
{ }= ∪ ∅X X
( ) ( ) ( )1t t t tP x L z x P x
C−=
( ) ( ) ( )Tt t t t t t t tx P x x x dx−
−Δ −Δ −ΔΛ = Λ∫ ( ) ( ) ( )t t t tx z x x−Λ = ΛL
7
Peak value over target track ~ 107
Second highest peak ~ 10
Time
Cumulative likelihood ratio surface (fusing data using motion model)
Time
Radar return data –Measurement likelihood ratio surfaces
Distance from radar
Classical Likelihood RatioDetection and Tracking (LRDT)
Motion model describes movement of periscopeFusion over time smoothes out random fluctuations from noise and clutterLR peaks accumulate on movement that fits the motion model (i.e., the periscope)LR peaks dissipate for structures which move in other ways (e.g., waves)
8
t = 5t = 20t = 10t = 15t = 25t = 30t = 35
LRDT on Networks: a Simple Example
( ) 0P x = ( ) 1P x =
t = 1
Ground truth position ofcell highlighted in purple
State space: all 1140 possible triangles (i.e., “terrorist cells”)Measurement model:– Signal: the cell appears with probability 0.8– Noise: each possible edge appears independently with probability 0.3
Motion model: cell swaps out a member with probability 0.1
9
Overview
1. Graph-theoretic Underpinnings– Graph-theoretic analogues of noise and signal
2. Tracking Plans in Networks– Extension of LRDT to transactional domain
3. Hierarchical Hypothesis Management– Novel methodologies required to mitigate combinatorial
explosion of state space
4. References
10
1. Graph-theoretic Underpinnings
Erdős-Rényi Random Graph Model G(n,p)– Provides noise model for detection problem– Simplest, most tractable random graph model
Inserted subgraph problem– Provides signal model for detection problem
Likelihood ratio– Optimal decision statistic for detection of inserted subgraph
Distribution of subgraph countOther random graph models?
11
Erdős-Rényi Random Graph Model G(n,p)
The notation G(n,p) denotes a random graph…– on n vertices– with each edge appearing independently with probability p
Very simple noise process model– No correlation structure– Well-studied, but still yields difficult problems– First case to explore before moving on to more realistic network
models (Random Collaboration, Geometric, Gaussian, etc.)
Instance of G(n,p) for n = 6, p = 0.5
12
Inserted Subgraph Problem
Evidence graph J– Background noise Erdős-Rényi random graph G(n,p).– Target graph H may or may not be inserted somewhere.
Binary decision problem: Is H present or not?Neyman–Pearson lemma: Likelihood ratio
is the optimal decision statistic, i.e., yields highestprobability of detection for given false alarm rate.
Theorem [Mifflin, Boner]:
( | )( )( | )HP J HJ
P J HΛ =
presentnot present
( )( )[ ]H
HH
X J H JJX H
Λ = =E
# copies of in # copies of expected just from noise
13
Noise Process
Signal + Noise Process
People, Companies, Ports, Groups, Etc. Target
Graph H
H could represent four shipments of precursor items to four distinct, but linked entities
P(J | )P(J | + )
ΛH (J) =
Likelihood Ratio
optimal decision statistic
Inserted Subgraph Problem
14
Likelihood Ratio Calculations
Example 1: n = 100, p = 0.07, XH(J) = 200– Are the 200 copies of H likely to have arisen by chance?– Answer:
Example 2: n = 1000, p = 0.007, XH(J) = 2000– Are the 2000 copies of H likely to have arisen by chance?– Answer:
Need information about distribution of XH to set thresholds and establish performance boundaries!– Expected value of XH easy.– But that’s all that’s easy about it!
( ) 200( ) 0.226 1[ ] 883.1H
HH
X JJX
Λ = = = <E
: probably just noise
( ) 2000( ) 174.8 1[ ] 11.4H
HH
X JJX
Λ = = =E
: probably contains target(s)
H =
15
Expected Value of Subgraph Count XH
E[XH]: average # of subgraphs in an instance of G(n,p)– Some preliminary notation
• v(H) = number of vertices of H• e(H) = number of edges of H• |Aut(H)| = number of automorphisms of H
– Simple formula for E[XH] (Erdős):
( )( )![ ]( ) | Aut( ) |
e HH
n v HX pv H H⎛ ⎞⎟⎜= ⎟⎜ ⎟⎟⎜⎝ ⎠
E
# of arrangements of H on this vertex set
# of choices for vertex set of H
probability of all e(H)edges of H appearing
|Aut(H)| = 2 :
H =v(H) = 4
e(H) = 4
4
4
4![ ]4 2
1806
for
H
nX p
pn
⎛ ⎞⎟⎜= ⎟⎜ ⎟⎟⎜⎝ ⎠
==
E
16
Distribution of Subgraph Count XH
Example: distribution of XH for n = 6, p = 0.5– Minimum possible value = 0 (probability = 18.1%), – Maximum possible value = 180 (probability = 0.003%)– Mean given by Erdős formula:
– Variance:
[ ] 4180 11.25HX p= =E
[ ] ( )( ) ( )24 2 3var 180 1 1 13 50 128 14.2HX p p p p p= − + + + =
H =
Instance ofG(n,p) forn = 6, p = 0.5
copies of862022010118
…0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627
0.025
0.05
0.075
0.1
0.125
0.15
0.175
( )
( ) ( )6 7 2
Pr
90 1 4
4.67% 0.5
exactly 8 copies of
for
H
p p p
p
=
− + =
=P
roba
bilit
y
XH
average = 11.25
17
Asymptotic Estimate of Variance
Theorem [Ferry]:– Decompose H into core cr(H) and rooted trees Ti
– Color each node i of core by isomorphism class of Ti
– Then
where
and B is computed by an exact recursive formula.
1( )
cr( )
var, ;H
i iG i V HH
XB T T d O n
X π
π ⎛ ⎞⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎟⎜⎝ ⎠
⎡ ⎤⎢ ⎥ ⎛ ⎞ ⎛ ⎞⎢ ⎥ ⎟⎜ ⎟⎣ ⎦ ⎜⎟⎜ ⎟⎜⎟ ⎟⎜ ⎜⎟⎡ ⎤ ⎟⎜⎟⎜ ⎝ ⎠⎝ ⎠⎢ ⎥
⎢ ⎥⎣ ⎦
−
∈ ∈
= +∑ ∏E
Aut cr( ) Aut cr( )/G H Hχ⎛ ⎞ ⎛ ⎞⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎟ ⎟⎟ ⎟⎜ ⎜⎟ ⎟⎜ ⎜⎝ ⎠ ⎝ ⎠
=
New result for study of distribution of XH– Bollobás, 1981 only applies to strictly balanced graphs.– Ruciński, 1988 does not estimate variance.– Janson, Łuczak, Ruciński give a much worse estimate.
π
H
cr(H)
( ) ( ), ; , ; B d B d
( ) ( ), ; , ; B d B d ×
( ) ( ), ; , ; B d B d ×
18
Asymptotic Estimate of Variance
With example H, formula yields
For n = 106, mean degree d = 10( )
34613
6
( )![ ]( ) | Aut( ) |
10 31! 10 1.3 10768 1031
e H
H
n v H dXv H H n⎛ ⎞ ⎛ ⎞⎟⎜ ⎟⎜= ⎟ ⎟⎜ ⎜⎟ ⎟⎜⎜ ⎟ ⎝ ⎠⎝ ⎠
⎛ ⎞ ⎛ ⎞⎟⎜ ⎟⎜⎟= = ×⎜ ⎟⎜⎟ ⎟⎜⎜ ⎟⎜ ⎝ ⎠⎝ ⎠
E
[ ] [ ] [ ] [ ]18var 5.4 10H H H HX X X Xν= = ×E E 0 2 4 6 8 10
10000
1.μ109
1.μ1014
1.μ1019
1.μ1024 [ ]HXν
d
[ ][ ]
( 2 3 4 5var 1 192 2304 12960 47008 127120 276544192
H
H
Xd d d d d
X= + + + + + +
E
)
6 7 8 9
10 11 12 13
14 15 16 17
18 19 20 21 22
23 24 25
503280 784840 1066956 12786441361424 1296700 1110904 860672607208 394332 239356 13820876436 39914 19041 7875 2698729 147 18
d d d dd d d d
d d d dd d d d d
d d d
+ + + ++ + + ++ + + +
+ + + + ++ +
H
19
2. Tracking Plans in Networks
Extension of LRDT to transactional domain– Noise model: Sequence of independent instances of G(n,p)– Signal model: Pattern of inserted subgraphs
Susceptible to combinatorial explosion
20
Noise Model
Classic Erdős-Rényi random graph model G(n, p)– n=30– p=0.3
Observe L instances
G(30, 0.3)
1 2, , , LJ J J…
21
Signal Model
Insert a sequence of graphs H1, H2, …, Hm into some fixed, but unknown, location.The Hks appear for τk time steps.Each edge has a probability pV of being observed.Total Plan length: T = 40
H1
τ1 = 10
Leader
H5
τ5 = 10
H4
τ4 = 5
H3
τ3 = 10
H2
τ2 = 5
22
State Space
Must track location and internal plan time τ.Possible time states:
α indicates the plan has not yet started.ω indicates the plan has finished.State space:
indicates no plan is present.Use a diffuse prior on the state space.
{ } { },
( , )H
Hτ
τ= ∅X ∪
1 , , τ τ α τ ω≤ ≤Τ = =
H
∅
23
Size of the State Space
There are
possible locations.Possible time states:
There are 42 possible time states.State space consists of
states.
5(12,180)(42) 1 511,561 5 10+ = ≈ ×
1 40, , τ τ α τ ω≤ ≤ = =
430 3! 12,180 103 2
⎛ ⎞⎟⎜ = ≈⎟⎜ ⎟⎟⎜⎝ ⎠
24
Motion Model
Advance the probability of each state as follows:
where
( )
( )( ) ( )( ) ( )( )( ) ( )( )
1 ( , ), 1 ,
( , ), 1 1,( , ),
( , 1), 1 2, , ,
( , ), 1 ( , ), 1 .
if
if
if
if
S t P H t
S t P H tP H t
P H t
P H t P H t
α τ α
α ττ
τ τ
ω τ ω
−
⎧⎪ − − =⎪⎪⎪⎪ − =⎪⎪=⎨⎪ − − = Τ⎪⎪⎪⎪ Τ − + − =⎪⎪⎩
…
1( )1
S tL t
=− +
25
Measurement Model
Let J be an evidence graph. The likelihood function is defined by
and
where q = 1 – p and q* = qER qV.
( ) ( \ ) ( ) ( ) ( \ )* *( , ) .k k k ke J H N e J H e J H e H J
ER ERL J H p q p qτ − ∪ ∩=
( ) ( ) ( ) .e J N e JER ERL J p q −∅ =
26
Update Equation
Update the probability distribution by
where
Note that x can be or .
( ) ( ),x
C L J x P x t−
∈
=∑X
( ) ( ) ( )1, ,P x t L J x P x tC
−=
( , )H τ′ ′∅
27
Example: Plan starts at time t = 25
Embedded movie removed
28
Summary: Tracking Plans in Networks
Proof of concept: LRDT can be extended to transactional domainCan be generalized, e.g.:– May allow “tips” about possible subterfuge– May allow attributed nodes and links– Use more complicated network and plan models– May Include a clutter model
Our best attempt to create a rule-based method to “score” nodes based on links and tips observed faltered– E.g. 6 malefactors identified, 1 of which is correct– Misled by tips
Difficulty: number of states grows quickly
29
3. Hierarchical Hypothesis Management
Number of hypotheses in previous example:– Numerically feasible to compute exactly
For larger problems:– Number of hypotheses rapidly increases– Impossible to maintain all hypotheses
Solution:– Group detailed hypotheses into successively coarser ones– Maintain probabilities on coarser hypotheses
• High probability coarse hypotheses get resolved to finer levels• Low probability hypotheses perish
Example:– Coarse hypothesis: “This edge is a member of the sought pattern”– Finer hypothesis: “This set of edges is a subset of the pattern”– Finest hypothesis: “This set of edges is the sought
pattern”
55 10×
30
Feature Selection
As features become larger…– they serve better
to distinguish target from noise;
– computational intensity increases.
leve
l
1k =
11k =,…
, ,
…
3k =
…
2k =
easy calculation,poor discrimination
good discrimination,hard calculation
31
A larger problem: Find a given pattern Hinserted 20% of the time In a fixed,unknown location (out of 5.2 x 1017
possible locations) In 125 instances ofrandom noise on 100 nodes.Too hard for direct solution: in each instance,160 trillion random copies of the pattern Hobscure the real one.HHM algorithm recursively prunes hypothesis space untilfeasible to detect the target. final output is the configuration of the target pattern in the data.
Example of HHM
H =
32
HHM in Action
100 1208060400 20
100 1208060400 20
100 1208060400 20
100 1208060400 20
Pick top 719 edges out of 4950.
In these 719 are 10222 2-edge pieces.Pick top 220 of these.
Among edges in top 220 2-edge pieces are 2553 3-edge pieces.Pick top 28 of these.
1-edge pieces
2-edge pieces
3-edge pieces
4-edge pieces
Among edges in top 28 3-edge pieces, there are 145 4-edge pieces.The edges in top 17 of these form the sought pattern. Pattern found!
2-edge piece
counts
Countsof all4950 edges
Top 719
edges
Top 220
33
HHM in Action
Embedded movie removed
34
Summary: HHM
LRDT approaches can have enormous state spacesIn the classical domain, multigrid and particle methods have been developed to tame this problem– Both these rely on the existence of an underlying metric space
In the intelligence domain, state spaces often lack a metric structure, so new technology is neededStates can often be grouped into natural, user-defined hierarchies, which are then amenable to HHMKey research areas:– Optimal threshold setting, e.g. better than Monte Carlo– Feature selection, e.g. connected k-sets are more discriminating
than arbitrary k-sets.
35
4. References
J. Ferry and D. Lo, “Fusing Transactional Data to Detect Threat Patterns,” Proc. 9th International Conference on Information Fusion, Florence, Italy, July 2006. G. Godfrey, J. Cunningham and T. Tran, “A Bayesian, nonlinear particle filtering approach for tracking the state of terrorist operations,” Proceedings of the Military Applications Society Conference on Homeland Security in the 21st Century, Mystic CT, July 2006.T. Mifflin, C. Boner and G. Godfrey, “Detecting Terrorist Activities in the 21st Century: A theory of detection for transactional networks,” Emergent Information Technologies and Enabling Policies for Counter-Terrorism, eds. R. Popp and J. Yen, Wiley IEEE, June 2006.C. Boner, “Novel, Complementary Technologies for Detecting Threat Activities within Massive Amounts of Transactional Data,” Proceedings of the International Conference on Intelligence Analysis, Tysons Corner, May 2005.C. Boner, “Automated Detection of Terrorist Activities through Link Discovery within Massive Databases,” Proceedings of the AAAI Spring Symposium on AI Technologies for Homeland Security, Palo Alto, March 2005.T. Mifflin, C. Boner, G. Godfrey and J. Skokan, “A random graph model of terrorist transactions,” Proceedings of the IEEE Aerospace Conference, Big Sky, March 2004.