Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount...

26
Lab for System Informatics and Data Analytics (SIDA) Dynamic Data-Driven Adaptive Sampling and Monitoring of Big Spatial-Temporal Data Streams for Real-Time Solar Flare Detection Department of Industrial and Systems Engineering University of Wisconsin-Madison Date: 8/8/2017 1 Dr. Kaibo Liu

Transcript of Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount...

Page 1: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Dynamic Data-Driven Adaptive Sampling and

Monitoring of Big Spatial-Temporal Data Streams

for Real-Time Solar Flare Detection

Department of Industrial and Systems Engineering

University of Wisconsin-Madison

Date: 8/8/2017

1

Dr. Kaibo Liu

Page 2: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Outline

• Motivation

• State of the art

• Proposed DDDAS framework

– Data-Driven Dynamic Sampling Strategy

• Case study

• Conclusion

2

Page 3: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Motivation

3

• With the advancement of sensing technique and data collection capability, Big Data Streams have become widely available in many DoD applications.

• This provides an unprecedented opportunity to gain system-wide situational awareness through real-time anomaly detection and fault localization.

• The emerging NASA Solar Dynamics Observatory (SDO) continuously monitors the dynamic solar activities for 24 hours/7 days a week Solar flare detection

Source: NASA

• generate a high resolution image every 0.75 second

• produce 1.5 TB big data per day

Page 4: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Applications and Importance

4

– significantly affect Earth’s ionosphere, causing hours-long disruptions in radio communications

– affect GPS receivers and satellites, making it very difficult for search and rescue in a war zone

– lead to failures in large-scale power-grid with cascading effects

• The solar flare activities have a close relationship with Air Force equipment and applications.

• Real-time detection system for the solar flare by exploiting the Big Data Streams of solar images is highly desired.

Page 5: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Challenges

• Big Data Streams place critical requirements and resources constraints for data communication and processing in real time– Send only 6 images back every minute for real-time analysis given

transmissions rate 130 million bits/second

• The occurrence of solar flare is naturally– complicated (depends on the cycle and the inherent dynamics and

randomness of solar activities)

– sparse (with a small signal-to-noise ratio (SNR))

– transient (only lasts for minutes and hard to predict)

• Currently, there is a lack of efficient online monitoring scheme tailored to these unique characteristics.

5

Page 6: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Proposed DDDAS Framework

6

(a) Applications𝑡

Original Solar Image

(b) Applications modeling

Updated Solar Image

(c) Application measurement

systems and methods

Dynamic Sampling

𝑡

DDDAS

Framework

(d) Mathematical and

statistical algorithms

SPC Chart

Update

Model

Update

SPC

Update samplingSample data

• A dynamically updated

spatial-temporal

statistical model fully

characterize the

changing background

• A dynamic sampling

algorithm that

actively decides

which data streams to

observe given the

resources constraints

• A scalable and robust

SPC to effectively

combine the information

from significant data

streams to produce an

overall global

monitoring system

Page 7: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

State of the art and innovative idea

7

• Existing approaches to process monitoring

– focus on fixed sub-region (rigid spatial domain)

• assume that the locations of anomaly event are known

• fail to capture the dynamic features of solar flare events

– sample whole image at fixed frequency (rigid temporal domain)

• result in a large detection delay or miss the event

• Adaptive sampling strategy : require large amount of historical information

– Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

• Innovative idea:

random sampling fixed sampling

blue: sampled data streams

red: anomaly regions

black: overlapping

Page 8: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Problem formulation and objective

When 𝑡 < 𝜏 , the process is in control,

𝑌𝑘,𝑡~i.i.d𝑁(0,1)

When 𝑡 > 𝜏, the process is out-of-control, the mean of data streams in an unknown region 𝐶 shifts to 𝛿:

𝑌𝑘,𝑡~i.i.d𝑁 𝛿𝜒𝐶 𝒙𝑘 , 1

𝑛

𝑚

𝑘th data streamLocation 𝒙𝑘Observation 𝑌𝑘,𝑡

Change point 𝜏

𝑌1,𝑡 𝑌2,𝑡

𝑌𝑘,𝑡

𝑌𝑝,𝑡

𝐶

𝑝 = 𝑚 × 𝑛 data streams

At time 𝑡, 𝒀𝑡 = 𝑌1,𝑡, … , 𝑌𝑝,𝑡′

At each time 𝑡, only 𝑞 out of 𝑝variables can be observed.

Clustered shift Limited resources

Goal: Based on dynamic observations in real time, actively decide which data stream to observe at the next time for quick detection of anomaly event while still maintaining a system-wide false alarm rate.

8

Page 9: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

General strategy 1Use local statistics

• To propose a scalable monitoring scheme, we define two

local statistics 𝑊𝑘𝑡1,𝑊𝑘𝑡

(2)for each stream to describe the

likelihood of upward or downward shift

𝑛

𝑚 Local statistics

𝑊𝑘𝑡(1)

: the likelihood of upward shift

𝑊𝑘𝑡(2)

: the likelihood of downward shift

𝑌1,𝑡 𝑌2,𝑡

𝑌𝑘,𝑡

𝑌𝑝,𝑡

9

Page 10: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

General strategy 2Allocate observed data streams adaptively

Wide search strategy In-control: observed data streamsshould be placed evenly to observethe shifted data stream rapidly.

Deep search strategyOut-of-control: observed data streams should accumulate in suspected region.

To combine these two strategies, we separate the observable variables intotwo groups: 𝑞𝑊,𝑡 W-observations and 𝑞𝐷,𝑡 D-observations.The proportion of 𝑞𝑊,𝑡 and 𝑞𝐷,𝑡 is adjusted adaptively.

10

Page 11: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

General strategy 2Allocate observed data streams adaptively

𝑞𝑊,𝑡 = 6 W-observations

𝑞𝐷,𝑡 = 4 D-observations

Combination of two strategies

11

To combine these two strategies, we separate the observable variables intotwo groups: 𝑞𝑊,𝑡 W-observations and 𝑞𝐷,𝑡 D-observations.The proportion of 𝑞𝑊,𝑡 and 𝑞𝐷,𝑡 is adjusted adaptively.

Page 12: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

The proposed Spatial Adaptive Sampling and

Monitoring (SASAM) algorithm

[i] How to allocate W-observations?

[ii] How to update local statistics?

[iii] How to determine the

number and location of D-observations?

Three Questions

𝑞𝑊,𝑡 = 𝑞 − 𝑞𝐷,𝑡Allocate W-observations [i]

Begin

𝑞𝐷,1 = 0

Update local statistics and charting statistics [ii]

Determine 𝑞𝐷,𝑡+1; Allocate D-observations for the next period [iii]

YAlarm

N

𝑡 ← 𝑡 + 1

Charting statistics > 𝐻?

12

Page 13: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

[i] Allocate W-observations

• Objective: select 𝒪𝑊,𝑡 = 𝑞𝑊,𝑡 observations

– with good space filling property

– with diversity from the previous selection

• Method– Generate 𝐼 candidate Latin hyper-cube designs.

– For each 𝑖 = 1,… , 𝐼, calculate

𝑚1,𝑖 = min𝒙𝑘∈𝒪𝑊,𝑡

𝒙𝑗∈𝒪𝑊,𝑡

𝑑 𝒙𝑘 , 𝒙𝑗 [measuring space filling property]

𝑚2,𝑖 = min𝒙𝑘∈𝒪𝑊,𝑡

𝒙𝑗∈𝒪𝑊,𝑡−1

𝑑 𝒙𝑘 , 𝒙𝑗 [measuring diversity from the previous design]

– Pick the candidate selection with the maximum 𝑚𝑖 = 𝑚1,𝑖 +𝑚2,𝑖

index.

An example, 𝒪𝑊,𝑡 =10

13

Page 14: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

[ii] Update Local statistics

Observations at time 𝑡: 𝑌𝑗,𝑡 , 𝑗 ∈ 𝒪𝑊,𝑡 ∪ 𝒪𝐷,𝑡

Conventional CUSUM chart:

𝑊𝑘,𝑡(1)

= 𝑊𝑘,𝑡−11

+ 𝑢𝑚𝑖𝑛𝑌𝑘,𝑡 −𝑢𝑚𝑖𝑛2

2+

𝑊𝑘,𝑡(2)

= 𝑊𝑘,𝑡−12

+ −𝑢𝑚𝑖𝑛𝑌𝑘,𝑡 −𝑢𝑚𝑖𝑛2

2+

the 𝑘th local statistics indicating +/- shift

14

Page 15: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

[ii] Update Local statistics

Observations at time 𝑡: 𝑌𝑗,𝑡 , 𝑗 ∈ 𝒪𝑊,𝑡 ∪ 𝒪𝐷,𝑡

Conventional CUSUM chart:

𝑊𝑘,𝑡(1)

= 𝑊𝑘,𝑡−11

+ 𝑢𝑚𝑖𝑛𝑌𝑘,𝑡 −𝑢𝑚𝑖𝑛2

2+

𝑊𝑘,𝑡(2)

= 𝑊𝑘,𝑡−12

+ −𝑢𝑚𝑖𝑛𝑌𝑘,𝑡 −𝑢𝑚𝑖𝑛2

2+

With Spatial consideration:

𝑊𝑘,𝑡(1)

= 𝑊𝑘,𝑡−11

+

𝑗∈𝒪𝑊,𝑡∪𝒪𝐷,𝑡

𝐾ℎ 𝒙𝑘 − 𝒙𝑗 𝑢𝑚𝑖𝑛𝑌𝑘,𝑡 −𝑢𝑚𝑖𝑛2

2+

𝑊𝑘,𝑡(1)

= 𝑊𝑘,𝑡−11

+

𝑗∈𝒪𝑊,𝑡∪𝒪𝐷,𝑡

𝐾ℎ 𝒙𝑘 − 𝒙𝑗 −𝑢𝑚𝑖𝑛𝑌𝑘,𝑡 −𝑢𝑚𝑖𝑛2

2+

Clustered shift Nearby data streams have similar likelihood of shift

𝐾ℎ(⋅): influence function

the 𝑘th local statistics indicating +/- shift

𝑊𝑘,𝑡 = max 𝑊𝑘,𝑡(1),𝑊𝑘,𝑡

(2)

Charting statistic: 𝑆𝑡 = max

1≤𝑘≤𝑝𝑊𝑘,𝑡

Describe the likelihood that the most suspected data stream has a shift.

15

Page 16: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

[iii] Allocate D-observations at next time point

• Where to allocate?– Nearest neighbors of

the most suspected stream

Most suspected data stream

local stats.

16

Page 17: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

𝑞𝐷,𝑡+1 = 𝑓𝜃(𝑆𝑡)

[iii] Allocate D-observations

• Where to allocate?– Nearest neighbors of the

most suspected stream

• How many? – More D-observations for

higher chance of shift

– Similar to variable sample size charts

The charting statistic

# of D-obs for time 𝑡 + 1

An increasing function described by someparameter 𝜃. We apply the linear function.

Li and Qiu (2014) used this technique on variable sample interval chart

Most suspected data stream

𝑆𝑡

𝑞𝐷,𝑡+1 = 6

local stats.

17

Page 18: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Summary

Generate a number of candidatesamples, and select the best onein terms of space filling property.

Update local statistics with allobservations of nearby variablesusing a kernel function.

New D-observations are selectedaround the most suspected stream;the number is determined by thecharting statistics.

𝑞𝑊,𝑡 = 𝑞 − 𝑞𝐷,𝑡Allocate W-observations [i]

Begin

𝑞𝐷,1 = 0

Update local statistics and charting statistics [ii]

Determine 𝑞𝐷,𝑡+1; Allocate D-observations for the next period [iii]

YAlarm

N

𝑡 ← 𝑡 + 1

Charting statistics > 𝐻?

18

Page 19: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

A real case study - solar flare detection

Dataset: • 𝑝 = 232 × 292 = 67744 dimensional variables;• 𝑞 = 500 observable pixels (~0.7% available);• One solar flare starts from frame 87.• Both charts are adjusted to have the same IC-ARL.

Results:• TRAS triggers alarm

at t = 95.• SASAM triggers an

alarm at t = 91.

SASAMTRAS

19

Page 20: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Video clip

20

Page 21: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Video clip

21

Page 22: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Result

Time 91The SASAM

triggers alarm

Time 95The TRAS

triggers alarm

TRAS SASAMObservable stream△ Top-10 observable stream in TRAS method

22

Page 23: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Summary of the proposed sampling

strategy

• A systematic adaptive sampling strategy is proposed for real-time monitoring of Big Data streams with dynamically selected partial information.

• Adaptability:

– Integrate two sampling strategies (wide or deep)

– Quickly detect a wide range of possible changes with no prior knowledge of the potential anomaly events by adaptively adjusting to the event locations;

– Actively select the data streams to observe from the whole streaming data to maximize the sensitivity for anomaly detection with consideration of resource constraints.

23

Page 24: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Proposed DDDAS new ideas

24

Objectives Existing Approaches Proposed New Methodology

Applications modeling

Establish a spatial-temporal statistical model for capturing

the changing baseline

Either only capture spatial or temporal characteristics

Extended models are rigid and limited to specific applications

Require full observations

Capture both domains by integration of graphical models with matrix factorization

Generic transfer learning framework for adaptive learning

Allow dynamic partial observations

Application measurement systems and

methods

Establish an effective sampling strategy to decide

which data streams to observe

Sampling over fixed sub-regions (rigid spatialdomain) or whole dataframes (rigid temporaldomain)

Dynamic sampling partial data streams over the spatial domain at each acquisition time based on resources constraints

Mathematical and statistical

algorithms

Establish a scalable and robust SPC

scheme to maximize change detection

capability

Search for all possible faultscenarios

Require all historical data

Assume potential faultscenarios are known

High computational costs

Require full observations

Automatically identify and localizefault scenarios

Require current observations and asummary statistic

Assume potential fault scenarios areunknown

Only linear complexity in the numberof data streams

Allow dynamic partial observations

Page 25: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Conclusion and Impact to Air Force

25

• It is critically important for the Air Force to make rapid decisions in a battlefield based on Big Data continuously collected from massive sensors in real time.

(a) Intrusion detection (b) unmanned vehicle surveillance (c) Cybersecurity

• Early detection and localization of these anomaly events will enhance system-wide situational awareness to support warfighters/military operations, prevent damages, reduce cost, improve efficiency, and save billions of lives. Snapshots of the temperature profile

from Climate Institute at ORNL

Page 26: Dynamic Data-Driven Adaptive Sampling and …...•Adaptive sampling strategy : require large amount of historical information – Top-r based Adaptive Sampling (TRAS) at DDDAS 2016

Lab for System Informatics and Data Analytics (SIDA)

Thank you for coming!

Questions?

26