The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch...

35
Elastic 1st March 2018 @elastic The Math behind Elastic Machine Learning Tom Veasey, Software Engineer, Machine Learning Hendrik Muhs, Software Engineer, Machine Learning

Transcript of The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch...

Page 1: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Elastic

1st March 2018

@elastic

The Math behind Elastic Machine Learning

Tom Veasey, Software Engineer, Machine LearningHendrik Muhs, Software Engineer, Machine Learning

Page 2: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

2

Beats

Logstash

Kibana

X-Pack X-Pack

Elasticsearch

Security

Alerting

Monitoring

Reporting

Graph

Machine Learning

X-Pack

Elastic Stack

● Single install - deployed with X-Pack● Data gravity - analyzes data from the same cluster● Contextual - anomalies and data stored together● Scalable - jobs distributed across nodes● Resilient - handles node failure

Page 3: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Machine Learning in the Elastic Stack

3

Page 4: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Machine Learning in the Elastic Stack

4

Page 5: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• analysis of time series (e.g. log data)

• help users understanding their data

• modeling of the data in order to detect anomalies, predict future values

• be of operational useful: real-time

Machine Learning in the Elastic Stack

5

In a nutshell

Page 6: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• smarter job placement

• automatic job creation

• data visualizer

• population analysis job wizards

• on-demand forecasting

• scheduled events

Machine Learning News

6

Big ML upgrade in 6.1 / 6.2

Page 7: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• What basic concepts do I need to know about?

• What is a model? How many are out there?

• From Detection to Projection: Forecasting

ES Machine Learning Guided Tour

7

What happens inside the black box

Page 8: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Creating an ML Job from a backstage perspective

8

Data Buckets

Data is aggregated into buckets1 hour

10 minutes

raw

Page 9: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Creating an ML Job from a backstage perspective

9

Data Transformation

Functions define how data is transformed (mean, count, sum, ...)mean

max

raw

Page 10: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• online approach (does not require (re-)access to the raw data)

• stores features (e.g. seasonality) as well as condensed historic information

• evolving: up-to-date to the last received bucket

• adapts to new data (up to complete re-learning)

ML Model

10

What is stored inside of a model

self-contained artifact

Page 11: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Creating an ML Job from a backstage perspective

11

Models

Number of build models depend on detectors and data splits

detectors

data splits

• a detector defines fields function

• a data splits allow individual models per split

Page 12: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Creating an ML Job from a backstage perspective

12

Models

Number of build models depend on detectors and data splits

detectors

data splits

Page 13: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Creating an ML Job from a backstage perspective

13

Models

Number of build models depend on detectors and data splits

detectors

data splits

Page 14: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Creating an ML Job from a backstage perspective

14

Models

Number of build models depend on detectors and data splits

detectors

data splits

Page 15: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Creating an ML Job from a backstage perspective

15

Models

Number of build models depend on detectors and data splits

detectors

data splits

Page 16: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• What we model and why

• trend model

• residual model

• modelling anomalous periods

• dealing with change, dealing with outliers

The Anatomy of a Model

16

Subtitle

Page 17: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• use existing models to project into the future (on-demand)

• provide a visualization of projection

• can be run at different points in time

ML Model for forecast

17

From the past to the future (> 6.1)

A ML model describes what is ‘usual’

Page 18: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• should not interfere with real-time analysis, runs in parallel

• low resource usage

• multi-user, repeatable

ML Forecast

18

From the past to the future

Design goals

Page 19: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

ML Forecast

19

Request for forecast Take a copy of corresponding models

Continue real time processing

Concurrently run forecast

Forecast results get written back into the

ML result index

Elasticsearch + X-pack

Machine Learning Job

Realtime Data Feed

Forecast Request

Detection Results

Forecast Results

Page 20: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Forecast challenges

Page 21: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Forecasting goal

21

“Accurate time series forecasting for a range of realistic data

characteristics with minimum human intervention”

Page 22: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Forecasting challenge: multiscale effects

22

Forecast ?

Forecast ?

time

Page 23: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Forecasting challenge: change points

23

change point

Page 24: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Time series modelling: handling change

24

Model

Controller

Page 25: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Time series modelling: handling change

25

Page 26: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Multiscale effects

26

• Reversion to behaviour on a given time frame typical

• Using one model, even with control of the “time window”, can’t capture this

• Ensemble + adjust weights based on how far ahead to predict

Page 27: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Change points

27

Change Detection (BIC)

H0 No change

H1 Time shift

H2 level shift

Change model

m x 1{ change }Change

Page 28: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Change points

28

Page 29: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Change points

29

Page 30: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Change points

30

• Need explicit handling of change points

• Detect and model level shifts, time shifts, etc

• Roll out multiple possible realisations of the change model to forecast and use

these to get expectation and confidence intervals

Page 31: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Summary

31

• Deterministic components of model; to forecast at time simply evaluate at time

• Maintain ensemble of trends for multiple time scales, i.e.

• Forecast using weighted average with weight a function of look ahead time, i.e.

• Detect and model probabilistically change points

• Roll out multiple possible realisations of the change model to forecast

Page 32: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

• Alerting: When do I run out of supplies?

• Further scalability: Large Jobs with lots of data splits

• Quality assessment: How good was my forecast?

• Multivariate: Forecast group of metrics using correlations

Forecasting: the future

32

Tell us your use case

Page 33: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

33

More Questions?

Visit us at the AMA

Page 34: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

www.elastic.co

Page 35: The Math behind Elastic Machine Learning · 2 Beats Logstash Kibana X-Pack X-Pack Elasticsearch Security Alerting Monitoring Reporting Graph Machine Learning X-Pack Elastic Stack

Exceptwhereotherwisenoted,thisworkislicensedunderhttp://creativecommons.org/licenses/by-nd/4.0/

CreativeCommonsandthedoubleCinacircleareregisteredtrademarksofCreativeCommonsintheUnitedStatesandothercountries.

Thirdpartymarksandbrandsarethepropertyoftheirrespectiveholders.

35

Please attribute Elastic with a link to elastic.co