Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ......

31
Survey of Streaming Applica4ons using Streams Sco6 Schneider [email protected] IBM Research

Transcript of Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ......

Page 1: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Survey  of  Streaming  Applica4ons  using  Streams  

Sco6  Schneider  [email protected]  

IBM  Research  

Page 2: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

The  Old  Way  

2  

?   !  

Page 3: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Big  Data  

3  

!  ?  !  ?  !  ?  !  ?  !  ?  

?   !  ?   !  ?   !  ?   !   ?   !   ?   !   ?   !  

?   !  ?   !  ?   !  ?   !   ?   !   ?   !   ?   !  

Page 4: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

4  

Streams  Programming  Model  

•  Streams  applica4ons  are  data  flow  graphs  that  consist  of:    –  Tuples:  structured  data  item  – Operators:  reusable  stream  analy4cs  –  Streams:  series  of  tuples  with  a  fixed  type  –  Processing  Elements:  operator  groups  in  execu4on  

Page 5: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Streams  Programming  Language  composite Main { !type! Entry = int32 uid, rstring server, ! rstring msg; ! Sum = uint32 uid, int32 total; !graph ! stream<Entry> Msgs = ParSource() { ! param servers: "logs.*.com"; ! partitionBy: server; ! } !! stream<Sum> Sums = Aggregate(Msgs) { ! window Msgs: tumbling, time(5), ! partitioned; ! param partitionBy: uid; ! } !! stream<Sum> Suspects = Filter(Sums) { ! param filter: total > 100; ! } !! () as Sink = FileSink(Suspects) { ! param file: "suspects.csv"; ! }!} !

5  

ParSrc

Aggr

Filter

Sink

ParSrc

Aggr

Filter

ParSrc

Aggr

Filter

Sink

ParSrc

Aggr

Filter

Page 6: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

6

SPL  source    

x86 host x86 host x86 host

x86 host

x86 host

PE  

PE  

PE  

PE  

PE  

PE  

PE  

PE  Co

nnec4o

ns  

Source  

Sink  

PE  

SPL  compiler  

Streams  Run4me  

 

Source  è  Compila4on  è  Execu4on  

6  

Page 7: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

7

SPL  source    

x86 host x86 host x86 host x86 host x86 host

PE  

PE  

PE  

PE  

PE  

PE  

PE  

PE  

Conn

ec4o

ns  

Source  

Sink  

PE  

SPL  compiler  

Streams  Run4me  

Source  è  Compila4on  è  Execu4on  

7  

Page 8: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

8

x86 host x86 host x86 host x86 host x86 host

PE   PE  

Sink  

Source  

Source   PE  

PE  

PE   PE  

Sink  

Sink  

PE  

PE  

PE  

PE  

PE  

PE  

PE  

PE  Co

nnec4o

ns  

Source  

Sink  

PE  

SPL  compiler  

Streams  Run4me  

(Job  management,  Security,  

Con4nuous  Resource    Management)  

Source  è  Compila4on  è  Execu4on  

8  

Page 9: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Applica4ons  

•  Smart  Grid  – Connec4ng  generators,  distributors  and  transmi6ers  

•  Social  Media  Analy4cs  – Sen4ment  analysis  of  movie  marke4ng  – Disease  tracking  

•  Medical  – SickKids:  neonatal  monitoring  – ANS  modeling  

9  

Page 10: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Smart  Grid  

Page 11: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Pacific  Northwest  Smart  Grid  

•  Goals  –  Manage  peak  demand  

–  Integrate  renewable  energy  sources  

–  Address  constrained  resources  

–  Reliablity,  effeciency  –  Dynamically  choose  most  econmical  resource  

Page 12: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Power  and  Demand  Flow  generators   transmi7ers   distributors   consumers  

Page 13: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Transac4ve  Control  Flow  generators   transmi7ers   distributors   consumers  

EIOC  

incen%ve  

feedback  

Page 14: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Transac4ve  Architecture  

Field  Communica4ons  Network  (Public  Wireless  Access,    WiMax,    BPL)    

Line  mounted  sensors  Substa4on   Smart  

metering  PHEVs    Distributed  

Genera4on  

Transmission  &  Distribu4on  and  Genera4on  Assets  Remote  IT  Assets  

Servers   Applica4ons  

Netezza  Data  Warehouse  

Dashboard  

InfoSphere  Streams  

3Tier  Renewable  Forecasts  

 Off-­‐Line,  Deep  Analy4cs  (Data-­‐mining  /  model  building)  

Message  Broker  Micro  broker  &  MQ6  

iCS  (ISO/IEC  18012-­‐2  applica4on  interoperability  framework)  

iCS  (ISO/IEC  18012-­‐2  applica4on  interoperability  framework)  

Alstom  T&D  EMS  &  Market  SW  

Data  media4on  and  real-­‐4me  analysis  

Energy  Informa%on  Opera%ons  Center  (EIOC)  

U%lity  Subproject  Sites  

Page 15: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

High  Speed  Data  Media4on  

ac4veMQ    JMS  Client  Source  Adapter  

Data  Transform  

Ac#veMQ  JMS  Provider  

Netezza  ODBC  

Provider  

ODBCAppend  table  instance  

table  filter   data  prep  

ODBCAppend  table  instance  

table  filter   data  prep  

ODBCAppend  table  instance  

table  filter   data  prep  

message  

table  access  specific  

one  job  per  Netezza  table  

alerts   Alerts  File  Daily  

Message  FIle  

alert  

Page 16: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Social  Media  

Page 17: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Super  Bowl  Movie  Trailers  

•  Effec4veness  – How  many  people  are  talking  about  the  movies?  – Who  are  they?  – Do  they  intend  to  see  it?  – How  does  this  compare  to  other  movies?  

•  Big  Data  – Over  1  billion  messages  – Over  30  million  profiles  extracted  

Page 18: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Buzz  During  Super  Bowl  

21 Jump Street

Act of Valor

Battleship

Dr. Seuss\' The Lorax

G.I. Joe: Retaliation

Ghost Rider: Spirit of Vengeance

John Carter

The Dictator

The Avengers

0

5000

10000

15000

20000

25000

30000

5pm 6pm 7pm 8pm 9pm 10pm 11pm 12am

Page 19: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Avid  Movie  Goer  Buzz  During  Super  Bowl  

Project X

John Carter

Battleship

Ghost Rider

21 Jump Street

The Dark Knight Rises

G.I. Joe

Spider-man

The Avengers

Act of Valor

The Lorax The Dictator

Page 20: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

10% 20% 30% 40% 50% 60% 70% 80% 90%

Battleship

The Dictator

CA TX NY VA FL GA NC OH MD NJ

Gender

Top 10 Markets

Gender

Top 10 Markets

Female Male

Female

CA TX NY GA FL NC MD VA OH PA

Male

CA TX NY OH FL NC VA GA MD PA

Gender

Top 10 Markets

Female Male Act of Valor

Buzz  by  Demographics  

Page 21: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Disease  Tracking  

Page 22: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Social  Media  Analy4cs  Architecture  

Social Media Consumer Profiles

Customer Models

InfoSphere Streams

InfoSphere BigInsights

Entity Integration

Predictive Analytics

Data Ingest & prep.

Text Analytics: Timely Insights

Entity Integration:

Profile Resolution

Predictive Analytics:

Action Determination

Social Media Data

Online  Flow:  Data-­‐in-­‐mo%on  analysis  

Text Analytics

Offline  Flow:  Data-­‐at-­‐rest  analysis  

Timely Decisions

Social Media Data

Customer Database

Consumer Lists

Customer & Prospect

profiles

Entity Integration

Page 23: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Medical  

Page 24: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

24  

"Data  Baby"  

Page 25: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Real  Neonatal  ICU  

Page 26: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Current  ICU  Monitoring  

Pa@ent  

Device  

Device  

Device  

Pa4ent  Record  

Fixed,  threshold  based  alerts    (e.g.:  heart  rate  below  45  bpm)  

Page 27: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

What's  Missing  

•  Automa4on  •  Data  reten4on  •  Complex  temporal  correla4ons  

–  e.g.,  Alert  when  both  SpO2  and  blood  pressure  drop  below  a  threshold  for  45  seconds  

–  e.g.,  Alert  when  variability  of  the  HR  is  below  a  threshold  for  a  30  minute  window  

–  e.g.,  Alert  when  probability  of  observing  bad  event  in  next  6  hours  is  high  

•  Further  explora4on  of  physiological  data  

Page 28: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

SickKids  Deployment  

Page 29: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Simple  Example  

•  "Baby  crashing"  – Correlate  SpO2  with  mean  arterial  pressure  

•  SpO2  <  85  •  Mean  BP  <  gesta4onal  age  

– For  20  seconds  

Page 30: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Not  So  Simple  Example  •  Detec4ng  onset  of  sepsis  in  premature  infants  •  Data  sources  

–  Systolic,  diastolic  and  mean  arterial  blood  pressure  –  SpO2  –  Respira4on  rates  –  Electrocardiograms  –  Sta4c  clinical  data  

•  Fusion  and  scoring  –  Based  models  specified  by  physicians  –  Data  driven  approach  uses  machine  learning  to  discover  new  models  

•  Deployed  at  SickKids  hospital  in  Ontaria,  Canada  monitoring  NICU  pa4ents  

Page 31: Survey’of’Streaming’Applicaons’ using’Streams’ · Streams’Programming’Language ... Social Media Consumer Profiles Customer Models InfoSphere Streams InfoSphere BigInsights

Modeling  the  ANS  with  Heart  Rate  Variability  

QRS    Detec4on  

RR    Genera4on  

Ectopic/Ab-­‐  normal    Beat    

Detec4on  

Interpola4on  

Windowing  

Windowing  

Windowing  

Windowing  

Windowing  

HRV1  

HRV2  

HRV3  

HRV5  

HRVk  

Synch  

Source  Receive Decode Raw ECG

Transform QRS Mask

Clean and Interpolate FMP Anomaly detection Cubic spline interpolation

Select Aggregation

Transform HRV Mask

Synchronize

Standard

Our research LEGEND