Data to Insight in a Flash:
Introduction to Real-Time Analytics with WSO2 Complex Event Processor
S. Suhothayan (Suho)
Technical LeadWSO2 Inc.
CEP Is & Is NOT!
• Is NOT!
– Simple filters
• Simple Event Processing
• E.g. Is this a gold or platinum customer?
– Joining multiple event streams
• Event Stream Processing
• Is !
– Processing multiple event streams
– Identify meaningful patterns among streams
– Using temporal windows
• E.g. Notify if there is a 10% increase in overall trading
activity AND the average price of commodities has fallen 2%
in the last 4 hours
Event Streams
• Event stream is a sequence of events
• Event streams are defined by Stream Definitions
• Events streams have in-flows and out-flows
– Inflows can be from
• Event builders
Converts incoming XML, JSON, etc events to event
stream
• Execution plans
– Outflows are to
• Event formatters
Converts to event stream to XML, JSON, etc events
• Execution plans
Stream Definition
{
'name':'soft.drink.coop.sales', 'version':'1.0.0',
'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales',
'metaData':[
{'name':'region','type':'STRING'}
],
'correlaitonData':[
{'name':’transactionID’,'type':'STRING'}
],
'payloadData':[
{'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'},
{'name':'total','type':'INT'}, {'name':'user','type':'STRING'}
]
}
Event Adaptors
● For receiving and publishing events
● Has the configurations to connect to external endpoints
● Has many-to-one relationship with Event Streams
Event Adaptors
Support for several transports (network access)
● SOAP
● HTTP
● JMS
● SMTP
● SMS
● Thrift
● Kafka
● File
● Websocket
Supports publishing date to databases
● Cassandra
● MYSQL
● H2
● MSSQL
● Oracle
Supports custom event adaptors via its pluggable architecture!
Event Format
• Standard event formats are available for receiving and publishing
events
– XML
– JSON
– Text
– Map
– WSO2 Event
• If events adhere to the standard format
they do not need data mapping.
• If events do not adhere
custom event mapping should be configured in
Event builder & Event Formatter
appropriately.
Event Format
Standard XML event format
<events>
<event>
<metaData>
<tenant_id>2</tenant_id>
</metaData>
<correlationData>
<activity_id>ID5</activity_id>
</correlationData>
<payloadData>
<clientPhoneNo>0771117673</clientPhoneNo>
<clientName>Mohanadarshan</clientName>
<clientResidenceAddress>15, Alexendra road,
California</clientResidenceAddres>
<clientAccountNo>ACT5673</clientAccountNo>
</payloadData>
</event>
<events>
Execution Plan
● Is an isolated logical execution unit
● Each execution plan imports some of the event streams available
in CEP and defines the execution logic using queries and exports
the results as output event streams.
● Has one-to-one relationship with CEP Backend Runtime (Siddhi).
https://github.com/wso2/siddhi
● Has many-to-many relationship with Event Streams.
● Each execution plan spawns a Siddhi Engine Instance.
CEP Solution patterns
1. Transformation - project, translate, enrich, split
2. Filter
3. Composition / Aggregation / Analytics
● basic stats, group by, moving averages
1. Join multiple streams
2. Detect patterns
● Coordinating events over time
● Trends - increasing, decreasing, stable, non-increasing, non-
decreasing, mixed
1. Blacklisting
2. Building a profile
Siddhi Query Structure
define stream <event stream>
(<attribute> <type>,<attribute> <type>, ...);
from <event stream>
select <attribute>,<attribute>, ...
insert into <event stream> ;
Siddhi Query
define stream SoftDrinkSales
(region string, brand string, quantity int,
price double);
from SoftDrinkSales
select brand, quantity * price as totalCost
insert into TotalCostStream ;
from TotalCostStream
select brand, toUSD(totalCost) as totalCostInUSD,
‘USD’ as currency
insert into OutputStream ;
Siddhi Query: Filter and window
define stream SoftDrinkSales
(region string, brand string, quantity int,
price double);
from SoftDrinkSales
[quantity > 99]#window.time(1 hour)
select region, brand, avg(quantity) as avgQuantity
group by region, brand
insert into AvgWholeSales ;
Siddhi Query: Partition
define stream SoftDrinkSales
(region string, brand string, quantity int,
price double);
partition with (region of SoftDrinkSales)
begin
from SoftDrinkSales
[quantity > 99]#window.length(100)
select region, brand,
avg(quantity) as avgQuantity
insert into AvgWholeSales ;
end;
Siddhi Query: Pattern
define stream Purchase(price double,cardNo long,place string);
from every (a1 = Purchase[price < 10] ) ->
a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud;
● Matches events arriving in order,
● Sequence is used to matching immediate next events arriving in
order.
Siddhi Query: Event Tables
define stream Purchase (price double, cardNo long, place string);
define stream NewUser (userName string, cardNo long, time long) ;
define table CardUserTable (name string, cardNum long) ;
from NewUser
select userName as name, cardNo as cardNum
insert into CardUserTable ;
from Purchase#window.length(1) join CardUserTable
on Purchase.cardNo == CardUserTable.cardNum
select Purchase.cardNo as cardNo,
CardUserTable.name as name,
Purchase.price as price
insert into PurchaseUserStream ;
● Similarly update and delete can be done
● Event tables can be backed by an RDBMs Database
Siddhi Query Extensions
● Function extension
● Aggregator extension
● Window extension
● Transform extension
from SoftDrinkSales#window.time(30 min)
select brand,
custom:stdev(quantity) as stdevQuantity
insert into OutputStream ;
Event Flow
● Visualization of the Event Stream flow in CEP
● Helps to get the big picture
● Good for debugging
Event Tracer
• Dump message traces in a textual format
• Before and after processing each stage of event flow
Event Statistics
• Real-time statistics
• via visual illustrations & JMX
• Time based request & response counts
• Stats on all components of CEP server
Performance Results
• Same JVM Performance (Siddhi with Esper, M means a Million)
4 core machine
– Filters 8M Events/Sec vs Esper 2M
– Window 2.5M Events/Sec vs. Esper 1M
– Patterns 1.4M Events/Sec about 10X faster than Esper
• Over the Network Performance (Using thrift based WSO2 event
format) - 8 core machine
– Filter 0.25M (or 250K) Event/Sec
High Availability
• Option 1: Side by side
– Recommended
– Takes 2X hardware
– Gives zero down time
• Option 2: Snapshot and restore
– Uses less HW
– Will lose events between snapshots
– Downtime while recovery
– ** Some scenarios you can use event tables to keep intermediate state
WSO2 CEP 4.0
• Apache Storm integration (to make WSO2 CEP highly scalable)
• Rewrite of Siddhi
– Single language for scalable and single node deployment
– Achieve maximum parallelism
• Geofencing support
– With management dashboard
• Time series and regression support
• Natural language & sentimental analysis support
• Integration to machine learning model (PMML models)
WSO2 CEP 4.0 - Milestone 1 released
Pack:http://svn.wso2.org/repos/wso2/people/mohan/CEP4.0.0-M1/wso2cep-4.0.0-M1.zip
Docs : https://docs.wso2.com/display/CEP400
Scalable WSO2 CEP Deployment
from CEP 4.0…
https://docs.wso2.com/display/CEP400/Clustered+Deployment
Geo Dashboard
With configurable alerting &
Monitoring capabilities.
http://wso2.com/library/articles/2015/01/article-geo-spatial-data-analysis-using-wso2-
complex-event-processor-0/
Natural Language Processing
Understanding the sentences &
Analyzing sentiments.
● Uses Stanford NLP.
● Adaptors for UIMA is also available.
https://github.com/wso2-gpl/siddhi/tree/master/siddhi-extensions/nlp
NLP Extentions
● findNameEntityType(entityType:string, groupSuccessiveEntities:boolean, text:string)
Extract nouns in the text, which match any predefined entity type such as PERSON, LOCATION, DATE...etc.
● findNameEntityTypeViaDictionary(entityType:string, dictionaryFilePath:string, text:string)
Extract all matches in the text, for entries defined in the dictionary xml file under the given entity type
● findRelationshipByRegex(regex:string, text:string)
Extract (subject, object, verb) relationship from the text, that match the given regular expression.
● findRelationshipByVerb(verb:string, text:string)
Extract (subject, object, verb) relationship from the text that match any form of the verb.
● findTokensRegexPattern(regex, text)
Extract phrases that match the given NLP regular expression pattern
● findSemgrexPattern(regex, text)
Extract words that match the given grammatical relationship regular expression pattern
Machine Learning
Using R, PMML Models for real-time predictive analysis
http://wso2.com/library/tutorials/2014/08/tutorial-implementing-a-wso2-cep-extension-to-run-machine-learning-models-written-in-pmml-format/
http://wso2.com/library/articles/2014/11/article-real-time-intruder-detection-with-r-pmml-and-wso2-cep/
Case study: Smart Energy
•DEBS (Distributed Event Based Systems)
academic conference 2014, yearly event
processing challenge
•Smart Home electricity data: 2000 sensors, 40
houses, 4 Billion events
•WSO2 CEP based solution is one of the four
finalists (Others Dresden University of
Technology and Fraunhofer Institute (Germany),
and Imperial College London)
•We posted fastest single node solution
measured (400K events/sec) and close to one
million distributed throughput.
Case study: Realtime Soccer Analytics
From DEBS 2013 …
http://www.slideshare.net/hemapani/analyzing-a-soccer-game-with-wso2-cep
Siddhi Query: Pattern
● Filters or transformations (process a single event)from Ball[v>10] select .. insert into ..
● Windows + aggregation (track window of events: time, length)from Ball#window.time(30s) select avg(v) ..
● Joins (join two event streams to one)from Ball#window.time(30s) as b join Players as p
on p.v < b.v
● Patterns (state machine implementation)from Ball[v>10], Ball[v<10]*,Ball[v>10] select ..
● Event tables (map a database as an event stream)Define table HitV (v double) using .. db info ..
Running Stats
partition with (id of Players)
begin
from s = Players [v <= 1 or v > 11] ,
t = Players [v > 1 and v <= 11]+ ,
e = Players [v <= 1 or v > 11]
select s.ts as tsStart, e.ts as tsStop,
s.id as playerId , “trot” as intensity,
t [0].v as instantSpeed ,
(e.ts - s.ts )/1000000000 as unitPeriod
insert into RunningStats ;
end;
Detect kicks & Shot on Goals
Detect kicks on the ball, calculate direction after 1m,
and keep giving updates as long as it is in right
direction
Top Related