Event Hub & Azure Stream Analytics
-
Upload
davide-mauri -
Category
Data & Analytics
-
view
603 -
download
0
Transcript of Event Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
Davide Mauri
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
About MeMicrosoft SQL Server MVPWorks with SQL Server from 6.5, on BI from 2003Specialized in Data Solution Architecture, Database Design, Performance Tuning, High-Performance Data Warehousing, BI, Big DataPresident of UGISS (Italian SQL Server UG)Regular Speaker @ SQL Server eventsConsulting & Training, Mentor @ SolidQE-mail: [email protected]: @mauridb Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Agenda• Complex Event Processing• The Lambda Architecture• Azure Stream Analytics
• Data Ingestion• Azure Stream Analytics Query Language• Advanced Features
• Additional Resources• Conclusions
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Complex Event Processing• Event processing is a method of tracking and analyzing (processing)
streams of information (data) about things that happen (events)
• Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances.
• Start to appear in 1990• Goal: identify meaningful events (such as opportunities or threats) and
respond to them as quickly as possible
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Complex Event Processing Use Cases• Network monitoring• Intelligence and surveillance• Risk management• E-commerce• Fraud detection• Smart order routing• Transaction cost analysis• Pricing and analytics• Market data management• Algorithmic trading• Data warehouse augmentation Ref: http://www.infoq.com/articles/stream-processing-hadoop
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
The Lambda ArchitectureGeneric, scalable and fault-tolerant data processing architecture […] in which low-latency reads and updates are required.
Ref: http://lambda-architecture.net/
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Hadoop but not only that!• Apache Hadoop Ecosystem is the typical solution nowadays
• “Mature” Option• Flume (optional collector and streaming data movement system)• Kafka (distributed messaging system)• Storm (distributed real-time computation system)
• “Innovative” Option• Spark + Spark Streaming
• Very powerful, but very complex
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Why the Cloud? And why Azure?• Due to the high scalability and computing power that a streaming
solution may require, the cloud is a perfect environment for it
• Very cheap and Very Simple to start a project
• Very well integrated with all other Azure offerings• From Monitoring to Power BI
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics• Real-Time (somehow) complex event processing engine
• Enables real-time event processing in a very simple and cheap way• SQL-Like language• Temporal Semantic Support
• Different from SQL Server 2016• Specific for streaming data
• Azure Only at present time
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics• Platform-as-a-Service
• Can handle millions of events per second• Based on the REEF project (now Apache incubated)
• Main objects: Job, Query, Functions, Input & Outputs• Totally manageable from a REST interface
• “Streaming Units” is the base concept to manage performance, scalability and costs
• Roughly 1 Streaming Units = 1 MB/Sec of throughput
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics - Data ingestion• Inputs for Stream Analytics
• Streaming Sources (“Data in motion”)• JSON, CSV or AVRO
• Reference Data (“Data at rest”)• JSON or CSV• Blob Store (max 50MB)
• Streaming Sources• Event Hubs• IoT Hub
Stream analytics – High-Level Architecture
OUTPUT[Result of Query]
Azure SQL DB
Azure Event Hubs
Azure Blob Storage
INPUT
Source of Events
Azure Blob Storage
Azure Event Hubs
Reference Data
Query runs continuously against the incoming stream of events
Stream Analytics
QueryEvents have defined schema and are
temporal (sequenced in time)
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Other Azure Stuff
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Data ingestion• A nice tool to monitor Event Hub is the “Service Bus Explorer”
• https://github.com/paolosalvatori/ServiceBusExplorer
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMOSimple Setup of Event Hubs, Source and Destination
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream Analytics Query Engine• Take date from one or more input
• Send resulting data to one or more output
• Support most common data types:• bigint, float, unicode strings, datetime• key-value pairs• arrays
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream Analytics Query Language• Stream Analytics Query Language Reference
• https://msdn.microsoft.com/library/azure/dn834998.aspx
• Subset of T-SQL
• With specific temporal extension• Time values to be used can be set using TIMESTAMP BY directive
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream Analytics Query LanguageDML Statements•SELECT•FROM•WHERE•GROUP BY•HAVING•CASE•JOIN•UNION
Windowing Extensions•Tumbling Window•Hopping Window•Sliding Window•Duration
Aggregate Functions•SUM•COUNT•AVG•MIN•MAXScaling Functions• WITH• PARTITION BY
Date and Time Functions•DATENAME•DATEPART•DAY•MONTH•YEAR•DATETIMEFROMPARTS•DATEDIFF•DATADD
String Functions• LEN• CONCAT• CHARINDEX• SUBSTRING• PATINDEX
Statistical Functions•VAR/VARP•STDEV/STDEVP
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMOStream Analytics Query in action
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Advanced features• Partitioning Support
• Specially useful for high scalability
• CTE-Like constructs that also helps scaling out
• Temporal aggregations• Tumbling, Hopping and Sliding Windows
• (Temporal) Join between input streams
Tumbling window• Adjacent non-overlapping
windows• Answer to the question:
“What happened in the last X seconds? And in the next X? And in the next X?” And so on…
1 5 4 26 8 6 5
0 10 4020 30 Time (secs)
1 5 4 26
8 6
50
A 20-second Tumbling Window
60
3 6 1
5 3 6 1
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeekJoin the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Hopping window
1 5 4 26 8 7
0 10 4020 30 50
A 20-second Hopping Window with a 10-second “Hop”
60
4 26
8 6
5 3 6 1
1 5 4 26
8 6 5 3
6 15 3
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
• Overlapping windows• Answer to the question:
“Each X second tell me what happened in the previous Y seconds”
• The same event can be in more than one windows
• Think to a “moving average”
Sliding window
1 5
0 10 4020 30 Time (secs)
50
A 20-second Sliding Window
1
8
8
5 1
9
5 1 9
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
• A forward moving window. Every time something happen, you get data of what happened in the last “X” seconds.
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMOStream Analytics Full Power!
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics and machine learning• Apply AzureML model to streaming data
• Sample use-cases• Fraud Detection• Product Recommendation• Customer Sentiment Analysis• Maintenance Prediction
• Right now in preview and available only through the “old” portal• https://manage.windowsazure.com/
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
DEMOStream Analytics & Machine Learning
Stream analytics alternative (on azure)• Apache Storm
• IaaS or PaaS (With HDInsight)
• Much more complex to manage and develop…but much more powerful
• https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-comparison-storm/
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Stream analytics on-premises?• Apache Hadoop Ecosystem
• Flume / Kafka / Storm
• StreamInsight• CEP solution part of the SQL Server Platform
• EventStore • Javascript OpenSource CEP
• None of them (except EventStore) has native temporal extension
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Additional resources• Online Documentation• Stream Analytics Reference Architecture• Lambda Architecture• GitHub Repository
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Thanks!Questions?
Join the conversation on Twitter: @DevWeek // #DW2016 // #DevWeek
Demos available on GitHubhttps://github.com/yorek/devweek2016