Azure Stream Analytics : Analyse Data in Motion

51

Transcript of Azure Stream Analytics : Analyse Data in Motion

Page 1: Azure Stream Analytics  : Analyse Data in Motion
Page 2: Azure Stream Analytics  : Analyse Data in Motion

Stream Analytics Analyze your data in motionDeepthi Anantharam

Technology Evangelist

@deananth

Ruhani Arora

Technology Evangelist

@infinitydlimit

Page 3: Azure Stream Analytics  : Analyse Data in Motion

The need for evolution – Identified 2 years ago

… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing.

– Gartner, “The State of Data Warehousing in 2012”

Data sources

ETL

Data warehouse

BI and analytics

Page 4: Azure Stream Analytics  : Analyse Data in Motion

The “Traditional” Data Warehouse

4

Data sources

OLTP ERP CRM LOB

ETL

Data warehouse

BI and analytics

Increasing data volumes

1

Real-time data

4

Non-Relational Data

Devices Web Sensors Social

New data sources & types

2Cloud-born data

3

Page 5: Azure Stream Analytics  : Analyse Data in Motion

Evolving Approaches to Analytics

ETL Tool(SSIS, etc)

EDW(SQL Svr, Teradata, etc)

Extract

Original Data

Load

Transformed Data

Transform

OLTP

ERP LOB

BI Tools

Data Marts

Data Lake(s)

Dashboards

Apps

Page 6: Azure Stream Analytics  : Analyse Data in Motion

ETL Tool(SSIS, etc)

EDW(SQL Svr, Teradata, etc)

Extract

Original Data

Load

Transformed Data

Transform

OLTP

ERP LOB

BI Tools

Devices

Web

Sensors

Social

Ingest (EL)Original Data

Data Marts

Data Lake(s)

Dashboards

Apps

Evolving Approaches to Analytics

Page 7: Azure Stream Analytics  : Analyse Data in Motion

ETL Tool(SSIS, etc)

EDW(SQL Svr, Teradata, etc)

Extract

Original Data

Load

Transformed Data

Transform

OLTP

ERP LOB

BI Tools

Devices

Web

Sensors

Social

Ingest (EL)Original Data

Scale-out Storage & Compute

(HDFS, Blob Storage, etc)

Transform & Load

Data Marts

Data Lake(s)

Dashboards

Apps

Streaming data

Evolving Approaches to Analytics

Page 8: Azure Stream Analytics  : Analyse Data in Motion

ETL Tool(SSIS, etc)

EDW(SQL Svr, Teradata, etc)

Extract

Original Data

Load

Transformed Data

Transform

OLTP

ERP LOB

BI Tools

Devices

Web

Sensors

Social

Ingest (EL)Original Data

Scale-out Storage & Compute

(HDFS, Blob Storage, etc)

Transform & Load

Data Marts

Data Lake(s)

Dashboards

Apps

Streaming data

Evolving Approaches to Analytics

Real Time data analytics

Page 9: Azure Stream Analytics  : Analyse Data in Motion

Agenda• ETL with new sources of

data• Azure Data Factory

• Analytics with new sources of data• Azure Stream Analytics

Page 10: Azure Stream Analytics  : Analyse Data in Motion

Azure Data Factory Overview • New Azure service for data developers & IT

• Compose data processing, storage and movement services to create & manage analytics pipelines

• Initially focused on Azure & hybrid movement to/from on premises SQL Server. Overtime will expand to more storage & processing systems throughout

• Rich, simple end-to-end pipeline monitoring and management

Page 11: Azure Stream Analytics  : Analyse Data in Motion

Operationalizing Information Production With Data Factory

Page 12: Azure Stream Analytics  : Analyse Data in Motion

Example Scenario: Customer Profiling (game usage analytics)

Page 13: Azure Stream Analytics  : Analyse Data in Motion

Customer Profiling – Game Usage Analytics

2277,2013-06-01 02:26:54.3943450,111,164.234.187.32,24.84.225.233,true,8,1,20582277,2013-06-01 03:26:23.2240000,111,164.234.187.32,24.84.225.233,true,8,1,2058-2123-2009-2068-21662277,2013-06-01 04:22:39.4940000,111,164.234.187.32,24.84.225.233,true,8,1,2277,2013-06-01 05:43:54.1240000,111,164.234.187.32,24.84.225.233,true,8,1,2058-225545-2309-2068-21662277,2013-06-01 06:11:23.9274300,111,164.234.187.32,24.84.225.233,true,8,1,223-2123-2009-4229-99366232277,2013-06-01 07:37:01.3962500,111,164.234.187.32,24.84.225.233,true,8,1,2277,2013-06-01 08:12:03.1109790,111,164.234.187.32,24.84.225.233,true,8,1,234322-2123-2234234-12432-344323…

Log Files Snippet (10s of TBs per day in cloud storage)

User Table UserID FirstName LastName State …

2277 Pratik Patel Oregon

664432 Dave Nettleton Washington

8853 Mike Flasko California

New User Activity Per Week By Region

profileid day state duration rank weaponsused interactedwith1148 6/2/2013 Oregon 216 33 1 51004 6/2/2013 Missouri 22 40 6 2292 6/1/2013 Georgia 201 137 1 51059 6/2/2013 Oregon 27 104 5 2675 6/2/2013 California 65 164 3 21348 6/3/2013 Nebraska 21 95 5 2

Page 14: Azure Stream Analytics  : Analyse Data in Motion

Terminologies• Linked Services• Data Sets • Pipeline• Diagram View

• Create a Data factory• Add Data Sources• Define Tables and

Pipelines• Deploy & Start• Monitor and Manage

Steps

Page 15: Azure Stream Analytics  : Analyse Data in Motion

Example: Game Logs, Customer Profiling

On Premises SQL Server Azure Blob Storage

1000’s Log FilesNew User View

Azure Data Factory

Page 16: Azure Stream Analytics  : Analyse Data in Motion

Example: Game Logs, Customer Profiling

On Premises SQL Server Azure Blob Storage

1000’s Log FilesNew User View

Azure Data FactoryVi

ew O

f

Game Usage

View

Of

New Users

New User Activity

Page 17: Azure Stream Analytics  : Analyse Data in Motion

Example: Game Logs, Customer Profiling

View

Of

On Premises SQL Server Azure Blob Storage

1000’s Log FilesNew User View

Copy “NewUsers” to Blob Storage

Cloud New Users

Azure Data FactoryVi

ew O

f

Game Usage

View

Of

New Users

New User Activity

Pipeline

Page 18: Azure Stream Analytics  : Analyse Data in Motion

Example: Game Logs, Customer Profiling

On Premises SQL Server Azure Blob Storage

1000’s Log FilesNew User View

Copy NewUsers to Blob Storage

Cloud New Users

Azure Data FactoryVi

ew O

f

Game Usage

View

Of

Mask & Geo-Code

New Users

Geo DictionaryGeo Coded

Game Usage

HDInsight

New User Activity

Pipeline

Pipeline

Page 19: Azure Stream Analytics  : Analyse Data in Motion

Example: Game Logs, Customer Profiling

On Premises SQL Server Azure Blob Storage

1000’s Log FilesNew User View

Copy NewUsers to Blob Storage

Cloud New Users

Azure Data FactoryVi

ew O

f

Game Usage

View

Of

Runs

OnMask & Geo-

Code

New Users

Geo DictionaryGeo Coded

Game Usage

Join & Aggregate

HDInsight

New User Activity

View

Of

Pipeline

Pipeline

Pipeline

Page 20: Azure Stream Analytics  : Analyse Data in Motion

“GeoCoded Game Usage” Table:

Step 3: Define Tables & Pipelines

Page 21: Azure Stream Analytics  : Analyse Data in Motion

Pipeline Definition:Step 3: Define Tables & Pipelines

Activ

ityAc

tivity

Page 22: Azure Stream Analytics  : Analyse Data in Motion

Powershell// Deploy TableNew-AzureDataFactoryTable -DataFactory“GameTelemetry“-File NewUserActivityPerRegion.json

// Deploy PipelineNew-AzureDataFactoryPipeline -DataFactory “GameTelemetry“-File NewUserTelemetryPipeline.json

// Start PipelineSet-AzureDataFactoryPipelineActivePeriod -Name “NewUserTelemetryPipeline“-DataFactory “GameTelemetry“-StartTime 10/29/2014 12:00:00

Page 23: Azure Stream Analytics  : Analyse Data in Motion

Incremental Data Production

Dataset2

Dataset3

Hourly

12-1

1-2

2-3

Daily

Monday

Tuesday

Wednesday

Daily

Monday

Tuesday

Wednesday

Hive Activity

GameUsage

GeoCodeDictionary

Geo-CodedGameUsage

Page 24: Azure Stream Analytics  : Analyse Data in Motion

Custom Actions• Allows running any .NET code wrapped within an ADF

activity• Can be used to connect to new sources/destination• Can be used to create custom transformation activities• Example: Invoke Azure ML model• SDK for custom activity creation:

Page 25: Azure Stream Analytics  : Analyse Data in Motion

Coordination: • Rich scheduling• Complex dependencies• Incremental rerun

Authoring: • JSON & Powershell/C#

Management:• Lineage• Data production policies (late data, rerun, latency, etc)

Hub: Azure Hub (HDInsight + Blob storage)• Activities: Hive, Pig, C#• Data Connectors: Blobs, Tables, Azure DB, On Prem SQL Server, MDS

[internal]

Data Factory – Available Today

Page 26: Azure Stream Analytics  : Analyse Data in Motion

Analyze your data in motion

Page 27: Azure Stream Analytics  : Analyse Data in Motion

What is Streaming Data?

Data in MotionData at Rest

Page 28: Azure Stream Analytics  : Analyse Data in Motion

Azure Stream Analytics

Real-time stream processing Near infinite cloud scale

Managed real-time analytics

Mission-critical reliability and scale

Rapid development

Point of Service Devices

Self CheckoutStations

Kiosks

Smart Phones

Slates/Tablets

PCs/Laptops

Servers

Digital Signs

DiagnosticEquipmentRemote Medical

MonitorsLogic

Controllers

SpecializedDevicesThin

Clients

Handhelds

Security

POS Terminals

AutomationDevices

VendingMachines

Kinect

ATM

Stream Analytics

Page 29: Azure Stream Analytics  : Analyse Data in Motion

How do customers create a real-time streaming solution?

Page 30: Azure Stream Analytics  : Analyse Data in Motion

Customers using ASA?

Page 31: Azure Stream Analytics  : Analyse Data in Motion

Using Azure Analytic Service

Data Source

Collect Process

Consume

Deliver

Event Inputs- Event Hub- Azure Blob

Transform- Temporal joins- Filter- Aggregates- Projections- Windows- Etc.

Enrich

Correlate

Outputs- SQL Azure- Azure Blobs- Event Hub- Table Storage

BI Dashboards

Predictive Analytics

AzureStorage

Azure Stream Analytics

Reference Data- Azure Blob

Page 32: Azure Stream Analytics  : Analyse Data in Motion

Sample Scenario : Toll Station

TollId EntryTime License Plate State Make Model Type Weight

1 2014-10-25T19:33:30.0000000Z JNB 7001 NY Honda CRV 1 3010

1 2014-10-25T19:33:31.0000000Z YXZ 1001 NY Toyota Camry 2 3020

3 2014-10-25T19:33:32.0000000Z ABC 1004 CT Ford Taurus 2 3800

2 2014-10-25T19:33:33.0000000Z XYZ 1003 CT Toyota Corolla 2 2900

1 2014-10-25T19:33:34.0000000Z BNJ 1007 NY Honda CRV 1 3400

2 2014-10-25T19:33:35.0000000Z CDE 1007 NJ Toyota 4x4 1 3800

… … … … … … … …

EntryStream - Data about vehicles entering toll stations TollId ExitTime LicensePlate

1 2014-10-25T19:33:40.0000000Z JNB 7001

1 2014-10-25T19:33:41.0000000Z YXZ 1001

3 2014-10-25T19:33:42.0000000Z ABC 1004

2 2014-10-25T19:33:43.0000000Z XYZ 1003

… … …

ExitStream - Data about cars leaving toll stations

LicensePlate RegistartionId Expired

SVT 6023 285429838 1

XLZ 3463 362715656 0

QMZ 1273 876133137 1

RIV 8632 992711956 0

… … ….

ReferenceData - Commercial vehicle registration data

Page 33: Azure Stream Analytics  : Analyse Data in Motion

Query Language - OverviewDML Statements• SELECT• FROM• WHERE• GROUP BY• HAVING• CASE• JOINS• UNION

Scaling Functions• WITH• PARTITION BY

Date and Time Functions• DATENAME• DATEPART• DAY• MONTH• YEAR• DATETIMEFROMPARTS• DATEDIFF• DATADD

Windowing Extensions• Tumbling Window• Hopping Window• Sliding Window

Aggregate Functions• SUM• COUNT• AVG• MIN• MAX

String Functions• LEN

CONCAT• SUBSTRING• CHARINDEX• PATINDEX

Page 34: Azure Stream Analytics  : Analyse Data in Motion

Tumbling Windows

SELECT TollId, COUNT(*)FROM EntryStream TIMESTAMP BY EntryTimeGROUP BY TollId, TumblingWindow(second, 10)

Count the total number of vehicles entering each toll booth every interval of 10 seconds.

1 5 4 26 8 6 5

0 5 2010 15 Time (secs)

1 5 4 26

8 6

25

A 10-second Tumbling Window

30

3 6 1

5 3 6 1

Page 35: Azure Stream Analytics  : Analyse Data in Motion

Hopping Windows

SELECT COUNT(*), TollId FROM EntryStream TIMESTAMP BY EntryTimeGROUP BY TollId, HoppingWindow (second, 10,5)

Count the number of vehicles entering each toll booth every interval of 10 seconds; update results every 10 seconds

1 5 4 26 8 7

0 5 2010 15 Time (secs)

25

A 10-second Hopping Window with a 5-second “Hop”

30

4 26

8 6

5 3 6 1

1 5 4 26

8 6 5 3

6 15 3

Page 36: Azure Stream Analytics  : Analyse Data in Motion

Sliding Windows

Give me the count of all the toll booths which have served more than 10 vehicles in the last 10 seconds

1 5

0 5 2010 15 Time (secs)

25

A 10-second Sliding Window8

8

51

9

51 9

1

SELECT TollId, Count(*) FROM EntryStream ESGROUP BY TollId, SlidingWindow (second, 10)HAVING Count(*) > 10

Page 37: Azure Stream Analytics  : Analyse Data in Motion

Intake millions of events per secondProcess data from connected devices/appsIntegrated with highly-scalable publish-subscriber ingestor

Easy processing on continuous streams of data Transform, augment, correlate, temporal operationsDetect patterns and anomalies in streaming data

Correlate streaming with reference data

Real-time analytics

Page 38: Azure Stream Analytics  : Analyse Data in Motion

Input and OutputManagement

TransformationsManagement

Programmatic Access with REST APIs

Jobs Management Start JobStop Job

Create JobDelete Job

List JobsUpdate Job

Create Input / OutputDelete Input / Output

List Input / OutputUpdate Input / Output

Create TransformationDelete Transformation

Get TransformationUpdate Transformation

The full functionality of Azure Stream Analytics is through REST APIs. Enables programmatic accessUseful for automation through scriptingEmbed in other applications/tools

Page 39: Azure Stream Analytics  : Analyse Data in Motion

Demo: Scaling , Monitoring & Logging

Page 40: Azure Stream Analytics  : Analyse Data in Motion

Scaling Concepts – Partitions

Step Result 1

Step Result 2

Step Result 3

PartitionId = 1

PartitionId = 3PartitionId = 2

PartitionId = 1

PartitionId = 2PartitionId = 3

Event Hub

Stream Analytics

SELECT COUNT(*) AS Count, TollBoothId FROM EntryStream Partition By PartitionId GROUP BY TumblingWindow (minute, 3), TollBoothId

Page 41: Azure Stream Analytics  : Analyse Data in Motion

41

• Preview services

• Offers ability to deal with new age problem in processing and analyzing data

• Scale, Speed, Economy

ADF & ASA

Page 42: Azure Stream Analytics  : Analyse Data in Motion

Recommended/related sessions

Inside Azure Storage – Options, abstractions and Best PracticesData, Sabha2, 11.00 AM – 11.55 AM tomorrow

1

Choosing Right platform for BigDataData, Sabha2, 3.00 PM to 3.55 PM tomorrow

2

Practical Machine LearningData, Sabha2 , 4.15 to 5.10 Today

3

Page 43: Azure Stream Analytics  : Analyse Data in Motion

ReferencesRelated references for you to expand your knowledge on the subjectAzure Stream Analytics Documentationhttp://azure.microsoft.com/en-in/documentation/services/stream-analytics/

Stream Analytics Query Language Referencehttps://msdn.microsoft.com/en-us/library/azure/dn834998.aspx

Azure Portalhttp://azure.microsoft.com

Azure Updateshttp://azure.microsoft.com/blog/

Microsoft Virtual Academyaka.ms/mva

Developer Networkmsdn.microsoft.com/

Page 44: Azure Stream Analytics  : Analyse Data in Motion

Azure SupportMust know resources to get online help for Azure.

Azure Support Optionshttp://azure.microsoft.com/en-us/support/options/

Azure Support Planshttp://azure.microsoft.com/en-us/support/plans/

Ask questions, & get answers

Post questions in the Azure

forums

Tag questions with the keyword Azure.

Page 45: Azure Stream Analytics  : Analyse Data in Motion

Azure VidyapeethA platform for learning – Choose your topic, choose your time

• Register to attend Azure Vidyapeeth Live webinars @

www.aka.ms/azure-vidyapeeth

• Collect free $100 Azure gift pass by registering for our Azure Vidyapeeth series at the Expo zone!

• Point your mobile phone here to download the Azure Vidyapeeth Mobile App : www.aka.ms/av-app

Page 46: Azure Stream Analytics  : Analyse Data in Motion

Tell us what you think Help us shape future events by sharing your valuable feedback.

Scan the QR code to evaluate this session.

< QR Code will be given 2 days before the Conference >

Page 47: Azure Stream Analytics  : Analyse Data in Motion

Thank you

Twitter: @deananth @infinitydlimit

Follow us online

Page 48: Azure Stream Analytics  : Analyse Data in Motion

Pricing (Today)

Page 49: Azure Stream Analytics  : Analyse Data in Motion

Query Language You write declarative queries in SQL No code compilation, easy to author and deploy

Unified programming modelBrings together event streams, reference data and machine learning extensions

Temporal Semantics All operators respect, and some use, the temporal properties of events

Built-in operators and functionsThese should (mostly) look familiar if you know relational databases

Filters, projections, joins, windowed (temporal) aggregates, text and date manipulation

Page 50: Azure Stream Analytics  : Analyse Data in Motion

50

Why Event Processing in the Cloud?

Event data is already in the Cloud

Event data isglobally distributed

Reduced TCO Scale Managed service,

not infrastructure

Bring the processing to the data, not the data to the processing!

Streamed Data

is naturallynon-local!

Page 51: Azure Stream Analytics  : Analyse Data in Motion

Application ComponentsComponents of an Azure Stream Analytics Application

OUTPUT[Result of Query]

Azure SQL DB

Azure Event Hubs

Azure Blob Storage

INPUT

Source of Events

Azure Blob Storage

Azure Event Hubs

Reference Data

Query runs continuously against incoming stream of events

Stream Analytics Query

Events

Have a defined schema and are

temporal (sequenced in time)