Adf dw walkthrough
-
Upload
msdevmtl -
Category
Technology
-
view
98 -
download
2
Transcript of Adf dw walkthrough
… data warehousing has reached the most
significant tipping point since its inception.
The biggest, possibly most elaborate data
management system in IT is changing.
– Gartner, “The State of Data Warehousing in 2012”
Data sources
5
Data sources
Increasing data volumes
1
Real-time data
2
Non-Relational Data
New data sources & types
3
Cloud-born data
4
ETL Tool(SSIS, etc)
EDW(SQL Svr, Teradata, etc)
Extract
Original Data
Load
Transformed Data
Transform
BI Tools
Data Marts
Data Lake(s)
Dashboards
Apps
ETL Tool(SSIS, etc)
EDW(SQL Svr, Teradata, etc)
Extract
Original Data
Load
Transformed Data
Transform
BI Tools
Ingest (EL)
Original Data
Data Marts
Data Lake(s)
Dashboards
Apps
ETL Tool(SSIS, etc)
EDW(SQL Svr, Teradata, etc)
Extract
Original Data
Load
Transformed Data
Transform
BI Tools
Ingest (EL)
Original Data
Scale-out Storage & Compute
(HDFS, Blob Storage, etc)
Transform & Load
Data Marts
Data Lake(s)
Dashboards
Apps
Streaming data
ETL Tool(SSIS, etc)
EDW(SQL Svr, Teradata, etc)
Extract
Original Data
Load
Transformed Data
Transform
BI Tools
Ingest (EL)
Original Data
Scale-out Storage & Compute
(HDFS, Blob Storage, etc)
Transform & Load
Data Marts
Data Lake(s)
Dashboards
Apps
Streaming data
BI Tools
Data Marts
Data Lake(s)
Dashboards
AppsData Hub
(Storage & Compute)
Data Sources(Import From)
Move data among Hubs
Data Hub(Storage & Compute)
Data Sources(Import From)
Ingest
Connect & Collect Transform & Enrich PublishInformation Production:
Ingest
Move to data mart, etc
BI Tools
Data Marts
Data Lake(s)
Dashboards
AppsData Hub
(Storage & Compute)
Data Sources(Import From)
Data Connector:Import from source to Hub
Data Connector: Import/Export among Hubs
Data Hub(Storage & Compute)
Data Sources(Import From)
Data Connector:Import from source to Hub
Data Connector:Export from Hub to data store
Connect & Collect Transform & Enrich PublishInformation Production:
• Coordination & Scheduling • Monitoring & Mgmt• Data Lineage
Raw sales (Custom view on top of DW tables)
Hive processing
Sales by category by day
OrderDate Company CategoryQtyOrdered
Unit Price
Sales Order
6/1/2004Action Bicycle Specialists Accessories 1716 22.0393SO71784
6/1/2004Action Bicycle Specialists Bikes 2288 864.0452SO71784
6/1/2004Action Bicycle Specialists Clothing 2340 26.8155SO71784
6/1/2004Action Bicycle Specialists Components 598 329.8538SO71784
6/1/2004Aerobic Exercise Company Components 338 133.8744SO71915
6/1/2004Action Bicycle Specialists Accessories 910 25.1057SO71938
New-AzureDataFactory-Name “HaloTelemetry“-Location “West-US“
New-AzureDataFactory-Name “DW-Demo2“-Location “West-US“
On Premises SQL Server Azure Blob Storage
AdventureWorksLTDW2014
Azure Data FactoryV
iew
Of
New Sales
Aggregated sales
Vie
w O
f
On Premises SQL Server Azure Blob Storage
New User View
Copy “NewSales” to Blob Storage
Cloud New Sales
Azure Data FactoryV
iew
Of
New Sales
New User Activity
Pipeline
On Premises SQL Server Azure Blob Storage
New User View
Copy New Sales to Blob Storage
Cloud New Sales
Azure Data FactoryV
iew
Of
Cloud New SalesAggregate
New Sales
AggregatedSales
HDInsight
Aggregated Sales
Pipeline
Pipeline OnPrem SSIS package
"availability": { "frequency": "Day", interval": 6 }
Hourly
12-6
6-12
12-6
AggregatesSalesActivity: (e.g. Hive):
Dataset2
Dataset3
Hourly
12-1
1-2
2-3
Daily
Monday
Tuesday
Wednesday
Daily
Monday
Tuesday
Wednesday
Hive Activity
Sales From DW
other source
Daily Sales
• Is my data successfully getting produced?
• Is it produced on time?
• Am I alerted quickly of failures?
• What about troubleshooting information?
• Are there any policy warnings or errors?
• Easily move data to my existing data marts for consumption by my existing BI
tools
• Azure DB
• SQL Server on premises
• Oracle
• Files
• Azure Blob content
Coordination:
• Rich scheduling
• Complex dependencies
• Incremental rerun
Authoring:
• JSON & Powershell/C#
Management:
• Lineage
• Data production policies (late data, rerun, latency, etc)
Hub: Azure Hub (HDInsight + Blob storage)
• Activities: Hive, Pig, C#
• Data Connectors: Blobs, Tables, Azure DB, On Prem SQL Server, Oracle
• Contact me: [email protected]
www.microsoft.com/learning
http://microsoft.com/technet
http://channel9.msdn.com/Events/TechEd
http://developer.microsoft.com