ETL for the masses with Power Query and M
-
Upload
regis-baccaro -
Category
Technology
-
view
1.490 -
download
3
description
Transcript of ETL for the masses with Power Query and M
![Page 1: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/1.jpg)
ETL for the MassesRégis Baccaro – IBM
@regbac
![Page 2: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/2.jpg)
Our Sponsors
![Page 3: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/3.jpg)
Introduction
Régis Baccaro @regbachttp://Theblobfarm.wordpress.com
http://Thelovefarm.wordpress.com
• Founder and lead organizer of SQL Saturday Denmark
• PASS Regional Mentor
• Works for IBM
• Passionate about the community
• .Net developer, BI dude, SharePoint fellow and accidental DBA
![Page 4: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/4.jpg)
Agenda
• Power Query and the M language
• E and T and L with Power Query
• Data refresh techniques with PQ
• Next step
![Page 5: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/5.jpg)
Introduction
• Power Query• Get data experience
• Filter and combine
• Embedded M for repeatable mashup
• Power Query Formula Language (aka M)• Mostly pure
• Higher-order
• Dynamically typed
• Partially lazy
• Functional programming language
![Page 6: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/6.jpg)
Elements of language
• Expressions – central construct• Evaluated to a single vlaue
• Values• Primitives
• List – ordered seq.
• Record – set of fields
• Table
• Function
![Page 7: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/7.jpg)
Evaluation
• Excel-like (surprise !)
• Nested records• In Records
• In Lists
• Lazy evaluation• Lists and Records (and let)
• Eager evaluation• Everything else
![Page 8: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/8.jpg)
Functions and Standard Library
• Mapping from a set of values to a single value• (named parameters) => function body
• Common set of definitions
![Page 9: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/9.jpg)
Operators
• Meaning varies depending on kind of value
• & = text or list concatenation and records merge
![Page 10: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/10.jpg)
Metadata
• Information about a value that is associated with a value
• A record
• Exists for every value
• Unobtrusive way to add information
• Accessed with Value.Metadata
![Page 11: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/11.jpg)
Let .....in expression
• So far only literal values
• Let allows a set of value to be:• Computed
• Named
• Used in subsequent expressions that follows the in
let
Source = Web.Page(Web.Contents("http://www.cvr.dk/Site/Forms/CompanySearch/CompanySearch.aspx?......),
RowCount = Table.RowCount(Source)
in
RowCount
![Page 12: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/12.jpg)
IF expression
• Select between 2 expression based on logical condition
![Page 13: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/13.jpg)
Error expression
• When an expression evaluation cannot yield a value
• Raised with error
• Handled with try
• Produces an Error record
• try...otherwise Used with default values
![Page 14: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/14.jpg)
Keywords and Operators
• and as each else error false if in is let meta not otherwise or section shared then true try type #binary #date #datetime #datetimezone #duration #infinity #nan #sections #shared #table #time
• , ; = < <= > >= <> + - * / & ( ) [ ] { } @ ! ? => .. ...
![Page 15: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/15.jpg)
The ”E” - Why is Power Query great for Extracting data
• Multiple data sources
Hey wait ! Where is PDW ?
![Page 16: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/16.jpg)
Query folding - A step toward declarative ETL approach
• Declarative vs Imperative
• Query folding similar to predicate pushdown
• Does Power Query have a Query Optimizer ?
• Demo
Query folding - the unofficial list:• SQL Databases• OData and OData based sources, such
as the Windows Azure Marketplace and SharePoint Lists
• Active Directory• HDFS.Files, Folder.Files, and
Folder.Contents (for basic operations on paths)
• Column removal• Renaming• Joins• Type conversions
![Page 17: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/17.jpg)
Real life scenario – ETL for the masses
• Seen a lot of demos
• Build a lot of demos
• They are always so clean !
![Page 18: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/18.jpg)
Real life scenario
![Page 19: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/19.jpg)
Transform
• M is how the magic happens!
• Data manipulation• Records
• Lists
• Tables
• Merging
• Function calls
![Page 20: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/20.jpg)
What about our scenario?
• Where should I get my data from?• Pure Excel
• Excel and MDS/DQS/SSIS/SQL
• Web, SQL, XML, ?
• Let me show you ! Input
• (cvr web)
![Page 21: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/21.jpg)
Let’s go to homegrown data?
• Bad web service
• Bad HTML structure
• Let’s go with local data that we can control• SQL Server
• Excel
• Let’s Query!
Isolated DB
Local storage
![Page 22: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/22.jpg)
Clean up before you merge!
• DQSKnowledge base with CVR
+ Cleansing project with LinkedIn input
________________________________________
= Demo2.1_AndreasStrandbyClean
• Hit ratio increased...
+
=
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
0
50
100
150
200
250
Cleanjoin
Nestedjoin
Merge
Hit Total
![Page 23: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/23.jpg)
Smarter Power Query
• Expression.Evaluate()
• Examples• Load query text from file
• Load function from file
• Passing parameters (as constants)
• Demo
![Page 24: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/24.jpg)
Refreshing Power Query data
• Different solutions
• All with flaws !
![Page 25: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/25.jpg)
Refreshing Power Query data – with VB6 !
• Back from 2006
• From Kim GreenLee
Plus Minus
Can be scheduled VB6 – are you kidding ?
More robust than the non-technical solution
![Page 26: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/26.jpg)
Refreshing Power Query data – with PowerShell
Plus Minus
Robust Hard to troubleshoot
Can not run in a task in windows task scheduler unless the user has checked that the user has to be logged on to run
![Page 27: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/27.jpg)
Refreshing Power Query data – The non-technical way
• Let me show you !
Plus Minus
Very easy Not very corporate !
The spreadsheet needs to be open
Excel file not saved
Locked out when it refreshes
![Page 28: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/28.jpg)
Refreshing Power Query data – The non-technical way part 2
• Let me show you !
Plus Minus
Very easy Not very corporate !
Uses technique from previous The spreadsheet needs to be open
![Page 29: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/29.jpg)
Refreshing Power Query data – with SSIS
Plus Minus
Robust Requires a SQL Server (wait, it’s a plus!)
Needs a SSIS / C# developer
![Page 30: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/30.jpg)
Refreshing Power Query data – with SSIS
• Using DQS for cleansing input
• Let me show you !
![Page 31: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/31.jpg)
How is Power query going to be used?
• Data store accumulating interesting data points
• Hook into read only data for reporting purposes or data marts
• One file to accumulate (Produce)
• Multiple files or programs to report (Consume)
• I don’t believe in “Data Steward”
• I believe someone will be in charge of procuring and monitoring data stores of disparate data (such as IT or DBA’s).
![Page 32: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/32.jpg)
Conclusion
• A step toward declarative ETL approach
• Still much work to do !
We have• A declarative data integration language• Only surfaced in Power Query• Can push data to an Excel spreadsheetImagine.....• Connection to heterogenous data sources
![Page 33: ETL for the masses with Power Query and M](https://reader033.fdocuments.in/reader033/viewer/2022052410/5561eca0d8b42aa5068b545c/html5/thumbnails/33.jpg)
THANK YOU!@REGBACHTTP://[email protected]