Ai big dataconference_eugene_polonichko_azure data lake
-
Upload
olga-zinkevych -
Category
Engineering
-
view
65 -
download
5
Transcript of Ai big dataconference_eugene_polonichko_azure data lake
Azure Data Lake: What is it? Why is it? Where is it?
EUGENE POLONICHKO
DATA PLATFORM MVP
BI\DWH ARCHITECT
About me
Eugene Polonichko has over 7 years of experience with SQL Server. He mainly focused on BI projects (SSAS, SSIS, PowerBI, Cognos, InformaticaPowerCenter, Pentaho, Tableau). Eugene is a passionate speaker and SQL community volunteer presenting regularly at PASS SQL Saturday events and local user groups around Ukraine and Europe. Eugene is PASS Chapter Leader and he has a status MVP Data Platform
https://www.linkedin.com/in/eugenepolonichko/
https://twitter.com/EvgenPolonichko
Agenda What is Data Lake?
Architecture of Azure Data Lake
Azure Data Lake Store
Overview of Azure Data Lake Store
Compare
For big data processing
Azure Data Lake Analytics
U-SQL
Concepts
U-SQL Script Structure
Extractors
U-SQL Jobs
U-SQL catalog
Monitoring and performance U-SQL jobs
Data Lake Analytics pricing
Data Lake
Data Lake
Architecture of Azure Data Lake
Azure Data Lake Stores
Azure Data Lake Store is a hyper-scale repository for big data analytic workloads. Azure Data Lake enables you to capture data of any size, type, and ingestion speed in one single place for operational and exploratory analytics.
The Azure Data Lake store is an Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS)
Can be accessed from Hadoop (available with HDInsight cluster) using the WebHDFS-compatible REST APIs
Azure Data Lake Stores
Use Cases
Store social media
posts, log files, sensor
data
Store corporate data
such as
relational databases
(as flat files)
Data Lake Storage vs Azure Storage
Optimized storage for big data analytics workloads
General purpose object store for a wide variety of
storage scenarios
Batch, interactive, streaming analytics, log files and etc
Any type of text or binary data, such as application
back end,
account contains folders, which in turn contains data stored as
files
Storage account has containers
Optimized performance for parallel analytics workloads. High
Throughput and IOPS.
Not optimized for analytics workloads
Big Data requirements
Pricing
Transaction prices
Storage prices
DEMO
Azure Data Lake Analytics
Azure Data Lake Analytics is an on-demand analytics job service to simplify big data analytics. You can focus on writing, running, and managing jobs rather than on operating distributed infrastructure.
Dynamic scaling
Develop faster, debug, and optimize smarter using familiar tools
Affordable and cost effective
Works with all your Azure Data
U-SQL: simple and familiar, powerful, and extensible
U-SQL
T-SQL C#
U-SQL
Concepts
Retrieve data from stored locations in rowset format
Transform the rowset(s)
Transform the rowset(s)
U-SQL Script Structure
Script :=
Statement_List.
Statement_List :=
{ [Statement] ';' }.
Statement := Use_Statement
| If_Else_Statement| Declare_Variable_Statement| Reference_Assembly_Statement| Deploy_Resource_Statement| DDL_Statement| Query_Statement| Procedure_Call| Import_Package_Statement| DML_Statement| Output_Statement.
U-SQL Script Structure
U-SQL Built-in Extractors:
Extractors.Text() :
Extractors.Csv()
Extractors.Tsv()
Extractors
U-SQL Jobs
UNIT
V--
V--
V—V---
V--
V--
ADLAUs
U-SQL Jobs
ADLAUs
Azure
Data
Lake
Analytics
Unit
Parallelism N = N ADLAUs1 ADLAU ~=A VM with 2 cores and 6GB of memory
U-SQL Jobs
U-SQL Catalog
Database
Table
Views
Procedures
DEMO
Monitoring
1 Azure Portal
Monitoring
Visual Studio
DEMO
Pricing
Links
http://www.sqlservercentral.com/stairway/142480/
https://azure.microsoft.com/en-us/solutions/data-lake/
Questions?
Thank you