Azure Data Lake Customer Deckazurebootcampdk.com/presentations/DataLake-Organize-v2.pdf · Azure...
Transcript of Azure Data Lake Customer Deckazurebootcampdk.com/presentations/DataLake-Organize-v2.pdf · Azure...
Azure Data Lake How to organize
Jan Cordtz, Microsoft Denmark
Cloud Solution Architect
AzureSearch
HybridCloud
Backup
StorSimple
Azure SiteRecovery
Import/Export
Azure AD Health Monitoring
AD PrivilegedIdentity Management
OperationalAnalytics
Domain Services
SQL Database DocumentDB
Redis Cache
StorageTables
SQL DataWarehouse
SQL Server Stretch Database
Visual Studio
ApplicationInsights
VS Team ServicesXamarin
HockeyApp
MobileEngagement
Cognitive Services Bot Framework Cortana
Security & Management
Azure ActiveDirectory
Multi-FactorAuthentication
Automation
Portal
Key Vault
Store/Marketplace
VM Image Gallery& VM Depot
Azure ADB2C
Scheduler
Security Center
WebApps
MobileApps
API Apps
Notification Hubs
Cloud Services
ServiceFabric
Functions
BatchRemoteApp
Container Service
VM Scale Sets
BizTalkServices
Service Bus
Logic Apps
API Management
Content DeliveryNetwork
Media Services
Media Analytics
HDInsight/Databricks
MachineLearning Stream Analytics
Data Factory
EventHubs
Data LakeAnalytics Service
IoT Hub
Data Catalog
Power BI Embedded
Data Lake Store
Data Center
Infrastructure as a Service
Platform as a Service
Trusted
HIPAA /
HITECH Act
FISC JapanCDSA Shared
Assessments
FACT UKPCI DSS
Level 1
MPAA
ENISA
IAF
Japan CS
Mark Gold
Japan My
Number ActSpain
ENS
Canada
Privacy Laws
Privacy
Shield
India
MeitY
Germany IT
Grundschutz
workbook
Spain
DPA
CSA STAR
Self-AssessmentSOC 2 Type 2 SOC 3
CSA STAR
Certification
CSA STAR
Attestation
FERPAGxP
21 CFR Part 11
GLBAMARS-E FFIECHITRUST IG Toolkit UK
Singapore
MTCS
UK
G-Cloud
Australia
IRAP/CCSLNew Zealand
GCIO
China
GB 18030
EU
Model ClausesArgentina
PDPA
China
TRUCS
China
DJCP
ISO 27001 SOC 1 Type 2ISO 27018 ISO 22301ISO 27017
GLO
BA
LIN
DU
ST
RY
REG
ION
AL
More certifications than any cloud provider
AnalyticsData Cloud
Always been there but growing “rapidly”
Been their for a long time (BI) but getting much more advanced –Machine Learning/AI
“New” kid on the block• Unlimited compute/storage• Fast deployment• Pay-as-you go• Many services
Open and hybrid
Business needs
Mode 1
- Datawarehouse
- Reporting
Selfservice
- Dashboard
- Business Intelligence
Mode 2
- IOT
- Machine Learning
- Analytics
- Governance
- Organize
- Common understanding of data
- Trial: Error/Proceed
- Hot/Cold path
- No specific technology
- Flexible economy
- Hybrid
Central platform
”Data Lake” / ”Data Bank”
• d
Built on Open Standards
Built on YARN
Store lets all HDFS compliant analytic applications
connect to it like Hortonworks, Cloudera, and
MapR
Microsoft HDInsight is 100% Apache Hadoop
Microsoft continues to contribute tens of thousands
of code and engineering hours to open sourceHDFS
YARN
U-SQL
Analytics
ServiceHDInsight
HDFS
Store
A databank
SQL
Cube
Archive storage
Data Inges-tion
Operational System A
Operational System B
Operational System X
DW
DataMart
Machine Learning
Data Ingestion
External Data
Cosmos DB
Data Bricks
Dynamic SizeablePAAS Economics
Fixed SizeableIAAS Economics
Whatever Apps
Data Lake
Data storage
Data LakeBlob DB
Rest FilesHortonWorks*
Cloudera*
MapR*
HDInsight
SQL NoSQL
Microsoft SQL (General)
MySQL (LAMP/PHP)
PostgresSQL (GIS)
Graph (Tinkerpop)
Documents (MongoDB)
Column-Value (Cassandra)
Key-Value (Table)
DataBricks
Machine Learning Studio
Data Science Virtual Machine
Azure
Machine Learning servicesSQL DB SQL DW
Cube
Storage – from a functionality point of view
11
File storage Database CubeData Lake
Functionality and cost
ETLData Factory
ETLData Factory
Principal regarding the Organization
• Is very simple to use for an end-user/application (=flat file/csv file)
• Is as cost-effective as sensible/possible.
• Do not compromise security.
• Fits well into a DevOps scenario
• “Automatic” meta-tagging
• Have a well-defined path for the information needed to be able to
support an effective auditing and logging process.
Copy
Organizing the Azure Data Lake
Azure Data Lake
Landing Zone
Landing Zone System Account(s)
– read/write
Work System Account(s)
- read
Work Work System Account(s)
– read/write
Publish A
Users in Groups
Read/Write
Read Only – except Work System Account(s)
Folder per ”area”
Analytics
Users in Groups Read Only – except Work System Account(s)
Read/Write”All data”
Transform
Transform &
Anonymize
Archive
Data Catalog
Data Inges-tion
Publish B
Users in Groups
Read/Write
Read Only – except Work System Account(s)
Folder per ”area”
Publish X
Users in Groups
Read/Write
Read Only – except Work System Account(s)
Folder per ”area”
………
Data Ingestion
”Gatekeeper”
Validation
Standardization SSIS,Event Hub,
Data Factory…….
Database,FTP,
File Storage…….
Firewall,AD control…….
Items like : Date formats (yyyymmdd),
number formats (,. or .,)
”Are you allowed to enter ?”
”Is the content you are coming with in
accordance with what we have agreed”
Push/
Pull
Hot/Cold
Path
Examples
Copy
Azure Data Lake and DevOps
Azure Data Lake
Landing Zone
Work
Publish A
Users in Groups
Read/Write
Read Only – except Work System Account(s)
Folder per ”area”
Analytics
Users in Groups Read Only – except Work System Account(s)
Read/Write”All data”
Transform
Transform
Data Catalog
Data Inges-tion
Publish B
Users in Groups
Read/Write
Read Only – except Work System Account(s)
Folder per ”area”
Publish X
Users in Groups
Read/Write
Read Only – except Work System Account(s)
Folder per ”area”
………
Anonymize
Thank you