XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia
-
Upload
xebia-france -
Category
Technology
-
view
138 -
download
2
Transcript of XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia
![Page 1: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/1.jpg)
@xebiconfr #xebiconfr
Data Lake done
right!Matthieu
Blanc
![Page 2: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/2.jpg)
@xebiconfr #xebiconfr 2
Data Lake?
![Page 3: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/3.jpg)
@xebiconfr #xebiconfr
WHY?
![Page 4: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/4.jpg)
@xebiconfr #xebiconfr
Centralisation
4
![Page 5: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/5.jpg)
@xebiconfr #xebiconfr
Self Service
5
![Page 6: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/6.jpg)
@xebiconfr #xebiconfr 6
Data lakes will only succeed if they become shared resources.
![Page 7: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/7.jpg)
@xebiconfr #xebiconfr
Challenge Ahead
7
![Page 8: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/8.jpg)
@xebiconfr #xebiconfr 8
Complex Ecosystem
![Page 9: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/9.jpg)
@xebiconfr #xebiconfr 9
Skill Gap
![Page 10: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/10.jpg)
@xebiconfr #xebiconfr 10
A Data Lake need to be managed
![Page 11: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/11.jpg)
@xebiconfr #xebiconfr
HISTORY
![Page 12: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/12.jpg)
@xebiconfr #xebiconfr
Siloed Data
12
![Page 13: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/13.jpg)
@xebiconfr #xebiconfr
Distributed File System
13
![Page 14: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/14.jpg)
@xebiconfr #xebiconfr
Technologies
14
Hadoop
AWS S3
Google Cloud Storage
![Page 15: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/15.jpg)
@xebiconfr #xebiconfr 15
![Page 16: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/16.jpg)
@xebiconfr #xebiconfr
Data Warehouse
16
![Page 17: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/17.jpg)
@xebiconfr #xebiconfr
Data Lake
17
![Page 18: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/18.jpg)
@xebiconfr #xebiconfr
Data Mart
18
![Page 19: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/19.jpg)
@xebiconfr #xebiconfr
Data Lake
19
![Page 20: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/20.jpg)
@xebiconfr #xebiconfr
Data Swamp?
20
![Page 21: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/21.jpg)
@xebiconfr #xebiconfr
HOW?
![Page 22: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/22.jpg)
@xebiconfr #xebiconfr
Data Landfill
22
Data Scientists in front of raw data in a “Data Lake”
![Page 23: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/23.jpg)
@xebiconfr #xebiconfr
Data Catalog
23
![Page 24: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/24.jpg)
@xebiconfr #xebiconfr
Metadata Repository
24
Datasets SearchV1 V2
API
Web UI
...
Metadata
Catalog
![Page 25: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/25.jpg)
@xebiconfr #xebiconfr 25
SQL
Clean, trusted, prepared Data
Raw data
![Page 26: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/26.jpg)
@xebiconfr #xebiconfr 26
![Page 27: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/27.jpg)
@xebiconfr #xebiconfr
Automate Data Provisioning
27
Raw data Master Data
CSV, JSON, XML, Logs ... Parquet, Avro, ORC
![Page 28: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/28.jpg)
@xebiconfr #xebiconfr
Data Organization
28
![Page 29: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/29.jpg)
@xebiconfr #xebiconfr
Data Zones
29
LandingZone
Master DataZone
Work Area
ConsumptionZoneData
Sources
Data Sources
Data Sinks
Change Data Capture Zone
![Page 30: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/30.jpg)
@xebiconfr #xebiconfr
Quality Control
30
![Page 31: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/31.jpg)
@xebiconfr #xebiconfr
Data Governance
31
Raw data Master Data
CSV, JSON, XML, Logs ... Parquet, Avro, ORC
Validation criteria
Web UI
Operational/statistical metadata
![Page 32: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/32.jpg)
@xebiconfr #xebiconfr
Security?
32
![Page 33: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/33.jpg)
@xebiconfr #xebiconfr
Data Zones
33
LandingZone
Clean DataZone
Work Area
ConsumptionZoneData
Sources
Data Sources
Data Sinks
![Page 34: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/34.jpg)
@xebiconfr #xebiconfr
LandingZone
Enforce security rule during data transformation
34
Clean DataZone
Sensitive Data
Work Area
ConsumptionZoneData
Sources
Data Sources
Data Sinks
![Page 35: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/35.jpg)
@xebiconfr #xebiconfr
Enforce security rule during data transformation
35
Raw data Master Data
CSV, JSON, XML, Logs ... Parquet, Avro, ORC
Data privacy metadata
Web UI
sensitive data encrypted
some data marked as sensitive
![Page 36: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/36.jpg)
@xebiconfr #xebiconfr
Avoid Data Swamp
Catalog your data
Automate Data Provisioning
Create Governance Zones
Provide Data Discovery Tools
![Page 37: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/37.jpg)
@xebiconfr #xebiconfr 37
Data Democratization
![Page 38: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/38.jpg)
@xebiconfr #xebiconfr 38
Most Data Lakes initiatives will fail
![Page 39: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/39.jpg)
@xebiconfr #xebiconfr
XData
39
![Page 40: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/40.jpg)
@xebiconfr #xebiconfr
Thankyou!Matthieu Blanc
![Page 41: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/41.jpg)
@xebiconfr #xebiconfr 41
Mainframe RDBMS NoSQL DBs Logs DWH Queues
HR Financial CRM Web BI Social Media
![Page 42: XebiCon'16 : Data Lake Done Right ! Par Matthieu Blanc, Data Architect chez Xebia](https://reader031.fdocuments.in/reader031/viewer/2022030305/587287931a28abc7068b7709/html5/thumbnails/42.jpg)