Post on 02-Jun-2018
8/11/2019 Data Lake for Hadoop
1/12
2010 Cisco and/or its affiliates. All rights reserved.
Cisco Data Lake
March 3, 2014
8/11/2019 Data Lake for Hadoop
2/12
2010 Cisco and/or its affiliates. All rights reserved.
Data Lake Defin
Current Hadoop
Why to Build Da
Benefits
Data Lake Desi
8/11/2019 Data Lake for Hadoop
3/12
2010 Cisco and/or its affiliates. All rights reserved.
Data Lake - a place to store practically unlimited amounts of data of any form
type that is relatively inexpensive and massively scalable. Data processing so
Hadoop can transform the data from its raw state to a finished product.
--Revelytix
If you think of a datamart as a store of bottled watercleansed and package
for easy consumptionthe data lake is a large body of water in a more natur
contents of the data lake stream in from a source to fill the lake, and various u
can come to examine, dive in, or take samples.
--Pentaho
The difference between a data lake and a data warehouse is that in a data wa
data is pre-categorized at the point of entry, which can dictate how its going t
--Forbes
8/11/2019 Data Lake for Hadoop
4/12
2013 Cisco and/or its affiliates. All rights reserved.
Databases
Current Hadoop Landscape
Unstructured Data
Docs, Cases, Content
IoE, Machine Data,
Clickstream
ERP
SFDC
Database N
Data Sources Hadoop Platform
IB, Contracts,
Hierarchies
Network Logs
CPAI
IB, Cases,
Hierarchies,
Customer
Network Logs
Collab
CSTG
Customer,
Hierarchies
Cisco.com
logs
Marketing
Bookings,
Hierarchies
etc
Data Science Program
8/11/2019 Data Lake for Hadoop
5/12 2010 Cisco and/or its affiliates. All rights reserved.
Every project team spends resources in bringing its data
Difficult to track data elements availability in the platform
Redundant platform resource utilization for data acquisition & mai
Data quality and reliability issues
Project teams develop their data acquisition flows manually
8/11/2019 Data Lake for Hadoop
6/12 2013 Cisco and/or its affiliates. All rights reserved.
Databases
Data Lake
Unstructured Data
Docs, Cases, Content
IoE, Machine Data,
Clickstream
ERP
SFDC
Database N
Data Sources Hadoop Platform
IB, Contracts, Cases
Hierarchies, Bookings,
Customers, Supply Chain
Etc
Network Logs,Cisco.com logs,
Documents,
etc
Data Lake (EDS)CPAI
Marketing
Data Science
CSTG
8/11/2019 Data Lake for Hadoop
7/12 2013 Cisco and/or its affiliates. All rights reserved.
Data reusebring data once and consumed by multiple projects
Data stored in raw formatcan be used by variety of apps and to
Automated frameworkcan be quickly configured to get data from
Better resource utilizationfrees resources in source systems an
platform
Quick project deliveries
8/11/2019 Data Lake for Hadoop
8/12
2013 Cisco and/or its affiliates. All rights reserved.
Databases
High Level Data Lake Architecture
Unstructured Data
Docs, Cases, Content
IoE, Machine Data,
Clickstream
ERP
SFDC
Database N
Data Sources Hadoop Platform
IB, Contracts, Cases
Hierarchies, Bookings,
Customers, Supply Chain
Etc
Network Logs,Cisco.com logs,
Documents,
etc
Data Lake (EDS)Tidal
Data Lake
Load Process
Hadoop Edge Node
Data lake
Metadata(TD)
8/11/2019 Data Lake for Hadoop
9/12
2013 Cisco and/or its affiliates. All rights reserved.
8/11/2019 Data Lake for Hadoop
10/12
2013 Cisco and/or its affiliates. All rights reserved.
Unstructured
Sources
Data Lake Population and Consumption
Transformed LayerData Lake
(Source Like
Structure) T
L
F1
F2
F3
F5 F6
F4
S
O
R
Any SourceStructured
Sources
CG1
TD
Docs, Cases,
Content
IoE, Machine
data,
Clickstream
ETL Offload
(3NF Model)
What data
Dat
Str
ProMo
What are
Lake?
Any
uns
Do we bu
layer?
Yes
HADOOP
SSOT to be
consumed
Functional A
Data Lake,
other Funct
EDS Gover
Transforme
8/11/2019 Data Lake for Hadoop
11/12
Thank you.
8/11/2019 Data Lake for Hadoop
12/12
2010 Cisco and/or its affiliates. All rights reserved.