Strata San Jose 2017 - Ben Sharma Presentation

23
Creating a modern data architecture March 15th, 2017 Ben Sharma | CEO [email protected]

Transcript of Strata San Jose 2017 - Ben Sharma Presentation

Page 1: Strata San Jose 2017 - Ben Sharma Presentation

Creating a modern data architecture March 15th, 2017

Ben Sharma | CEO

[email protected]

Page 2: Strata San Jose 2017 - Ben Sharma Presentation

•  Award-winning provider of enterprise data lake management solutions:

Data lake management and governance platform

Self-service data platform

•  Data Lake Design and Implementation: POC, Pilot, Production, Operations, Training

•  Data Science Professional Services

Page 3: Strata San Jose 2017 - Ben Sharma Presentation

3 Zaloni Proprietary

Increased Agility

New Insights

Improved Scalability

Data lakes are central to the modern data architecture

Page 4: Strata San Jose 2017 - Ben Sharma Presentation

4 Zaloni Proprietary

•  Store all types of data in its raw format

•  Create Refined, Standardized, Trusted datasets for various use cases

•  Store data for longer periods of time to enable historical analysis

•  Query and Access the data using a variety of methods

•  Manage streaming and batch data in a converged platform

•  Provide shorter time-to-insight with proper data management and governance

The data lake promise

Page 5: Strata San Jose 2017 - Ben Sharma Presentation

5 Zaloni Proprietary

Data architecture modernization Tr

aditi

onal

M

oder

n

Data Lake

Sources ETL EDW

Derived (Transformed)

Discovery Sandbox EDW

Streaming

Unstructured Data

Various Sources

Data Discovery Analytics BI

Data Science Data Discovery

Analytics BI

Page 6: Strata San Jose 2017 - Ben Sharma Presentation

Zaloni Confidential and Proprietary - Provided under NDA

6 Zaloni Proprietary

0% of market

Optimize

Self-Organizing Data Lake

•  Self-improving data lake via machine learning algorithms

•  True democratization of big data and analytics

•  Intelligent data remediation and curation

•  Recommended Data Security, and Governance policies

•  Lights out business operations optimized for business success

2% of market

Automate

Responsive Data Lake

•  Self-Service Ingestion & Provisioning

•  360 View of Customer, Product, etc

•  Enterprise Data Discovery

•  Operationalize analytical models into business fabric

•  Enables immediate data impact on business operations

Manage

10% of market

Managed Data Lake

•  Acquire useful data from across the enterprise

•  Improved visibility and understanding via managed Ingestion of data and metadata

•  Ensure security and privacy of sensitive data

•  Operationalize data at scale

•  Leverage enterprise governance & security policies

•  Scalable production data lake for new and improved business insights

22% of market

Store

Data Swamp

•  Hadoop on premises or in the Cloud

•  Limited visibility and usability of data

•  Limited corporate oversight & governance

•  Sandbox or Dev Environments

•  Ad hoc and incremental growth of big data applications

•  Ad-hoc and exploratory insights for individual use cases

Zaloni Big Data Maturity Model

Stage:

Characteristics:

Descriptor:

Stage Today:

Business Impact:

Ignore

66% of market

•  Emphasis on structured data

•  Limited ability to leverage data at scale

•  Business emphasis on retrospective reporting and analysis

•  Strong governance and security policies

•  Slow to accommodate business changes

Data Warehouse

Value Realized

Page 7: Strata San Jose 2017 - Ben Sharma Presentation

7 Zaloni Proprietary

Data Lake Reference Architecture

•  Data required for LOB specific views - transformed from existing certified data

•  Consumers are anyone with appropriate role-based access

•  Standardized on corporate governance/ quality policies•  Consumers are anyone with appropriate role-based access•  Single version of truth

Transient Landing Zone Raw Zone

Refined Zone

Trusted Zone

Sandbox

Data Lake

•  Temporary store of source data

•  Consumers are IT, Data Stewards

•  Implemented in highly regulation industries

•  Original source data ready for consumption

•  Consumers are ETL developers, data stewards, some data scientists

•  Single source of truth with history

•  Data required for LOB specific views - transformed from existing certified data

•  Consumers are anyone with appropriate role-based access

Sensors (or other time series data)

Relational Data Stores (OLTP/ODS/

DW)

Logs(or other unstructured

data)

Social and shared data

Page 8: Strata San Jose 2017 - Ben Sharma Presentation

8 Zaloni Proprietary

•  Leverage the full power of a scale-out architecture with an actionable, scalable data lake

Data Lake 360 ° : Zaloni’s integrated platform for data lakes

1. Enable the lake

2. Govern the data

•  Improve data visibility, reliability and quality to reduce time-to-insight

•  Safeguard sensitive data and enable regulatory compliance

•  Foster a data-driven business through self-service data discovery and preparation

3. Engage the business

Page 9: Strata San Jose 2017 - Ben Sharma Presentation

9 Zaloni Proprietary

1.  Based on a foundation of metadata management

2.  Lightweight and distributed

3.  Hybrid – top down and bottom up approach

Data Governance

Centrally governed, critical data elements

Regionally governed, departmental data sets

Locally governed, data used in specific applications

Gartner Data and Analytics 2017

Page 10: Strata San Jose 2017 - Ben Sharma Presentation

10 Zaloni Proprietary

1.  Central to a well-managed data lake – provides visibility, reliability and enables data governance

2.  Capture and manage operational, technical and business metadata

3.  Reduced time to insight for analytics

Types of metadata: §  Where it resides, and how was it ingested §  What it means, and how it should be interpreted §  What governance policies apply to it §  What it's worth, and how its value can be expressed §  Who it's accessed and consumed by §  Which business processes downstream consume it

Metadata Management

Page 11: Strata San Jose 2017 - Ben Sharma Presentation

11 Zaloni Proprietary

Metadata Exchange Framework

1.  Metadata sharing is critical for an integrated approach

2.  Federated approach for metadata collection

3.  Two way metadata exchange between the Data Lake and other Enterprise repositories

Metadata Exchange Framework

Data Lake

Enterprise Metadata repository

Two way exchange

Page 12: Strata San Jose 2017 - Ben Sharma Presentation

12 Zaloni Proprietary

1.  Ability to ingest vast amounts of data 2.  Ability to handle a wide variety of formats (streaming, files, custom)

and sources 3.  Build in repeatability

via automation to pick up incoming data and apply pre-defined processing

Managed Ingestion

Page 13: Strata San Jose 2017 - Ben Sharma Presentation

13 Zaloni Proprietary

1.  See how data moves and how it is consumed in the data lake 2.  Safeguard data and reduce risk, always knowing where data has come from,

where it is, and how it is being used

Data Lineage

Page 14: Strata San Jose 2017 - Ben Sharma Presentation

14 Zaloni Proprietary

1.  Rules based data validation

2.  Integration with the managed data pipeline

3.  Stats and metrics for reporting and actions

4.  Automation, Remediation, Notifications

Future:

•  ML based classification

Data Quality

Page 15: Strata San Jose 2017 - Ben Sharma Presentation

15 Zaloni Proprietary

1.  Secure infrastructure for data in motion and data at rest 2.  Role based access control for Metadata and the data

3.  Mask or tokenize data before published in the lake for consumption 4.  Audit, access logs, alerts and notifications

Data Security and Privacy

Page 16: Strata San Jose 2017 - Ben Sharma Presentation

16 Zaloni Proprietary

1.  Hot -> Warm -> Cold on an entity level based on policies/SLAs

2.  Provide data management features to automate scheduling and orchestration of data movement between heterogeneous storage environments

3.  Across on-premise and cloud environments

Data Lifecycle Management

Page 17: Strata San Jose 2017 - Ben Sharma Presentation

17 Zaloni Proprietary

Data Catalog

1.  See what data is available across your enterprise 2.  Contribute valuable business information to improve search and usage

3.  Use a shopping cart experience to create sandbox for ad-hoc and exploratory analytics

Page 18: Strata San Jose 2017 - Ben Sharma Presentation

18 Zaloni Proprietary

Self-service Data Preparation

1.  Blend data in the lake without a costly IT project 2.  Perform interactive data-driven transformations

Page 19: Strata San Jose 2017 - Ben Sharma Presentation

19 Zaloni Proprietary

•  How do you create a cloud agnostic data lake platform?

•  How deploy a cost-effective compute layer? §  Elastic compute layer §  Batch and near real-time

•  How do you optimize storage? §  Support polyglot persistence §  DLM

•  How do you optimize network connectivity between Ground to Cloud?

•  How do you meet enterprise security requirements?

Considerations for data lake in the cloud

CLOUD and HYBRID ENVIRONMENTS

Page 20: Strata San Jose 2017 - Ben Sharma Presentation

20 Zaloni Proprietary

Building your blueprint

1. Questions 2. Inputs 3. Outcomes

Business Drivers AND Business Questions: e.g. Where is fraud occurring? How do I optimize inventory?

Data Use Cases Platform

Subject Areas Source System

Capabilities, Process

Ingest, Organize, Enrich, Explore

Roadmap

Managed Data Lake

Analytics Strategy

= + +

Page 21: Strata San Jose 2017 - Ben Sharma Presentation

New Buyer’s Guide on Data Lake Management and Governance

•  Zaloni and Industry analyst firm, Enterprise Strategy Group, collaborated on a guide to help you:

1.  Define evaluation criteria and compare common options

2.  Set up a successful proof of concept (PoC)

3.  Develop an implementation that is future-proofed

Page 22: Strata San Jose 2017 - Ben Sharma Presentation

Free Starbucks card with demo

or survey response

Complimentary copy of new book

“Understanding Metadata”

Speaking Sessions: Data Monetization Use Case

Dirk Jungnickel, SVP of BA and Big Data, Emirates Integrated Telecommunications

Company (du), Tuesday, 11:00 a.m. – LL20 A

Visit Booth #923 for these giveaways!

Building a Modern Data ArchitectureBen Sharma, CEO and Founder, Zaloni

Wednesday, 11:50 a.m. – LL21 A

Page 23: Strata San Jose 2017 - Ben Sharma Presentation

DATA LAKE MANAGEMENT AND GOVERNANCE PLATFORM

SELF-SERVICE DATA PLATFORM