Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL...

37
Tim Garrod Emerging DataOps Principles and Technologies for Analytics MIDWEST ARCHITECTURE COMMUNITY COLLABORATION

Transcript of Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL...

Page 1: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

Tim Garrod

Emerging DataOps Principles and

Technologies for Analytics

MIDWEST ARCHITECTURE COMMUNITY COLLABORATION

Page 2: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

2

• What is DataOps?

• What is driving DataOps?

• Enabling DataOps…

• DataOps technologies and functional requirements

Agenda

Page 3: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

3

What is DataOps?

Page 4: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

4© 2019 Attunity 4© 2019 Attunity

DevOps: Accelerating time to value

DESIGN CODE TEST DEPLOY

WATERFALL

AGILE

DEVOPS

DESIGN CODE TEST DEPLOY

DESIGN DEPLOY

Page 5: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

5© 2017 Attunity 5© 2017 Attunity

DevOps – a grandfather of DataOps

Built upon the Agile development movement

Collaboration between development and operations

Continuous integration & delivery frameworks

Agile, Collaborative & Continuous

Page 6: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

6© 2019 Attunity 6© 2019 Attunity

WHAT IS DATAOPS?WE ARE IN THE EARLY DAYS

Source: Gartner - Hype Cycle for Data Management, July 2018

<1%ADOPTION

Page 7: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

7© 2019 Attunity 7© 2019 Attunity

LINKEDIN RESULTS

LinkedIn mentions

39,830Unicorns

488,972DevOps

1,082DataOps

Page 8: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

8© 2019 Attunity 8

PEOPLE TECHNOLOGYPROCESS

WHAT IS DATAOPSEmerging discipline to build and manage efficient, effective data pipelines

Applies DevOps principles of agile development and continuous integration

Seeks to improve collaboration between data managers and consumers

Continuous data delivery in the data pipeline

Source: Gartner Innovation Insight for DataOps, December 2018

DATACODE TOOLS INFRASTRUCTURE

Page 9: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

10

What is driving DataOps?

Page 10: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

11© 2019 Attunity 11

DATA ARCHITECTURE EVOLUTION

Bulk data movement

Brittle hand-coded ETL

Monolithic / appliance driven architectures (one size fits all)

Slow time to market / react to change

Legacy Data Warehouse Architecture

Real-time data movement

Automated design and code generation

Use-case driven scalable technologies

Faster Time to Market / Value

Modern Analytics Data Management

Architecture

Platform

Processes

Page 11: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

12© 2019 Attunity 12

ANALYTICS MATURITY CURVE

Descriptive

•What

happened

Diagnostic

•Why did it

happen

Predictive

•What will happen,

when and why

Prescriptive

•What should

happen

Cognitive

•AL/ML/Human-like

decision making

HISTORICAL REAL-TIME

YEARS & MONTHS WEEKS & DAYS

DATA LATENCY

TIME TO VALUE

COLLABORATION

DataOps helps enable analytics maturity

Non-functional Enablers

SILOED SPECIALISTS EMBEDDED TEAMS

*Intels Analytics Maturity Curve

Page 12: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

13© 2019 Attunity

BUILD

CRM

ERP

FINANCE

LEGACY

That’s not exactly what I

wanted

Why is this data old?

We don’t need that anymore

HOW WE USED TO DO THINGS!

We need data!! {code:”etl”, type: “manual”, speed:”batch”}

Page 13: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

14© 2019 Attunity 14© 2019 Attunity

“In every pipeline, data must be identified,

captured, formatted, tagged, validated, profiled,

cleaned, transformed, combined, aggregated,

secured, cataloged, governed, moved, queried,

visualized, analyzed, and acted upon. Phew!”

Wayne Eckerson PRESIDENT, ECKERSON GROUP

WHY DATAOPS?

Increasing analytics requirements create

complexity and data flow bottlenecks

Data consumers drive demands that IT cannot

meet with existing processes and technologies

Projects are failing due to this friction

RISING CHALLENGES

DATA VOLUME,

VARIETY,

VELOCITY

NEW

PLATFORMS

SPEED OF

CHANGE

CODING

COMPLEXITY

Page 14: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

15

Enabling DataOps

Page 15: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

16© 2019 Attunity 16© 2019 Attunity

YOU CAN’T BUY DATAOPS

You should invest in modern technology which

supports key DataOps principles and significantly

accelerates your time to insight

PEOPLE TECHNOLOGYPROCESS

DATACODE TOOLS INFRASTRUCTURE

Page 16: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

17© 2019 Attunity 17

DATA PROCESSES

Deployment

Continuous Deployment

Orchestration

Workflow & Scheduling

Development

Continuous Integration

Continuous Testing

Metrics, Monitoring, Alerting, Reporting

Agile Techniques

Code Repository

Configuration Repository

Container Management Software

DATAOPS ADOPTION FRAMEWORK

Source: Diving into DataOps, Eckerson Group, December 2018

EARLY DAYS

BUSINESS

USERS

Data Consumers

Data Explorers

Data Analysts

Data Scientists

Customers/Suppliers

Data Capture

Batch Jobs, SQL, File Transfer, Change Data Capture, Replication,

Streaming

Data Analytics

Reports, Dashboards, Models, Business Intelligence Tools,

Data Science Tools, Auto ML Platforms,

Embedded BI

Data Integration

ETL/ELT, MDM, Data Unification, Profiling, Validation, Security,

Transformation

Data Preparation

Data Catalogs, Data Lineage,

Data Governance, Data Collaboration

DATA TECHNOLOGIES

Data Storage

Data Warehouses, Lakes

Data Sandboxes

Computing Infrastructure

DATA

PIPELINES

Data Analytics

Reports, Models, Data Analytics / Scientists

(Line of Business)

Data Engineering

Data Sets, Data Engineers

(IT / BI Department)

Data Ingestion

Data Sources, Data Architects, DBAs (IT Department)

Source Data

ERP / CRM Data, Systems Data, Social Data, External

Data, Master Data

Page 17: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

18© 2017 Attunity

DATAOPS team structure

Business

Analyst

Data

Engineer

BI Dev /

Data

Scientist

Business

Analyst

Data

Engineer

Data

ScientistBusiness

Consumer

From siloed to centralized – DataOps teams need to be enabled to collaborate

Page 18: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

19© 2017 Attunity

DATAOPS team structure

Business

Analyst

Data

Engineer

Data

ScientistBusiness

Consumer

• Data Engineer

• Responsible for building Data Lakes, Data Warehouses

• DBA, ETL Engineer

• ETL, Spark programming

• Data Analyst

• Responsible for visualizations, charts, dashboards

• BI Engineer, Report analyst, Data visualization engineer

• Data storytelling

• Data Scientist

• Responsible for algorythms, models

• ML Engineer, AI programmers w/ domain expertise

• Spark, R, Python, Notebooks

• Business Consumer*

• Virtual team member

• Responsible for business feedback loop

Source: Building a DataOps Team - DataKitchen

Page 19: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

20© 2019 Attunity 20© 2019 Attunity

MODERN

INTEGRATIONMODERN

PLATFORMS

Big Data

Cloud

Data Lakes

Streaming

DATAOPS BENEFITS FROM A MODERN DATA INFRASTRUCTURE

MODERN

ANALYTICS

AI/ML

IoT

Predictive

Real-Time

Page 20: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

21© 2019 Attunity 21© 2019 Attunity

MODERN

INTEGRATION

DATAOPS BENEFITS FROM A MODERN DATA INFRASTRUCTURE

Page 21: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

22© 2019 Attunity 22

A MODERN DATA INFRASTRUCTURE

Cloud platforms – AWS, Azure,

Google, Snowflake

Data democratization

Real-time data movement

Automated design and code generation

Use-case driven scalable technologies

Faster Time to Market / Value

Modern Analytics Data Management

Page 22: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

23© 2019 Attunity 23© 2019 Attunity

5 KEY DATAOPS TECHNOLOGY REQUIREMENTS

MODERN

INTEGRATION

Continuous

Universal

Automation

Agility

Trust

1

2

3

4

5

Page 23: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

24

DataOpstechnologies

Page 24: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

25© 2019 Attunity 25© 2019 Attunity

DATAOPS REQUIREMENT #1

CONTINUOUS INTEGRATION

AGENTLESS CDC REAL-TIME STREAMS

STREAM DATA AND METADATA

CHANGES

OPTIMIZED FOR EVERY SOURCE AND TARGET

MINIMAL IMPACT USING

TRANSACTION LOGS WITH NO AGENTS

MODERN

INTEGRATION

Continuous

Universal

Automation

Agility

Trust

1

2

3

4

5

Page 25: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

26© 2019 Attunity 26© 2019 Attunity

DATAOPS REQUIREMENT #2

UNIVERSAL

MODERN

INTEGRATION

Continuous

Universal

Automation

Agility

Trust

1

2

3

4

5

DATABASE EDW HADOOP

CLOUD

MAINFRAME

SAP

FLAT FILES

OTHER LEGACY

Oracle

SQL Server

DB2 iSeries

DB2 z/OS

DB2 LUW

MySQL

PostgeSQL

Sybase ASE

Informix

ODBC

Exadata

Teradata

Netezza

Vertica

Pivotal

Hortonworks

Cloudera

MapR

DB2 for z/OS

IMS/DB

VSAM

ECC on Oracle

ECC on SQL

ECC on DB2

ECC on HANA

S4 HANA

AWS RDS

Amazon Aurora

Amazon Redshift

Salesforce

SQL/MP

Enscribe

RMS

Delimited(e.g., CSV, TSV)

30 SOURCES

DATABASE EDW

STREAMING

CLOUD

HADOOP

Oracle

SQL Server

DB2 LUW

MySQL

PostgreSQL

Sybase ASE

Informix

MemSQL

Microsoft PDW

Exadata

Teradata

Netezza

Vertica

Sybase IQ

Amazon Redshift

Actian Vector

SAP HANA

Amazon RDS

Amazon Redshift

Amazon S3

Amazon Aurora

Google Cloud SQL

Azure SQL DW

Azure SQL DB

Azure MySQL

Azure PostgreSQL

Snowflake

Hortonworks

Cloudera

MapR

Amazon EMR

HDInsight

Kafka

Azure Event Hubs

MapR-ES

AWS Kinesis

FLAT FILESSAP

HANA Delimited(e.g., CSV, TSV)

40 TARGETS

Page 26: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

27© 2019 Attunity

DATAOPS REQUIREMENT #2UNIVERSAL

MAINFRAME

SAP

OTHER…

DATABASES

PaaS DB

DATABASE

REPLICATION

DATA

WAREHOUSE

AUTOMATION

SAAS

APPS

FILES

DATA WAREHOUSE

RDBMS

STREAMINGDATA PIPELINEAUTOMATION

DESIGN & MANAGE

GENERATE DELIVER REFINE

change stream

To cloud, lakes

for analyticuse

DATA LAKE

AUTOMATION

OTHER…

CLOUD & DATA LAKES

OTHER…

DATA WAREHOUSES

AzureSQL DW

Redshift

Page 27: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

28© 2019 Attunity 28© 2019 Attunity

DATAOPS REQUIREMENT #3

AUTOMATION

MODERN

INTEGRATION

Continuous

Universal

Automation

Agility

Trust

1

2

3

4

5

AUTOMATE THE DATA PIPELINE

AUTOMATE MULTI-STAGE PROCESSING WITHOUT CODING

ADAPTS TO SOURCE AND TARGET

CHANGES

HETEROGENEOUS & DISTRIBUTED

WORKLOADS

GENERATE DELIVER REFINE MANAGE

Page 28: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

29© 2019 Attunity 29© 2019 Attunity

DATAOPS REQUIREMENT #3

AUTOMATION

STANDARDIZE MERGE

FORMAT Full Change History

ENRICH

SUBSET

HDS

ODS

Snapshot

CAPTURE

PARTITION

Raw

Deltas

SAP

RDBMS

DATAWAREHOUSE

FILES

MAINFRAME

Consume

ANALYZE

MONITOR

GENERATE DELIVER REFINE

DATA INGESTED VIA CDC INTO CLOUD AND

DATA LAKE

DATA CONTINUOUSLY UPDATED AND MERGED INTO

CHANGE HISTORY

PURPOSE-BUILT SUBSETS PROVISIONED

FOR ANALYTICS

Page 29: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

30© 2019 Attunity 30© 2019 Attunity

DATAOPS REQUIREMENT #4

AGILITY

MODERN

INTEGRATION

Continuous

Universal

Automation

Agility

Trust

1

2

3

4

5

RUNS IN THE CLOUD(S), ON-

PREM, OR BOTH

DATA LAKE AND WAREHOUSE

AUTOMATION

FUTURE-PROOF FOR

‘ARCHITECTURES IN MOTION’

Page 30: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

31© 2019 Attunity 31© 2019 Attunity

DATAOPS REQUIREMENT #3AUTOMATION

DATA LAKE

DataSources

Queries / ML / Analytics

DataWarehouse

DataSources

DATA LAKE

Queries / ML / Analytics

DataWarehouse

DataSources

Queries / ML / Analytics

DataWarehouse

DATA LAKEData

Sources

Queries / ML /Analytics

DataWarehouse

DataWarehouse

Page 31: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

32© 2019 Attunity 32© 2019 Attunity

DATAOPS REQUIREMENT #5

TRUST

MODERN

INTEGRATION

Continuous

Universal

Automation

Agility

Trust

1

2

3

4

5

Metadata creates greater trust and

confidence from data consumers

DATA CATALOG TO ACCESS IT-

MANAGED AND USER-CURATED

DATA SETS

DATA LINEAGETO UNDERSTAND WHERE THE DATA CAME FROM AND

HOW IT WAS TRANSFORMED

DATA VALIDATION TO ENSURE THAT

ALL OF THE SOURCE DATA WAS

REPLICATED

Page 32: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

33© 2017 Attunity 33© 2017 Attunity

Data Catalog &

Automated TestingCDC

Technologies to support DataOps Data Pipelines

MAINFRAME

SAP

FILE

RDBMS

APP / OTHER

ENTERPRISE DATA SOURCE

Real-Time

IngestionAutomated Design & Code

Generation

Data Warehouse & Data Lake

Automation

Trust

Continuous Agile Collaborative

Page 33: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

34

Demonstration

DataOps pipeline imagined

Page 34: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

35© 2019 Attunity 35© 2019 Attunity

DATAOPS: ACCELERATING TIME TO INSIGHT

DESIGN CODE TEST DEPLOY

TRADITIONAL

NEEDS

▪ Fast cycles

▪ Real-time data

▪ Resilience

Data ConsumerData Manager▪ Long cycles

▪ Batch processing

▪ Brittle ETL scripts

Page 35: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

36© 2019 Attunity 36© 2019 Attunity

DATAOPS: ACCELERATING TIME TO INSIGHT

▪ Long cycles

▪ Batch processing

▪ Brittle ETL scripts

Data Consumer

NEEDS

▪ Fast cycles

▪ Real-time data

▪ Resilience

Data Manager

DESIGN DEPLOY

DATAOPS

✓Code Automation

✓Real-time delivery

✓Run-time evolution

✓Collaboration

Page 36: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

37

Questions?

Page 37: Emerging DataOps Principles and Technologies for Analytics...SAP FLAT FILES OTHER LEGACY Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgeSQL Sybase ASE Informix ODBC Exadata

38

Thank you…