Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logical Data Warehouse and...
-
Upload
denodo -
Category
Technology
-
view
236 -
download
0
Transcript of Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logical Data Warehouse and...
© 2016 Autodesk | Enterprise Information Services
Designing an Agile Fast Data Architecture for Big Data Ecosystem
using Logical Data Warehouse and Data Virtualization
Kurt Jackson
Autodesk Enterprise Information Services
© 2016 Autodesk | Enterprise Information Services 3
Some Definitions
Agile
“The division of tasks into short
phases of work and frequent
reassessment and adaptation of
plans.”
Data Architecture
“The models, policies, rules or
standards that govern which data is
collected, and how it is stored,
arranged, integrated.”
Logical Data Warehouse
“A logical abstraction layer which sits
on top of a variety of enterprise data
sources. The logical layer provides
durable data views without needing to
move or transform data from the
sources.”
Data Virtualization
“Data management that allows an
application to retrieve and
manipulate data without knowing
specific details about the data, such as
how it is formatted or where it is
physically located.”
© 2016 Autodesk | Enterprise Information Services 4
Agile
Data Architecture
Logical Data Warehouse
Data Virtualization
Agile Data Architecture Lifecycle
© 2016 Autodesk | Enterprise Information Services 6
Multi-year Transition
Autodesk’s Business Challenge
Subscription
and
Rental
Perpetual
© 2016 Autodesk | Enterprise Information Services 10
Philosophy
Access and refine data
near the source
Published logical data
interfaces
Truly agile data
environment
© 2016 Autodesk | Enterprise Information Services 12
Why Build the Logical Data Warehouse Data virtualization can be used
throughout your data pipeline!
© 2016 Autodesk | Enterprise Information Services 14
One More Definition
Data Governance
“The management of the
availability, usability, integrity,
and security of
the data employed in an
enterprise.”
© 2016 Autodesk | Enterprise Information Services 15
Logical Data Warehouses are an essential part of your Data
Governance Strategy for your Big Data Ecosystem
Availability
Channeling end user access
through a single governance
point simplifies administration
Usability
The LDW provides a single
repository for schema
definitions
Simplifies end-user access for
visualization and interpretation
Integrity
Only published views in the LDW
are publically available
Coupled with ownership,
guarantees the quality of the
data set
Security
The LDW can provide a single
point for authentication,
authorization and audit trail for
end user access
© 2016 Autodesk | Enterprise Information Services 16
The Logical Data Warehouse implements the philosophy
Access and refine data near the source No painful ETL pipelines for data
derivation
Leverage power of Spark for fast access
Published logical data interfaces Single access point for all of external data
sets
Enterprise-class governance across the big data ecosystem
Truly agile data environment Facilitates rapid change/evolution in your
big data ecosystem
Rip and replace becomes almost transparent – replace the system that delivers those views and you’re done
© 2016 Autodesk | Enterprise Information Services 18
Implementation Approach
Identify enterprise data sources
Harder than you think
All new custom streaming, highly-available
ingestion mechanism
Self-service or nearly so
Kafka/Flume
Leverage best-of breed for individual
components
Spark for ETL and fast access
Hcatalog/Oozie for metadata and job
orchestration
Denodo for LDW
Leverage highly-redundant cloud storage for
the data lake
S3
Develop canonical representations for your
data sets
Freakin’ hard!
Virtualize Spark fast access, data
warehouses and marts with a next
generation Logical DW
New implementations leverage the LDW
Legacy migrates opportunistically to Spark
fast access
© 2016 Autodesk | Enterprise Information Services 19
Data Consumers
Architecting the Data Virtualization Layer
Corporate
LDAP
Data Virt
Instance
1
Data Virt
Instance
n
…
Logging Infrastructure
CI/CD
Source
Repository
Data
Data
Code
Audit
Audit
Legacy
Data Sources
© 2016 Autodesk | Enterprise Information Services 20
Build an Information Architecture
Base views to abstract data sources
Layered derived views to reflect successively refined
derivations
Create the notion of publication for curated, externally
visible views
Expose services on top of views to make views more
accessible
Separate namespaces (schemas) by project or
subject area
Build the notion of commonality for views shared
across schemas
Naming conventions for all objects
Data portal for one-stop shopping for data consumers
© 2016 Autodesk | Enterprise Information Services 21
Building an LDW makes your Big
Data Ecosystem Enterprise-Ready
Autodesk is a registered trademark of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk
reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document.
© 2016 Autodesk | Enterprise Information Services. All rights reserved