METADATA BASED DYNAMIC ETLS - DeveloperMarchdevelopermarch.com/developersummit/2015/report/... ·...
Transcript of METADATA BASED DYNAMIC ETLS - DeveloperMarchdevelopermarch.com/developersummit/2015/report/... ·...
AGENDA
Background
Reports Generator –ETL Architecture
Implementing using SSIS
Problem of the Unknowns
How to solve the problem?
Capturing the meta-data
Using the meta-data
Generating ETLs dynamically
Challenges
Mitigating using BIML
Final Architecture
Lessons Learnt
Reports Generator for
Clearing Connectivity Standards
Is a product
To simplify integration with data systems
That accepts Clearing Data and…
Generates Clearing Connectivity Standard Reports
Background
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Clearing Reports
Generator
Std. Report 1
Std. Report n
. . .
Raw Data A
Raw Data B
Raw Data X
Raw Data Y
. . .
Reports Generator – High Level Architecture
Raw Data Sources
Schema A
Schema B
Schema C
Std. Reports
Schema X
Schema Y
Schema Z
Reports Generator
Transform
Extract
Load
Schema Known
during DesignSchema known
during design
Reports Generator ETL – SSIS implementation for 1 Report
Raw Data Source - A simple source
schema from which an ETL package
extracts the data
Transform - Transformation on the
extracted data
Standard Report - A destination schema
to load the data
Building a Product - The problem of the unknowns
ABC corporation
Raw Data Source
Schema A
Standard Report
Schema X
Report Generator
ETLs(Point to Point)
XYZ corporation
Raw Data Source
Schema B
Standard Report
Schema Y
Report Generator
ETLs(Point to Point)
A new prospect…
Source
Schema ?
Destination
Schema ?
Report Generator
ETLs
(????)
Takes time to build
How to solve the problem?
1. Capture the required meta-data
• Data about data
• Source & Destination Schema Info
• Tables, columns, Data type, expressions, etc.
2. Use the meta-data at run-time …
• A Bespoke solution
• A Do-All ETL package
• Dynamic ETLs
• Other options…
1. Capturing the meta-data
Meta-data storeSource/Destination
connection information
Schema Information
Source
Schema
Destination
Schema
Transformation
Information
Expressions
Aggregations
…
2. Use the meta-data at run-time – Dynamic ETLs
Meta-data Store
ETL Generator
Read
Metadata
Validate
Metadata
Generate ETL
dynamically
ETL (SSIS)
Packages
Each SSIS ETL Package …
is a highly complex XML
may vary for every version of
SSIS
not easy to maintain
Generating Dynamic ETLs in SSIS – Challenges/Risks
Mitigating the risks - Simpler language for ETL
<- OR -> ?
About 660 lines of
XML This is the whole
package!
• The Business Intelligence
Markup Language (BIML)
• An XML dialect for ETL
• Simple and easy to maintain
• Can target multiple SSIS
versions
• Powerful i.e. has inline C#
support
• Reusable using templates
More About BIML...
• Created by Varigence
• Varigence is mainly into BI development
• Supports multiple SSIS versions
• Well supported by multiple IDEs/tools
• Mist and Vivid
• More about BIML in http://bimlscript.com/
• More about varigence in https://varigence.com/
Tying it back - Dynamic ETLs using BIML
Metadata Store
ETL Generator
Read
MetadataValidate
Metadata
Load BIML
templates
Generate ETL
dynamically
Transform
template to BIML
ETL Packages
Load common
templates
Clearing Reports
Generator
Std. Report 1
Std. Report n
. . .
Raw Data A
Raw Data B
Raw Data X
Raw Data Y
. . .
Reports Generator – Final Architecture
Metadata Store
ETL Generator
Read
MetadataValidate
Metadata
Load BIML
templates
Generate ETL
dynamically
Transform
template to BIML
ETL Packages
Load common
templates
A star schema on the RHS introduces maximum
complexity. Use a flat schema
Tracing the input from the output is difficult
ETL as a technology is not for row by row
processing
Use the abstraction layer wisely
Improperly used BIML templates properly leads
to maintainability issues
Keep things simple - don’t write business logic in
BIML
Maintain version changes across deployments
Will it work?
© COPYRIGHT 2015 SAPIENT CORPORATION | CONFIDENTIAL
Lessons learnt
Background
Reports Generator –ETL Architecture
Implementing using SSIS
Problem of the Unknowns
How to solve the problem?
Capturing the meta-data
Using the meta-data
Generating ETLs dynamically
Challenges
Mitigating using BIML
Final Architecture
Lessons Learnt
RECAP