INFORMATION ARCHITECTURE FOR DWH PROJECTS
RUAIRI PRENDIVILLE SENIOR CONSULTANT , SYBASE (UK) JUNE I4TH, 2012, ISTANBUL
2 – Company Confidential – June 18, 2012
AGENDA
• Introduction to DWH
– Are DWH project complex, challenges, requirements
• Information Architecture (EA) for DWH
–Models, Workflows, Artifacts
• Mapping over Sybase PowerDesigner
–Mapping to IA models and artifacts
– Features of interest for DWH
• Demonstration
–One example of IA architecture on the project
• DWH Referent IA
– Layers and recommendations
3 – Company Confidential – June 18, 2012
COMPLEXITY OF DWH PROJECTS
ARE DWH PROJECTS COMPLEX?
4 – Company Confidential – June 18, 2012
5 – Company Confidential – June 18, 2012
BI/DWH COMPLEXITY
Causes, sources
– Data Sources Different sources, technologies, business functions, legacy,
overlapping, concepts, elements
– Scope and Performances, • Never enough, never on time, content variations(!),
– Participants • Different backgrounds, knowledge, skills, motivation, visions,
– Requirements • Continuous changes and extensions,
– Growth and Development • Volume of data, people, reports and analysis,
–Quality • Clean, right, correct on time, cleansing (division of resp.)
6 – Company Confidential – June 18, 2012
BI/DWH COMPLEXITY
Change, Heterogeneity
– Never enough Users/BI Analyst can not give definite, detailed, complete and
precise specification of all reports/views in advance (!),
Changes are coming on the end, they are inevitable and continual,
– Never on time Every change should be implemented and used in usable time
frame, before user forgets about it,
– Never one and exactly one data source, Different sources results in: – Different DMS technology, different refresh rate, different volume,
different performances (management).....
– Data overlapping – consolidation,
7 – Company Confidential – June 18, 2012
BI/DWH COMPLEXITY
Volume and growth, data quality
– Large Volume and intensive Growth is inevitable, Operational Data Sources, – Keep only data set needed for operational work (year?),
– Keep it in the shape suitable for operational work (Relational),
– Analytical Extension, Keep data needed for sound analysis (many years?),
Keep it in the shape suitable for analysis (MDM),
– Growth is inevitable per time and volume,
– Compromises: Time: keep last (x) months or representative sample,
Nobody is actually happy, neither IT or Business
– Data Quality Clean, consolidated data source does not exists,
Every data source needs “housekeeping” constantly, Data Entry/ETL
8 – Company Confidential – June 18, 2012
BI/DWH COMPLEXITY
Performance
– Never fast enough Use of technology not designed/suited for analytics (RDBMS),
Intensive use of „ad hoc“ request – indexing problem (RDBMS), – Free exploration over arbitrary data set is heavily limited,
Very intensive and heavy administration – never ending story,
–One and only one complete “Version of the truth” Similar or overlapping analysis are presenting different data!! – Which one is correct and right? What about the rest of it?
You are publisher – hold the responsibilities, – Hold the reader's trust,
– Publish on regular basis,
– Use variety of sources and edit them with quality and consistency,
Data consistency must be established and protected,
9 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
WORKFLOWS AND MODELS
10 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Position of EA
11 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Position of IA
Motivation, goals, business principles organizational structure, Business Functions, Services and Processes
IT support for Business Architecture System Services, Applications, Databases, Components, Forms, Reports, Data Flows....
Technology for IS Architecture, network, servers, installed instances, Access points,
From current to planned
12 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Method (ADM)
– To Define Architecture Development Method Define at any point of the project Who is doing What, How and When
Define Phases, Workflows, Artifacts, Models, and Deliverable
– Essential for DWH/BI with Backward Requirement process
– Presented ADM and IA is: Agile – simplification of RUP, TOGAF
Iterative and Incremental – cyclic repetition of workflows,
Data Driven – based on Data Assets
Comprehensive – includes all activities including maintenance and RFC
Model Driven – all artifacts are represented with modeling artifacts
Requirement Driven – placed in the center of methodology
Sustainable – at any point knowledge is collected, formally specified and properly presented
13 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Main ADM Cycle
Requirements&Constraints in the center
Not all are mandatory
Presented main cycle, others possible
Many cycles are expected
All that is needed to obtain sustainable system
• Development
• Deployment
• Maintenance
Active, in IME
14 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
B.Data Analysis
– Objective: Discover, identify, collect, elaborate, specify, define and present Data Assets
Different abstraction levels: from conceptual to implementation
– Viewpoints: Architectural viewpoint, Data Providers and Consumers, data flow process,
engaged systems, applications, components, usage, access rights
Structural viewpoint, structure, attributes, relationships, dependencies, rules applied on conceptual and physical level
– Inputs: IA (others not in the scope),
–Outputs DA, Sources Conceptual and Physical Data
15 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
C.DWH Design and Implementation
– Objective: D&I of integrated and unified data collection, organized in dimension of
time, which is subject oriented used for analysis, planning and evaluation of business performances
Establish common view (unified/integrated/complete) over the enterprise data, stable source of historical information, accommodate data growth
– Activities: Full and detailed schema specification for DWH, Staging and ODS
– Inputs: IA & DA, Sources Conceptual/Physical Data,
–Outputs IA & DA, Conceptual and Physical DWH
16 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
D.ETL Design and Implementation –Objective:
To analyze, elaborate, define, specify, present and implement full and incremental ETL flows between source2staging, staging2DWH, DWH2MDM
To discover, identify, collect, elaborate, specify, define and present all characteristics of ETL flows,
– Activities: Extraction Method, Schema, Condition and Frequency for increments,
– Source 2 Target Mapping,
Transformation processes on the appropriate level of details,
Data Flow Architecture, Trash management
– Inputs: IA&DA, Sources Conceptual/Physical Data and DWH,
–Outputs DA, Data Flow, Sources Conceptual/Physical Data,
17 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
E.BI Design and Implementation –Objective:
Establish Multidimensional space (Business Universe) with Facts, Measures, Dimensions and Hierarchies,
Build visualization including Reports, Dashboards, OLAP views,
Check if requested KPI set is supported and presented,
– Activities: MDM Space (above)
Detailed specification of requested KPI with mapping
Detailed specification of Reports, Dashboards and OLAP Views
Access rights and delivery mechanisms
– Inputs: DWH, IA and DA
–Outputs IA, DA, DWH (MDM) models
18 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Paths – Simplified Analysis, Design and Implementation
Generally many paths are possible
Gap Analysis may discover missing info
Analysis, Design and Implementation of DWH, ETL and BI are tightly interconnected and dependent on each other
19 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Models – Business&Information Architecture
– Specifies Application systems and applications,
Data Assets, Databases, Data Source/Destination, Data Providers/Consumers,
Usage of Data Assets and Applications, cooperation and collaboration of Applications and/or services,
ownership over the Data Assets, Applications and Services, elements of SLA
ETL procedures on high abstraction level.
– Viewpoints Architecture of the system
– Represents a “hat” for the rest of the system
20 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Models - Source Model
– Specifies details to understand structural relationships and meaning (conceptual)
internal structure of data source with implementation details (physical)
– tables, columns, views, keys, procedures, indexes, rights, constraints, triggers
consolidated by bidirectional synchronization and associated transformation
Specifies extraction scheme for every data source,
Source for Source2Target mapping
– Viewpoint One or more diagrams per source to represent subject area
– Used to synchronize changes from source into DWH IA Starting point for change management
21 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Models – DWH model
– Specifies details to understand structural relationships and meaning (conceptual)
internal structure of DWH with implementation details (physical),
Internal MDM structure of Data Marts (physical),
For Staging, Trash, ODS, DWH and MDM
Target for Source2Target mapping
Relational2MDM mapping,
– Viewpoints One or more diagrams to represent subject area
One or more MDM diagrams to represent Data Marts
– Used to synchronize changes from DWH IA to actual RDBMS Ending point for change management
22 – Company Confidential – June 18, 2012
INFORMATION ARCHITECTURE FOR DWH
Models – Data Flow model
– Specifies Connects all important data sources to destination
Data flows from source to destination with all attributes and constraints
– characteristics of flow processes, source and destination tables, kind of the flow (ETL, replication or federation), integration preconditions, used integration service, possible outcomes etc.
Mapping source2target within every step of the flow
– Viewpoints One or more diagrams to represent actual flow task
– Aggregation, sort, filter, projection, split, join, merge, lookup
One or more diagrams to represent transformation control flow
– Used to present integration, consolidation and migration
23 – Company Confidential – June 18, 2012
MAPPING ON SYBASE POWERDESIGNER
DWH RELATED MODELS AND FEATURES
24 – Company Confidential – June 18, 2012
MAPPING ON SYBASE POWERDESIGNER
Data Models
25 – Company Confidential – June 18, 2012
MAPPING ON SYBASE POWERDESIGNER
Architecture and Requirements
26 – Company Confidential – June 18, 2012
MAPPING ON SYBASE POWERDESIGNER
DHW related features – Dependency Matrix
–What Two dimensional hierarchical matrix
Present, review and create/delete links of particular kind between two artifacts
– Any model, any diagram, any two artifacts
– Indirect (two or more links) dependency, drilling
Hierarchy of objects on row/column, Copy to CVS
– Reasoning Full, rich, useful dependency analysis (network)
EAM to understand and present dependency between Data Assets and Data Providers and Consumers,
PDM to create mapping overview
DMM to present actual source to target dependencies
27 – Company Confidential – June 18, 2012
MAPPING ON SYBASE POWERDESIGNER
DHW related features – Mappings
–What Modeling Connection between objects
– Mapping with transformation (O/R, R/R, O/O)
– Generation (Generate Mappings)
Wizard to convert mappings into Transformation Task (ILM)
Mapping Editor
– Reasoning Data Flow specification
Relational to Multidimensional,
– DWH2MDM
Relational to Relational,
– Source2Target
Any descriptive dependency
– Federation concept (not Replication or ETL)
28 – Company Confidential – June 18, 2012
MAPPING ON SYBASE POWERDESIGNER
DHW related features – Impact/Lineage Analysis
–What Impact Analysis – consequences of the change
Lineage Analysis – objects forming the basis for object
Temporary View for Review
IAM for permanent view, snapshot (Drilling, Exploring)
Analysis Rules changeable (Impact/Lineage),
– Reasoning Change Management evaluation, estimation and planning,
– To asses change impact before it happens (costs, time, resources)
Snapshots
– Development points, different version of system,
Meta Data BI
– To explore meta-data set, discover implicit dependencies,
29 – Company Confidential – June 18, 2012
DEMONSTRATION
EXAMPLE
30 – Company Confidential – June 18, 2012
REFERENT INFORMATION ARCHITECTURE
MAIN LAYERS OF INFORMATION ARCHITECTURE
31 – Company Confidential – June 18, 2012
REFERENT INFORMATION ARCHITECTURE
Recommended architecture - example
32 – Company Confidential – June 18, 2012
REFERENT INFORMATION ARCHITECTURE
Data Access Layer - recommendations
– Set of processes, tools, activities and models: Data extraction (E) from operational system to DWH,
Transformation to suitable shape (T),
– Data cleansing, consistency check, integrity,
– Translation from operational to enterprise format,
– Enterprise DWH data structure is inevitable different then operational,
Data loading into DWH (L),
– Recommendations : Understand OS structure, rules and dynamics,
Dynamics of data refresh rate should be realistic,
Changed Data Capture Algorithm: Intrusive/Non Intrusive,
Apply effective ETL/ELT, use staging area,
Document everything - very intensive and complex changes,
– Mappings between Data Sources and DWH Destination,
33 – Company Confidential – June 18, 2012
REFERENT INFORMATION ARCHITECTURE
DWH Layer and Recommendations – DWH, Staging, Trash and ODS
Common view on enterprise data, regardless of how/who will use it,
Unification offers flexibility in how the data is later interpreted
A stable source of historical information,
Efficient accommodation of a data explosion (growth),
Supply data for Analytical layer on required granularity,
Trash and Alerts&Matching to jump over initial cleansing (blocker),
– Recommendations: Use pre-packaged solution and existing experience,
Use relational and multidimensional modeling (document all),
Relational to address performance issues, follow OS paradigm,
Multidimensional to present later Business Universe for Analytics,
Use views to transform and map Relational to Multidimensional and back,
34 – Company Confidential – June 18, 2012
REFERENT INFORMATION ARCHITECTURE
Delivery Layer and Recommendations – Set of processes, tools, activities and models:
Selection of data subset to be delivered,
Reorganization (format) of the data to be delivered,
– Aggregation, Summing, Counting, additional classification,
– Data transformation (Date, Time), slowly changing dimensions
Transform DWH structures to “Business Universe”,
– Facts, Measurements, Dimensions, Hierarchies, Business lang. abstraction
Granularity accordingly to the End User needs,
– Recommendations: Use multidimensional modeling,
Model transformation/mappings between DWH and Data Mart(s),
Extend multidimensional model with required BI meta-data,
Define refresh rate on the basis on user needs (constraints),
Aggregations are difficult for incremental update
35 – Company Confidential – June 18, 2012
REFERENT INFORMATION ARCHITECTURE
MDM Layer and Recommendations – Data marts
DWH derivatives, provide the business community answers to asked questions and strategic analysis,
Tailored for a particular capability or function of enterprise,
Vertically organized and bounded to one business function,
Organized in multidimensional structure,
OLAP – On line Analytical Processing – ROLAP/MOLAP
– Recommendations: Choose proper storage technology (ROLAP/MOLAP),
– Special storage may not be standard and may narrow your choices,
– Common storage may not be performative enough,
Choose Virtual Marts as basement of Analytics,
Use separate V. Mart for separate business concerns,
Adjust to BI meta-data requirements, use automated access,
INFORMATION ARCHITECTURE FOR DWH PROJECTS
QUESTIONS?
RUAIRI PRENDIVILLE SENIOR CONSULTANT , SYBASE (UK) JUNE I4TH, 2012, ISTANBUL
Top Related