IBM Information management integration and STG Smart Analytics Igor G. Gonchar IBM STG High End Systems Solutions Architect for RCIS
© 2012 IBM Corporation 2
Clients Need to Establish a Repeatable Delivery System for Information:
The data supply chain
High-Level Architecture Components:
Analyze
Integrate
Transactional & Collaborative Applications
Manage Business Analytics Applications
External Information Sources
Complex Queries
Streaming Data
Advanced Analytics
Master Data
Structured & Unstructured
Content
Data Data Warehouses
Governance Data Quality Management
Security & Privacy Lifecycle
management
3rd Parties
Social Data
Archiving & Retention
Process & Collaboration
Data Models
Data Mining
Predictive Modelling
Consume
Content Repository
Real-time analytics
© 2012 IBM Corporation 3
Delivering trusted information across your entire
information supply chain
Analyze Integrate
Transactional &
Collaborative
Applications
Manage
Business Analytics
Applications
External
Information
Sources
Cubes
Streams
Big Data
Master Data
Content
Data
Streaming
Information Govern Quality
Security & Privacy Lifecycle
Data Warehouses
Standards
Page 3
© 2012 IBM Corporation 4
IBM Solutions for Information Governance
GOVERN Quality Security & Privacy Lifecycle
InfoSphere
Information
Server
InfoSphere
Optim &
FileNet
InfoSphere
Guardium
MANAGE INTEGRATE ANALYZE
DB2, Informix
FileNet solidDB InfoSphere
MDM
Netezza & InfoSphere Warehouse
Cognos, InfoSphere Warehouse
InfoSphere Streams
InfoSphere BigInsights InfoSphere
Information Server
InfoSphere
Foundation Tools
& Industry
Models
Standards
Page 4
© 2012 IBM Corporation 5
Set of extensive Industry Data Models Business Terms
● Business Terms define industry concepts in plain business language, with no modeling or abstraction involved. Business Terms have a set of properties and are organized by Business Categories. Clearly defined business terms help standardization within a company. The mapping to the data models makes it possible to create a common enterprise-wide picture of the data requirements and to transform these requirements into IT data structures.
Analytical Requirements
● Analytical Requirements are high level grouping of business information needed and used by the enterprise to express business Measures along axes of analysis, which are named Dimensions. It allows business users to fully articulate the requirements for a piece of analysis using their business terminology. The Analytical Requirements are the basis for building the enterprise data models used to develop the IT assets that deliver the analytical requirements to the business users.
Supportive Glossary
● A Supportive Glossary is a grouping of terms incorporating any terminology originating from an internal or external source. It is used to support data structures such as regulatory reports (e.g. Basel II, IAS/IFRS, Solvency II), industry standards (ACORD, HIPAA, SEPA, SEC US GAAP, FpML, MISMO), business architecture standards (e.g. EPP), vendor interfaces (e.g. SAS, Fair Issac, Sendero, Oracle Financials), or legacy source systems (e.g. Loans systems, Underwriting systems).
Atomic
Warehouse
Model
Analytical
Requirements
Dimensional
Warehouse
Model
IBM Data Models
Supportive
Glossary
Business
Terms
Atomic Warehouse Model
● The Atomic Warehouse Model is a design level data model that represents the enterprise-wide repository of atomic data used for informational processing. This includes the historization of the value changes of business information that may vary over time, and of which the business wants to keep track for analytical purposes.
Dimensional Warehouse Model
● The Dimensional Warehouse Model is the enterprise-wide repository for analytical data. It contains star schema style dimensional data structures organized around fact entities that support the Analytical Requirements. The Dimensional Warehouse Model can be accessed directly by analytical tools or queries, or its content may be easily distributed to specific downstream data marts, if any.
© 2012 IBM Corporation 6
AnalyzeIntegrate
Transactional &
Collaborative
Applications
Manage
Business Analytics
Applications
External
Information
Sources
Cubes
Streams
Big Data
Master Data
Content
Data
Streaming
Information GovernQuality
Security & PrivacyLifecycle
Data Warehouses
Standards
AnalyzeIntegrate
Transactional &
Collaborative
Applications
Manage
Business Analytics
Applications
External
Information
Sources
Cubes
Streams
Big Data
Master Data
Content
Data
Streaming
Information GovernQuality
Security & PrivacyLifecycle
Data Warehouses
Standards
Atomic
Warehouse
Model
Analytical
Requirements
Dimensional
Warehouse
Model
IBM Data Models
Supportive
Glossary
Business
Terms
Set of extensive Industry Data Models
● The Models represent the integrated Design time basis for Data Warehouse Deployment
● The model content is provided to the IBM InfoSphere Runtime infrastructure in an ordered manner
● The Analytical Requirements are used to drive the specification of the Business Analytics requirements
● The Supporting Glossary records the upstream external sources as well as any other external Regulatory obligations
● The Central Data Warehouse is generated from either the Atomic Warehouse Model or the Dimensional Warehouse Model
● The Data Mart structure development is driven from the Dimensional Warehouse Model
© 2012 IBM Corporation 7
An integrated tooling platform addressing Business Term management and
associated Data Model development
AnalyzeIntegrate
Transactional &
Collaborative
Applications
Manage
Business Analytics
Applications
External
Information
Sources
Cubes
Streams
Big Data
Master Data
Content
Data
Streaming
Information GovernQuality
Security & PrivacyLifecycle
Data Warehouses
Standards
AnalyzeIntegrate
Transactional &
Collaborative
Applications
Manage
Business Analytics
Applications
External
Information
Sources
Cubes
Streams
Big Data
Master Data
Content
Data
Streaming
Information GovernQuality
Security & PrivacyLifecycle
Data Warehouses
Standards
Atomic
Warehouse
Model
Analytical
Requirements
Dimensional
Warehouse
Model
Supportive
Glossary
Business
Terms
● ER Data Models are managed natively IDA. Enables the Data Model Development to leverage the normal benefits of IDA (team support, integration to Cognos, DB2 , Netezza, RSA,etc)
● All of the Business-related content ( Business Terms, Analytical Requirements and Supportive Glossaries) are managed by the Business in InfoSphere Business Glossary
● Enables the deployment of the terms to a larger Business Audience and leverages the management, stewardship etc of Business Glossary
● Integration between IBG and IDA done via standard IDA plugin provided by IBG
● Enables Modellers to view, map to and develop using a synchronized copy of the Components in BG
InfoSphere Data Architect (IDA)
IBG Plugin for IDA
Analytical
Requirements
Supportive
Glossary
Business
Terms
InfoSphere Business Glossary (IBG)
Read-only view of IBG Terms for IDA Users
© 2012 IBM Corporation 8
Dimension traceability to Atomic Warehouse Model
Traceability from Dimensions in Dimensional Model to Atomic Warehouse Model
• IDA Dependencies used
• Traceability from the entities and attributes
© 2012 IBM Corporation 9
Dimensional Model in Cognos
The Dimensional Models export directly to the Cognos Framework via the IDA/Cognos bridge
All Facts, Measures and Dimensions defined in IDA are maintained during the export
The Star Schemas defined in the Dimensional Model form the basis of Packages in Cognos
Framework Manager
These in turn can be exported to the Cognos server for report generation
IDA to Cognos
© 2012 IBM Corporation 10 10
Metadata Server
Foundation Tools & Beyond
Assess, Monitor,
Manage Data Quality
Rules
Information Analyzer Business Glossary
Links
DataStage & QualityStage
Generate Logic to Load
Warehouse Map Sources to
Target Model
FastTrack
Simplification & Content: reduces project time, risk and cost!
Cognos
Deliver Reports
Define Business
Requirement & Glossary
Discovery
Find Data
Relationships &
Transformation Rules
Create Business
Objects
2
3
4 6
7
5
Populates
Establish Platform
Import & Enhance
Industry Model
Data Architect
1
© 2012 IBM Corporation 11
12/2/2013
IBM Smart Analytics System P&X Standard Configuration
InfoSphere Warehouse
Cubing Services
Cognos 10.2 BI
ELT
Operational Source Systems Structured/ Unstructured Data
Data Warehouse
System P or X
Implementation Services and AVP
DB2
DB2 Utilities Suite Image Copy, LOAD, UNLOAD, REORG, etc
SPSS Modeler
© 2012 IBM Corporation 12
Incremental Update
ELT or ETL
Table or Partition Update
Change Data Capture
Incremental Update
OLTP Application
Data Warehouse
Analytics Accelerator
Synchronizing data to lower data latency from days to minutes/seconds
© 2012 IBM Corporation 13
IBM Warehousing & Analytics
– Offering Positioning…”It Depends”
CUSTOMER Preferences
1. High performance analytic queries and real-time transactions are both required
2. Power Systems, Linux/x series, or System z platform
3. Consistent use of DB2 across IT environment
IBM Netezza
(Appliances)
IBM Smart Analytic System (Optimized systems)
IBM InfoSphere Warehouse (Custom configurations)
CUSTOMER Preferences
1. High performance analytic queries without DBA tuning
2. No storage administration 3. Fastest possible deployment
© 2012 IBM Corporation 14
Platforms: System z, Power Systems, System x, Systems Storage
System z Power Systems
System x
Strategic Objectives
Drive growth through new workloads: consolidation, analytics, and hybrid computing (zBX)
Expand client base through competitive takeouts and focus on new clients in Growth Markets
Strategic Objectives
Aggressively continue to gain share from HP and Oracle/Sun with Power migration programs
Establish Power as premier platform to execute Cloud, Analytics, and Smarter Planet hypergrowth
Strategic Objectives
Drive IBM Stack growth – integrated with Cloud, Analytics, Smarter Planet
Drive improved x86 value capture model
Systems Storage
Strategic Objectives
Drive differentiation through storage efficiency and data protection
Execute brand transformation plans
© 2012 IBM Corporation 15
IBM InfoSphere Warehouse
IBM Smart Analytics System
IBM Netezza
Flexibility Simplicity The right mix of simplicity and flexibility
Simplicity, Flexibility, Choice IBM Data Warehouse & Analytics Solutions
Information Management Portfolio
(Information Server, MDM, Streams, etc)
Warehouse Accelerators
Flexible Integrated System True Appliance Custom Solution
© 2012 IBM Corporation 16
Foundation Start with a single Foundation Module, the starting common foundation
Scalability and Failover For additional data handling capacity, number of users or failover functionality, add additional nodes
BI and Analytics InfoSphere Warehouse and Cognos BI modules
1 Module 1 to x Modules 0 to y Modules 0 or x/5 Modules
Choose the way that your data warehouse solution develops. Simply start with any foundation and just add modules as you require.
Core Warehouse Modules Application Modules
Foundation Module
Data Module
User
Module
Failover Module
+ + Warehouse
Applications Module
Business Intelligence
Module
IBM Smart Analytics System Transparent modular architecture
© 2012 IBM Corporation 17 © 2012 IBM Corporation
Data Warehouse
Data Mart ODS
PureData
DS8870
Analytics Accelerator
DB2
ETL/ELT
Operational Source Systems
Or AIX
Or z/OS
Organized for simplicity and functionality
17
© 2012 IBM Corporation 18 © 2012 IBM Corporation
ETL/ELT
Data Mart3
DB2
Centralized Control of Decision Information Fast, Consistent, Easily Managed Information
Data Mart
2
WEB Applications
Analytic Applications
Business Performance Applications
Data
Warehouse
Centrally managed
Consistent information
Easy to access
Easy to update
Fast business recovery
Simplified administration
Maximize business value from resources
Analytics Accelerator
Data Studio
© 2012 IBM Corporation 19
10-100x faster than traditional custom systems4
20x greater concurrency and throughput for tactical queries than previous Netezza technology5
Pattern based database deployment in minutes, not hours1
Handles more than 100 databases on 1 system2
IBM PureData System
Continuous ingest of operation data
Handles 1000+ concurrent operational queries3
Up to 10x storage savings with adaptive compression6
System for Transactions
System for Analytics
System for Operational Analytics
powered by Netezza technology
1. Based on IBM internal tests and system design for normal operation under expected typical workload. Individual results may vary. 2. Based on one large configuration 3. Based on IBM internal tests of prior generation system, and on system design for normal operations under expected typical workload. Individual results may vary. 4. Based on IBM customers' reported results. "Traditional custom systems" refers to systems that are not professionally pre-built, pre-tested and optimized. Individual results may vary. 5. Based on IBM internal performance benchmarking 6. Based on client testing is the DB2 10 Early Access Program
© 2012 IBM Corporation 20
IBM Smart Analytics Advantages
+ + + + +
SI by You
Models Cleansing ETL MDM Data Warehouse BI
+ + + + + + +
SI by IBM
Models
BDW
Cleansing
InfoSphere
ETL
InfoSphere
MDM
IBM MDM Server
Data Warehouse
Smart Analytics
Server
BI
Cognos
Unified Infrastructure Benefits: Decreased risk by 53% Improved business alignment 83% Improve time to value by 75% Reduce project staffing by 90% I
BM RESEARCH/ANALYST REPORTS
© 2012 IBM Corporation 21
The IBM Big Data Platform
InfoSphere BigInsights
Hadoop-based low latency analytics for variety and
volume
Data-At-Rest
Netezza High Capacity Appliance
Queryable Archive for Structured Data
Netezza 1000
BI+Ad Hoc Analytics on Structured Data
Smart Analytics System
Operational Analytics on Structured Data
Informix Timeseries
Time-structured analytics
InfoSphere Warehouse
Large volume structured data analytics
InfoSphere Streams
Low Latency Analytics for streaming data
Velocity, Variety & Volume
Data-In-Motion MPP Data Warehouse
Stream Computing
Information Integration
Hadoop
InfoSphere Information Server
High volume data integration and transformation
Big Data Concepts and Hardware Considerations Apache Hadoop:
open source framework for the distributed
processing of large data sets across clusters of computers using a simple programming
model
© 2012 IBM Corporation 22
The IBM Big Data Platform
Big Data Concepts and Hardware Considerations
Integrate and manage the full variety, velocity and volume of data
Apply advanced analytics to information in its native form
Visualize all available data for ad-hoc analysis
Development environment for building new analytic applications
Workload optimization and scheduling
Security and Governance
© 2012 IBM Corporation 23
IBM’s Value: Complementary Analytics
Traditional Approach Structured, analytical, logical
New Approach Creative, holistic thought, intuition
Structured Repeatable
Linear
Monthly sales reports Profitability analysis
Customer surveys
Internal App Data
Data Warehouse
Traditional Sources
Structured Repeatable
Linear
Transaction Data
ERP data
Mainframe Data
OLTP System Data
Unstructured Exploratory Iterative
Brand sentiment Product strategy Maximum asset utilization
Hadoop Streams
New Sources
Unstructured Exploratory
Iterative
Web Logs
Social Data
Text Data: emails
Sensor data: images
RFID
Enterprise Integration
Big Data Concepts and Hardware Considerations
© 2012 IBM Corporation 24
The Big Data Ecosystem: Interoperability is Key
Streaming Data
Traditional Warehouse
Analytics on Data at Rest
Data Warehouse
Analytics on Structured
Data
Analytics on Data In-Motion
InfoSphere BigInsights
Traditional / Relational
Data Sources
Non-Traditional / Non-Relational Data Sources
Non-Traditional/ Non-Relational Data Sources
Traditional/Relational Data Sources
Internet-Scale
Data Sets
InfoSphere Streams
© 2012 IBM Corporation 25
On a Smarter Planet, technology innovation redefines industries
Trading
Traffic Control
Fraud Prevention
Law Enforcement
© 2012 IBM Corporation 26
Netezza and Industry Models
● Industry strength of DW models plays to typical Netezza vertical approach
● Use Netezza as the basis for any Dimensional structures generated from traditional Data Warehouse models
● Enables models to be deployed to leverage the traditional Netezza Strengths
● Aligns with typical usage/topology for Netezza
● Generate DDL from IDA and customize Distribution clause to run in Netezza
© 2012 IBM Corporation 27
Where Does a Data Warehouse Fit in the IT Environment?
Content
Structured Data
Analyze Integrate
Govern
Master Data
Data
Transactional & Collaborative Applications
Manage
Streaming Information
Business Analytic Applications
Streams
Big Data
Data Warehouses
External Information
Sources
www
Quality
Lifecycle Management
Security & Privacy
© 2012 IBM Corporation 29
The Business Solution templates can be deployed directly onto a Netezza DW
environment
Pro
fita
bili
ty
Ris
k
Managem
ent
Regula
tory
C
om
plia
nce
Data Sources
● The Corporationschooses the required structures to address their specific business needs from the 145 pre-defined Business Solution Templates
● Parallel projects can select from different areas to ensure consistency of reporting across the enterprise
© 2012 IBM Corporation 30
The BDW Business Solution templates can be deployed in a “Conformed Dimension
configuration – all on a Netezza DW environment
Pro
fita
bili
ty
Ris
k
Managem
ent
Regula
tory
C
om
plia
nce
Conformed Dimension
Layer
Data Sources
● Different Reporting areas can share a “Conformed Dimension Layer”
● Ensures consistency of Dimensional structures such as “Customer”, “Product”, “Time” across the enterprise
● This means that a Financial Institution can build up a cross-enterprise dimensional data warehouse over time in small business focused bite-sized chunks … all on Netezza !
Top Related