Luncheon Webinar Series June 3rd, 2010 ... - Data...
Transcript of Luncheon Webinar Series June 3rd, 2010 ... - Data...
1
Luncheon Webinar SeriesJune 3rd, 2010
Deep Dive – MetaData WorkbenchSponsored By:
Deep Dive – MetaData Workbench• Questions and suggestions regarding presentation
topics? - send to [email protected]• Downloading the presentation
– http://www.dsxchange.net/MetaDataWorkbench.html – Replay will be available within one day with email with details
• Pricing and configuration - send to [email protected]• Bonus Offer – Free premium membership for your DataStage
Management! Submit your management’s email address and we will offer him access on your behalf.
– Email [email protected] subject line “Managers special”.– Join us all at Linkedin http://tinyurl.com/DSXmembers
2
Tips and Tricks for Managing, Administering Metadata SuccessfullyTSB-3403
Marc HaberFunctional Architect, Infosphere Metadata Tools
© Copyright IBM Corporation 2010. All rights reserved.U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACYOF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS” WITHOUTWARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS INFORMATION IS BASED ONIBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBMWITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USEOF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHINGCONTAINED IN THIS PRESENTATION IS INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATINGANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), ORALTERING THE TERMS AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OFIBM PRODUCTS AND/OR SOFTWARE.
IBM, the IBM logo, ibm.com, Infosphere, and are trademarks or registered trademarks of International BusinessMachines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms aremarked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S.registered or common law trademarks owned by IBM at the time this information was published. Such trademarksmay also be registered or common law trademarks in other countries. A current list of IBM trademarks is availableon the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml
Disclaimer
Agenda• Introduction
– InfoSphere Information Server
– InfoSphere Foundation Tools
– Metadata Primer
• Getting Started
– Goals
– Architecture
– Administration Tasks
• Product Demonstration
– Import, Manage and Deliver
• Summary and Conclusion
Introduction
7
• Simplify delivery of Trusted Information
• Accelerate Client Value
• Promote Collaboration
• Mitigate Risk
• Modular, yet Integrated
• Scalable – Project to Enterprise
An Industry Unique Information PlatformInfoSphere Vision
IBM InfoSphere Information Server
Discover, model, define, and govern
information structure and content
Standardize, merge,and correct information
Combine and restructure information
for new uses
Synchronize, virtualize and move information
for in-line delivery
Unified Deployment
Unified Metadata Management
InfoSphere Information Server
Business Glossary
Manage Business Terms
Data Architect
Design Enterprise Models
Assess Data Quality
Information Analyzer
Capture Design Specifications
FastTrack
Monitor Data Flows
Metadata Workbench
IBM Industry ModelsLeverage Industry Best Practices
Discovery
Understand Data Relationships
Metadata
InfoSphere Foundation Tools
10
Enterprise Projects
Test Data Generation
Application Retirement &Consolidation
Data Archival
Data De-identification
Data Quality
Data Integration
Master Data Management
Data Warehousing
Manage Business Terms
Business Glossary
Design Enterprise Models
Data Architect
Capture Design Specifications
FastTrack
Monitor Data Flows
Metadata Workbench
InfoSphere Foundation Tools
Assess, Monitor, Manage Data Quality
Information Analyzer
Discover Data Relationships
New – Discovery
Discover GovernDesign
• Discover and understand the data across heterogeneous systems
• Design trusted information structures for business optimization
• Govern that information over time
InfoSphere Foundation Tools Portfolio
ComplianceAnalysis Reporting for compliance measures in ensuring data quality
and trust of Data Sources
StandardsData Flow reports are requirements
of Sarbanes Oxley, Basel II and other regulatory standards
ChangeUnderstanding and reacting to the impact of
change of Data Sources and structures
GovernanceAsset catalog and metadata
reporting for Data Governance initiatives and requirements
Infosphere Metadata Workbench
Literally, “data about data” helps to describe a company’s information frombusiness, technical, and operational perspectives
Practically, “information that is important and critical”, “information that is difficult to grasp or fully understand”, “information that is continually emerging and processed”
Metadata Primer
Business MetadataAudience: Business users Purpose: Business rules, definitions, terminology, glossaries, algorithms and lineage using business language
Technical MetadataAudience: Specific Tool Users – BI, Data Integration, Profiling, ModelingPurpose: Defines source and target systems, table and field/attribute structures, derivations and dependencies
Operational MetadataAudience: Operations, ManagementPurpose: Information about application runs: frequency, record counts, component by component analysis, other statistics
Metadata Primer – standard definition
MeaningUnderstand the true meaning of a concept, what business process orentity does it represent, what business rules govern it, whatspecifications define it, what concepts are related
Size and ConstructUnderstand the length, type and structure of a concept
MetricsUnderstand the cardinality, range, valid values, frequency of a concept
UsageTrace the data flow through systems and applications, understandwhat processes and logic is involved in moving, transforming orotherwise aggregating data
Metadata Primer – user definition
Governance and Compliance Regulations are increasing How do organizations comply and meet documentation requirements? How can organizations ensure accountability and responsibility?
Business Competition continues to grow How do organizations individualize their customer experience?
How can organizations get access to information to make correct decisions?
Costs and system complexities are expanding
How can organizations drive optimization with integration?
How do organizations manage complex software environments?
Metadata Business Drivers
Job Design Analysis:Analysis is defined as the projected flow of information, across differentDataStage Jobs where the target and source Stages share a common source.Such information is necessary to determine the Impact of Change or Data FlowAnalysis Reports delivered by the Infosphere Metadata Workbench.
Linkage of Jobs via their common Stage Types and properties.
Requires Automated Linkage service to be invoked
Does not require user to load or use Physical Schema’s or Files
Metadata Primer – Design Metadata
Job Operational Analysis:Analysis is defined as the actual flow of information, from a Source data itemthrough a set of actions defined within a DataStage Job and written to a Targetdata item, based upon the Operational Job Run logs of the Job. Form acomplete ETL Data Flow diagram, analyzing the sources of information, Job Runstatistics and Transformation logic.
Linkage of Jobs via their Job Run Operational Logs
Requires import of Operational Metadata
Requires Automated Linkage service to be invoked
Metadata Primer – Operational Metadata
Features• Explore, analyze and manage assets
• Data Lineage and Impact Analysis
• Extended visibility to enterprise integration flows outside of Information Server
• Full searching and querying across information Assets
Benefits• Mitigate risk for change management
• Support compliance and governance initiatives
• Comprehensive understanding of data lineage for trusted information
IT DevelopersAdministrators
Project Managers & DBAs
Exploration and Analysis of Information AssetsInfoSphere Metadata Workbench
• View end-to-end lineage including design metadata, operational metadata, user-defined metadata
• View context-specific details including stewards, term, description, Job image, Job operational metadata details, etc.
Data Lineage
• Business oriented view of Data Lineage Analysis report
• Business Lineage is configuredwithin the Metadata Workbench, explicitly including only key Data Assets
Business Lineage
• Data Catalog browse data structures, including Database, Data File, BI Report and Job assets
• Asset Detailsdisplay asset information, including relationships and usage details
Catalog and Display
• Asset Usage understand ETL Jobs or Mapping consumption, Business Glossary defined meaning, Data Steward, Mapping Specification requirement from FastTrack or Analysis Profiling data from Information Analyzer
• Asset Informationdisplay base information, including description, container and relationships
Asset Display Information
• Homepage quickly search, display or query Information Assets
• Query Results formatted as a spreadsheet, for easier understanding and readability
Search and Query
• Results Formatted as a spreadsheet, for easier understanding and readabilityGrouped according to TypeAbility to save as Spreadsheet or Text File
Query Result Information
• Create specific ad-hoc Reports Select Information Asset properties and Relationships or their propertiersAdd specified conditioning filtersPublish Queries for all users
Query Construction
Getting Started
Design Document
Abstract definition and specification which govern the flow of information from Source System for Reporting, OLAP and Mining deliverables.
Governance and Auditing requirements dictate the need for Data Lineage reporting analysis.
Design Specification
1. System Application2. Data File3. Database Warehouse & Mart4. BI Reports
5. DataStage Jobs6. Data Scripts
7. Data Flow Analysis
1
3
3
2
4
5
5
6
Identify and Plan the Tasks
Data LineageAbility to view Data Flow, validate Systems of Record, validate Business Logic
Data ReportingEnsure compliance and data re-use, understand data consumption
Data TerminologyEnsure standardized language, descriptions and methodology
Data ConsistencyEnsure proper Data Formatting, Data Type and Value Range
Goals
Import metadata about Database Tables and Files that are used in Job Design and Production
Import metadata about BI Reports used to publish information
Define and import Extended Data Sources and external Data Mappings for a complete end-to-end lineage flow
Publish shared metadata as necessary
Generate and import operational metadata from job runs
Invoke Metadata Workbench administrative services
Metadata Preperation
Did you know? Design metadata for DataStage and QualityStage jobs is automatically stored in the metadata repository as well as metadata from all other suite tools.
Data Lineage and Impact Analysis
Data Reporting and Querying
Data Structure
MANAGE CONTENT
METADATA SERVER
Business Operational
TechnicalETL Design
ETL Operational
BI Structure
AUTHOR AND LINK TO IT ASSETS
BUSINESSGLOSSARY
METADATAWORKBENCH
INFORMATIONANALYZERFAST TRACK
INFOSPHEREDATA
ARCHITECT
IT ASSETS
BI REPORTS, PHYSICAL SCHEMAS, DS/QS JOBS
IMPORT/EXPORT MANAGER OR DATASTAGE CONNECTORS
Lineage
Querying
Metadata Workbench
Understanding
Metadata Workbench Architecture
Features• Import capabilities for 3rd party BI tools (Cognos,
Business Objects, MicroStrategy), data modelingtools (ERwin, RDA) and databases (ODBCconnections to all major RDBMS)
• Metadata Bridges interchange metadata with eachspecific application a consist of a model, a decoder,and an encoder which require no coding.
• Support a variety of import formats including XMI,XML, UML, CWM and CSV metadata exchangeformats
Benefits• Visibility of data modeling to ETL to report layer
minimizes risks of overlooking criticaldependencies
• Leverage common metadata exchangeenvironment for application developmentconsistency
IT Developers
IT Administrators
Infosphere Import Export Manager
IT Developers
IT Administrators• Data Source
import and maintain application, procedure or file definitions from spreadsheets
• Data Flow import and maintain source to target mappings, their business logic and function from spreadsheets
Infosphere Import Extended Data Source
IT Developers
IT Administrators• Data Flow Mapping
document and express the transformation or business logic between source and target
• Create or Import create Extended Data Flow Mapping documents within the Metadata Workbench or import from a file
• Custom Attributes extend the properties of a mapping to record specific and proprietary information, including runtime data, specification or organizational data
Infosphere Import Extended Data Mapping
Metadata Administrators
• Ability to include or exclude Projects• Intelligent metadata linking• Ability to schedule Analysis Services• Ability to map Database Aliases• Enhanced and extended support for Stages
Allows administrators to minimize time maintaining and managing
metadata assets as well as reduce the numbers of errors introduced
from manual reconciliation processes.
Infosphere Data Lineage Administration
As a developer creates the Job canvass, they are building a flow of datafrom the Source to the Target of the Job. That flow, connected with otherJob flows, will translate into Data Lineage.
The Metadata Workbench Linkage Services will infer a relationship betweenboth DataStage Jobs, based upon a common Data Set.
DataStage and QualityStage Development
Ensuring a proper Job Design, while maintaining standards for naming and data connectivity will ensure greater linkages between the Job Design and the imported Data Source.
• Database Connectors• Job Parameters and Environment Variables• Load Column information from Shared Table• Supported DataStage Stage Types• DataStage Common Connector Stages• Build SQL vs. User Defined SQL
DataStage and QualityStage Job Design
DB2 Native DB2 UDB API (S, P) DB2/UDB Enterprise (P) DB2 UDB Load (S, P)
Server NameSchema NameTable Name
RDBMS Native Dynamic RDBMS (S, P) Server NameSchema NameTable Name
MSOLE Native MS OLEDB (S) Server NameSchema NameTable Name
MSSQL Native MS SQL Server Load (S) SQL Server Enterprise (P)
Server NameSchema NameTable Name
Oracle Native Oracle 7 Load (S) Oracle Enterprise (P)Oracle OCI (S) Oracle OCI Load (S)
Server NameSchema NameTable Name
Sybase Native Sybase BCP Load (S) Sybase Enterprise (P) Sybase IQ 12 Load (S) Sybase OC (S)
Server NameSchema NameTable Name
ODBC ODBC (S) ODBC Connector (P) ODBC Enterprise (P)
Server NameSchema NameTable Name
TeraData Teradata API (S, P) Teradata Connector (P)Teradata Enterprise (P) Teradata Export (M) Teradata Load (S, M) Teradata Multiload (S, P) Teradata Relational (M)
Server NameTable Name
Complex Flat File Complex Flat File (S, P, M) File Name
Other Flat File Delimited Flat File (M) Fixed-width Flat File (M) Multi-format Flat File (M)
File Name
Hash File Hashed File (S) File Name
Sequential File Sequential File (S, P) File Name or Pattern
(S) = Server Canvas(P) = Parallel Canvas(M) = Mainframe Canvas
The following DataStage andQualityStage stages are supported bythe IBM Metadata Workbench analysisservice in determining cross Jobrelationships based upon the values ofthe Stage properties.
Other types of DataStage Stages maybe manually associated to DatabaseTables or Data File Elements.
Infosphere Data Lineage Support
Product Demonstration
Summary and Conclusion
Step 1: Understanding the objectives
Step 2: Defining the Tasks
Step 3: IBM Infosphere – Delivering Lineage and Understanding
Summary
44
Thank You!Your Feedback is Important to Us
Don’t Miss these Foundation Tools Sessions!!• Future Directions in Integrated
Data QualityUSL-3873 02:00 PM - 04:00 PM
• Introduction and Overview -InfoSphere Foundation ToolsFeaturing Business Partner: Accantec Information SolutionsTSB-3392 03:00 PM - 03:50 PM
• The Evolution of a Complex Data Warehouse with InfoSphere Foundation Tools Customer: Consip S.p.a TSB-3333 05:15 PM - 06:05 PM
• Building Business-led Informational Solutions with Industry Models, InfoSphere Warehouse, Business Glossary and Cognos TSB-3593 05:15 PM - 06:05 PM
Wed - May 19
Customer Sessions, Presentations, Usability Sessions, Live demos, Hands-On Labs
• Data Discovery & Mapping to Accelerate Information Centric Projects TSB-3405 07:45 AM - 08:45 AM
• Using Information Analyzer for Data Quality Health Monitoring TSB-3410 07:45 AM - 08:45 AM
• Reduce costs, speed collaboration, and access critical data w/ low impact using Foundation ToolsHOL-3845 10:30 AM - 01:30 PM
• Get the Most Out of Your Data Modeling & MetadataCustomer: Danske Bank TSB-3496 11:45 AM - 12:35 PM
• A Metadata Based Approach to Data GovernanceCustomer: Deutsche BankBLD-3615 02:00 PM - 02:50 PM
• InfoSphere Foundation Tools Deep Dive & Roadmap TSB-3393 02:00 PM - 02:50 PM
• Governing Your Information Supply Chain TSB-3379 02:00 PM - 02:50 PM
• Do You Really Trust Your Information? See How You Can - Live Demos IncludedTSB-2902 02:00 PM - 02:50 PM
Thu – May 20• Tips & Tricks for Managing &
Administering Successful MetadataTSB-3403 07:45 AM-08:45 AM
• Succeed In Getting All Stakeholders Involved Using Business Glossary TSB-3414 07:45 AM-08:45 AM
• Delivering Smart Analytics, ROI & Business Benefits through the InfoSphere Portfolio Customer: 3UKBLD-3493 9:00 AM – 9:50 AM
• Accelerate Master Data Design and Definition using InfoSphere DiscoveryTSB-3545 9:00 AM – 9:50 AM
• Industry Models for Basel II Compliance and Risk Management Customer: CitiGroupBLD-3022 12:30 PM – 01:30 PM
Fri - May 21
** Visit Our Live Demos Every Day @ The Demo Room! **• Understand and Map Your Distributed Data• Integrated Metadata for Enterprise Collaboration and Trust
• Assess Information Quality and Health • Proven Models that Accelerate Your Information Agenda
Contacts us for more information aboutIBM InfoSphere Metadata WorkbenchMarc Haber [email protected]
Functional Architect, Metadata ToolsInfosphere Metadata Workbench product specialist
Farnaz Erfan [email protected] Product Marketing Manager