select decode(vsize(ltrim(rtrim(replace(replace(replace(p_aaa,'-'),'('),')')))) ,
© Copyright 2012 Your organization1 Strategy for Data Governance Replace with your name &...
-
Upload
esther-singleton -
Category
Documents
-
view
217 -
download
0
Transcript of © Copyright 2012 Your organization1 Strategy for Data Governance Replace with your name &...
© Copyright 2012 Your organization 1
Strategy for Strategy for Data Governance Data Governance
Replace with your name & Replace with your name & organizationorganization
• Las Vegas • February 18, 2008
© Copyright 2012 Your organization 2
OutlineOutline
Benefits of a data governance strategy
Components of a data governance strategy
Organization, roles and responsibilities
Impact of a data governance strategy on BI and IT
How to implement a data governance strategy program
© Copyright 2012 Your organization 3
Why you need a data governance Why you need a data governance strategystrategy
CEO CFO
I would like an accounting of the company’s financial assets
Uhh … let me see. I think we still have enough money in our bank
accounts to cover payroll this month, and uhh …I’m not sure if there are any
outstanding accounts receivables … Uhh and – hmm … let me think …
© Copyright 2012 Your organization 4
Why you need a data governance Why you need a data governance strategystrategy
CEO CIO
I would like an accounting of the company’s
information assets
Uhh … let me see. I don’t really have an inventory of all the data, and I’m not sure what data is in which database, or how
much of that data is redundant and
inconsistent. I also can’t vouch for the quality of the data … Uhh and – hmm … let me think …
© Copyright 2012 Your organization 5
Do these problems exist in your Do these problems exist in your organization? organization?
Replace with your problems
© Copyright 2012 Your organization 6
Do these problems exist in your Do these problems exist in your organization?organization?
Room for more problems and issues
© Copyright 2012 Your organization 7
Motivations for Data GovernanceMotivations for Data Governance
SEC audits and risk of losing investors Risk of fines and incarceration due to inaccurate
regulatory reporting Risk of losing customers due to poor data quality Loss of productivity due to excessive and uncontrolled
redundancy Suboptimal business performance
© Copyright 2012 Your organization 8
Technology SolutionsTechnology Solutions
Enterprise Resource Planning (ERP) Data Warehousing (DW & BI) Customer Relationship Management (CRM) Supply Chain Management (SCM) …
© Copyright 2012 Your organization 9
Data WarehousingData Warehousing
DW Promises DW Reality Data integration
No more uncontrolled data redundancy
Consistency of data content
Improved data quality Historical enterprise data Unlimited ad-hoc reporting
Reliable trend analysis reporting Business intelligence capabilities
Stove-pipe data marts and departmental data warehouses
Continued redundancy, sometimes even increased data redundancy
Data is still inconsistent among data marts and data warehouses (no central staging area, no reconciliation totals)
Little improvement to data quality Historical data is limited to departmental views Limited ad-hoc reporting (too complicated, missing
relationships, poor performance) Inconsistent trend analysis reports among data
marts BI capabilities compromised by inconsistent and
unreliable key performance indicators (KPI)
© Copyright 2012 Your organization 10
Customer Relationship ManagementCustomer Relationship Management
CRM Promises CRM Reality
Data integration Non-redundant customer data
Data quality Increased customer satisfaction
Product pricing customization
Knowledge of customer wallet share
More stove-pipe systems Continued redundancy, more departmental views,
purchased packages not integrated Dirty customer data continues Decreased customer dissatisfaction because of
poor-quality customer data Wrong pricing because of departmental views, still
not cross-organizational Privacy issues and dirty data led to government
regulations
© Copyright 2012 Your organization 11
The Lesson?The Lesson?
You cannot keep doingYou cannot keep doingwhat you have always donewhat you have always done
and expect the results to be different.and expect the results to be different.
“That wouldn’t be logical”Spock, Star Trek
Not even withNot even withnew technology.new technology.
© Copyright 2012 Your organization 12
Data Governance Defined …Data Governance Defined … “The execution and enforcement of authority over the
management of data assets and the performance of data functions” (Robert Seiner)
(Jane Griffin)
“The process by which you manage the quality, consistency, usability, security, and availability of your organization’s data”
(Danette McGilvray)
“A process and structure for formally managing information as a resource. Ensures the appropriate people representing business processes, data, and technology are involved in the decisions that affect them; includes an escalation and decision path for identifying and resolving issues, implementing changes, and communicating resulting actions”
ConsultantsConsultants
© Copyright 2012 Your organization 13
Data Governance Defined …Data Governance Defined …
“Unites people, process, and technology to change the way data assets are acquired, managed, maintained, transformed into information, shared across the company as common knowledge, and consistently leveraged by the business to improve profitability.”
(Wachovia)
(Sallie Mae)
“Resolving data issues using a horizontal perspective of the organization and focusing on the major “pain points” for our business areas.”
(BMO)
“A framework of accountabilities and processes for making decisions and monitoring the execution of data management.”
ClientsClients
© Copyright 2012 Your organization 14
Data Governance Defined …Data Governance Defined …
“The orchestration of people, process, and technology to enable the leveraging of data as an enterprise asset. It includes policies, procedures, organization, roles, and responsibilities, with associated communication and training required to design, develop, and provide ongoing support for the effort.”
(SAP)
(DataFlux)
“An organization-wide commitment to data quality,, with data stewardship recognized as an essential business role.
VendorsVendors
© Copyright 2012 Your organization 15
Data Governance Defined …Data Governance Defined …
The execution of authority over the management of data
Data quality – including conformance to valid values, uniqueness, non-redundant, complete, accurate, understood, timely, referential integrity
Metadata creation and maintenance – information about data, both technical and business
Master data management (MDM)
Data integration
Data categorization for performance, availability, and security
OtherOther
© Copyright 2012 Your organization 16
OutlineOutline
Benefits of a data governance strategy
Components of a data governance strategy
Organization, roles and responsibilities
Impact of a data governance strategy on BI and IT
How to implement a data governance strategy program
© Copyright 2012 Your organization 17
Components of a DG strategyComponents of a DG strategy
Data standardization Data integration Data modeling Data quality Metadata management Security and privacy Performance and measurement DBMS and product selection Business intelligence
© Copyright 2012 Your organization 18
Data standardizationData standardization
Formal data definitions Business data naming standards Class words lexicon Technical data naming standards Common words lexicon Data domain standards
Our Situation with StandardizationOur Situation with Standardization
Insert your standardization status
© Copyright 2012 Your organization 19
© Copyright 2012 Your organization 20
Formal Data DefinitionsFormal Data Definitions A data definition must reflect the real-world meaning
A data definition explains the content and meaning of the unique data element
A data definition must be complete enough to ensure a thorough understanding of the data element
Data definitions are short and precise (one paragraph) and (optionally) may contain examples
Data definitions should never contain information about the source or use of the data elements
Bad definition:“The depth of the well in feet”
Good definition:“The total depth of the well in feet from the surface of the surrounding ground to the deepest point dug or drilled regardless of the depth of the well casing.”
Example:Well Depth Feet
Source: The DW Challenge by Michael Brackett
© Copyright 2012 Your organization 21
Data Naming StandardsData Naming Standards- Business
The name of an attributeattribute should be derived from its definition
Attribute names are always fully spelled out Attribute names should have 3 components:
– Prime word– Qualifiers (modifiers)– Class word
Attribute names should be fully qualified Attribute names should always end with an approved class
word Use only class words from an approved class words lexicon Attribute name components should be business terms, not
technical terms
Example:“Checking Account Monthly Average Balance”
© Copyright 2012 Your organization 22
Class Words LexiconClass Words Lexicon
Indicator . . . Char 1
Name . . . Char 15-40
Number . . . Integer
Percent . . . Dec 5,2
Quantity . . . Small Int
Rate . . . Dec 6,4
Text . . . Varchar 250
Amount . . . Dec 9,2
Balance . . . Dec 13,2
Code . . . Char 1-5
Count . . . Small Int
Date . . . Date
Description . . .Vchar
Identifier . . Integer
Business Data Domains
Approved and Published
© Copyright 2012 Your organization 23
Data Naming StandardsData Naming Standards - Technical The name of a columncolumn is composed of abbreviated
attribute name components
Use only abbreviations from an approved common words lexicon (abbreviations list)
Column name components should always be abbreviated if an approved abbreviation exists whether the column name is too long or not
When column names are too long, qualifiers should be eliminated starting with the least significant qualifier to the second least significant qualifier, etc.
Example:“CHKG_ACCT_MTHLY_AVG_BAL”
© Copyright 2012 Your organization 24
Common Words LexiconCommon Words Lexicon
Account . . . ACCTAmount . . . AMTAverage . . . AVGBalance . . . BALChecking . . . CHKGCertificate of Deposit ...CDCode . . . CDE Count . . . CNTDate . . . DTEDescription . . .DESC
Identifier . . . IDIndicator . . . INDMonthly . . . MTHLYName . . . NMNumber . . . NBRPercent . . . PCTQuantity . . . QTYRate . . . RTESavings . . . SVGText . . . TXT
Abbreviations List
Approved and Published
© Copyright 2012 Your organization 25
Data Domain StandardsData Domain Standards Every attribute (data element) must be atomic
Every attribute must be unique (no synonyms, no homonyms)
Every attribute identifies or describes only one business object (entity) in the real world
Every attribute must have business metadata (name, definition, business rules, owner, source, etc.)
Every attribute must have a predefined data domain
Data domains must be based on EDM data quality rules
Business metadata and data domains are defined and maintained by business people
© Copyright 2012 Your organization 26
Data Standardization – Best PracticesData Standardization – Best Practices
Provide training in data administration principles Create formal data definitions Create fully qualified business data names Apply the data domain standards Create and use class words and common words
lexicons Publish the data standards
Standardization – What we need to doStandardization – What we need to do
Enter your proposed actions
© Copyright 2012 Your organization 27
© Copyright 2012 Your organization 28
Data IntegrationData Integration Look for potential duplicate entities by examining:
– Entity definitions– Semantic intent– Entity content
Ensure that each entity has one unique business identifier
Put one fact (attribute) in one place (entity) using the normalization rules
Look for potential duplicate attributes by examining:– Attribute definitions– Semantic intent– Domains
Capture real world business actions between entities as data relationships (not reporting patterns)
© Copyright 2012 Your organization 29
Single Version of The TruthSingle Version of The Truth
Based on normalization
rules
Salesperson
CommissionedSalesperson
SalariedSalesperson
OrgStructure
Org Unit
Product Part
ProductCategory
Product
Customer Product Order
PotentialCustomer
ExistingCustomer
Customer
AccountAccount Payment
Payment
Method
Part
Supplier Shipment
Warehouse
© Copyright 2012 Your organization 30
Unstructured dataUnstructured data
Storage and administration– Enterprise content management systems
(ECMS)– Check-in and check-out functionality– Retention and archiving– Backup and recovery– Secure objects
Content reusability Search and delivery Combining structured and unstructured data
© Copyright 2012 Your organization 31
Data Integration – Best PracticesData Integration – Best Practices
Determine data integration benefits and costs Create an inventory of all your data Use logical data modeling and normalization rules to
find and remove synonyms and homonyms Use a metadata repository to document the names and
definitions of your business data Don’t forget to integrate unstructured data with
structured data
Data Integration – Our StatusData Integration – Our Status
Focus on the important data such as customer, supplier, agents, inventory, parts, loans, or whatever it is that runs your business. Include examples of where you are integrated and where not.
© Copyright 2012 Your organization 32
Data Integration – This is what we need Data Integration – This is what we need to doto do
Enter your integration actions
© Copyright 2012 Your organization 33
© Copyright 2012 Your organization 34
Data modelingData modeling Logical Data Model
Business view of data Process Independent Project-specific model
Enterpriseinformation architecture
Enterprise Data Model Business view of data Process Independent Enterprise-wide model
Physical Data Model Database view of data Process Dependent Database-specific model
Database model
Business model
Data Modeling – Our SituationData Modeling – Our Situation
© Copyright 2012 Your organization 35
© Copyright 2012 Your organization 36
Logical Data ModelLogical Data Model
Captures what an organization is and
what it does in terms of:
– Business objects (entities)– Business data (attributes)– Business activities (relationships)– Business rules (metadata)– Business policies (metadata)
Not tailored for:
– Query or reporting pattern or tool– Access or storage requirements– Performance
© Copyright 2012 Your organization 37
Process Process InIndependencedependence
Access path independent
Program independent
Query / report independent
Database independent
Tool independent (OLAP)
Language independent
Platform independent
© Copyright 2012 Your organization 38
Purpose of Logical Data ModelingPurpose of Logical Data Modeling Facilitate data integration
Facilitate business analysis
Facilitate communication among business people
Improve productivity through reusability
Focus on data ownership as opposed to system ownership
Bring data quality problems to the surface
Separate process logic from data
Serve as the baseline data architecture for database design
© Copyright 2012 Your organization 39
Enterprise Data ModelEnterprise Data Model“Single Version of the Truth”
Salesperson
CommissionedSalesperson
SalariedSalesperson
OrgStructure
Org Unit
Product Part
ProductCategory
Product
Customer Product Order
PotentialCustomer
ExistingCustomer
Customer
AccountAccount Payment
Payment
Method
Part
Supplier Shipment
Warehouse
Supported by common
data definitions, domains, and business rules.
Integrated 360o business view!
© Copyright 2012 Your organization 40
Physical Data ModelPhysical Data Model
Database design based on physical attributes:
– Access patterns– Size of tables– Number of business users– Location of business users– Platform (Processor, DBMS)– OLAP tools
Tailored for:
– Query or reporting pattern or tool– Access and storage requirements– Performance
© Copyright 2012 Your organization 41
Process Process DeDependentpendent
Access path dependent
Program dependent
Query / report dependent
Database dependent
Tool dependent (OLAP)
Language dependent
Platform dependent
© Copyright 2012 Your organization 42
Purpose of Physical Data ModelingPurpose of Physical Data Modeling
Facilitate database design
Focus on performance
Architect database structures:
– Tables– Columns– Primary keys– Foreign keys– Referential integrity rules
© Copyright 2012 Your organization 43
Data Modeling – Best PracticesData Modeling – Best Practices
Always create a logical business data model – do not just focus on database modeling
Sell the importance of creating an enterprise information architecture (enterprise data model) to management
Assign data modeling responsibilities (the enterprise data model should not be created by database designers)
Create a process to link the physical data models to the enterprise data model
Data Modeling – This is what we need Data Modeling – This is what we need to doto do
Enter your proposed data modeling actions
© Copyright 2012 Your organization 44
© Copyright 2012 Your organization 45
Data qualityData quality
Discoveryby accident
Program “abends”
1
Limiteddata analysis
Data profilingData cleansing
2
Proactiveprevention
4
Enterprise-wideDQ methods &techniques
Correctingsource dataand programs
3
Addressingroot causes
shortterm
5
Optimization
Continuousprocess improvements
longterm
At what level of DQ maturity is your organization?
1 Uncertainty2 Awakening3 Enlightenment4 Wisdom5 Certainty
(based on CMM)
© Copyright 2012 Your organization 46
Data quality costsData quality costs
MarketingCampaign
PerInstance
Numberof
Instances
Total NumberPer Year
TotalCost
Per Year
Time: ($60/hour loaded rate) Creating redundant occurrence 2.4 min 167,141 1 $ 401,138 Researching correct address 10 min 5,000/mo 12 $ 600,000 Correcting address errors 0.3 min 6,000/mo 12 $ 21,600 Handling complaints from customers 5.5 min 974/yr 1 $ 5,357 Mail preparation 0.1 min 393,273 4 $ 157,309
Materials, Facilities, Equipment: Marketing brochure $1.96 393,273 4 $3,083,260 Postage $0.52 393,273 4 $ 818,008 Warehouse storage $0.01 393,273 4 $ 15,731 Shipping equipment and maintenance $5,000/yr 36% 1 $ 1,800
Computing resources: CPU transactions $0.02/trans 393,273 4 $ 31,462 Data storage $0.001/mo 393,273 12 $ 4,719 Data backup $0.005/mo 393,273 12 $ 23,596
Total Annual Costs $5,163,980
Direct Costs of Non-Quality Information© Larry English,Improving DW and BI Quality
© Copyright 2012 Your organization 47
Data quality costsData quality costsInformation Development Cost Analysis
Category
PortfolioTotal
Number
RelativeWeightFactor*
AverageUnit
Dev/MaintCosts
TotalDev/Maint
Expenses**
TotalInfrastructureValue-addingCost-adding
Expenses
% ofBudget
Expenses
Infrastructure Basis: Enterprise architected DBs 200 0.75 $ 15,000 $ 3,000,000 Enterprise reusable create/update programs + 300 1.50 $ 30,000 $ 9,000,000 Total Infrastructure expenses $12,000,000 24%
Value Basis: Total retrieve equivalent pgms + 300 1.00 $ 20,000 $ 6,000,000 Total value-adding expenses $ 6,000,000 12%
Cost-adding Basis: Redundant create/update pgms 500 1.50 $ 30,000 $15,000,000 Interface/extract programs 400 1.00 $ 20,000 $ 8,000,000 Redundant database files 600 0.75 $ 15,000 $ 9,000,000 Total cost-adding expenses 1,500 $32,000,000 64%
Lifetime Total ** 3,800 $50,000,000 100%
* Determine relative effort to develop average unit of each category using effort to develop a retrieve program as “1.00”+ For programs that retrieve some data and create/update other data, determine the percent of retrieve only attributes and percent of create/update attributes (e.g., to retrieve customer data to create an order)**Based on 3,800 application programs and database files in portfolio and $50 Million in development
© Larry English,Improving DW and BI Quality
© Copyright 2012 Your organization 48
Dummy (default) valuesDummy (default) values
Defaults for mandatory fields
SSN 999-99-9999 Age 999 Zip 99999
Income 9,999,999.99
Inability to determine customer profiles Inability to determine customer demographics
© Copyright 2012 Your organization 49
““IntelligentIntelligent”” dummy values dummy values
Defaults with meaning
SSN 888-88-8888Income 999,999.99Age 000Source Code ‘FF’
Non-resident alien
Employee
Corporate customer
Account closed prior to 1991
Inability to write straight forward queries withoutknowing how to filter data
© Copyright 2012 Your organization 50
Missing ValuesMissing Values
Operational systems do not always require informational or demographic data
Gender EthnicityAgeIncomeReferring Source
Inability to analyze marketing channels
© Copyright 2012 Your organization 51
Multi-purpose fields Multi-purpose fields
Inability to judge product profitability
ONE field explicitly has MANY meanings
» Which business unit enters the data» At what time in history it was entered» A value in one or more other fields
Appraisal Amount redefined as
Advertised Amount redefined as
Sold Date Loan Type Code redefined as ...
25 redefines = 25 attributes !
Not mutually exclusive !
Only the value of oneis known for each record !
25 redefines = 25 attributes !
Not mutually exclusive !
Only the value of oneis known for each record !
© Copyright 2012 Your organization 52
Cryptic values (1)Cryptic values (1)
Often found in “Kitchen Sink” fields
» Usually one byte (if not one bit)» Highly cryptic (A, B, C, 1, 2, 3, ...)» Non-intelligent, non-intuitive codes
» Often not mutually exclusive
Inability to empower end users to write their own queries
© Copyright 2012 Your organization 53
Cryptic values (2)Cryptic values (2)
ONE field implicitly has MANY meanings
Master_Cd {A, B, C, D, E, F, G, H, I}
{A, B, C}{D, E, F} {G, H, I}
Type of customer
Type of supplier
Regional constraints
© Copyright 2012 Your organization 54
Free-form address linesFree-form address lines
Unstructured text
» no discernable pattern» cannot be parsed
address-line-1: ROSENTHAL, LEVITZ, Aaddress-line-2: TTORNEYSaddress-line-3: 10 MARKET, SAN FRANCaddress-line-4: ISCO, CA 95111
Inability to perform market analysis
© Copyright 2012 Your organization 55
Contradicting valuesContradicting values
Values in one field are inconsistent withvalues in another related field
1488 Flatbush Avenue New York, NY 75261
Type of real property: Single Family Residence Number of rental units: four
Texas Zip
Income property
Inability to make reliable business decisions
© Copyright 2012 Your organization 56
Violation of business rulesViolation of business rules
Business Rule: Adjustable Rate Mortgages must have
» Maximum Interest Rate ( Ceiling)» Minimum Interest Rate ( Floor)
Business Rule: A Ceiling is always higher than a Floor
ceiling-interest-rate: 8.25floor-interest-rate: 14.75
switched ?
Inability to calculate product profitability
© Copyright 2012 Your organization 57
Reused primary keyReused primary key
Little history, if any, stored in operational files
» primary keys are customarily re-used » may have a different rollup structure
January ‘94: branch 501 = San Francisco Mainregion 1area SW
August ‘97: branch 501 = San Luis Obisporegion 2area SW
Inability to evaluate organizational performance
© Copyright 2012 Your organization 58
Non-unique primary key Non-unique primary key
Inability to determine customer relationshipsInability to analyze employee benefits trends
Duplicate identification numbers
» Multiple customer numbers Customer Name Phone Number Cust. Number
Philip K. Sherman 818.357.5166 960601 Philip K. Sherman 818.357.7711 960105 Philip K. Sherman 818.357.8911 960003
» Multiple employee numbers
Employee Name Department Empl. Number July 1995: Bob Smith 213 (HR) 21304762 January 1996: Bob Smith 432 (SRV) 43218221 August 1999: Bob Smith 206 (MKT) 20684762
© Copyright 2012 Your organization 59
Missing data relationshipsMissing data relationships
Data that should be related to other data in a dependent (parent-child) relationship
» Branch number 0765 does not exist in the BRANCH table
Branch Employee
Inability to produce accurate rollups
Benefit
© Copyright 2012 Your organization 60
Inappropriate data relationshipsInappropriate data relationships
Data that is inadvertently related, but should not be
» two entity types with the same key values
Purchaser: Jackie Schmidt 837221Seller: Robert Black 837221
Inability to determine customer or vendorrelationships
© Copyright 2012 Your organization 61
Management SupportManagement Support
Management awareness of importance of data quality Cost justification of data quality initiative Ongoing commitment Finding a business management sponsor
© Copyright 2012 Your organization 62
Triage - PrioritizationTriage - Prioritization
Which data to cleanse Justification for cleansing Ease of cleansing Possibility of cleansing Political support for cleansing
© Copyright 2012 Your organization 63
Cost of CleansingCost of Cleansing
Automatic versus manual– Tools to perform automatic cleansing– Effort to support use of tools
Use of defaults Knowledge/experience of those performing manual
cleansing
© Copyright 2012 Your organization 64
Responsibility for Data QualityResponsibility for Data Quality
“It’s not enough to say that data quality is everyone’s responsibility.”
Data Quality Administrator Ongoing commitment Data ownership responsibility Operational versus data warehouse responsibility
© Copyright 2012 Your organization 65
Data Quality – Best PracticesData Quality – Best Practices
Inventory the quality of your data Sell the importance of data quality to management Assign data quality responsibility Triage the cleansing process
Data Quality – Our StatusData Quality – Our Status
Enter all the major problems you have or anticipate with data quality and don’t limit yourself to one slide.
© Copyright 2012 Your organization 66
Data Quality – What Steps We Should Data Quality – What Steps We Should Take to Improve Take to Improve
Enter all the practical steps you should take and prioritize them. Don’t limit yourself to one slide.
© Copyright 2012 Your organization 67
© Copyright 2012 Your organization 68
Metadata ManagementMetadata ManagementBusiness NamesData DefinitionsData DomainsData RelationshipsBusiness RulesDQ RulesData Integrity Rules
TablesColumnsKeys (primary/foreign)Ref. Integrity RulesIndexesETL rulesProcess logic
Developer’s ViewTechnical MetadataTechnical Metadata
User’s ViewBusiness MetadataBusiness Metadata
Master Master MetadataMetadata
Administratio
n
Administratio
nDocumentation
Documentation
Data LineageData LocationData UsageData VolumesLoad StatisticsError Statistics
Administrator’s ViewUsage MetadataUsage Metadata
Navig
atio
n
Navig
atio
n
© Copyright 2012 Your organization 69
Metadata is everywhereMetadata is everywhere
WordProcessing
FilesDBMS
Dictionaries Spreadsheets ETL
ToolsCASETools
OLAPTools
Data MiningTools
Technicians and Business Data Database ETL Application Data Mining Business People Analysts Administrator Administrator Developer Developer Expert
MetadataRepository
Metadata Migration Process
DocumentationDocumentation
Technician’s ViewTechniTechnical cal MetadataMetadata
Business Person’s ViewBusiness MetadataBusiness Metadata
NavigationNavigation
© Copyright 2012 Your organization 70
Metadata as the KeystoneMetadata as the Keystone
Single version of the truth It’s the inventory of information Tears down dysfunctional information fiefdoms Opportunities for data standardization
© Copyright 2012 Your organization 71
Management Support for MetadataManagement Support for Metadata
IT and the Business Management understanding of the importance of
metadata Impact on project schedules Long term benefit of metadata Importance for operational and data warehouse
© Copyright 2012 Your organization 72
Which Metadata to CaptureWhich Metadata to Capture
Don’t boil the ocean What metadata is valuable Ease and cost of capture Political issues relating to capture
© Copyright 2012 Your organization 73
Responsibility for Capturing MetadataResponsibility for Capturing Metadata
Incentive for capturing Management direction Automatic and manual
© Copyright 2012 Your organization 74
Responsibility for Maintaining MetadataResponsibility for Maintaining Metadata
Where does Metadata Repository Administration report?
Why is administration and maintenance important? Long-term commitment
© Copyright 2012 Your organization 75
How Metadata Is UsedHow Metadata Is Used
Business– Understanding the data– Understanding the meaning of results– Avoiding incorrect conclusions
IT– Research– Impact analysis– Tool interchange
© Copyright 2012 Your organization 76
Metadata – Best PracticesMetadata – Best Practices
Determine which metadata to capture and use Determine how the tools will capture and use metadata Sell management on the importance of metadata Assign metadata responsibility
Metadata – Where are we?Metadata – Where are we?
Include anything you have done including a glossary or business and IT definitions.
© Copyright 2012 Your organization 77
Metadata – What Should We be DoingMetadata – What Should We be Doing
As you enter these actions, consider including responsibility but make sure you have talked to those people or departments before presenting to management.
© Copyright 2012 Your organization 78
© Copyright 2012 Your organization 79
Security and privacySecurity and privacy
Remote Access
WorkstationTerminals
LAN File Server
Mainframe
CommunicationServer
Database Server
InternetAccess
AAAA
BBBB
DDDD
EEEE
FFFF
GG
HHHHCC
CC
Security exists
No security
Legend:
MainframeSecurityPackage
LANSecurityPackage
PCSecurityPackage
PasswordSecurity
EncryptionFunction
DBMSSecurity
GenericSecurityPackage
A
B
C
D
E
F
G
H
Conn.Path
© Copyright 2012 Your organization 80
Categorization for Security/PrivacyCategorization for Security/Privacy
Does all data have the same security/privacy requirements?
Who determines security/privacy requirements of data? What are the regulatory requirements for security and
privacy? Does your organization have a Security Office? What
authority do they have?
© Copyright 2012 Your organization 81
Responsibility For Data SecurityResponsibility For Data Security
Security Office Internal auditors Data Owners Responsibility for administering Testing security and privacy
© Copyright 2012 Your organization 82
Mechanism For Establishing Security Mechanism For Establishing Security ProceduresProcedures
Security requirements– Internal – Regulatory
Tools that implement security Communicating security requirements to those who
implement
© Copyright 2012 Your organization 83
Security AuditSecurity Audit
Validating procedures Validating training Testing and probing Recommending mitigation Frequency of audits
© Copyright 2012 Your organization 84
Regulatory IssuesRegulatory Issues
Health Care – HIPPA Finance Brokerage - SEC Insurance Media – FCC
© Copyright 2012 Your organization 85
Security & Privacy – Best PracticesSecurity & Privacy – Best Practices
Raise the consciousness of security and privacy requirements
Connect with your Security Office Determine security capabilities of tools Assign responsibilities Test and validate
Security & Privacy – What exposures Security & Privacy – What exposures do we have?do we have?
Hopefully you have talked to your Security Officer and anyone else who is responsible for the security of data.
© Copyright 2012 Your organization 86
Security & Privacy – What Steps do we Security & Privacy – What Steps do we Need to TakeNeed to Take
Be sure to clear these actions with those responsible for security and privacy.
© Copyright 2012 Your organization 87
© Copyright 2012 Your organization 88
PerformancePerformance
Benchmarking Capacity planning Designing (optimal schemas) Coding (efficient SQL calls) Monitoring and measuring Tuning
– Database structures– DBMS parameters and OS– Communication links– Hardware
© Copyright 2012 Your organization 89
Categorization for PerformanceCategorization for Performance
How good does response time need to be? How does it differ from application to application? What is the cost-benefit of excellent response time? Were performance considerations included in the
architecture?
© Copyright 2012 Your organization 90
Categorization for AvailabilityCategorization for Availability
Scheduled hours (24 X 7, 18 X 6,…) Availability during scheduled hours How does it differ from system to system? Is excellent availability cost justified? Was availability included in the architecture?
© Copyright 2012 Your organization 91
Capacity PlanningCapacity Planning
Database size Number of users Number of transactions Number of queries/reports Time and day of usage Complexity of transactions/queries/reports Proactive response to capacity increase
© Copyright 2012 Your organization 92
Monitoring/MeasuringMonitoring/Measuring
Response time Resource utilization (CPU, disk access, network) Who is using the system When is the system being used Chargebacks
© Copyright 2012 Your organization 93
Service Level AgreementsService Level Agreements
Response time Availability
– Schedule hours (hours/day, days/week)– Availability during scheduled hours
Timeliness of data Response to problems Response to new requests Who establishes agreements? What’s realistic? Incentives to meet SLAs
© Copyright 2012 Your organization 94
Reporting performanceReporting performance
IT– Who needs to take action– Who needs to see reports/alerts
Business– Matching project agreements– Expectations
© Copyright 2012 Your organization 95
TuningTuning
Awareness of problems – measurement tools and responsibilities
Tuning capability of platform, RDBMS, tools Responsibility for tuning
© Copyright 2012 Your organization 96
Measurement ToolsMeasurement Tools
Performance Usage Resource utilization Network
© Copyright 2012 Your organization 97
Performance & Measurement – Best Performance & Measurement – Best PracticesPractices
Determine what is advantageous to measure Assign responsibilities Designate tools for measurement Report metrics to management
© Copyright 2012 Your organization 98
DBMS/Product SelectionDBMS/Product Selection
Desktop Remote Client
Mid-range Workgroup Server
Industrial-strengthEnterprise Server
© Copyright 2012 Your organization 99
Relational DBMSRelational DBMS
Which RDBMS is the standard Relation to platform What applications is it being used for
© Copyright 2012 Your organization 100
Why standardize the RDBMS?Why standardize the RDBMS?
Minimize the number of RDBMSs Less training required More leverage on RDBMS vendor Flexible assignments Fewer interface problems Fewer interface programs
© Copyright 2012 Your organization 101
Relation to platformRelation to platform
RDBMS performance impacted by platform Platform may dictate (or strongly recommend)
RDBMS choice Which decision comes first?
Desktop Remote Client
Mid-range Workgroup Server
Industrial-strengthEnterprise Server
© Copyright 2012 Your organization 102
How DBMS is being used How DBMS is being used
Operational/OLTP Data Warehouse/Business Intelligence
ODSDMEDW
OM
DW DatabasesOperational Systems
© Copyright 2012 Your organization 103
Tools/UtilitiesTools/Utilities
Platform dependent DBMS dependent Expensive 33% on the shelf Lots of product duplication Necessary?
© Copyright 2012 Your organization 104
Standards for ProductsStandards for Products
Who sets standards? Are the standards known? Are they standards or guidelines? Who can give dispensation?
© Copyright 2012 Your organization 105
Criteria for SelectionCriteria for Selection
Need Cost Vendor
– Support– Reputation– Financial stability
© Copyright 2012 Your organization 106
Responsibility for SelectionResponsibility for Selection
Technical evaluators Strategic architect Management
© Copyright 2012 Your organization 107
Single Vendor vs Best of BreedSingle Vendor vs Best of Breed
Single vendor– Possibly a better relationship– Leverage– Not always the best products– Products should all work together
Best-of-breed– Need to integrate yourself– Finger pointing when problems– Potential incompatibilities
© Copyright 2012 Your organization 108
Deals/NegotiationsDeals/Negotiations
Have someone else negotiate Don’t let vendor know you have chosen them before
you negotiate www.dobetterdeals.com (Joe Auer – ComputerWorld)
© Copyright 2012 Your organization 109
Relationship with VendorsRelationship with Vendors
Partnerships Money Issues Support Conferences Being a reference
© Copyright 2012 Your organization 110
Databases Required by the Application Databases Required by the Application PackagesPackages
Packages do not support all DBMSs Packages do not support all DBMSs equally well Does preferred DBMS violate database standard Are support personnel (DBAs) available?
© Copyright 2012 Your organization 111
Impact of PackageImpact of Package
Machine Requirements Performance Availability
© Copyright 2012 Your organization 112
DBMS/Product Selection – Best DBMS/Product Selection – Best PracticesPractices
Determine real requirements Establish software standards Make use of existing software whenever possible Talk to organizations who are using the products
© Copyright 2012 Your organization 113
trend metric actual target variance
same store salescustomer retentionnew customerscharge cards issued30 day past-due accounts60 day past-due accounts90 day past-due accountsmerchandise return rateinventory turnover rate
$108.0m $120.0m - 10%
96% 95% +0.9%
3.8k 5.0k -24.0%
trend metric actual target variance
same store salescustomer retentionnew customerscharge cards issued30 day past-due accounts60 day past-due accounts90 day past-due accountsmerchandise return rateinventory turnover rate
$108.0m $120.0m - 10%
96% 95% +0.9%
3.8k 5.0k -24.0%
trend metric actual target variance
same store salescustomer retentionnew customerscharge cards issued30 day past-due accounts60 day past-due accounts90 day past-due accountsmerchandise return rateinventory turnover rate
$108.0m $120.0m - 10%
96% 95% +0.9%
3.8k 5.0k -24.0%
Business intelligence (BI)Business intelligence (BI)
… provides decision makers
a 360o view of their business
8.5k 12.0k -33.3%
500 400 +2.0%
FinancialPerformance
regulatorywarning
marketopportunity
complianceviolation
Daily SalesMarketGrowth
Meters Alerts Trends Forecasts
Source: TDWI
© Copyright 2012 Your organization 114
Goals and ObjectivesGoals and Objectives
Why have a data warehouse? Have goals and objectives been identified? Have they been communicated? Are they measured post-implementation?
© Copyright 2012 Your organization 115
ArchitectureArchitecture
Platform Tools/products How the data flows
© Copyright 2012 Your organization 116
DW and BI ToolsDW and BI Tools
RDBMS Data Modeling ETL Access and Analysis Data quality (Cleansing) Measurement
© Copyright 2012 Your organization 117
Data MiningData Mining
Data mining Data farming
Verification of assumptions Discovery of the unknown
Results based on known data relationships
Yields information that can be proven to be factual
Deductive method
Inferred results from data found in database
Yields information that is assumed to be true for
some probability
Inductive method
© Copyright 2012 Your organization 118
Data Sources for Data MiningData Sources for Data Mining
Orders
Shipments
Account Master
Billing
ETL
EnterpriseData Warehouse
Sales DM
Customer DM
Data Mining Applications
Operational databases DW databases
Data MiningDatabases
© Copyright 2012 Your organization 119
Spiral BI/DW MethodologiesSpiral BI/DW Methodologies
BusinessOpportunity
BusinessOpportunity
BI/DW BI/DW Applications Applications
Assessment& Strategy
Assessment& Strategy
ProjectPlan
ProjectPlan
DataRequirement
DataRequirement
BusinessAnalysis
BusinessAnalysis
Post-Impl.Review
Post-Impl.Review
ApplicationDesign
ApplicationDesign
DevelopmentDevelopment
ImplementationImplementation
TestingTesting
DataInventory
DataInventory
BusinessGoals
© Copyright 2012 Your organization 120
Software Release ConceptSoftware Release Concept
Project = ApplicationProject = Application //
“Refactoring”- Kent Beck
“Extreme scoping”- Larissa Moss
“feels like prototyping”
SecondRelease
FirstRelease
FourthRelease
Reusable &Expanding
FinalRelease
BI Application
FifthRelease
ThirdRelease
Projects
© Copyright 2012 Your organization 121
Using the Software Release ApproachUsing the Software Release Approach
Mistakes are less expensive to fix early in the development process!
Unstable requirements can be tested and enhanced in small increments
Scope is very small and manageable Technology infrastructure can be tested and proven Data volumes (per release) are relatively small Project schedules are easier to estimate because the
scope is very small Development activities can be iteratively refined, honed,
and adapted
© Copyright 2012 Your organization 122
Using the Software Release ApproachUsing the Software Release Approach
And the quality of the release deliverables (and ultimately the quality of the BI applications) will be higher!
And the development process will get faster and faster!
Unstable requirements can be tested and enhanced in small increments
Scope is very small and manageable Technology infrastructure can be tested and proven Data volumes (per release) are relatively small Project schedules are easier to estimate because the
scope is very small Development activities can be iteratively refined, honed,
and adapted
© Copyright 2012 Your organization 123
Software Release GuidelinesSoftware Release Guidelines
Deliver every three to six months (first release will take longer)
Strictly control the scope and keep it very small
Keep expectations realistic The enterprise infrastructure must be robust
(technical and non-technical) Metadata must be an integral part of each release;
otherwise, the releases will not be manageable Designs, programs, and tools must be flexible
SecondRelease
FirstRelease
FourthRelease
FinalRelease
BI Application
FifthRelease
ThirdRelease
© Copyright 2012 Your organization 124
Iterative BI Application DevelopmentIterative BI Application Development
Planning
Requiremts & Data Analysis
Requiremts& Application Prototyping
Meta DataRepository Analysis
ETLDesign
ApplicationPrototyping
Meta DataRepository Design
ETLDevelopment
Meta DataRepositoryDevelopment
Data Analysis
Data Mining
ApplicationDevelopment
ETLTesting
Meta DataRepositoryTesting
ApplicationTesting
ETL Design
ApplicationPrototyping
Release Implementatn
BusinessCase
Assessment
Post-Impl.Review
Planning
Requiremts & Data Analysis
Requiremts& Application Prototyping
Meta DataRepository Analysis
ETLDesign
ApplicationPrototyping
Meta DataRepository Design
ETLDevelopment
Meta DataRepositoryDevelopment
Data Analysis
Data Mining
ApplicationDevelopment
ETLTesting
Meta DataRepositoryTesting
ApplicationTesting
ETL Design
ApplicationPrototyping
Release Implementatn
BusinessCase
Assessment
Post-Impl.Review
Planning
Requiremts & Data Analysis
Requiremts& Application Prototyping
Meta DataRepository Analysis
ETLDesign
ApplicationPrototyping
Meta DataRepository Design
ETLDevelopment
Meta DataRepositoryDevelopment
Data Analysis
Data Mining
ApplicationDevelopment
ETLTesting
Meta DataRepositoryTesting
ApplicationTesting
ETL Design
ApplicationPrototyping
Release Implementatn
BusinessCase
Assessment
Post-Impl.Review
Planning
Requiremts & Data Analysis
Requiremts& Application Prototyping
Meta DataRepository Analysis
ETLDesign
ApplicationPrototyping
Meta DataRepository Design
ETLDevelopment
Meta DataRepositoryDevelopment
Data Analysis
Data Mining
ApplicationDevelopment
ETLTesting
Meta DataRepositoryTesting
ApplicationTesting
ETL Design
ApplicationPrototyping
Release Implementatn
BusinessCase
Assessment
Post-Impl.Review
Planning
Requiremts & Data Analysis
Requiremts& Application Prototyping
Meta DataRepository Analysis
ETLDesign
ApplicationPrototyping
Meta DataRepository Design
ETLDevelopment
Meta DataRepositoryDevelopment
Data Analysis
Data Mining
ApplicationDevelopment
ETLTesting
Meta DataRepositoryTesting
ApplicationTesting
ETL Design
ApplicationPrototyping
Release Implementatn
BusinessCase
Assessment
Post-Impl.Review
Planning
Requiremts & Data Analysis
Requiremts& Application Prototyping
Meta DataRepository Analysis
ETLDesign
ApplicationPrototyping
Meta DataRepository Design
ETLDevelopment
Meta DataRepositoryDevelopment
Data Analysis
Data Mining
ApplicationDevelopment
ETLTesting
Meta DataRepositoryTesting
ApplicationTesting
ETL Design
ApplicationPrototyping
Release Implementatn
BusinessCase
Assessment
Post-Impl.Review
Release 1
Release 2
Release 3
Release 4
Release 5
Release 6
BIApplication
© Copyright 2012 Your organization 125
Business Intelligence – Best PracticesBusiness Intelligence – Best Practices
Set goals and objectives Set expectations early and often Establish cost justification Find a terrific sponsor Use a spiral methodologies Deliver often with software releases
BI & DW – How well are we doing?BI & DW – How well are we doing?
Include applications, departments, number of users, usage, user satisfaction, ROI, management perception,…
© Copyright 2012 Your organization 126
DW & BI – What are we going to do to DW & BI – What are we going to do to make our DW and BI Sing?make our DW and BI Sing?
This might include training, selling to management and end users, new BI tools, new organizational responsibilities,…
© Copyright 2012 Your organization 127
© Copyright 2012 Your organization 128
OutlineOutline
Benefits of a data governance strategy
Components of a data governance strategy
Organization, roles and responsibilities
Impact of a data governance strategy on BI and IT
How to implement a data governance strategy program
© Copyright 2012 Your organization 129
Organization, roles and responsibilitiesOrganization, roles and responsibilities
Data owner Data steward Data strategist Strategic architect Database administrator/designer Data administrator (EIM) Metadata administrator (EIM) Data quality analyst (EIM) Security officer
© Copyright 2012 Your organization 130
Data ownerData owner
Assigned to business people (often data originators)
Typically hold a senior position (directors or managers)
Have authority to set policies and dictate business rules and security for the data
Are accountable to the information consumers in the organization
© Copyright 2012 Your organization 131
Data stewardData steward
Should be assigned to business people, but could be performed by senior business analysts from IT
Must know the industry and the organization very well (often people with seniority)
Requires an enterprise-wide understanding of the data and the business rules
Have authority to communicate and enforce policies, business rules, and security for the data
Mediate data disputes among business people and facilitate resolutions
© Copyright 2012 Your organization 132
Data strategistData strategist
Understands the strategic business goals Knows the government regulations and governmental
reporting requirements Understands the DBMS platforms and operating
systems Knows the internal application databases (operational
and BI) Is aware of future data demands and data volumes Creates and maintains the data governance strategy
© Copyright 2012 Your organization 133
Strategic architectStrategic architect
Develops the overall architecture for both operational and BI environments to include:
– Software– Utilities– Tools– Interfaces
Determines if the BI/DW environment will be one-tier or multi-tier and what the platform components should be
Participates in architecting databases and data flows
© Copyright 2012 Your organization 134
Database administrator/designerDatabase administrator/designer
Understands user requirements and how databases are accessed and updated
Knows different database design techniques (relational, multi-dimensional) and when to apply them
Is responsible for the physical aspects of application databases:
– Logical and physical database design– Partitioning and indexing– Dataset placement – Performance and tuning (databases and SQL)– Backup and recovery
Maintains the application databases
© Copyright 2012 Your organization 135
Data administratorData administrator Knows the industry and the business processes Understands the data and the business rules that
are used by those processes Has expertise in E/R modeling and knows the
normalization rules Standardizes and integrates the data (logically)
through the enterprise information architecture Creates and enforces data naming standards Collects and maintains business metadata:
– Data names (fully spelled out business names)– Data definitions and metrics definitions– Business rules (data rules and process rules)
© Copyright 2012 Your organization 136
Metadata administratorMetadata administrator
Knows industry metadata standards Understands DW databases and ETL architectures Builds and maintains a metadata repository or
administers a purchased MDR product Selects and installs metadata integration and access
tools Integrates and loads metadata from various BI and
developer tools (Data Modeling, Data Profiling, DBMS, ETL, OLAP)
© Copyright 2012 Your organization 137
Data quality analystData quality analyst
Knows the internal application databases and how to extract data from them
Is familiar with data profiling and data cleansing tools Understands the user requirements, the business
processes, and the business rules Audits operational source data to find and report
violations of business rules and other DQ problems Participates in writing data cleansing specs Identifies root causes for dirty data Facilitates negotiations between data originators and
information consumers about DQ improvements
© Copyright 2012 Your organization 138
Security officerSecurity officer
Knows the governmental security and privacy regulations (HIPAA)
Understands the business requirements for securing the data
Understands security features and capabilities of the application components (DBMS, BI tools, Web portals)
Ensures that appropriate security settings are placed on:– Databases– BI tools– Developer tools– Web portals
Organization – Do we have the right Organization – Do we have the right roles and responsibilities?roles and responsibilities?
Include and responsibilities that overlap and identify any gaps where some roles are not be filled.
© Copyright 2012 Your organization 139
Organization – What should we be Organization – What should we be considering?considering?
Be careful here. You are likely to step on toes. Be sure to vet any proposed changes with the appropriate management.
© Copyright 2012 Your organization 140
© Copyright 2012 Your organization 141
OutlineOutline
Benefits of a data governance strategy
Components of a data governance strategy
Organization, roles and responsibilities
Impact of a data governance strategy on BI and IT
How to implement a data governance strategy program
© Copyright 2012 Your organization 142
Impact of a data governance strategy Impact of a data governance strategy on BI and ITon BI and IT Better and faster decisions Increased analyst productivity Employee empowerment Cost containment Cash flow acceleration Revenue enhancement Fraud reduction Demand chain management Better customer service Lower customer attrition Better relationships with suppliers and customers Public relations and reputation
RELIABLEINFORMATION
© Copyright 2012 Your organization 143
Gain ControlGain Control
Consistent security implementation Understand, define and assign ownership Understand, define and assign stewardship Minimize redundancy Inventory data Develop consistent terminology
© Copyright 2012 Your organization 144
Support the IT StrategySupport the IT Strategy
Provide departments, projects and personnel with guidelines for storing and accessing data
Minimize the number of RDBMSs Establish, disseminate and maintain standards for
shared data resources Deliver a high level of service
– Performance – Availability– Response time – Responsiveness to user requests
© Copyright 2012 Your organization 145
OutlineOutline
Benefits of a data governance strategy
Components of a data governance strategy
Organization, roles and responsibilities
Impact of a data governance strategy on BI and IT
How to implement a data governance strategy
© Copyright 2012 Your organization 146
Incremental Data Governance Strategy Incremental Data Governance Strategy ImplementationImplementation
Don’t get into the details too soon Don’t be seen as a theorist -- your actions must be
pragmatic Don’t lead with long-term deliverables Don’t commit more than you can deliver Avoid unproven technology
© Copyright 2012 Your organization 147
Steps to Implement a Data Governance Steps to Implement a Data Governance StrategyStrategy
Conduct a data environment assessment Establish a target data environment Develop an implementation plan Sell data governance strategy within the organization Evaluate progress and justify your existence Revisit the plan
© Copyright 2012 Your organization 148
SummarySummary
Pitch the importance of a data governance strategy to your CIO or CTO
Ask to either lead the effort or to be a permanent member of the team
© Copyright 2012 Your organization 149
Thank youThank you
ISBN 0-201-61635-1
ISBN 0-201-78420-3
ISBN 0-201-76033-9
ISBN 0-321-24099-5
Larissa MossMethod Focus, Inc.
Sid AdelmanSid Adelman & Associates