1 Managing Information Quality in Organisations Based on a presentation by Dr Mikhaila Burgess...
-
Upload
lionel-hawkins -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Managing Information Quality in Organisations Based on a presentation by Dr Mikhaila Burgess...
1
Managing Information Quality in Organisations
Based on a presentation by Dr Mikhaila Burgess
School of Computer Science & Informatics
Cardiff University
2
Session overview What is quality? What is Data Quality (DQ)? And
why is it important anyway? Potential impact of poor DQ (data quality) Defining Data Quality
Designing for Quality Data Ensuring DQ in databases
So what goes wrong? Potential causes of poor DQ
Managing DQ
… and some exercises
3
Data vs Information
Items about things, events, activities, transactions, … Numeric, alphanumeric, figures, sounds, images, … Recorded, stored, but not organised to convey any
specific meaning
“data that have been organised in a manner that gives them meaning for the recipient” (Turban et al, 2005)
known; ‘surprise’ value
One person’s data is another’s information
Data
Information
What is ‘quality’?What does the word actually mean?
Why is DQ important?Impact of Poor Data Quality … some examples
Defining Data QualityHow do we know what we all mean when we talk about DQ?
Designing for Quality DataEnsuring a level of quality is your databases
So what goes wrong?Some causes of poor quality data & information
9
Data Entry: Human Aspect Unintentional errors in data entry Lack of understanding Poor Training Intentional incorrect data entry
Malicious / Non-malicious
Poorly defined or out-of-date collection process
Multiple levels of data entry
Garbage in, Garbage out
10
Data Entry: Technical Aspect Inaccurate measuring or counting device Errors in the data storage process Missing data fields Data scanner
Poor quality data scanner Inappropriate scanner
Microfiche Microfilm Aperture cards
Incorrect set-up
11
Herbarium Catalogue Approx 7 million specimens
Pressed & dried Preserved in spirit
30,000 per year HerbCat
www.kew.org/herbcat/ ePIC – electronic Plant
Information Centre www.kew.org/epic/
12
Type Specimen Over 350,000 Original specimen Fixed species name &
description
18th century Reference point for
botanists – applying names correctly (taxonomy & systematics)
http://www.kew.org/collections/herb_types.html
13
Random Data
“The snafu started when police used the address as part of what Browne called “random material’’ to test an automated computer system that tracks crime complaints and records of
other internal police information”
Thursday 18th March 2010 – NYPD’s Identity Theft Squad deliver cheesecake to Walter (83) and Rose (82) Martin, Brooklyn, NY
50 raids over 8 years
50 errant visits blamed on computer glitch
Apologise & explain … and to check people “weren’t using that address for identity theft”
Cops Sorry For Coming To Wrong Home 50 Times
(Associated Press & Boston Globe)
14
Organisational Issues Scattering of databases throughout different
departments or organisations Lack of awareness of data quality issues Obsession with technology Old (Legacy) databases
Poorly documented data Missing/poor documentation about purpose Obsolete data
Mergers & Acquisitions Non-merging of databases - autonomy Merging of databases Data stored in multiple locations and not correctly linked
15
Merging Databases Homonyms & synonyms
Surname, Name, Customer, CustName, … OrderID
ID for order processed for a customer ID for order placed with a supplier
Representational inconsistency Data: eg address Database: eg
Oracle & SQLServer Access & Objectivity
16
Merging Databases Designed for different purpose
Database design Data collection
Student database storing module marks, working out number of resits, allowing to
proceed, degree classification storing financial details, whether fees have been paid, ensuring no
awards presented until account is clear
RAF, Navy, Army Codes for individual stock items Merged db’s … Iraq – 3 days out of action!
17
Merging Databases Duplicate data
eg customer name: Mikhaila Burgess
Misspellings:
Michaela Burges
Mikalia
Mikkalia
Michael
Michelle
Burge
Burgers
Burgese
Barron
Variations: Dr Mikhaila Burgess
Dr M S E Burgess
Ms M Burgess
M Burgess
Mr M Burgess
18
Introducing DQ problems
Creator Custodian Consumer
Data productionSame data collected in different data sets Customer data: Sales, Support, Finance, … Hospital: clinical, diagnosis, specialist treatment, finance, … Different purpose, different data stored
Not necessarily the same values Different entry procedures & constraints Different relevant information Cascading updates?
(Strong et al 1997)
19
Introducing DQ problems
Creator Custodian Consumer
Data storagePotentially large volumes of data Accessibility challenges Access codes (eg country: 1-UK, 2-USA, …)Distributed data Heterogeneous storage systems Potentially inconsistent data formats & values
(Strong et al 1997)
20
Introducing DQ problems
Creator Custodian Consumer
Data usageInformation needs change Personal requirements Organisational environment Data no longer relevantConflicts between accessibility and security, privacy & confidentialityAccess limitation due to lacking IT resources
(Strong et al 1997)
But who are these people?
21
An Issue of Change Organisations change The environment changes
government, competition, market needs, customers, customer requirements …
Requirements & specifications change Different projects have different requirements
Require data for different purposes
Ideal world: stop data entry, clean, ensure fit for purpose, restart with perfect database Tomorrow it will no longer be perfect!
22
10 Potholes to IQ#1 Multiple sources of the same information produce different values.
#2 Information is produced using subjective judgments, leading to bias.
#3 Systemic errors in information production lead to lost information.
#4 Large volumes of stored information make it difficult to access information in a reasonable time.
#5 Distributed heterogeneous systems lead to inconsistent definitions, formats, and values.
#6 Nonnumeric information is difficult to index.
#7 Automated content analysis across information collections is not yet available.
#8 As information consumers’ tasks and the organisational environment change, the information that is relevant and useful changes.
#9 Easy access to information may conflict with requirements for security, privacy, and confidentiality.
#10 Lack of sufficient computing resources limits access.
(Strong et al 1997)
Managing InformationManage data/information as a product, not a by-product … TQM for Data!
24
The Deloitte CIO club October 2005 50% of CIOs report that data quality
issues have had a negative impact on their business in the last year, and 6% say it affects them on a daily basis. A further 19% are occasionally affected.
50% of CIOs consider data quality to be an IT issue: even though 88% also believe that their non-IT colleagues are aware of the benefits of better quality data.
Data cleansing is reactive, not proactive. Many CIOs stated it only happens “when it’s needed” – for example, when new systems are introduced – with none carrying out regular, programmed data cleansing sweeps.
•
••
Panel admits to lack of strategic approach to managing data quality
http://www.deloitte.com/uk/cio/
25
Managing data as a product Data & Information – typically treated as a by-product
Focus on system, not data Treat data/information as a product
An end deliverable that will satisfy customer needs Focus on data & fitness for purpose
Fundamental change in organisations understanding of data Follow four principles …
Understand consumer’s information needs Manage the data production process Manage data as a product with a product life-cycle Data product manager – responsible for managing the
data product
(Lee et al 2006)(Wang et al 1998)
Creator
Consumer
Custodian
26
Product & Information ManufacturingISSUE DIFFERENCE (examples)
Intangibility Manufactured products (MP) are tangible; Information Products (IP) are intangible
Inputs Product process requires raw materials, experience, technology;IP process needs 4 inputs – data, experience, technology, time
Consumption IPs can be repeatedly consumed;Raw materials/MPs need to be replaced
Handling MPs – limited/single userIPs potentially used by many simultaneously
… …
27
TQM to TDQM TQM – typical foundation for DQ/IQ programmes
Mea
sure
Analys
e
DefineIm
prov
e PLAN
DO
ACT
CHECK
Define the IP Identify characteristics of the IP, determine IQ dimensions Identify IP requirements Identification of IP manufacturing process, and
those involved
Analysis Pinpoints causes of poor IQ; effects on organisation; consider users; Pareto charts, SPC
Measurement Determining extent of IQ problems Looks at results of previous attempts to resolve
issues – learning from experience
Improvement Delivering methods of continuous improvement
28
Data Quality Policy For organisation to remain engaged & succeed in
maintaining a viable, sustained DQ effort Proactively support business activities
A DQ policy must reflect the vision of the organisation.
Start DQ management programme … effort not sustained Single DQ Champion or department … others fail to
come on board … not disseminated across business
Organisational policy must involve all functions and activities relating to the maintenance of data products.
29
10 Policy GuidelinesThe organisation …1. … adopts the basic principle of treating information as
product, not by-product. 2. … establishes and keeps data quality as a part of the business
agenda. 3. … ensures that the data quality policy and procedures are
aligned with its business strategy, business policy, and business processes.
4. … establishes, clearly defined data quality roles and responsibilities as part of its organisation structure.
5. … ensures that the data architecture is aligned with its enterprise architecture.
(Lee et al 2006)
30
10 Policy Guidelines6. … takes a proactive approach in managing changing data
needs. 7. … has practical data standards in place. 8. … plans for and implements pragmatic methods to identify
and solve data quality problems, and has in place a means to periodically review its data quality and data quality environment.
9. … fosters an environment conducive to learning and innovating with respect to data quality activities.
10. … establishes a mechanism to resolve disputes and conflicts among different stakeholders.
(Lee et al 2006)
31
Examples …
http://www.lancashirecare.nhs.uk/documents/FOI_12DataQualityPolicy.pdf
http://www.suffolk.gov.uk/CouncilAndDemocracy/OurPerformance/DataQualityPolicy.htm
32
Review What is quality?
Defining Quality & DQ Importance of quality data
DQ in databases Database design Database Integrity
Some examples of poor DQ and it’s impact http://www.iqtrainwrecks.com/
Measuring DQ Managing data as product
33
ReferencesCROSBY, P.B. (1978) Quality is Free: The Art of Making Quality Certain, McGraw-Hill.DROMEY, R. G. (1996) Concerning the Chimera. IEEE Software, 13(1), pp 33-43.JURAN, J. M. & GODFREY, A. B. (1999) Juran's Quality Handbook (Fifth Edition), McGraw
Hill, USA.LEE, Y.W., PIPINO, L.L., FUNK J.D. and WANG, R. Y. (2006) Journey to Data Quality, MIT
Press, MA, USA.PIRSIG, R. M. (1974) Zen and the Art of Motorcycle Maintenance, Random House.REDMAN, T.C. (1995) “Improve Data Quality for Competitive Advantage,” Sloan Management
Review, 36(2), Winter 1995, pp 99-107.REDMAN, T.C. (1997) Data Quality for the Information Age, Artech House.STRONG, D.W., LEE, Y.W. & WANG, R.Y. (1997) 10 Potholes in the Road to Information
Quality, IEEE Computer, August 1997, pp 38-46.TURBAN, E., ARONSON, J.E., & LIANG, T.P. Decision Support Systems and Intelligent systems
(7th ed), Prentice-Hall.WANG, R., LEE, Y.W., PIPINO, L.L. & STRONG D.M. (1998) “Managing Your Information as a
Product,” Sloan Management Review, 39(4), Summer 1998, pp95-105. WANG, R. & STRONG D. (1996) Beyond Accuracy: What data quality means to data consumers. Journal of Management Information Systems, Spring 1996, 12(4), pp 5-33.
WATSON, R.T. (2003) Data Management: Database and Organizations, Wiley & Sons.