GIS Data Quality
description
Transcript of GIS Data Quality
![Page 1: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/1.jpg)
GIS Data QualityGIS Data Quality
Producing better data quality Producing better data quality through robust business through robust business
processesprocesses
Kim Ollivier BrightStar
TRAINING
![Page 2: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/2.jpg)
Schedule Day OneSchedule Day One
Suggested breaks for the following times: Start: 9:00
Session 1 ( 90 min)Morning tea: 10:30 to 10:45
Session 2 ( 105 min)Lunch: 12:30 to 1:30
Session 3 ( 90 min) Afternoon tea: 3:00 to 3:15
Session 4 ( 105 min)Finish: 5:00
Each session will have an exercise or interactive discussion
![Page 3: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/3.jpg)
TodayToday
IntroductionIntroduction What causes poor qualityWhat causes poor quality
LunchLunch
Assessing Quality processesAssessing Quality processes GIS upgrade project examplesGIS upgrade project examples
![Page 4: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/4.jpg)
TomorrowTomorrow
Metadata Designing rules
Lunch
Data warehouse and ETL Feature maintenance
![Page 5: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/5.jpg)
OverviewOverview
Introduce yourselfIntroduce yourself Your goals for this course?Your goals for this course?
Build a data quality systemBuild a data quality system Avoid the worst trapsAvoid the worst traps Be able to describe a project scopeBe able to describe a project scope
• Budget, timeline, prioritiesBudget, timeline, priorities
![Page 6: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/6.jpg)
Sections of course based onSections of course based on
With permission from the author
ISBN 978-0-09771400-2
![Page 7: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/7.jpg)
What is Data Quality?What is Data Quality?
“If they are fit for their intended uses in operations, decision making and planning.”
“If they correctly represent the real-world construct to which they refer.”
![Page 8: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/8.jpg)
Spatial AccuracySpatial Accuracy
![Page 9: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/9.jpg)
![Page 10: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/10.jpg)
Statistical AccuracyStatistical Accuracy
Completeness Score = Relevant Relevant + MissingAccuracy Score = Relevant - Errors Relevant Overall Score = Relevant - Errors Relevant + Missing
![Page 11: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/11.jpg)
CompletenessCompleteness
LINZ Bulk Data ExtractLINZ Bulk Data Extract metadata\metadata\meta.htmlmeta.html
![Page 12: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/12.jpg)
Data ProfilingData Profiling
Find out what is thereFind out what is there Assess the risksAssess the risks Understand data challenges earlyUnderstand data challenges early Have an enterprise view of all dataHave an enterprise view of all data
![Page 13: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/13.jpg)
Profile MetricsProfile Metrics
IntegrityIntegrity ConsistencyConsistency Completeness, DensityCompleteness, Density ValidityValidity TimelinessTimeliness AccessibilityAccessibility UniquenessUniqueness
![Page 14: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/14.jpg)
SecuritySecurity
ConfidentialityConfidentiality PossessionPossession IntegrityIntegrity AuthenticityAuthenticity AvailabilityAvailability UtilityUtility
![Page 15: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/15.jpg)
ConsistencyConsistency
Discrepancies between attributesDiscrepancies between attributes Exceptions in a cluster Exceptions in a cluster Spatial discrepanciesSpatial discrepancies
![Page 16: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/16.jpg)
![Page 17: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/17.jpg)
![Page 18: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/18.jpg)
A GIS Data A GIS Data Quality SystemQuality System
Assess
Data Quality AssessmentData Profiling
Improve Prevent Recognise
Data CleaningMonitoring
Data IntegrationInterfaces
Ensuring Quality ofData Conversionand Consolidation
Building DataQuality Metadata
Warehouse
Monitor
Recurrent Data QualityAssessment
![Page 19: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/19.jpg)
Course examplesCourse examples
LINZ coordinate upgrade 1998-2003LINZ coordinate upgrade 1998-2003 NSCC services upgrade 2008NSCC services upgrade 2008 Valuation roll structure and matchingValuation roll structure and matching ETL of utilites from SDE to AutocadETL of utilites from SDE to Autocad Address location issues NAR, DRAAddress location issues NAR, DRA
Documents and examples on memory stick
![Page 20: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/20.jpg)
Exercise 1:Exercise 1:Nominate your databaseNominate your database
Select a representative example dataset Select a representative example dataset for later discussionfor later discussion
You may be responsible forYou may be responsible for Or, you have to integrateOr, you have to integrate Or, you have to load itOr, you have to load it Or, you supply it to othersOr, you supply it to others
Morning Tea
![Page 21: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/21.jpg)
Assessing QualityAssessing Quality
1.1. Project stepsProject steps2.2. Required rolesRequired roles3.3. Defining the objectivesDefining the objectives4.4. Designing rulesDesigning rules5.5. Scorecard and MetadataScorecard and Metadata6.6. Frequency of assessmentFrequency of assessment7.7. Common mistakesCommon mistakes
![Page 22: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/22.jpg)
Processes Affecting Data QualityProcesses Affecting Data Quality
Real-TimeInterfaces
Batch Feeds
Manual DataEntry
System Consolidations
Initial Data Conversion
Processes bringing data from outside
Process Automation
Loss of Expertise
New DataUses
System Upgrades
Changes notcaptured
Processes causingdata decay
Processes changing data from within
Data processing Data cleaning Data purging
Database
![Page 23: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/23.jpg)
Outside: Initial Data ConversionOutside: Initial Data Conversion
Define data mappingDefine data mapping Extract, Transform, Load (ETL)Extract, Transform, Load (ETL) Drown in Data ProblemsDrown in Data Problems Find Scapegoat Find Scapegoat
![Page 24: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/24.jpg)
Outside: System ConsolidationOutside: System Consolidation
Often from mergers (Auckland?)Often from mergers (Auckland?)• Unplanned, unreasonable timeframesUnplanned, unreasonable timeframes
Head-on two car wreckHead-on two car wreck Square pegs into round holesSquare pegs into round holes Winner – loser merging (50% wrong)Winner – loser merging (50% wrong)
![Page 25: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/25.jpg)
Outside: Manual Data EntryOutside: Manual Data Entry
High error rateHigh error rate Complex and poor entry formsComplex and poor entry forms Users find ways around checksUsers find ways around checks Forcing non blanks does not workForcing non blanks does not work
![Page 26: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/26.jpg)
Outside: Batch FeedsOutside: Batch Feeds
Large volumes mean lots of errorsLarge volumes mean lots of errors Source system subject to changesSource system subject to changes Errors accumulateErrors accumulate Especially dangerous if triggers Especially dangerous if triggers
activatedactivated
![Page 27: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/27.jpg)
Outside: Real-Time InterfacesOutside: Real-Time Interfaces
Data between db’s in synchronisationData between db’s in synchronisation Data in small packets out of contextData in small packets out of context Too fast to validateToo fast to validate Rejection loses record, so acceptedRejection loses record, so accepted
Faster or better but not both!Faster or better but not both!
![Page 28: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/28.jpg)
Decay: Changes Not CapturedDecay: Changes Not Captured
Object changes are unnoticed by Object changes are unnoticed by computerscomputers
Retroactive changes may not be Retroactive changes may not be propagatedpropagated
![Page 29: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/29.jpg)
Decay: System UpgradesDecay: System Upgrades
The data is assumed to comply with the The data is assumed to comply with the new requirementsnew requirements
Upgrades are tested against what the Upgrades are tested against what the data is supposed to be, not what is data is supposed to be, not what is actually thereactually there
Once upgrades are implemented Once upgrades are implemented everything goes haywireeverything goes haywire
![Page 30: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/30.jpg)
Decay: New Data UsesDecay: New Data Uses
““Fitness to the purpose of use” may not Fitness to the purpose of use” may not applyapply
Acceptable error rates may now be an Acceptable error rates may now be an issueissue
Value granularity, map scaleValue granularity, map scale Data retention policyData retention policy
![Page 31: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/31.jpg)
Decay: Loss of ExpertiseDecay: Loss of Expertise
Meaning of codes may change over time Meaning of codes may change over time that only “experts” knowthat only “experts” know
Experts know when data looks wrongExperts know when data looks wrong Retirees rehired to work systemsRetirees rehired to work systems Auckland address points were entered Auckland address points were entered
on corners and the rest guessed, later on corners and the rest guessed, later used as exact.used as exact.
![Page 32: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/32.jpg)
Decay: Process AutomationDecay: Process Automation
Web 2.0 bots automate form fillingWeb 2.0 bots automate form filling Transactions are generated without ever Transactions are generated without ever
being checked by peoplebeing checked by people Customers given automated access are Customers given automated access are
more sensitive to errors in their own more sensitive to errors in their own datadata
![Page 33: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/33.jpg)
Within: Data ProcessingWithin: Data Processing
Changes in the programsChanges in the programs Programs may not keep up with changes Programs may not keep up with changes
in data collectionin data collection Processing may be done at the wrong Processing may be done at the wrong
timetime
![Page 34: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/34.jpg)
Special GIS Data IssuesSpecial GIS Data Issues
Coordinate data not usually readableCoordinate data not usually readable Data models CAD v GIS Data models CAD v GIS Fuzzy matching is not Boolean (near)Fuzzy matching is not Boolean (near) Atomic objects harder to defineAtomic objects harder to define Features have 2,3,4,5 dimensionsFeatures have 2,3,4,5 dimensions Projection systems are not exactProjection systems are not exact Topology requires special operatorsTopology requires special operators
![Page 35: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/35.jpg)
Within: Data PurgingWithin: Data Purging
Highly risky for data qualityHighly risky for data quality Relevant data may be purgedRelevant data may be purged Erroneous data may fit criteriaErroneous data may fit criteria It may not work the next yearIt may not work the next year
![Page 36: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/36.jpg)
Within: Data CleaningWithin: Data Cleaning
En masseEn masse processes may add errors processes may add errors Cleaning processes may have bugsCleaning processes may have bugs Incomplete information about dataIncomplete information about data
![Page 37: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/37.jpg)
Assessing Data QualityAssessing Data Quality
Data profilingData profiling Interview usersInterview users Examine data modelExamine data model Data GazingData Gazing
![Page 38: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/38.jpg)
Data GazingData Gazing
Count the recordsCount the records Just open the sources and scrollJust open the sources and scroll Sort and look at the endsSort and look at the ends Run some simple frequency reportsRun some simple frequency reports See if the field names make senseSee if the field names make sense What is missing that should be thereWhat is missing that should be there
Lunch
![Page 39: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/39.jpg)
Data CleaningData Cleaning
There are always lots of errorsThere are always lots of errors It is too much to inspect all by handIt is too much to inspect all by hand Data experts are rare and too busyData experts are rare and too busy It does not fix process errorsIt does not fix process errors You may make it worseYou may make it worse
![Page 40: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/40.jpg)
Automated CleaningAutomated Cleaning
The only practical methodThe only practical method Needs sophisticated pattern analysisNeeds sophisticated pattern analysis Allow for backtrackingAllow for backtracking Data quality rules are interdependentData quality rules are interdependent
![Page 41: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/41.jpg)
Common MistakesCommon Mistakes
1.1. Inadequate Staffing of Data Quality Teams Inadequate Staffing of Data Quality Teams 2.2. Hoping That Data Will Get Better by Itself Hoping That Data Will Get Better by Itself 3.3. Lack of Data Quality Assessment Lack of Data Quality Assessment 4.4. Narrow Focus Narrow Focus 5.5. Bad Metadata Bad Metadata 6.6. Ignoring Data Quality During Data Conversions Ignoring Data Quality During Data Conversions 7.7. Winner-Loser Approach in Data Consolidation Winner-Loser Approach in Data Consolidation 8.8. Inadequate Monitoring of Data Interfaces Inadequate Monitoring of Data Interfaces 9.9. Forgetting About Data Decay Forgetting About Data Decay 10.10. Poor Organization of Data Quality Metadata Poor Organization of Data Quality Metadata
![Page 42: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/42.jpg)
MetadataMetadata
Data modelData model Business rules, relations, stateBusiness rules, relations, state Subclasses (lookup tables)Subclasses (lookup tables) GIS Metadata (NZGLS or ISO) XMLGIS Metadata (NZGLS or ISO) XML Readme.txtReadme.txt
Includes everything known about the data
![Page 43: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/43.jpg)
Data ExchangeData Exchange
Batch or interactiveBatch or interactive ETL (Extract Transform Load)ETL (Extract Transform Load) ReplicationReplication Time differences in dataTime differences in data
![Page 44: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/44.jpg)
GIS in Business ProcessesGIS in Business Processes
Integrates many different sourcesIntegrates many different sources Spatial patterns are revealedSpatial patterns are revealed Display thousands of records Display thousands of records
simultaneously with direct accesssimultaneously with direct access Location now seen as importantLocation now seen as important
![Page 45: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/45.jpg)
ScorecardScorecard
DQ Score
Score SummaryScore Decompositions
Intermediate Error ReportsAtomic Level Data Quality Information
![Page 46: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/46.jpg)
Case StudyCase Study
Outline a GIS data quality systemOutline a GIS data quality system Measles ChartMeasles Chart PrioritisePrioritise InterviewInterview Build up a scorecardBuild up a scorecard
Afternoon Tea
![Page 47: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/47.jpg)
Assessment ExerciseAssessment Exercise
Split into pairsSplit into pairs Interview one person about their datasetInterview one person about their dataset Collect basic informationCollect basic information Devise a strategy for a profileDevise a strategy for a profile
Rotate pair with anotherRotate pair with another Interview other personInterview other person
Verbal reports to classVerbal reports to class
![Page 48: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/48.jpg)
Major Upgrade ProjectsMajor Upgrade Projects
LINZ Coordinate upgradeLINZ Coordinate upgrade NSCC Coordinate upgradeNSCC Coordinate upgrade
![Page 49: GIS Data Quality](https://reader035.fdocuments.in/reader035/viewer/2022081506/568148ab550346895db5bf44/html5/thumbnails/49.jpg)
ReferencesReferences
Data Quality Assessment – Arkady MaydanchikData Quality Assessment – Arkady Maydanchik