Completeness

11
Completeness February 27, 2006 Geog 458: Map Sources and Errors

description

Completeness. February 27, 2006 Geog 458: Map Sources and Errors. Outlines. Completeness Testing completeness Documenting completeness in the metadata Data quality. Completeness. The data set is called “complete” if what’s defined/needed is encoded in the DB - PowerPoint PPT Presentation

Transcript of Completeness

Page 1: Completeness

Completeness

February 27, 2006

Geog 458: Map Sources and Errors

Page 2: Completeness

Outlines

• Completeness

• Testing completeness

• Documenting completeness in the metadata

• Data quality

Page 3: Completeness

Completeness

• The data set is called “complete” if what’s defined/needed is encoded in the DB

• Spatial completeness: degree to which all features are captured corresponding to data capture specifications

• Attribute completeness: degree to which the relevant attributes of a feature are available corresponding to a given capture specifications

• Data quality component that describes whether the entity objects represent all entity instances of the corresponding abstract universe

• Relationship between the objects represented in the data set and the abstract universe of all such objects

Page 4: Completeness

Abstract universe

• Can be thought of a reference frame• Data set = digital representation of a subset of

(perceived) reality• Abstract universe = terrain nominale; abstract view of the

universe; universe of discourse; miniworld; subset of perceived reality (it involves selection and abstraction process)

• Data set is intended to represent the abstract universe• Since completeness means the relationship between

data set and abstract universe, a useful characterization of completeness relies on a comprehensive definition of the abstract universe

Page 5: Completeness

Data completeness vs. Model completeness

• It is possible to classify completeness into two categories depending on how the abstract universe is defined or specified

• Data completeness: the abstract universe is defined on generic uses of data; application-independent

• Model completeness: the abstract universe is defined on specific uses of data; application-dependent

• So which would be more flexible? Which would have multiple versions of completeness on the same data?

Page 6: Completeness

Spatial completeness

• Let’s say the abstract universe “lake” is defined as the water body with the area more than 1 square mile

• Check the number of entities in the abstract universe; set this number to A

• Check the number of entities encoded in the DB (lake data set); set this number to B

• Completeness would be B/A• The definition of “lake” varies depending on

applications, thus so does A vary

Page 7: Completeness

Attribute completeness

• Subordinated to spatial completeness• Define what the relevant attributes will be

– Lake will have area, depth, type (freshwater), and so on

• Check if attribute values are missing for entity in hand– Geometric description might be incomplete (area)

• Report on the number of missing values out of the total number of features for each attribute

Page 8: Completeness

Relation to other data quality components

• Completeness may affect the logical consistency of a data set– Missing arc, node connectivity, closed polygon– Missing attribute (left and right-node) connectivity– Missing attribute in PK key constraint– Missing attribute in FK referential constraint

• So where do I document this in completeness or logical consistency?– If incompleteness causes logical inconsistency,

describe it in logical consistency section– Else it will be included in completeness section

Page 9: Completeness

Data quality vs. fitness of use

• Data quality– The totality of features and characteristics of a

data set that bear on its ability to satisfy a stated set of requirements; application-independent

• Fitness of use– The totality of features and characteristics of a

data set that bear on its ability to satisfy a set of requirements given by the application; application-dependent

Page 10: Completeness

Data quality vs. fitness of use

• Data quality information is usually provided by the producer of a data set

• Fitness of use is assessed when evaluating the use of a data set by users this principle is referred to truth in labelling (users are responsible for quality control indeed)

• See different approaches to quality control in the lecture note on spatial data quality

Page 11: Completeness

Data quality report

• What you are reporting in data quality section of the metadata will be data-independent, so that it can be reused for any potential uses of the data

• Reporting data quality can be thought of the process for evaluating the ability of the data set to meet up to the requirements

• In that how well the value is close to ground truth (attribute/positional accuracy), whether it exhibits lack of contradictions (logical consistency), and whether what’s relevant is encoded in the DB (completeness)