Data Quality and Error

26
Data Quality and Error Presented By, S Bensinghdhas, M.E (Design) Asst. Lecturer SJCET, Dar Es Salam

description

GIS data quality and error

Transcript of Data Quality and Error

Data Quality and Error

Data Quality and ErrorPresented By,

S Bensinghdhas, M.E (Design)Asst. LecturerSJCET, Dar Es SalamErrorWhenever you work with spatial data (or any data for that matter) you will deal with some sort of error due to the many steps involved in creating spatial data. Spatial data is just an abstraction of what is really there. Because of this abstraction, we can expect error due to:How we conceptualize the data in the first placeHow we collect the dataHow we present the dataAdditionally, there are other sources of error such as:Obvious ErrorsErrors in natural variationErrors in data processing

Data QualityGIS IS A GARBAGE MAGNIFIERGARBAGE IN / GARBAGE OUTMOST FAILED GIS PROJECTS ARE DUE TO POOR PLANNING AND POOR DATA QUALITY

Obvious ErrorThe errors we just discussed are illustrative of the general types of obvious errors you would encounter when using geospatial information. As a geospatial analyst, you will have to give thought as to how to correct those errors before proceeding with a project.Also, as a geospatial analyst, you should always approach a project with the obvious sources of error we just discussed firmly on you mind. Therefore, when given a task to perform, and the associated data, the following should act as a good checklist:Is the data current?Were the data mapped at the correct scale? Do they have the same accuracies?What is the resolution of the data? Will it support the kinds of analysis we want to perform?Do we have all the data for the project areas, or is there some data missing?If we need other data sets, are they available, or will we have trouble getting them?

Components of Data QualityPositional AccuracyAttribute AccuracyLogical ConsistencyResolutionCompletenessSpatial AccuracyAs we previously stated, positional accuracy relates to the coordinate values for the geographic objects. But, even positional accuracy is divided into two different categories:Absolute accuracy: refers to the actual X,Y coordinates of a geographic object. If one knows the correct position of the geographic object, they can compare the differences with the position represented in the geographic database. Typically, absolute accuracy will measure the total different between an object, or the difference in the X coordinate and the difference in the Y coordinate.Relative accuracy: refers to the displacement of two or more points on a map (in both the distance and angle), compared to the displacement of those same points in the real world. The figures on the right show two different maps of the Cornell campus and the City of Ithaca. The top map, a USGS quadrangle, has an absolute accuracy of around 40 feet. That is, the coordinates for a building on the quadsheet are probably within 40 feet of their real world coordinates. The bottom map, a photogrammetrically derived map of the same area has an absolute accuracy of about 2.5 feet.

Relative AccuracyEven though the USGS quadrangle has much less absolute accuracy than the photogrammetrically derived map, if were were to zoom into an area and measure the distance between two points, the relative distance, and the angle would be fairly similar. In this case, the distance along Tower Road is only about 15 feet different, and the azimuth of the road is virtually identical.

Positional Accuracy

Attribute Accuracy

Logical ConsistencyRepresentation of data that does not make senseRoad in the waterContours that cross or endFeatures on steep slopes

ResolutionGeneralization may improperly represent size and shapeCartographic AstheticsEntire regions may be eliminated (islands, peninsulas, etc.)

CompletenessFragmented coverage of many developing countriesSoilsVegetationMust determine methods for uniformity

Obvious ErrorsThe statement to err is human is very applicable to creating spatial data. Humans make a lot of errors. Typing in the wrong value in a computer is a common mistake that humans make. However, there are other sources of obvious error besides human error:Age: a map is a representation of real-world objects at a given point in time. The reliability of a dataset typically goes down as it gets older. This is especially true of data that would frequently change such as housing within a city. Many GIS projects take years to complete, and it is entirely possible that much of the data collected in the beginning of a project may be out of date by the end of the project.Map Scale: In general, larger scale maps show more detail than smaller scale maps. Also, larger scale maps tend to have greater accuracy than smaller scale maps, especially maps within the same family such as the differences between 1:250,000, 1:100,000 and 1:24,000 USGS maps. Computers, and GIS software really dont care what data you give it. That being the case, a GIS will process any of your data, whether the processing is appropriate or not. Therefore, you can combine data from different scales rather easily, however, doing so may not be a good idea due to the different accuracies of the products.Data Format: The way we represent data also presents an obvious source of error. For example, a raster map of landuse represented by 10 meter grid cells will differ significantly from a raster map of landuse represented by 100 meter grid cells. The following is a grid of landuse values around Ithaca, New York. You can see the differences in representation between a map with 10 meter grid cells, 30 meter grid cells, and 100 meter grid cells.Aerial Coverage: Many data sets may not have uniform coverage. That is, there may be pieces missing in one section. Accessibility: Not all data sets are equally accessible. For example, land resources in one country may be available, but are considered a state secret in another country. Also, due to the recent events of September 11, 2001, some data are unavailable due to security reasons. Problems with Age

The following maps show the different land cover types between 1968 and 1995. You can see how the data has changed over 30 years, and why using older data might present a problem.Obvious Sources of ErrorAreal CoverageMany data sets do not have a uniform coverage of information

SUFFOLK COUNTY PARCELS

NASSAU COUNTY BASEMAPProblems with FormatYou can see the different way in which data is represented when using different formats. In this case, 10, 30, and 100 meter grid cells are used.

10 meter30 meter100 meterErrors Due to Natural VariationYou can see why each of the previous error types are called Obvious Errors. But there are other types of errors that are not so obvious, and oftentimes overlooked. Nonetheless, you will have to be aware of these kinds of errors too. The errors are termed errors in natural variation, and take the form of:Positional Errors Due to Natural Variation: there are natural variations in materials that might make them less accurate. For example, a paper map stored in a humid room will actually shrink. The shrinking of the material is virtually unnoticeable by a user, but depending upon the scale of the map, the real world errors could be quite large. Variations Due to Equipment: Some equipment may not measure information correctly, or may have slight variations from measurement to measurement. For example, a temperature gauge or pH meter may have slightly different readings when measuring the same location. If youve ever measured your blood pressure on one of the automatic machines in the drug store, you have probably noticed that two readings taken after one another can be different. While some of this is based on your own fluctuations in blood pressure, the machines themselves have some variability.The variations of measurements are often related to two important concepts called precision and accuracy Errors Resulting from Natural Variations from Original MeasurementsPositional AccuracyResult of poor field work, media shrinkage and expansion, poor vectorization (line digitizing)Correction through rubbersheetingAccuracy of ContentAttribute errors caused by miscoding, or faulty equipment (thermometer, pH meter)Sources of Variation in Data:Data entry or output faultsErrors Resulting from Natural Variations from Original MeasurementsMeasurement ErrorAccuracy vs. PrecisionAccuracy: extent to which an estimated value approaches the true valuePrecision: measure of dispersion of observations about a meanAccuracy vs. Precision exampleLaboratory ErrorsResults of World-wide Laboratory Exchange ProgramSame soil samples in different laboratories exceeded: 11% for clay content

Accuracy and PrecisionAccuracy is defined as displacement of a plotted point from its true position in relation to an established standard while Precision is the degree of perfection; or repeatability of a measurement. For mapping, accuracy is associated with position of an object to its true position. Precision is then the ability to repeat a measurement, or how likely you are to return to the same location time and time again. The figures to the right illustrate the differences between accuracy and precision. Therefore, if there are natural variations in either the instruments used for measurement, or the object you are measuring, the accuracy or precision may be effected.

204Errors Arising Through ProcessingNumerical Errors in the ComputerNumerical precisionPC ARC/INFO is Single PrecisionSome GIS are using Integer values to store coordinates and large areas may not be stored precisely. Scaling a triangleFaults Arising Through Topological AnalysisAssumesSource data is uniformDigitizing procedures are infallibleMap overlay is only concerned with line intersectionBoundaries can be sharply defined and drawnRaster to VectorGIS allows you to convert raster and vector features between one another. For example, we can take a raster feature and convert it to vector format. Or, we can take a vector feature and convert it to raster. But, as the examples show, depending upon the resolution of the features, the representation of the geographic objects may be quite different. In some cases, you can see how the raster version of the map actually caused some buildings to merge together.

Vector Data of BuildingsVector data converted to raster with 10 grid cells

Raster data converted back to vector, using 10 grid cellsErrors in Data ProcessingDigitizing Data: Once again, scale presents a problem with digitized data. On a soil map, drawn at a scale of 1:100,000, a 1 mm wide line (the thickness of a sharp pencil) would actually represent 100 meters on the ground. Or, as shown in the example below, the road edge on the USGS quadrangle is actually 4 meters wide in some spots.Spatial Analysis: Some GIS functions such as overlay present problems such ambiguous locations, and the concept of sliver polygons. Also, converting data from raster to vector format will also introduce errors. Each of the examples are shown in the illustrations below.

Width of edge of pavement is greater than 4 metersErrors Associated with Spatial AnalysisErrors in Digitizing a MapSource errorsDistortionBoundaries drawn on a map have a thickness1 mm line1.25 m wide on 1:250 map100m wide on 1:100000Estimates show that 10% of a 1:24000 soil map may represent the boundary lines aloneDigital RepresentationCurves are approximated by many verticesBoundaries are not absolute, but should have a confidence intervalSliver PolygonsIn the following example, there are two polygons. When we overlay the two of them, the resulting polygon has not only the logical intersection between the two polygons, but also many small polygons that are probably due more to the fact that the representation of the polygon boundaries are slightly different. These smaller, or sliver polygons, represent spatial errors in the data. Errors Associated with Spatial AnalysisBoundary ProblemsDefinitely inDefinitely outPossibly inPossibly outAmbiguous (on the digitized border line)