Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

39
Data Storage and Editing (Entity and attribute) DeMers Chapter 6 http://www.iupui.edu/~jeswilso/ g438/lecture5/
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Page 1: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Data Storage and Editing

(Entity and attribute) DeMers Chapter 6

http://www.iupui.edu/~jeswilso/g438/lecture5/

Page 2: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Introduction

• Any analysis performs must be based on good data, correctly organized and in the proper format.

• In raster, we may need to display each coverage to isolate illogical or out-of-place grid cells as we compare them to the input document

• In vector systems, we may have to build in topology after the initial data input, to pinpoint any digitization errors

• In case of entity-attribute agreement, we may need to output sample portions of our map for comparison against the original input material

Page 3: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Storage of GIS Databases• Raster: Attribute values for grid cells are the primary data stored in

the computer. Values make up the actual grid and positions of grid cells catalogued relative to the order in which they appear e.g., if you store the origin of the grid, cell size, and number of rows and columns, all you need is the cell values

• Vector: Common for GISs to store vector entities and associated attributes in separate files (reason for RDBMS). For example, in ArcView shape file format, entities are stored in one file, attribute in another, and projection info in a third file and Arc/Info Coverage ( workspace, entity directory, info directory )

• Tiling - storage of individual sections (tiles) in predefined subsections. The purpose is to reduce volume of data needed for analysis of any

particular section e.g., quad boundaries, T&R grid, etc.

Page 4: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

The Importance of Editing the GIS Database• Most errors result from improper input

• Generally, at least some errors will always occur and require editing, e.g., pushing the wrong digitizer button (vertices instead of node), pushing the wrong keyboard button when entering attribute information, and position errors in digitizing (shaky hand)

• 3 general types of error

• Entity error - (position error), primarily associated with vector model (missing entities, incorrectly placed entities, disordered entities)

• Attribute error ( occurs in both vector and raster models, typing errors, misspelling, etc.

• Entity-attribute agreement error ( a.k.a., logical consistency, correctly type codes attached to wrong entities)

Page 5: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Accuracy• The degree to which information on a map or in a digital database matches

true or accepted values

• An issue pertaining to the quality of data and the number of errors contained in a data set or map

• It is possible to consider horizontal and vertical accuracy with respect to geographic position

• Attribute accuracy - conceptual, and logical accuracy • Level of accuracy required for particular applications varies greatly. Highly

accurate data can be very difficult and costly to produce and compile • e.g., mapping standards employed by the United States Geological Survey

(USGS): "requirements for meeting horizontal accuracy as 90 per cent of all measurable points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000."

Page 6: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Accuracy Standards for Various Scale Maps

• 1:1,200 ± 3.33 feet • 1:2,400 ± 6.67 feet • 1:4,800 ± 13.33 feet • 1:10,000 ± 27.78 feet • 1:12,000 ± 33.33 feet • 1:24,000 ± 40.00 feet • 1:63,360 ± 105.60 feet • 1:100,000 ± 166.67 feet

Page 7: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Accuracy Standards for Various Scale Maps

• 1:1,200 ± 3.33 feet • 1:2,400 ± 6.67 feet • 1:4,800 ± 13.33 feet • 1:10,000 ± 27.78 feet • 1:12,000 ± 33.33 feet • 1:24,000 ± 40.00 feet • 1:63,360 ± 105.60 feet • 1:100,000 ± 166.67 feet

Page 8: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Precision • Refers to the level of measurement and exactness of description in a

GIS database (e.g., number of decimal places) • Precise locational data may measure position to a fraction of a unit

e.g. to the millimeter• Precise attribute information may specify the characteristics of

features in great detail • Important to realize, however, that precise data--no matter how

carefully measured--may be inaccurate • Level of precision required for particular applications varies greatly.

Engineering projects such as road construction require very precise information measured to the millimeter. Demographic analyses of marketing or electoral trends can often make do with less, say to the closest zip code or precinct boundary

Page 9: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Why be concerned about error? - The Problems of Propagation and Cascading

• Discussion focused to this point on errors that may be present in single sets of data

• ”Doing" GIS usually involves comparisons of many sets of data. If errors exist in one or all of the data layers, the solution to the GIS problem generated from them may itself be erroneous

• Inaccuracy, imprecision, and error may be compounded in GIS that employ many data sources

Page 10: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

DIGITIZATION-continue

Tic

1

2 3

4Geographicfeatures

Page 11: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Error Propagation and Cascading

• Occurs when one error leads to another • Means that erroneous, imprecise, and inaccurate information

will skew a GIS solution when information is combined • DeMers - "error prone data will lead to error prone analysis" • e.g., if a map registration point has been mis-digitized in one

coverage and is then used to register a second coverage • Result = the second coverage will propagate the first mistake • In this way, a single error may lead to others and spread until it

corrupts data throughout the entire GIS project

Page 12: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Entity Errors: Vector

• Six categories identified by DeMers/ESRI – All entities that should have been entered are

present

– No extra entities have been entered

– Entities are in the right place and are of the correct shape and size

– Entities that are supposed to be connected to each other are all polygons have a single label point which identifies them

– All entities are within the outside boundary identified

Page 13: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Nodes and Vertices

• Specific types of entity errors in vector GIS

– can involve points, lines, polygons, nodes, vertices, label points

– nodes - denote ends of lines or point where polygon closes on itself

– vertices - denote change or direction within a line

– points -> lines -> polys

• Nodes are used to show specific topological relationships, e.g.: – intersection of roads or streams

– intersection between stream and lake

– node errors include pseudo nodes and dangle nodes

Page 14: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Pseudo nodes

• Occur where lines connect with itself or other line • A line connects with itself to form a polygon, a.k.a. island pseudo

node (fig. 6.1a, p. 161) • Also occur where two lines intersect (rather than crossing) (fig.

6.1b) • Pseudo nodes are not necessarily errors, but indicate the

potential location of errors • e.g., pseudo node in the middle of a line representing a node can

be used to separate road into two different speed limit zones • Others may indicate error, (pushed wrong button when

digitizing, placed cursor at wrong location)

Page 15: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Digitization errors- Pseudo node (Diamond)

Pseudo node Not representing a serious errors

Pseudo node connects two and only two arcs

Error

Pseudo node

Page 16: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Dangle nodes

• A single node connected to a single line • Again, not necessarily and error, but may be • Can result from three possible mistakes: (fig. 6.2, p. 162)

– Failure to close a polygon – Undershoot – Overshoot – Sometimes result from incorrect placement, sometimes from

fuzzy tolerance and snapping distance • One method of general error detection is comparing digitized

to original document at equivalent scales good for broad scale obvious errors, not for finer scale errors

Page 17: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

DIGITIZATION

• For linear features such as rivers, roads, railways it is important to digitize each section separately (start node and end node at a specified section) or use Route latter

Road1 Road2

Node

Page 18: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Digitization errors - Dangle Error (square)Dangle error

Closed polygon

Natural feature

Road

Overshoot

Undershoot

Acceptable dangle nodee.g. end of roads

Page 19: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Label point and sliver errors

• Polygon label point errors ( points -> lines -> polys)

• Label point is used to associate a polygon with attributes

• If label point is missing, or there are more than one, indicates error e.g., fig. 6.4, p. 163

• Sliver polygon errors • Commonly result from incorrect practice of double digitizing

• Can also result from overlay or merging operations which join coverages from different sources

• Can be removed manually or by dissolving polygons less than a certain area and/comparing intended number of polys with actual number (Fig. 6.5, p.164)

Page 20: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Digitization errors-Labels

Missing labels or too many labels

missing labelstoo many labels

Page 21: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Sliver polygon errors

Page 22: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

How to correct digitization errors?

• List digitization errors using the command (Nodeerrors and Labelerrors)

• Using ARCEDIT to edit the coverage then use the commands (edit feature (ef) e.g. ef label, ef node, ef arc

• Use a series of commands such as nodesnap, arcsnap, reshape, split, add, delete, move, copy, rotate, extend, and unsplit

• For labels use Createlabels

Page 23: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Topology

• Topology is the process of projecting complex surfaces to a simple ones

• Topology is a procedure for explicitly defining spatial relationships connecting adjacent features (e.g., arcs, nodes, polygons, and points).

• Different types of spatial relationships are expressed as lists of features e.g.

• An area is defined by the arcs comprising its border

• An arc is defined by set of points (X,Y)

Page 24: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Topology-Main Concepts

• The three major topological concepts are:

• Connectivity: Arcs connected to each other at nodes

• Contiguity/Adjacency: Arcs have direction and left and right sides

• Area Definition:: Arcs connected to surround an area define a polygon (area)

Page 25: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Spatial Relationships(Topology)

Area DefinitionArea Definition

AdjacencyAdjacency

ConnectivityConnectivity

Page 26: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

PolygonTopologypolygon-arc topology

polygon attribute table

arc coordinate data

user ID arc list

user ID area parcel zone

x,y pairsidentifier

1 1,7,3,2

2 2,5,10,4

3 6,9,11,10

1 1200 11-123 R1

2 2300 11-150 R2

3 4321 11-231 R3

6 X1,Y1,X2,Y29 X4,Y4,........

1

23

78

11

10

69

54

3

2

1

node polygon

arc

Page 27: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Advantages of Topology

Check for digitization errors (overshoot,

undershoot, unclosed polygon, missing labels, too

many labels)

Store data more efficiently

(eliminate data redundancy-normalization)

Make spatial analysis more faster

Page 28: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Topology

• Topological data structures dominate GIS software.

• Topology allows automated error detection and elimination.

• Rarely are maps topologically clean when digitized or imported.

• A GIS has to be able to build topology from unconnected arcs.

• Nodes that are close together are snapped.• Slivers due to double digitizing and overlay are

eliminated.

Page 29: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Creating topology in Arc/Info

• After digitization and correction to digitization errors topology can be built

• The command BUILD is used for point, line, or polygon coverages

• The command CLEAN is used for line and polygon coverages

• CLEAN never create topology for point coverage

• BUILD never detect intersection of arcs and polygons

Page 30: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Topology commands

• C:\[ARC] CLEAN [in-cov] {out-cov} {dangle-length} {fuzzy-tol}

• C:\[ARC] CLEAN road1 road2 # 3.4• C:\[ARC] BUILD [in-cov] {POLY/ LINE/ POINT}• C:\[ARC] BUILD cities POINT • For features that have no intersection such as

contours, BUILD with line option can be used• For features that have intersection such as roads and

lots, it is better to first use CLEAN and then use BUILD

Page 31: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Tables created by topology

• Arc Attribute Table (AAT)

• Polygon Attribute Table (PAT)

• Point Attribute Table (PAT) Area and perimeter = 0

• Route Attribute Table (RAT)

• Feature Attribute Table (FAT)

• Node Attribute Table (NAD)

Page 32: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Hint for topology

• Make a copy of the original data before start building topology

• Make a known strategy for naming of the coverages

• For example, names of raw coverages start with R e.g Rroads and Rlanduse

• Keep coverage names less than or equal 8 characters and without extension (8.3)

Page 33: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Coordinate Transformation

The tablet coordinates must be converted to real world (map) coordinates

The commands that used for coordinate transformation are:

CREATE or GENERATE - used to create a master coverage

The (X,Y) of the tic file (Tic.dbf) must be set to map coordinates.

TRANSFORM - used to transform the coverage

Page 34: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Coordinate Transformation-continue

• Latitude (Ø) and longitude () must be converted to Decimal degrees (DD) e.g.

Latitude = 13 deg+ 45 min/60+55/360

• Project the decimal degrees to plane coordinate e.g. UTM

(0,0)

(5,8)

(0,0)

(50,80)

Digitizercoordinate

Mapcoordinates

Page 35: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Generate

• Generate can create a coverage from raw coordinates (Id, X,Y) e.g. from GPS

• Create a file of tic coordinates e.g. Tic1 which is ACII with (TICID, X, Y)

• Create a file of polygon coordinates e.g. poly1

• GENERATE: INPUT Tic1 :TICS

• GENERATE : INPUT Poly1: POLYS :Quit

Page 36: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Attribute Errors: Raster and Vector

• Attribute errors generally more difficult to detect

• Types include:

• Missing attributes perhaps only kind of attribute error traceable without comparison to source material e.g., plot all polygons and color them according to a certain attribute, if color is missing, attribute is missing

• Incorrect attribute values or text more difficult to detect one method is to plot all polygons and color them according to a certain attribute, if only one polygon has a certain attribute and there should be other, it may stick out, in general, involves direct comparison with source material)

Page 37: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Dealing With Projection Changes

• Often times, regardless of input method, separate GIS data input for a project will be based on different projection systems

• Necessary to transform all data to common system before use in integrated modeling examples in ArcView

• Joining Adjacent Coverages: Edge Matching (Union)• Joining two adjacent coverages (usually of the same

theme) together to produce a single data set that covers a broader region edge matching also done in raster systems

Page 38: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Conflation and Rubber Sheeting

• Conflation and Rubber Sheeting: Refers to the registration (georeferencing) of two maps (vector or raster) in a non-linear way (Ovelay two maps)

• Used to make maps of different sources spatially correspond with one another. Most often used in raster data using ground control points (GCPs). Conflation and rubber sheeting are synonymous terms according to DeMers (Figure 6.1, p. 174)

• The need to geo-reference internal objects themselves not just the map corners (Rubber Sheeting)

• Templating: " cookie cutting"

• If you have multiple coverages of different extents, the template is used to "cookie cut" them all to the same extent

Page 39: Data Storage and Editing (Entity and attribute) DeMers Chapter 6 jeswilso/g438/lecture5

Exercise

• Characteristics of data storage in raster and vector

• 3 general types of error in spatial databases

• Accuracy vs. precision

• Error propagation and cascading of error in GIS

• Types of errors in vector GIS

• Types of errors in attribute data

• The concept of topology - what is it, what types of Relationships are stored for point, line, and poly features, why do we need it in GIS?