Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur...

48
Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 8 October 17, 2013 Geocoding

description

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 8 October 17, 2013 Geocoding. Outline. Geocoding overview Polygon geocoding Linear (street) geocoding - PowerPoint PPT Presentation

Transcript of Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Dr. David Arctur...

Page 1: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620)

Dr. David ArcturResearch Fellow, Adjunct Faculty

University of Texas at Austin

Lecture 8October 17, 2013Geocoding

Page 2: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Outline Geocoding overview Polygon geocoding Linear (street) geocoding Problems and solutions Geocoding layer sources Geocoding in ArcGIS

2INF385T(28620) – Fall 2013 – Lecture 8

Page 3: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Overview Process of creating geometric

representations for locations (such as points) from descriptions of locations (such as street addresses)

Uses a computer program called a geocoding engine that employs code tables and rules to standardize address components

3INF385T(28620) – Fall 2013 – Lecture 8

Page 4: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

4

Examples City’s economic development

department Maps technology businesses by street address to

determine technology-rich areas in a city Hospital

Maps patients to determine where to open a satellite clinic

Emergency dispatch Maps callers’ addresses to determine who should respond

to an emergency Retail store chain

Maps store and customer locations, and compares to mapped competitor locations

Others?INF385T(28620) – Fall 2013 – Lecture 8

Page 5: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Tabular data Text file or database

Street addresses ZIP Codes

5INF385T(28620) – Fall 2013 – Lecture 8

Page 6: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Geocoding reference layers Street centerlines ZIP Code polygons

6INF385T(28620) – Fall 2013 – Lecture 8

Page 7: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

POLYGON GEOCODINGLecture 8

Page 8: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

ZIP Code geocoding Method to map data whose

geocode is for a polygon Assign each record to its polygon Count the records for each polygon Join the table to the corresponding

polygon layer Symbolize using a choropleth map or

graduated point symbols

8INF385T(28620) – Fall 2013 – Lecture 8

Page 9: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

ZIP Code geocoding

9INF385T(28620) – Fall 2013 – Lecture 8

Page 10: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

ZIP Code geocodingPoints created at ZIP Code centroids

10INF385T(28620) – Fall 2013 – Lecture 8

Page 11: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

ZIP Code geocoding

Points (attendees) spatially joined to ZIP Code polygons

11INF385T(28620) – Fall 2013 – Lecture 8

Page 12: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

12

ZIP Code geocoding Choropleth map created

INF385T(28620) – Fall 2013 – Lecture 8

Page 13: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

LINEAR (STREET) GEOCODING

Lecture 8

Page 14: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Linear geocoding (streets) TIGER (Census Bureau) street maps

Four street address numbers, low to high for each side of a street segment

100 198

101 199

Oak Street

14INF385T(28620) – Fall 2013 – Lecture 8

Page 15: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Number 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Street name 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Street type 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Direction, suffix 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Direction, prefix 125 E Oak St, Apt. 2, Pittsburgh, PA 15213Unit number 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Zone, city 125 Oak St E, Apt. 2, Pittsburgh, PA 15213Zone, ZIP Code 125 Oak St E, Apt. 2, Pittsburgh, PA 15213

Items for single-number street address:Address Unit City ZIP Code125 Oak St E Apt. 2 Pittsburgh 15213

Address components

15INF385T(28620) – Fall 2013 – Lecture 8

Page 16: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Street Intersections Put intersections in address field

Forbes AV & Craig STGrant ST & 5th AVE North Star RD & Duncan AV

Do not include street numbers3999 Forbes Ave & 100 Craig ST

ConnectorsAny unusual character (e.g., &, @, |)Just be consistent

16

Page 17: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Geocoding Flowchart

OutputNo match

ParseAddress

GenerateSoundex Key

FindCandidates:No Range &Soundex Key

Score Matches

Best match >= 90?

OutputAddress

InputAddress Matches

?Yes No

NoYes

INF385T(28620) – Fall 2013 – Lecture 8 17

Page 18: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Geocoding stepsOriginal address: 125 East Oak Street 15213

Address parsed: |125|East|Oak|Street|15213

Abbreviations standardized: |125|E|Oak|St|15213

Elements assigned to match keys:[HN]:125 [SN]:Oak[ST]:St [SD]:E [ZP]:15213

Index values calculated: [HN]:125 [SN]:Oak(Soundex #) [ST]:St [SD]:E [ZP]:15213 (Index #)

18INF385T(28620) – Fall 2013 – Lecture 8

Page 19: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Soundex index Matches names based on

how they sound (if indices match) Translates names to a 4-digit

index of 1 letter and 3 numbers

First character of name remains unchanged

Adjacent letters in the name which have the same Soundex key are assigned a single digit

If the end of the name is reached before filling 3 digits, use zeros to complete the code

Key Letters1 b f p v2 c g j k q

s x z 3 d t 4 l5 m n6 rdisregard

a e h i o u y w

Oake = O-200, Oak = O-200Smith = S-530, Smythe = S-530Paine = P-500, Payne = P-500Callahan = C-450, Calahan = C-450

Beadles = B-342, Beattles = B-342Schultz = S-243, Shults = S-432

http://www.sconsig.com/sastips/soundex-01.htmhttp://www.archives.gov/research/census/soundex.html

19

Page 20: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Scoring candidates Use a rule base to score

source and reference matches Start with score of 100 Subtract points for each mismatch Examples from rule base

Soundex indices match but street names do not (-2)

Street type missing in source (-1) Street types do not match (-2)

20INF385T(28620) – Fall 2013 – Lecture 8

Page 21: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Candidate streets

From To Street Type Side Parity Direction Street_2 98 Oak St R E W 43441 99 Oak St L O W 4345100 198 Oak St R E E 4346101 199 Oak St L O E 4357

Candidates identified: 125 East Oak Street 15213

Candidates scored and filtered:

From To Street Type Side Parity Direction Street_100 198 Oak St R E E 4346101 199 Oak St L O E 4357

21INF385T(28620) – Fall 2013 – Lecture 8

Page 22: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Address matched as point

From To Street Type Side Parity Direction Street_101 199 Oak St L O E 4357

Best candidate matched

Oak StPi

ne

Ave

100101

198199

125

21

9899

22INF385T(28620) – Fall 2013 – Lecture 8

Page 23: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

PROBLEMS AND SOLUTIONSLecture 8

Page 24: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Possible problems Variations in street names

Fifth Avenue, Fifth Ave., 5th AV Saw Mill Run Blvd, Route 51

Data entry errors Fidth Avenue Sawmill Run

Place names White House, Heinz Field, Empire State Building

Intersections Fifth Avenue and Craig Street

24INF385T(28620) – Fall 2013 – Lecture 8

Page 25: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Possible problems Zones

100 Main ST 15101, 100 Main ST 16202 P.O. boxes

P.O. Box 125 Missing street data

25INF385T(28620) – Fall 2013 – Lecture 8

Page 26: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Solutions Clean data before geocoding Purchase or build high-quality maps

(field verification) Use postal address standards Assign house numbers in rural areas Use alias tables

26

Alias Address

White House 1600 Pennsylvania Avenue

Heinz Field 100 Art Rooney Avenue

Empire State Building 350 5th Ave

INF385T(28620) – Fall 2013 – Lecture 8

Page 27: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

27

Alias table

Alias AddressCMU 5000 Forbes AvCarnegie Mellon 5000 Forbes AvCarnegie Mellon U 5000 Forbes AvCarnegie Mellon Univ 5000 Forbes AvCarnegie Mellon University

5000 Forbes Av

Etc.

INF385T(28620) – Fall 2013 – Lecture 8

Page 28: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

GEOCODING LAYER SOURCES

Lecture 8

Page 29: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

US Census TIGER files

29

Digitized from 1:100,000 scale maps Pros:

Free and easy to download Uniform across jurisdictional lines

(nationally) Street address formatting works well with

standard GIS geocoding capacities Cons:

Incomplete data Placement of address point is approximate

INF385T(28620) – Fall 2013 – Lecture 8

Page 30: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

TIGER line attribute table

30

Census street centerlines extracted from lines that make up census boundaries tl_2009_04013_edges.shp "FEATCAT" = 'S'

INF385T(28620) – Fall 2013 – Lecture 8

Page 31: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

31

MAF/TIGER Master Address File / Topologically Integrated

Geographic Encoding and Referencing MAF is a complete inventory of housing units and businesses in

the United States and its territoriesTIGER is a collection of lines as we know it

MAF produces mail-out census forms and ACS random samples

MAF/TIGER produces maps for on-the-ground census takers MAF is confidential TIGER 2009 and newer have much improved positional

accuracyINF385T(28620) – Fall 2013 – Lecture 8

Page 32: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

US Census ZIP Codes

32

ZIP Code Tabulation Areas (ZCTAs) Approximations for census purposes Do not reflect actual ZIP Code areas

and are not kept up to date

INF385T(28620) – Fall 2013 – Lecture 8

Page 33: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

33

Local jurisdictions Parcel address points

Pros: Accurate placement of residential location (parcel positional data is often very good; e.g., +/- 5 meters or less)

Cons: May need to contact individuals within

agencies to get most up-to-date data May not be available, or may cost a

substantial amount of money Data ends at jurisdictional boundaries Data files tend to be very large

INF385T(28620) – Fall 2013 – Lecture 8

Page 34: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

34

Local jurisdictions Street centerlines

Pros: Potential to be more up to date (often

yearly updates, sometimes quarterly) Often accuracy adequate to meet city

infrastructure needs (typically +/- 10 meters or less)

Cons: May need to contact individuals within

agencies to get most up-to-date data Data ends at jurisdictional boundaries

INF385T(28620) – Fall 2013 – Lecture 8

Page 35: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

35

Private vendors StreetMap USA

National dataset (US and Canada) Address locators prebuilt, can geocode across the

United States

GDT Dynamap/2000 US street data Small fee for individual ZIP Code layers. Map layers are the highest quality street map layers

in terms of appearance, completeness, and accuracy.

More than one million changes every quarter Maps include more than 14 million US street

segments and include postal boundaries, landmarks, water features, and other features

INF385T(28620) – Fall 2013 – Lecture 8

Page 36: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

36

Online geocoding ArcGIS.com, Google, GeoCommons,

Maptive, etc. Pros:

Fast and easy to access Free or inexpensive

Cons Loss of privacy/confidentiality Accuracy Usability in desktop GIS

INF385T(28620) – Fall 2013 – Lecture 8

Page 37: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

GEOCODING IN ARCGISLecture 8

Page 38: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Create address locator ArcCatalog

38INF385T(28620) – Fall 2013 – Lecture 8

Page 39: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

39

Choose address locator style Skeleton of the address locator Based on data tables and reference

layer

INF385T(28620) – Fall 2013 – Lecture 8

Page 40: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

40

Address locator styles

INF385T(28620) – Fall 2013 – Lecture 8

StyleReference dataset geometry

Reference dataset representation

Address search parameters

Example Applications

US Address—Dual Ranges

LinesAddress range for both sides of street segment

All address elements in a single field

320 Madison St.N2W1700 County Rd. 105-30 Union St.

Finding a house on a specific side of the street

US Address—Single House

Points or polygons

Each feature represents an address

All address elements in a single field

71 Cherry Ln.W1700 Rock Rd. 38-76 Carson Rd.

Finding parcels, buildings, or address points

Page 41: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

41

Note: there are other styles…

INF385T(28620) – Fall 2013 – Lecture 8

Page 42: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

42

Queens, NY

Salt Lake City, UT

Regions of Illinois & Wisconsin

Germany

… and many others!

INF385T(28620) – Fall 2013 – Lecture 8

Other styles… (build custom locators)

Page 43: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Choose reference layer Streets, ZIP Codes

43INF385T(28620) – Fall 2013 – Lecture 8

Page 44: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

44

ArcGIS locator parameters

INF385T(28620) – Fall 2013 – Lecture 8

Page 45: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

45

Geocode in ArcMap Add tabular data and streets layer Add address locator Geocode addresses View geocoding results Interactively rematch addresses

INF385T(28620) – Fall 2013 – Lecture 8

Page 46: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

46

Address rematching Investigate

unmatched addresses Generally requires

expertise and knowledge of local streets

Compare a street name in the attributes of the streets table and the address table.

INF385T(28620) – Fall 2013 – Lecture 8

Page 47: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

47

Prepare log file Log file includes reasons why

addresses did not get geocoded. Useful for future work on cleaning

addresses or repairing street maps

Incorrect address Possible reason/solution490 Penn Avenue Missing ZIP Code111 Hawksworth Spelled incorrectly900 Smallman Street TIGER street missing900 Lib Ave Spelled incorrectly

INF385T(28620) – Fall 2013 – Lecture 8

Page 48: Introduction to Geographic Information Systems  Fall 2013  (INF 385T-28620) Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin

Summary Geocoding overview Polygon geocoding Linear (street) geocoding Problems and solutions Geocoding layer sources Geocoding in ArcGIS

Next week: Tutorial chapter 9, and discussion of term projects – see iSchool syllabus links:http://courses.ischool.utexas.edu/Arctur_David/2013/fall/385T/schedule.php 48INF385T(28620) – Fall 2013 – Lecture 8