Geocoding – the Columbus way! Bakshi. About the Research
Transcript of Geocoding – the Columbus way! Bakshi. About the Research
![Page 1: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/1.jpg)
Geocoding – the Columbus way!
Rahul Bakshi
![Page 2: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/2.jpg)
About the Research
Part of Masters’ ThesisAdvisor: Craig KnoblockOther Committee members:Cyrus Shahabi and John WilsonBuild a Geocoder with maximum accuracy
![Page 3: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/3.jpg)
Thesis statement
The accuracy of the geocoded coordinates of a location can be significantly improved by exploiting online property-related data
![Page 4: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/4.jpg)
Motivating Problem
Inaccuracies in the existing applicationsThe error margins become critical in some applications:
Aligning Vector Data and Satellite ImageryEnvironmental Health StudiesUrban Rescue and Recovery Operations
![Page 5: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/5.jpg)
Positional Error Comparison
Reference: Cayo, M. R. and T. O. Talbot (2003). "Positional error in automated geocoding of residential addresses." International Journal of Health Geographics 2(10).
![Page 6: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/6.jpg)
Street Data
For the US, there are three main providers for street data
Geographic Data Technology (GDT)Navigation Technologies (NavTech)TIGER/Lines (Bureau of the Census)
![Page 7: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/7.jpg)
Limitations of these sources
Provide the address ranges and latitude/longitude information for the end pointsNo data about number of addresses in a segmentNo data about the size of address/lots
![Page 8: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/8.jpg)
Information in Street Sources
![Page 9: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/9.jpg)
Existing Approach
Address range methodGet the street data from sources like NavTech, GDT, TigerLinesApproximate the location based on information in the street dataExample
Address to locate: 645 Sierra St, El Segundo, CA -90245
![Page 10: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/10.jpg)
ExampleSierra StFrom: A ( 33.923413, -118.408709 )To: B ( 33.924813, -118.408809 )
Addresses on the Left: 601-699Addresses on the Right: 600-698
645: Left Side22nd out of the 50 addresses on the left side
Interpolate the address on the street
A
B
![Page 11: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/11.jpg)
Limitations of the existing approach
Assumes all addresses are present in the given range – which is seldom the caseDoes not take into account the lot sizesGeocodes non-existent addresses as wellE.g.: The following address does not exist -2622 Ellendale Pl, Los Angeles, CA – 90007Lets see what do the existing services have to say…
![Page 12: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/12.jpg)
All of them geocode it !
![Page 13: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/13.jpg)
The Columbus approach
Make use of the data already on the InternetProperty tax sites – repository of information that one requires to make the interpolations more accurateTake the number of houses in accountTake the lot sizes in account
![Page 14: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/14.jpg)
Uniform lot-size method
Works when data source having information on the property parcels/addresses existsExploits these sources to get the number of lots on the street segmentAssumes all lots are equal in dimension
![Page 15: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/15.jpg)
Outline of the method
Get the information of the street segment from the street data sourceQuery the property tax source to get the number of parcels before and after the current addressApproximate the location of the address based on the new values
![Page 16: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/16.jpg)
Corner lot problem
Number of dimensions on the street =number of lots on the street +
corner lot
![Page 17: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/17.jpg)
AlgorithmGet the street data from the street-data-sourceGet number of lots before and after the current address from the property data sourceAdd a corner lotCalculate the street length in terms of earth coordinatesCalculate the lot size based on the street length and the number of lots on the streetInterpolate the location of the address based on the average lot size
![Page 18: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/18.jpg)
Address-range (traditional) method
![Page 19: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/19.jpg)
Uniform lot-size method
![Page 20: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/20.jpg)
Actual lot-size method
The corner lot problem motivates us to optimize furtherPalm St, I do worse than traditional approachPossible only if the lot sizes available in the Property Tax sitesCompute the sizes of each of the lots/streets and then run a matching algorithmWorks on rectangular blocks
![Page 21: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/21.jpg)
136
240
482575
256
240
420575
204
240
482533
324
240
420533
136
120
542575
256
120
482575
204
120
542533
324
120
482533
136
256
542482
256
256
482482
204
256
542440
324
256
440
136
375
482482
256
375
420482
204
375
482440
324
375
440
482
420
![Page 22: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/22.jpg)
Finding the optimal layout
Calculate the actual length and breadth (width) of the block using the information in the street data source[length, width]
Truedim
257
257
480480
![Page 23: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/23.jpg)
Finding the optimal layout
Get the coordinates of the block from the street data sourceQuery the property source and get the dimension of every lot on the blockCompute the dimensions of the 16 possible orientationsCompare these with the true dimensionThe layout that most closely matches / least error is chosen as the layout
![Page 24: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/24.jpg)
Integrating data sources
Unified Query InterfaceLarge number of property sitesQuery a single relations
Different property sources for different placesNew York: State, Los Angeles: CountyDisparate representations : structure and attribute namesStreet Data: organized by county or states
![Page 25: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/25.jpg)
Source Descriptions
Describe the Source as view over Domain description
A single property relation
Three types of SourcesProperty TaxProperty Tax with details of dimensionsStreet Data Sources
![Page 26: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/26.jpg)
PropertyTax
USPDR
PropertyTaxCA PropertyTaxNY
State = ‘CA’ State = ‘NY’
PropertyTaxLA PropertyTaxSF
LA Property SF Property
County = ‘LA’ City = ‘SF’
LAProperty(sa, ci, st, zi, fraddr, fraddl, toaddr, toaddl, before, after) :-
PropertyTax(sa, ci, co, st, zi, fraddr, fraddl, toaddr, toaddl, before, after, lotwidth, lotdepth)^
(co = ‘Los Angeles’)^
(st = ‘CA’)
![Page 27: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/27.jpg)
UniformLotSizeGeocoder
PropertyTaxStreet
JoinUniformLotSizeApproximation
Join
UniformLotSizeGeocoder(sa, ci, co, st, zi, lat, lon):-
Street(sa, ci, co, st, zi, frlat, frlon,tolat, tolon, fename, fetype, zipl, zipr, fraddr, fraddl, toaddr, toaddl)^
PropertyTax(sa, ci, co, st, zi, fraddr, fraddl, toaddr, toaddl, before, after,lotwidth, lotdepth)^
UniformLotApproximation(frlat, frlon, tolat, tolon, before, after, lat, lon)
![Page 28: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/28.jpg)
Query
•Inverse the source descriptions
•Generate datalog program to solve the query
![Page 29: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/29.jpg)
Datalog program generated
![Page 30: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/30.jpg)
Advantage of this model
GLAV (Global-Local as View)Easy to add new sources
![Page 31: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/31.jpg)
ResultsChosing a region
El Segundo
Data SourceConflated TIGER/Lines
Fetch Agent Platform to convert website data into XMLPrometheus 2.0 information mediatorGeocoded 267 addresses spanning 13 blocksActual lot-size method could not be applied to 58 addressesNone of the methods could be applied to one addressResults based on the remaining 208 addresses
![Page 32: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/32.jpg)
N
Chosen area for goecoding
![Page 33: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/33.jpg)
Driving distance
![Page 34: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/34.jpg)
Address-range (traditional) method
![Page 35: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/35.jpg)
Uniform lot-size method
![Page 36: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/36.jpg)
Actual lot-size method
![Page 37: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/37.jpg)
506 Oak A
ve
504 Oak A
ve
508 Oak A
ve
512 Oak A
ve
510 Oak A
ve
514 Oak A
ve
518 Oak A
ve
501 E Palm A
ve
505 E Palm A
ve
509 E Palm A
ve
513 E Palm A
ve
519 E Palm A
ve
521 E Palm A
ve
591 E Palm A
ve
![Page 38: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/38.jpg)
501 Mariposa Ave
511
Mar
ipos
a A
ve
517
Mar
ipos
a A
ve
523
Mar
ipos
a
525
Mar
ipos
a52
7 M
arip
osa
535 Mariposa Ave
615 Penn St
609 Penn St
627 Penn St
621 Penn St
633 Penn St
639 Penn St
645 Penn St
524
Palm
Ave
520
Palm
Ave
610 Sheldon St
622 Sheldon St
628 Sheldon St
634 Sheldon St
640 Sheldon St
646 Sheldon St
616 Sheldon St
![Page 39: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/39.jpg)
Comparison of Results
7.8024256.6407273.80526Maximum Error
0.034870.070860.86578Minimum Error
1.469589.9236120.49335Standard Deviation
1.629937.8714936.85359Average Error
Actual lot-sizeUniform lot-sizeAddress-range(all errors are in meters)
Average percentage of improvement over traditional approach
Uniform lot-size method: 78.65%Actual lot-size method: 95.59%
![Page 40: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/40.jpg)
Address Range Methodµ = 36.85 σ =20.49
Uniform lot-size Methodµ = 7.87 σ = 9.92
Actual lot-size Methodµ = 1.63 σ = 1.47
Error in meter
Pro
bab
ilit
y
Normal Distribution of the error
![Page 41: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/41.jpg)
Related WorkCayo, M. R. and T. O. Talbot (2003) Positional error in automated geocoding of residential addressesRatcliffe (2001) On the accuracy of TIGER-type geocoded address data in relation to cadastral and census areal unitsKrieger et al. (2001) Evaluating the accuracy of geocoding in public health research Gupta, Marciano et al.(1999) Integrating GIS and Imagery through XML-Based Information Mediation
![Page 42: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/42.jpg)
Conclusion & Future Work
More accurate geocoding achievedIntegrating other sources to get property dataSolved the address-validating problemExtend the actual lot size method to non-rectangular blocksIntegrate more property tax data sources
![Page 43: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/43.jpg)
Acknowledgements
Thanks to Craig for his valuable guidance, Snehal for help with the algorithms and implementation, Shou-de for the calculations in the actual lot size methodThanks to Cyrus Shahabi and John Wilson
![Page 44: Geocoding – the Columbus way! Bakshi. About the Research](https://reader031.fdocuments.in/reader031/viewer/2022030511/5abbd2497f8b9ad1768d2204/html5/thumbnails/44.jpg)
Questions / Comments