Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a...
-
Upload
regina-day -
Category
Documents
-
view
213 -
download
1
Transcript of Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a...
![Page 1: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/1.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
On building a high performance
gazetteer database
Amittai AxelrodMetaCarta Inc
![Page 2: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/2.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Thanks to
Keith Baker
Kenneth Baker
Michael Bukatin
András Kornai
![Page 3: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/3.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Plan of the talk
• Database background
• Relating geographic names and features
• Handling ambiguities and inconsistencies in geographic names
• Classification and storage system for geographic features
![Page 4: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/4.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Databases
• No DB (faking it with flat files) -- clumsy
• Record-oriented -- still runs the world
• Relational -- making headway
• Object-oriented -- still very academic
• For MetaCarta GazDB, relational approach made most sense:• Overlapping records (McKinley/Denali)• Need for frequent updates of subparts of
records
![Page 5: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/5.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Gazetteer production process
![Page 6: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/6.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Conversion scripts
• Enforce uniform structure on the data
• Normalize across sources (e.g. lat/lon to decimal degrees, spelling, …)
• Configuration required once per source
• Load data in GazDB
• Combination perl/SQL
![Page 7: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/7.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Relating features and names
![Page 8: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/8.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Other tables used in GazDB• Population• Elevation• Language• Feature type• Source/versioning info• Temporal extent• Hierarchical information• Confidence• Comments• Change logs (full auditing)
![Page 9: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/9.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Geographic names
• Internationalization• Full Unicode (UTF8) support• Maintain detail language information (SIL)
• Name resolution • Canonical form (16 bits)• Display form (8 bit)• Search form (6 bit)
• Authoritativeness
• Explicitness
![Page 10: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/10.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Updating a name in the GazDB
![Page 11: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/11.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Geographic features
• Spatial representations • Point, line, area, …
• Functional classes• Building, field, campus, city, …
• Administrative types• Nation, province, county, international org, …
![Page 12: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/12.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Export scripts
• Read GazDB
• Select which fields to include in custom output
• Creates .gbdm (MetaCarta format) binaries
• Combination perl/SQL
• Not yet general across binary output formats
![Page 13: Geographic Text Search Corporate Proprietary, Copyright 1999-2003, MetaCarta, Inc. On building a high performance gazetteer database Amittai Axelrod MetaCarta.](https://reader036.fdocuments.in/reader036/viewer/2022070401/56649f205503460f94c38d5e/html5/thumbnails/13.jpg)
Geographic Text SearchCorporate Proprietary, Copyright 1999-2003, MetaCarta, Inc.
Conclusions• Accept multiple sources (only configure
once per source)• Fast loading of large datasets (1m entries
per hour on linux desktop)• Simple update procedure• Outputting large binary custom gazetteers
for different purposes at extreme speeds (1m entries per minute)