Understanding the Semantics of Places in Gazetteers via Spatial...

22
1 AAG 2016 Rui Zhu Spatial Data Mining & Big Data Analytics Understanding the Semantics of Places in Gazetteers via Spatial Analysis Rui Zhu STKO Lab Department of Geography University of California, Santa Barbara

Transcript of Understanding the Semantics of Places in Gazetteers via Spatial...

Page 1: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

1AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Understanding the Semantics of Places in Gazetteers via Spatial Analysis

Rui Zhu

STKO LabDepartment of Geography

University of California, Santa Barbara

Page 2: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

2AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Outline

● Motivations● Data sets● Methods● Results & Discussions ● Future work

Page 3: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

3AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Motivations

● A wide variety of gazetteers

Data Integration/Federated queries

Page 4: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

4AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Motivations (cont.)

● Traditional techniques for integration/ alignment:

– Expert guess

– String similarity measures; e.g. Levenshtein distance

– Network similarity measures. e.g. Structure equivalence

Page 5: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

5AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Motivations (cont.)

● However, their geo-ontologies/ typing schema are different!

Mountains In

DBpedia Places

Mountains In

Geonames

Both mountain peaks and ranges

Only mountain peaks

Page 6: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

6AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Motivations (cont.)

● How to understand such semantic heterogeneity among gazetteers?

Semantic Signatures

Janowicz, K., 2012. Observation-driven geo-ontology engineering. Transactions in GIS 16 (3), 351–374.

Spatial

Temporal

Thematic

Spatial Analysis

Spatial Semantic Signature

Page 7: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

7AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Data Sets

DBpedia Places GeoNames Getty Thesaurus of Geographic Names (TGN)

Extracted fromDBpedia articles

Formal geographical database that contains over eight millionplace names

Focus on places that areculturally or historically significant

72 234 285Number of feature types

Page 8: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

8AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Data Sets (cont.)

● Common information– Toponyms/Place names

– Geographic feature type

– Spatial footprints

Dams in GeoNames

Page 9: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

9AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods

● Overview:

Spatial Point Patterns Analysis

Spatial Autocorrelation Analysis

Spatial Interaction Analysis

Spatial Semantic Signatures

Comparison of Geographic Feature Types

FeatureExtraction

MDS

Align Geo-ontologies(i.e. match geographic

feature types)

LearningModels

Page 10: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

10AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods (cont.)● Spatial Point Pattern Analysis

Sampling for local analysis

Generate random points (Complete Spatial Randomness)

Points for one place type

Select nearest 100 neighbors for each random points

Conduct spatial point pattern analysis on these 100 neighbors

Average the statistics over all random points

Page 11: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

11AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods (cont.)

● Spatial Point Pattern Analysis (cont.)

Spatial Semantic Signatures (Local)– Intensity of point patterns

– Distance to nearest neighbor

– Ripley's K (i.e. range and mean deviation from the theoretical values)

– Kernel density estimation (i.e. bandwidth and range)

– Standard deviation ellipse (i.e. rotation, std. along x-axis and y-axis)

Ripley's K, Kernel density estimateion and Standard deviation ellipse for Dams in GeoNames

Page 12: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

12AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods (cont.)

● Spatial Point Pattern Analysis (cont.)

Spatial Semantic Signatures (Global)– Overall intensity of point patterns

– Kernel density estimation (i.e. bandwidth and range)

Dams in GeoNames

Page 13: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

13AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods (cont.)

● Spatial Autocorrelation Analysis

Conversion: Point data →Raster Map

Dams in GeoNames

Cell value: number of instances falling in the cell

Cell size : 36 km * 22.2 km

Page 14: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

14AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods (cont.)

● Spatial Autocorrelation Analysis (cont.)

Spatial Semantic Signatures – Global Moran's I

– Sample Semivariogram (i.e. semivariances at first, median and last lag distances).

Page 15: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

15AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods (cont.)

● Spatial Interaction with other geographic features

Spatial Semantic SignaturesPopulation (LandScan2014)

Population for each feature point

Road Segment (Digital Chart of the World)

Distance to nearest segment for each feature point

● Mimimum● Maximum

● Mean● Standard deviation

Page 16: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

16AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Methods (cont.)

Transform high-dimensional feature space to 2-dimensional

(Multidimensional scaling)

24 features for spatial semanticsignature

Visualize it on a 2D map

Page 17: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

17AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Results and Discussions

● Same names and similar spatial patterns

Page 18: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

18AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Results and Discussions(cont.)

● Same names but different spatial pattern

Page 19: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

19AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Results and Discussions(cont.)

● Different names but similar spatial pattern

Page 20: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

20AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Results and Discussions(cont.)

● Different names and different spatial pattern

Page 21: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

21AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Future work

● Derive additional statistical features to represent the spatial semantic signature (e.g. statistics for co-occurrence, topological relations);

● Quantify the dissimilarity/similarity of place types using such spatial semantic signatures (e.g. supervised/ unsupervised learning algorithms);

● Combine spatial signatures with previously studied temporal and thematic signatures;

● Integrate this study (bottom-up) with classical top-down knowledge engineering.

Page 22: Understanding the Semantics of Places in Gazetteers via Spatial …geog.ucsb.edu/~zhu/presentation_slides/AAG2016_RuiZhu.pdf · 2019. 9. 18. · 11 AAG 2016 Rui Zhu Spatial Data Mining

22AAG 2016 Rui Zhu Spatial Data Mining &

Big Data Analytics

Special thanks to:

Yingjie Hu, Krzysztof Janowicz and Grant McKenzie

Questions and/or comments?