Improving volunteered geographic data quality using semantic similarity measurements

26
1/26 Improving volunteered geographic data quality Improving volunteered geographic data quality using semantic similarity measurements using semantic similarity measurements Arnaud Vandecasteele - Arnaud Vandecasteele - Rodolphe Devillers Rodolphe Devillers Memorial University of Newfoundland, Canada Memorial University of Newfoundland, Canada 8th International Symposium on Spatial Data Quality, 30 May - 1 June 2013

description

 

Transcript of Improving volunteered geographic data quality using semantic similarity measurements

Page 1: Improving volunteered geographic data quality using semantic similarity measurements

1/26

Improving volunteered geographic data quality Improving volunteered geographic data quality using semantic similarity measurementsusing semantic similarity measurements

Arnaud Vandecasteele - Arnaud Vandecasteele - Rodolphe DevillersRodolphe Devillers Memorial University of Newfoundland, CanadaMemorial University of Newfoundland, Canada

8th International Symposium on Spatial Data Quality, 30 May - 1 June 2013

Page 2: Improving volunteered geographic data quality using semantic similarity measurements

2/26

Outline

Introduction

Conclusion

Semantic SimilarityP-Rank algorithmTobler's Law

OSM Semantic PluginDescriptionExamples

Page 3: Improving volunteered geographic data quality using semantic similarity measurements

3/26

IntroductionNational Mapping Agencies

What make National Mapping Agencies Authoritative ?

Positional Accuracy

Completeness

Attribute Accuracy

ISO 19113ISO 19115

...ISO 19157

Page 4: Improving volunteered geographic data quality using semantic similarity measurements

4/26

IntroductionGeographic Information Quality view asa Project Management Triangle

Page 5: Improving volunteered geographic data quality using semantic similarity measurements

5/26

IntroductionGeographic Information Quality view asa Project Management Triangle

Really?

Page 6: Improving volunteered geographic data quality using semantic similarity measurements

6/26

Introduction

Could Another Map be authoritative* ?

* and cheap, and fast, accurate and in the better of worlds free

Page 7: Improving volunteered geographic data quality using semantic similarity measurements

7/26

IntroductionVolunteered Geographic Information (VGI)

Page 8: Improving volunteered geographic data quality using semantic similarity measurements

8/26

IntroductionVolunteered Geographic Information (VGI)

the widespread engagement of large numbers of private citizens, often with little in the way of formal qualifications, in the creation of geographic information

Goodchild - 2007

Page 9: Improving volunteered geographic data quality using semantic similarity measurements

9/26

Source: http://wiki.openstreetmap.org/wiki/Stats

OpenStreetMap (OSM) is a collaborative project to create a free editable map of the

world

+ 1 million

+ 1.8 billion nodes+ 180 million ways+ 1.9 million relations

Started in 2004

IntroductionThe OpenStreetMap project

Page 10: Improving volunteered geographic data quality using semantic similarity measurements

10/26

IntroductionData Quality & Volunteered Geographic Information

What aboutData Quality ?

Good geometric accuracyHaklay – 2010, Girres and Touya – 2010, Ludwig et al., - 2011

ButGeographic coverage patchwork

Goodchild - 2007

Semantics can be inconsistentBallatore et al., - 2012, Mooney and Corcoran - 2012

Page 11: Improving volunteered geographic data quality using semantic similarity measurements

11/26

Introduction

VGI changed the way we produce, publish and share Geographic Information

BUT

Semantic Quality is still an important issue

How to improve semantic quality using a VGI approach ?

Research Problem

Page 12: Improving volunteered geographic data quality using semantic similarity measurements

12/26

Semantic SimilarityWhat is Semantic Similarity ?

Landuse =

Forest

How to describe a forest in OpenStreetMap

Natural =

Wood

One concept, different representation !Q ? -> When should we use landuse=forest rather than natural=wood?* https://help.openstreetmap.org/questions/324/when-should-we-use-landuseforest-rather-than-naturalwood

11 different answers and no real general agreement

Page 13: Improving volunteered geographic data quality using semantic similarity measurements

13/26

Semantic SimilarityHow to measure the semantic similarity ?

● Geometric Model● Feature Model● Alignment Model● Network models● Transformation Model

Different models exist:

Semantic similarity applied to VGI:

Mooney and Corcoran - 2012

Ballatore et al., - 2012

Natural =

Wood

Landuse =

Forest

Natural =

Wood

Landuse =

Forest

Natural =

Wood

Landuse =

Forest

Measure?

Semantic Network created from the OpenStreetMap Wiki

Point Pattern analysis and semantic pattern

Page 14: Improving volunteered geographic data quality using semantic similarity measurements

14/26

Semantic SimilaritySemantic Network from the OSM Wiki, who it works ?

Page 15: Improving volunteered geographic data quality using semantic similarity measurements

15/26

Source: OSM WIKI

Semantic SimilaritySemantic Network from the OSM Wiki

Page 16: Improving volunteered geographic data quality using semantic similarity measurements

16/26

Measuring Semantic similarity

Two entities are similar if :

1 They are referenced by similar entities

2 They reference similar entities

A B

C

=

A B

C

=

Semantic Similarity

P-Rank Algorithm

Page 17: Improving volunteered geographic data quality using semantic similarity measurements

17/26

Semantic similarity

all things are related, but nearbynearby things

are more relatedrelated than distant things“

”Tobler - 1970

Semantic similarity and Geography

Tobler's first law of geography

Page 18: Improving volunteered geographic data quality using semantic similarity measurements

18/26

New Object in a cityNew Object in a cityA

P-Rank score

P-R

an

k s

core

P-Rank sc

ore

P-R

ank s

core

P-Rank score

P-R

ank

score

Semantic similarityApplied Tobler's first law to semantic similarity

Page 19: Improving volunteered geographic data quality using semantic similarity measurements

19/26

Java OpenStreetMap Editor

OpenStreetMap Semantic Plugin

OSM Editor usage stats (source OSM Wiki)

Page 20: Improving volunteered geographic data quality using semantic similarity measurements

20/26

Description

OpenStreetMap Semantic Plugin

Page 21: Improving volunteered geographic data quality using semantic similarity measurements

21/26

A BP-Rank Score

0.18

A CP-Rank Score

0.35

A DP-Rank Score

0.05

How similar are they ?

P-Rank scores

OpenStreetMap Semantic Plugin (aka OSMantic)Description

A

AC

Page 22: Improving volunteered geographic data quality using semantic similarity measurements

22/26

Creation of a new objectExamples - Creation of a new object

New object

Page 23: Improving volunteered geographic data quality using semantic similarity measurements

23/26

OpenStreetMap Semantic PluginExamples - Edition of an existing object

Page 24: Improving volunteered geographic data quality using semantic similarity measurements

24/26

OpenStreetMap Semantic PluginExamples – Semantic Similarity Evaluation

Page 25: Improving volunteered geographic data quality using semantic similarity measurements

25/26

Conclusion

The next big question ?

When will VGI be the next authoritative dataset ?

Semantic Similarity can be used to enhance the quality of VGI dataset

OSM Semantic plugin uses a collaborative approach to reduce the potential semantic similarity

How to improve the results:● Using the Tag Info database to know the most used tags ● By mixing the Geographic and the semantic approach (Ballatore + Mooney)

Page 26: Improving volunteered geographic data quality using semantic similarity measurements

26/26

Questions ?

Rodolphe DevillersMarine Geomatics Labhttp://www.marinegis.com/ Memorial University of Newfoundland

Acknowledgements

Natural Science and Engineering Research Council of Canada (NSERC)Andrea Ballatore for sharing his results