The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to...

13
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=tgsi20 Download by: [Universitaetsbibliothek Heidelberg] Date: 18 September 2017, At: 00:49 Geo-spatial Information Science ISSN: 1009-5020 (Print) 1993-5153 (Online) Journal homepage: http://www.tandfonline.com/loi/tgsi20 The OpenStreetMap folksonomy and its evolution Franz-Benjamin Mocnik, Alexander Zipf & Martin Raifer To cite this article: Franz-Benjamin Mocnik, Alexander Zipf & Martin Raifer (2017) The OpenStreetMap folksonomy and its evolution, Geo-spatial Information Science, 20:3, 219-230 To link to this article: http://dx.doi.org/10.1080/10095020.2017.1368193 © 2017 Wuhan University. Published by Informa UK Limited, trading as Taylor & Francis Group Published online: 18 Sep 2017. Submit your article to this journal View related articles View Crossmark data

Transcript of The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to...

Page 1: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=tgsi20

Download by: [Universitaetsbibliothek Heidelberg] Date: 18 September 2017, At: 00:49

Geo-spatial Information Science

ISSN: 1009-5020 (Print) 1993-5153 (Online) Journal homepage: http://www.tandfonline.com/loi/tgsi20

The OpenStreetMap folksonomy and its evolution

Franz-Benjamin Mocnik, Alexander Zipf & Martin Raifer

To cite this article: Franz-Benjamin Mocnik, Alexander Zipf & Martin Raifer (2017) TheOpenStreetMap folksonomy and its evolution, Geo-spatial Information Science, 20:3, 219-230

To link to this article: http://dx.doi.org/10.1080/10095020.2017.1368193

© 2017 Wuhan University. Published byInforma UK Limited, trading as Taylor &Francis Group

Published online: 18 Sep 2017.

Submit your article to this journal

View related articles

View Crossmark data

Page 2: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

GEO-SPATIAL INFORMATION SCIENCE, 2017VOL. 20, NO. 3, 219–230https://doi.org/10.1080/10095020.2017.1368193

OPEN ACCESS

The OpenStreetMap folksonomy and its evolution

Franz-Benjamin Mocnik , Alexander Zipf and Martin Raifer

Institute of Geography, Heidelberg University, Heidelberg, Germany

ABSTRACTThe comprehension of folksonomies is of high importance when making sense of VolunteeredGeographic Information (VGI), in particular in the case of OpenStreetMap (OSM). So far, only littleresearch has been conducted to understand the role and the evolution of folksonomies in VGI andOSM, which is despite the fact that without a comprehension of the folksonomies the thematicdimension of data can hardly be used. This article examines the history of the OSM folksonomy,with the aim to predict its future evolution. In particular, we explore how the documentation ofthe OSM folksonomy relates to its actual use in the data, and we investigate the historical andfuture scope and granularity of the folksonomy. Finally, a visualization technique is proposed toexamine the folksonomy in more detail.

ARTICLE HISTORYReceived 2 June 2017Accepted 5 August 2017

KEYWORDSVolunteered GeographicInformation (VGI);OpenStreetMap (OSM);folksonomy; taxonomy;evolution; granularity;visualization

1. Introduction

Geographical information is often regarded as expos-ing spatial, temporal, and thematic aspects. Goodchild(2007) has, for example, coined the term geo-atom fordata explicitly exposing spatial, temporal, and thematicdimensions (Goodchild 2007). Such a view on geo-graphic information also applies to many examples ofVolunteeredGeographic Information (VGI), which ex-pose these dimensions. Specifications for spatial andtemporal aspects exist, for example, for a location rep-resented by a pair of coordinates in a given coordinatesystem, or for a point in time represented in Coordi-nated Universal Time (UTC) and formatted accord-ing to the ISO 8601 (ISO 2004). Thematic aspects arethough harder to be formalized in general due to theirmore manifold and often more complex nature, andtaxonomies or ontologies have to be established foreach data-set in order to translate between the formalsymbols of the data and theirmeanings. As VGI is oftencreated and improved in a community-driven process,the data aswell as its taxonomy is heterogeneous and re-flects the needs and views of the community members.The taxonomy is thus, inmany cases, never entirely for-mally written down, and some classes of the taxonomyare used by many contributors while others are onlyadopted by single ones. In case of such a community-driven creation process that is not centrally steered norcoordinated, taxonomies are often called folksonomiesto reflect the decisiveness of the community, the het-erogeneity and the resulting rather weak formalization.

OpenStreetMap (OSM) can be regarded as being oneif not the most characteristic example of VGI. With theaim to produce maps and to offer environmental data

CONTACT Franz-Benjamin Mocnik [email protected]

for other purposes, the OSM project targets at repre-senting the environment. Each feature is represented byan element, either a point feature, called a node, havinga location; a polyline, called a way, composed by sev-eral nodes; or a relation between other elements. TheseOSM elements are thematically characterized by tags.Each of these tags consists of a key and a value, oftenwritten as "key"="value". In principle, contribu-tors can use such tags freely without any specificationthat would restrict possible keys or values. Accordingly,many different tags are used (more than 89 millions asof June 2017, Taginfo 2017), and their meanings arenot necessarily communicated to other contributorsor users. The most important tags are documented ina wiki.1 The documentation is, however, incompletebecause a folksonomy is, by definition, open to changesby every contributor, and conflicting versions exist dueto translations into different languages.

The thematic information represented in the dataare, in case of OSM, reflected by the folksonomy, a factwhich can be used to predict the future development ofthe data when analysing the folksonomy. Which scopeof the data can be expected in the future? How fine-grainedwill the representation be? Can different phasesof the evolution be identified? etc. Despite the obviousrelevance of these questions, only little research aboutthe OSM folksonomy, and even about folksonomiesin VGI in general, has been conducted. This articleapproaches the general understanding of the evolutionof theOSMfolksonomyas awhole by statistically exam-ining the properties of the folksonomy. Amore detailedcomprehension of single tags remains for further exam-ination; we though provide a visualization technique to

© 2017 Wuhan University. Published by Informa UK Limited, trading as Taylor & Francis GroupThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricteduse, distribution, and reproduction in any medium, provided the original work is properly cited.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 3: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

220 F.-B. MOCNIK ET AL.

tackle this issue. In particular, we address the followingresearch questions (RQ) in this article:

RQ1: Acknowledging that there is no formal require-ment to document the folksonomy, how doesthe folksonomy used in the OSM data-set re-late to its documentation in the OSM wiki?This question is of particular interest because thedocumentationof the folksonomy is easy to anal-yse, while the analysis of the folksonomy usedinside the OSMdata-set would require extensivecomputation power and a more sophisticatedstatistical examination. We address this ques-tion by comparing when key-value pairs werefirst used in the data, and when they were firstdocumented. (Section 3)

RQ2: More and more key-value pairs are introducedover time. How did the OSM folksonomychange in the past, and how will it evolve inthe future? In particular, we aim at showing thatonly a limited number of keys and values will beintroduced if current trends continue, and weestimate this number of keys and values. Thisapproach renders possible an understanding ofhow the scope of OSM folksonomy may evolve,and how fine-grained the representation will be-come. (Section 4)

RQ3: The scope of the folksonomy can be expectedto increase over time, and the folksonomy canbe expected to becomemore fine-grained and tobe increasingly documented. Can we identifyseveral phases in the evolution of the OSMfolksonomy? RQ1 and RQ2 aim at understand-ing the changing scope, granularity, and docu-mentation in more detail. We address the thirdresearch question by comparing the results ofthe preceding research questions. (Section 5)

RQ4: The OSM folksonomy is complex and subject toregular modifications. Many decisions to mod-ify the folksonomy, or its documentation, resultfrom the need for new values, or even from plan-ning processes. These factors can be understoodby manually retracing when new values were in-troduced, or when values were deprecated.Howcan we visualize the OSM folksonomy in orderto understand its evolution at the level of in-dividual keys and values? The authors are notaware of any already available visualization ofthe history of the OSM folksonomy.We proposea new visualization technique, which is able toaddress this research question. (Section 6)

2. Related work

VGI, and OSM data in particular, has been examinedin many studies, and a number of tools exist to browse

the data, including its folksonomy. A commonly usedmethod is to display parts of the OSM data-set in aninteractive map, which provides additional informa-tion on request. There exist, for example, a number ofsoftware tools to view OSM data (www.openstreetmap.org,mobile viewers, etc.), to use data for further investi-gations (geographic information systems, in particularQuantum GIS, ArcGIS, etc.), and software tools to editOSM data (iD, Potlach 2, JOSM, Maps.me, Vespucci,etc.). These tools concentrate on the examination ofcurrent OSM data, often including thematic informa-tion, but historic data are mostly excluded. It is,however, the temporal dimension which enables theexamination of the evolution of OSM data.

Several software tools examine and visualize the cre-ation process of only a small part of the OSM data-set.The application show-me-the-way (www.github.com/osmlab/show-me-the-way), for example, visualizes therecent changes of OSM data with only a short delay.While this application provides an understanding ofhow boundaries of elements are mapped, it does notprovide holistic insights about the entire creation pro-cess. The history of an OSM element can be exam-ined by the application osm-deep-history (www.github.com/osmlab/osm-deep-history); a collection of chan-ges submitted as a “changeset” can be examinedusing the Augmented OSM Change Viewer(overpass-api.de/achavi); and the tool Who did it?(zverik.osm.rambler.ru/whodidit/) provides informa-tion about local changes. Similar tools exist or did exist.

Information about the folksonomy, in particular,about the tags used to thematically describe OSM el-ements, has been collected and aggregated by severalwebsites such as Taginfo (taginfo.openstreetmap.org)and Tagfinder (tagfinder.herokuapp.com). These web-sites summarize information provided by the OSMwiki, whereby this information is further enhanced byconsidering statistics about the usage of tags intheOSMdatabase, and by information about howotherprojects use these tags. The tool OSM Tag History(taghistory.raifer.tech) visualizes the usage of a tag intheOSMdatabase by a line chart. ThewebsiteOSMstats(osmstats.neis-one.org) examines even other statisticaldata about the OSM data-set and the users, and visual-izes the data by line charts. The geospatial distributionof elements tagged as buildings or roads can be exam-ined by OpenStreetMap Analytics (osm-analytics.org).The websiteOSMatrix provides tools to, among others,statistically analyse the use of tags (Roick et al. 2012,2011). A detailed statistical analysis of OSM users hasbeen provided by Mooney and Corcoran (2012b).

The evolution of OSM has been studied widely bytracing how metric properties and the topology of therepresented street network evolve (Neis et al. 2012; Cor-coran and Mooney 2013). Arsanjani et al. (2015) even

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 4: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

GEO-SPATIAL INFORMATION SCIENCE 221

simulated the potential evolution of OSM through acellular automatamodel. In a study of the road networkin Beijing, Zhao et al. (2015) demonstrated how themapping behaviour advances OSM data, in particular,how the evolution of the road network is shaped byexploration and densification activities. These papersexamine, however, only to a minor extent the folkson-omy but rather focus on spatial and temporal featuresof the data. Other studies relate OSM data, at least tosomedegree, to the folksonomy, but donot examine thehistory. Zielstra et al. (2013) have, for example, assessedthe effect of data imports, differentiating between dif-ferent tags that stand for different categories of roads.

The quality of OSM data has been discussed in re-spect to the folksonomy. Barron et al. (2014) have, forexample, discussed the quality of OSM data in termsof different factors, one of which is the number oftags of an element. A more thorough discussion onthe conceptual quality of OSM has been provided byBallatore and Zipf (2015). They discuss different di-mensions of conceptual quality, including the accuracy,the granularity, the completeness, the consistency, thecompliance, and the richness of the data, by consideringthe folksonomy documented in the OSM wiki and thetaxonomies provided in different editors. The discus-sion does, however, not consider the evolution of thefolksonomy to a greater extent. The tagging practicesrelated to OSM have been examined in greater detailby Davidovic et al. (2016). The study examines, in par-ticular, how well features have been tagged by differentusers. Mooney and Corcoran (2012a) have examinedhow the tags associated to an element change over time,and how the lack of control mechanisms can affect dataquality. Finally, Aliakbarian and Weibel (2016) haveshown how to make use of the OSM folksonomy whengeneralizing information for maps.

Folksonomies have been examined in many stud-ies (Trant 2009). Shen and Wu (2005) describe folk-sonomies as complex networks, whereby the lattermight reflect the evolution of the former: the discussedlaws of complex networks have been shown to often bethe result of a temporal process. The dynamic aspects,resulting from the collaboration of many contributors,have been discussed by Golder and Huberman (2005).The general evolution of folksonomies has been dis-cussed by Gendarmi and Lanubile (2006), with the aimto provide methods to apply community-driven evolu-tion to ontologies.

3. Documentation of the folksonomy

TheOSM folksonomy is created by the use of tags in thedata, but a documentation in the OSM wiki is availableto foster a common view on which tags are meaning-ful. As a folksonomy, the collection of tags is neitherplanned nor controlled by a central instance. It is rather

the result of, at least in parts, independent decisions byindividual contributors. These contributors, however,need to agree on common keys and values if their datashall be usable on a larger scale – how could the dataotherwise be interpreted when, for example, creating amap? Such agreements are discussed in the community,usingmailing lists or personal discussions, and they aresubsequently often documented in the OSM wiki. As aresult, the folksonomycan, at least inparts, be examinedby analysing its documentation. This section examineshow good the documentation of the folksonomy is, andwhat can accordingly be followed about the taxonomyby an analysis of its documentation.

The first use of a tag in the data and the first docu-mentation of the tag are compared in Figure 1. Whilethe date of the first documentation in the wiki is veryclear, it is not clear when a tag shall be considered asbeing used in the data. There are more than 89 millionsdistinct tags being used as of June 2017 (Taginfo 2017),and a single use of a tag may thus not be considered asrelevant. As can be seen in Figure 1(a), tags are, withonly minor exceptions, used in the data before beingdocumented. This behaviour reflects that the folkson-omy is created by its use, rather than by a centrallycoordinated process with a strong formalization. Be-fore 2011, some tags were documented upon their firstuse. Corresponding contributors were thus most likelyaware that they are the first to use certain tags in thedata, and hence recognized the necessity to documentthese tags. The vast majority of tags documented after2013 have, however, been used before their documen-tation. The documentation can thus be regarded as arepresentation of the folksonomy that was defined inthe data, and not vice versa.

Themajority of the relevant and frequently used tagsare documented in the wiki, despite the fact that theirdocumentation is, with minor exceptions, created afterthe first use of the tags in the data. Most tags have,for example, been documented before having reached10% of its current use in the data (Figure 1(d)). Thesame effect can even be seen in case of the 100th use(Figure 1(b)) or 1% of its current use (Figure 1(c)).These figures do not depict those tags that are only usedin the data but never have been documented. In fact,the general examination of tags like "name"="NewYork City" inside the documentation would makelittle sense, because they are only used for one or fewspecific features, in the above example, for the City ofNewYork.While the documentationof the tags appearsmostly after their first use, the first documentation andthe first use in the data are only weakly correlated(tags are distributed in the lower triangle in Figure1(a)). There is, however, a linear correlation betweenthe first documentation and the time at which theybecome relevant (tags distributed around the diagonalin Figure 1(d)).

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 5: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

222 F.-B. MOCNIK ET AL.

Figure 1. Comparison of the use of a tag in the OSM database and its first documentation in the OSM wiki. (a) First use of thetag in the data. (b) 100th use of the tag in the data. (c) 1% of the current use of the tag in the data. (d) 10% of the current useof the tag in the data. Each blue disk represents a tag, and the size of the disk reflects how frequently the tag is used in the OSMdatabase. Only tags that are used at least 1000 times in the data and that are documented in the OSM wiki are included, tags withvalue "*" are excluded. Data from the OSM database/wiki © OpenStreetMap contributors (cf. http://openstreetmap.org/copyrightand http://wiki.openstreetmap.org/wiki/Wiki_content_license).

Major documentation efforts have been made in theearly years, with a focus on frequently-used tags. Tagsdepicted by larger disks in Figure 1(a) – tags that havebeen used extensively in the OSM data-set – appearsignificantly more often before 2009, regarding to boththeir use and their documentation. While this couldbe due to the high chance to adopt these concepts –they basically were introduced some time ago – onecan assume that the most essential concepts were in-troduced in early times, and many of these essentialconcepts can also be expected to be very frequentlyused. This demonstrates that the most frequently-usedtags were used and documented very early. Not onlyfrequently-used tags but also less frequently-used oneswere extensively documented before 2010. In case ofdisks being horizontally aligned in Figure 1, severaltags were documented at the same time, most likelyin a coordinated way, even though the tags have beenused in the data from different points in time. Suchcoordinated efforts can be observed between 2008 and2015.

How can we determine how the completeness ofthe documentation of the tags has been changing overtime? As has been discussed earlier, there is no sensein considering all tags, because many values are onlyused once or a few times in the data, as in the aboveexample of the City of New York. Instead, only relevanttags should be considered, and most of them seem tobe documented in the OSM wiki, according to ourpreviousfindings. This iswhywe consider as a statisticalpopulation only the currently documented tags τ thathave been used more than 1000 times in the data. Ata given point in time t, only a subset τt ⊂ τ of thesetags have been used in the data. The completeness of thedocumentation at a point in time t is, in the scope of thispaper, defined as the percentage of tags in τt that weredocumented at time t. While this definition necessarilyimplies that the documentation is complete in currenttimes, it can reveal about how the completeness evolvedover time (Figure 2).

After a period of ongoing documentation, the docu-mentation of the tags had reached a high level of com-

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 6: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

GEO-SPATIAL INFORMATION SCIENCE 223

pleteness. Figure 2(a) shows that the completeness ofthe documentation is increasing over time, which alsocould be an artefact of considering only currently doc-umented tags τ . The larger increase in the early yearscompared to later ones indicates, however, that theincrease is not only such an artefact. This impression isamplifiedwhen considering, instead of τt , the set of tagsτ ′t that have reached 10% of its current use in the data attime t: a rapid increase of the completeness happenedbetween 2008 and 2010, and the completeness in lateryears was always at around 90% or above (Figure 2(d)).Before 2008, the documentation was very incomplete(Figure 2(b) and (c)).

The results of this section have demonstrated thatthere exists a close relationship between the folksonomyand its documentation, which answers RQ1. Tags areusually first documented after being introduced in thedata, justifying the collection of tags to be called afolksonomy due to their, in large parts, uncoordinated

use in the data. Most tags are, however, documented assoon as they have become relevant due to their frequentuse in the data, making the documentation suitable forstudying the folksonomy. It can even be hypothesizedthat the documentation and the adoption of tags inOSM editors have an impact on the use of the tags inthe data.

4. Evolution of the folksonomy

The OSM folksonomy is evolving over time – it isextended andmodified by the use of new tags during thecontribution of data. As we have seen in the previoussection, we can analyse relevant parts of the folkson-omy by its documentation. This section tacklesRQ2 byanalysing how the documentation of the folksonomyhas been changing over time.

The number of keys and tags is growing over time.Figure 3 depicts the number of keys and tags, that is,key-value pairs, that have been documented in theOSM

Figure 2.Completeness of the documentation of the tags in theOSMwiki. (a) First use of the tag in the data. (b) 100th use of the tag inthe data. (c) 1% of the current use of the tag in the data. (d) 10% of the current use of the tag in the data. The depicted completenessrefers to how many tags of the currently documented tags have or have not, at a given point in time, been documented, despitehaving already been introduced in the data (first/100th/etc. use). The documentation is necessarily 100% complete at the currentdate, because only tags that are documented in the OSM wiki and that are used at least 1000 times in the data are consideredin the population of the statistics. Tags with value "*" are excluded. Data from the OSM wiki © OpenStreetMap contributors (cf.http://wiki.openstreetmap.org/wiki/Wiki_content_license).

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 7: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

224 F.-B. MOCNIK ET AL.

wiki. After a period of slower documentation (Figure3(a) and (b)), the number of documented keys and tagsis increasingly growing since 2008. This growth followsan exponential law with negative exponents in bothcases, approaching a constant value in future times.Assuming that the current behaviour continues, therewill be about 194 keys documented in the limit case,and 98% of this number will statistically be reached inthe third quarter of 2017. After late 2017, the numberof documented keys will, by and large, stagnate, whichmeans that the number of new keys will counterbalancethe number of removed keys if current trends continue.As canbe seen inFigure 3(a), deviations from this statis-tical trendmay occur. The number of tagswill approachabout 1213. According to the current prognosis, 98% ofthese will be reached in the fourth quarter of 2031, ifthe current trend is not subject to future changes.

The evolution of the keys and tags can be interpretedin terms of scope and granularity of the folksonomy.Keys are used to represent different themes. As valuesonly occur in certain combinations with these keys,the values can be regarded as being subordinate to thekeys. The scope of the folksonomy is accordingly onlydetermined by the keys, while the values determine thegranularity. Each value represents a sub-concept of akey. The more values are used for a certain key, themore fine-grained the subconcepts are. As the docu-mentation contains the relevant keys and values, weare able to measure the relevant scope and the relevantgranularity respectively.

The folksonomy of OSM will have reached its maxi-mal scope by late 2017, according to the above findings,while the granularity of the folksonomy is still becom-ing finer after this date. The granularity can be exam-ined in more detail by analysing the average numberof values per key (Figure 3(c)). This average numbervaries in early years, but follows a linear trend after2010. This linear growth shows that the folksonomybecomes increasingly fine-grained. When the numberof documented keys and values stagnates, the lineargrowth of the granularity will have to be stopped: thenumber of values per key will also stagnate.

Most keys have only one value, such as"tunnel"="yes" or "width"="number". Inthe first case, the concept of a tunnel is not very fine-grained, and the value "yes" just indicates that thefeature is, in fact, a tunnel. In the second case, thewidth is provided and represented as a number. Bothexamples are very typical, as can be seen in Figure 4(a):most keys have only one value, evenwhen excluding thevalue "*". Keys with many values were created in thefirst years (Figure 4(b)). The fact that these keys havecurrently many documented values is not an effect ofthe long time since their creation. Instead, the numberof values of these keys has also been growing muchquicker than for other keys. The keys "shop" and

Table 1. Phases in the evolution of the OSM folksonomy.

Phase Years (prognosis Documentation Scope Granularityafter 2016)

Phase I –2007 Very little Growing RefiningPhase II 2008–2009 Growing Growing RefiningPhase III 2010–2017 Almost complete Growing RefiningPhase IV 2018–2031 Almost complete Stable RefiningPhase V 2032– Almost complete Stable Stable

"amenity" have 140 and 106 values, respectively.It comes not unexpected that these keys have mostdocumented values, because thematic information inthe geographic domain is often about places, and shopsand amenities are very important types of places.

The history of the folksonomydid follow simple lawsin the last years, which enables us to extrapolate its fu-ture development, as has been discussed in this section.This answers RQ2. In particular, we have argued thatboth the scope and the granularity will become stableover time, and we have derived the number of keysand values to expect in the limit case, if current trendscontinue.

5. Phases in the evolution of the folksonomy

The two preceding sections have shown how the folk-sonomyand its documentation change over time.Thesechanges are very different in earlier and in later years,and different trends can be identified. In this section,we aim at identifying different phases in the evolutionof the OSM folksonomy by considering in combinationthe different factors that already have been discussed.These considerations answer RQ3. An overview can befound in Table 1. It should be noted that the year spec-ifications of the phases IV and V are predictions; theypresume that current trends continue without changeand can only be seen as a prognosis.

Phase I: FoundationPhase (–2007): In the early yearsof OSM the folksonomy emerges. There ex-ists only very little documentation, and onlyvery little can be followed by examining thedocumentation.

Phase II: Documentation Phase (2008–2009): Thesecond phase is characterized by a growingdocumentation, until most relevant keys andvalues are documented. During this phase,the documentation reflects the folksonomyonly in parts, and it can only be conjecturedthat the number of keys and values isgrowing.

Phase III: PhaseofGrowingScopeandRefiningGran-ularity (2010–2017): The documentation isclose to completion in this stage, and rele-vant parts of the folksonomy can accordinglybe examined by its documentation in thisand subsequent phases. The number of keys

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 8: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

GEO-SPATIAL INFORMATION SCIENCE 225

Figure 3. Evolution of the keys and values over time. (a) Keys. (b) Tags (key-value pairs). (c) Values per key. The actual data are depictedby a solid blue line, and the fits, by a dashed red line. Figures (a) and (b) are fitted by the function f (x) = a + b · exp (−c · (x − d)),and (c) by a linear function. Keys and tags with value "*" are excluded. Data from the OSM wiki © OpenStreetMap contributors (cf.http://wiki.openstreetmap.org/wiki/Wiki_content_license).

Figure 4. Values per key. (a) Histogram of the values per key. (b) Values per key. Keys with value "*" are excluded. Data from the OSMwiki © OpenStreetMap contributors (cf. http://wiki.openstreetmap.org/wiki/Wiki_content_license).

and values increases, indicating the scopeof the folksonomy to be growing and thefolksonomy to become more fine-grained.This growth of the number of keys follows an

exponential law with negative exponent; theaverage number of values per key, a linearlaw.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 9: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

226 F.-B. MOCNIK ET AL.

Figure 5. Visualization technique for the documentation of the folksonomy in theOSMwiki in 2007. The nodes of the inner circle referto the documented keys,while the nodes around, to the corresponding values. The longer a value exists, themore itmoves away fromthe origin. Data from the OSM wiki © OpenStreetMap contributors (cf. http://wiki.openstreetmap.org/wiki/Wiki_content_license).

Phase IV: PhaseofRefiningGranularity (2018–2031):The number of keys has become stable, andthe scope of the folksonomy is, accordingly,not growing any longer. The number of tags,that is, of key-value pairs, still grows. Thefolksonomy becomes accordinglymore fine-grained, but it is not clear how exactly thegranularity is changing. Conjectured that thenumberof tags follows an exponential growthwith negative exponent also in this phase, thephase will last until the end of 2031.

Phase V: Phase of Stability (2032–): This phase ischaracterized by a non-changing number ofdocumented keys and values. For each newkey and value, there will, statistically, be anoldkeyor value respectively be removed from

the documentation. The relevant scope andgranularity of the folksonomy are expectedto not grow any longer, albeit they may stillevolve.

At the time of publication, the evolution of the OSMfolksonomy is at the end of phase III. Phases I to III are,accordingly, the result of the analysis of the previousevolution of the folksonomy and its documentation.Subsequent phases are, however, the extrapolation ofcurrent trends, and they are thus subject to unexpectedinfluences. Will the folksonomy unexpectedly becomeeven more fine-grained when the mapping of the envi-ronment, according to the existing folksonomy, reachesglobal completeness, while, at the same time, adheringto high quality standards?Will the aims ofOSM changeand the scope accordingly broaden in the future?While

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 10: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

GEO-SPATIAL INFORMATION SCIENCE 227

Figure 6. Visualization technique for the documentation of the folksonomy in the OSM wiki in 2012. Compare Figure 5. Data fromthe OSM wiki © OpenStreetMap contributors (cf. http://wiki.openstreetmap.org/wiki/Wiki_content_license).

the expected temporal boundaries of phases IV and Vmay not be proven to be true, the predicted evolutiondoes not come unexpected. The increase in the numberof keys has already significantly slowed down, whereasthe number of values is still increasing. It comes notunexpected either that the number of values will stag-nate, because the granularity can practically not refineforever.

6. Visualizing the evolution of the folksonomy

The OSM folksonomy is subject to constant change.The preceding sections only examine changes in thenumber of keys and values, but not all changes are re-flected by these numbers. In particular, these numbersdo not change when old keys and values are replaced

by new ones in phase V. These changes can, accord-ingly, not be analysed with the methods discussed inthe preceding sections. This section aims at findingvisualization techniques to explore these changes of thefolksonomy and thus provides answers to RQ4.

In Figures 5 and 6, the history of the documentationof the OSM folksonomy is visualized as a network. Thenodes in the inner circle refer to the keys, and the nodesoutside this circle refer to the values related to thesekeys. Both the keys and values are linked by lines incase they can occur in combination. The documentedkeys and values, and the combinations in which theyoccur, are changing over time. Accordingly, also theirdepiction varies over time. In the interactive visualiza-tion, which can be found online as part of the OSMvis-Project,2 the point in time can be chosen by a time

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 11: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

228 F.-B. MOCNIK ET AL.

slider. The nodes referring to the values are movingaway from the origin as time passes, which provides anintuitive understanding of how long a value has alreadyexisted at the depicted point in time. In addition, thenodes are enlarged, when the corresponding descrip-tions in the OSM wiki were updated at the depictedpoint in time. As the visualization only depicts the keys,values, and corresponding links that are documented inthe OSM wiki, it is appropriate for the comprehensionof the advancement of the relevant parts of the folkson-omy.

The visualization reflects howfine- or coarse-grainedthe folksonomy was during its history. In addition tothe discussion of the preceding sections, it does not onlyshow the number of keys and values, but also revealswhich key has which values. The important concept ofa "highway" was, for example, very coarse-grainedin 2007, that is, no value was documented for the key"highway", while the concept of an "amenity"was already much more fine-grained at that time (Fig-ure 5). The concept related to the key "sports" andrelated values were introduced in early 2010, but theconcept of buildings has becomemore fine-grained firstin late 2012 (Figure 6). These examples demonstratethat the folksonomy, or at least its documentation, has,in parts, developed in a possibly unexpected way. Thisis despite that the number of keys and values developedin a predictable way.

This section does not aim at discussing the keysand values in detail, but to rather demonstrate thata detailed comprehension can be gained by the pro-posed visualization technique. Such knowledge is, infact, useful for the understanding of how the conceptsfor OSM data are evolving and how they affect dataquality. In fact, severalmetrics of how good an ontologymight be in respect to different aspects and applica-tions have been developed (Burton-Jones et al. 2005;Fernández et al. 2009). As long as the environment,that is, the subject to describe, and the purposes forwhich OSM data are used do not drastically changeover time, the ontology – in our case the folksonomy –can be expected to be stable over time or to improveuniformly. The folksonomy can also be expected tobe of uniform granularity, if there are no particularreasons to model certain aspects of the environmentwith different granularities. If there happen unexpectedchanges of the scope or the granularity over time, adetailed understanding of which keys and values wereremoved or introducedmayprovide insights about dataquality. The history of the folksonomy is, in fact, anintegral part of the understanding of the quality of thedata and the folksonomy. The visualization providesanswers to such questions and thus to RQ4, becauseit renders a detailed understanding of the folksonomyat the level of individual keys and values, and of theirevolution over time.

7. Conclusions and future work

This article treats the OSM folksonomy and how itevolves over time. We have found evidence that tagsusually are first used in the OSM data and then are doc-umented, aligning well with the collection of tags beingregarded as a folksonomy that is created and evolves ina community-driven process. Despite the documenta-tion being created at a later point in time, it containsmost of the relevant tags, and the documentation of therelevant tags seems to be close to completion since alonger time. It has been shown that the evolution of thefolksonomyhas followed, at least in recent years, simplelaws, which provide insights into the future evolution ofthe folksonomy. We have, in particular, identified fivephases in this evolution, including an increasing scopeof the folksonomy (almost 200 keys expected in late2017; end of phase III) and a refining granularity (morethan 1200 values expected in late 2031; end of phaseIV), assuming that the evolution of the folksonomy canbe extrapolated from its history. In phase V, the scopeand the granularity will both be stable if current trendscontinue. Finally, we have introduced a visualizationtechnique to explore the folksonomy at the level ofsingle keys and values.

We have, in this article, examined the folksonomyas a whole. While some aspects of the evolution ofthe folksonomy can be comprehended by such an ex-amination, the motivation behind single changes canonly be comprehended by a more detailed analysis.Which keys become more important or more fine-grained over time? Are values systematically renamed?Which new topics are reflected by the folksonomy. Thevisualization presented in Section 6 provides a possibleapproach to systematically examine the folksonomy indetail, but further research is needed to obtain a bet-ter understanding of the predominant patterns. Evensupplementary or alternative visualizations might bedeveloped to stress different aspects of the folksonomy.

This article examined the English documentationof the OSM folksonomy only, despite the folksonomybeing documented in different languages in the OSMwiki. These language versions differ in their contentand length, and the comparison of these versionsmightreveal more information about the creation process ofthe documentation, as well as about its completeness.Future researchmight address the evolution of the folk-sonomy and data quality issues by examining thesedifferences between the different language versions indetail.

The OSM folksonomy is subject to change, becausethe environment and, even more important, also thepurpose of the data are changing. The folksonomy notonly adapts to these changes but also improves andreflects the zeitgeist. In consequence, data may refer toanoutdated tag, that is, a tagwhichhas been renamedorreplaced in the documentation in the meantime, or to a

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 12: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

GEO-SPATIAL INFORMATION SCIENCE 229

tag that has got another meaning. Such inconsistenciescan hardly be avoided and can even be seen as a charac-teristic of VGI. How do changes of the documentationand changes in the data relate? What influence doesthe vocabulary that has been adopted by OSM editingsoftware have? Future research may shed light on suchinteractions, in particular by viewing such interactionas a community-driven process.

We have discussed several phases of the evolutionin Section 5. These phases incorporate the evolution ofthe folksonomy as well as its documentation. Twitterhashtags, tags in Flickr, and folksonomies in similardata collections share many properties with the OSMfolksonomy. In how far do the phases of the evolution ofthe OSM folksonomy also apply to other folksonomiesand to examples of social tagging? Which of the ob-servations are specific to the evolution of the OSMfolksonomy, and why are they?

Notes

1. In the scope of this paper, OSM wiki refers to the En-glish language version of thewiki run byOSM, and thedocumentation of the folksonomy refers to the keys andvalues documentedonhttp://wiki.openstreetmap.org/wiki/Map_Features as well as on linked pages.

2. http://osm-vis.geog.uni-heidelberg.de.

Funding

This work has been partially supported by the DeutscheForschungsgemeinschaft (DFG) project A framework formeasuring the fitness for purpose ofOpenStreetMapdata basedon intrinsic quality indicators [grant number FA 1189/3-1].

Notes on contributors

Franz-Benjamin Mocnik is a postdoctoral researcher atHeidelberg University. His main interests are structures andlaws in geographical information science, often with a focuson data quality and Volunteered Geographic Information.

Alexander Zipf is a professor at Heidelberg University. Heis mainly engaged in the analysis of Volunteered GeographicInformation with a strong focus on data quality, as well as incrowdsourcing and citizens as sensors.

Martin Raifer is a researcher atHeidelbergUniversity. He isworking on innovative technology related to OpenStreetMapand open geodata in general, as well as on spatial data analysisand visualization.

ORCID

Franz-BenjaminMocnik http://orcid.org/0000-0002-1759-6336Alexander Zipf http://orcid.org/0000-0003-4916-9838

References

Aliakbarian, M., and R. Weibel. 2016. “Integration ofFolksonomies into the Process of Map Generalization.” In

Proceedings of the 19th ICA Workshop on Generalisationand Multiple Representation, Helsinki, Finland.

Arsanjani, J.,M.Helbich,M. Bakillah, and L. Loos. 2015. “TheEmergence and Evolution of OpenStreetMap: A CellularAutomata Approach.” International Journal of DigitalEarth 8 (1): 74–88. doi:10.1080/17538947.2013.847125.

Ballatore, A., and A. Zipf. 2015. “A Conceptual QualityFramework for Volunteered Geographic Information.”In Proceedings of the 12th Conference on SpatialInformation Theory (COSIT), 89–107. Santa Fe, NM.doi:10.1007/978-3-319-23374-1_5.

Barron, C., P. Neis, and A. Zipf. 2014. “A Com-prehensive Framework for Intrinsic OpenStreetMapQuality Analysis.” Transactions in GIS 18 (6): 877–895.doi:10.1111/tgis.12073.

Burton-Jones, A., V. Storey, V. Sugumaran, and P. Ahluwalia.2005. “A Semiotic Metrics Suite for Assessing the Qualityof Ontologies.” Data and Knowledge Engineering 55 (1):84–102. doi:10.1016/j.datak.2004.11.010.

Corcoran, P., and P. Mooney. 2013. “Characterising theMet-ric and Topological Evolution of OpenStreetMap NetworkRepresentations.” The European Physical Journal SpecialTopics 215 (1): 109–122. doi:10.1140/epjst/e2013-01718-2.

Davidovic, N., P. Mooney, L. Stoimenov, and M. Minghini.2016. “Tagging in Volunteered Geographic Information:An Analysis of Tagging Practices for Cities and UrbanRegions in OpenStreetMap.” ISPRS International Journalof Geo-Information 5 (12). doi:10.3390/Ijgi5120232.

Fernández, M., C. Overbeeke, M. Sabou, and E. Motta.2009. “What Makes a Good Ontology? A Case-Study inFine-Grained Knowledge Reuse.” In Proceedings of the 4thAsian Conference on The Semantic Web (ASWC), 61–75.Shanghai, China.

Gendarmi, D., and F. Lanubile. 2006. “Community-Driven Ontology Evolution Based on Folksonomies.” InProceedings of the Workshop On the Move to MeaningfulSystems (OTM), 181–188. Montpellier, France.

Golder, S., and B. Huberman. 2005. “The Structureof Collaborative Tagging Systems.” arxiv:cs/0508082v1[cs.DL].

Goodchild, M. 2007. “Towards a General Theory ofGeographic Representation in GIS.” International Journalof Geographical Information Science 21 (3): 239–260.doi:10.1080/13658810600965271.

ISO (International Organization for Standardization). 2004.ISO 8601:2004. Data Elements and Interchange Formats.Information Interchange. Representation of Dates andTimes.

Mooney, P., andP.Corcoran 2012a. “TheAnnotationProcessin OpenStreetMap.” Transactions in GIS 16 (4): 561-557.doi:10.1111/j.1467-9671.2012.01306.x.

Mooney, P., and P. Corcoran 2012b. “Who Are theContributors to OpenStreetMap and What Do They Do?”Proceedings of the 20thAnnual GIS ResearchUK (GISRUK),Lancaster, UK.

Neis, P., D. Zielstra, and A. Zipf. 2012. “The Street NetworkEvolution of Crowdsourced Maps: OpenStreetMapin Germany 2007–2011.” Future Internet 4: 1–21.doi:10.3390/fi4010001.

Roick, O., J. Hagenauer, andA. Zipf. 2011. “OSMatrix –Grid-based Analysis and Visualization of OpenStreetMap.” InProceedings of the 1st European State of theMap Conference(SOTM-EU), Vienna, Austria.

Roick, O., L. Loos, and A. Zipf. 2012. “A Technical Frame-work for Visualizing Spatio-Temporal Quality Metrics ofVolunteered Geographic Information.” In Proceedings ofthe Conference Geoinformatik, Braunschweig, Germany.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017

Page 13: The OpenStreetMap folksonomy and its evolution - Franz-Benjamin... · the OSM folksonomy relates to its actual use in the data, and we investigate the historical and future scope

230 F.-B. MOCNIK ET AL.

Shen, K., and L. Wu. 2005. “Folksonomy as a ComplexNetwork.” arxiv:cs/0509072v1 [cs.IR].

Taginfo. 2017. “Database Statistics.” Accessed May 23.https://taginfo.openstreetmap.org/reports/database_statistics

Trant, J. 2009. “Studying Social Tagging and Folksonomy: AReview and Framework.” Journal of Digital Information 10(1).

Zhao, P., T. Jia, K. Qin, J. Shan, and C. Jiao. 2015.“Statistical Analysis on the Evolution of OpenStreetMapRoad Networks in Beijing.” Physica A 420: 59–72.doi:10.1016/j.physa.2014.10.076.

Zielstra, D., H. Hochmair, and P. Neis 2013. “Assessingthe Effect of Data Imports on the Completeness ofOpenStreetMap. AUnited States Case Study.”Transactionsin GIS 17 (3): 315-334. doi:10.1111/tgis.12037.

Dow

nloa

ded

by [

Uni

vers

itaet

sbib

lioth

ek H

eide

lber

g] a

t 00:

49 1

8 Se

ptem

ber

2017