A full lifecycle for the semantic enrichment of folksonomies

57
Un cycle de vie complet pour mantique des folksonomies A full lifecycle for the semantic enrichment of folksonomies 1 EGC 2011 BREST 25-28 . 01 . 2011 Freddy Limpens, Fabien Gandon (Edelweiss, INRIA Sophia Antipolis) Michel Buffa (Kewi-I3S, UNSA-CNRS)

Transcript of A full lifecycle for the semantic enrichment of folksonomies

Page 1: A full lifecycle for the semantic enrichment of folksonomies

Un cycle de vie complet pour mantiquedes folksonomies

A full lifecycle for the semantic enrichment of folksonomies

1

E G C 2 0 1 1 – B R E S T

2 5 - 2 8 . 0 1 . 2 0 1 1

Freddy Limpens, Fabien Gandon

(Edelweiss, INRIA Sophia Antipolis)

Michel Buffa

(Kewi-I3S, UNSA-CNRS)

Page 2: A full lifecycle for the semantic enrichment of folksonomies

2

From social tagging to folksonomies

Tags freely associated to resources …

… collected and shared on the web

Page 3: A full lifecycle for the semantic enrichment of folksonomies

3

… resulting in

FOLKSONOMIES

A mass of users for a mass of resources

Page 4: A full lifecycle for the semantic enrichment of folksonomies

Limitations of folksonomies

4

Spelling variations of tags:

newyork= new_york = nyc

Page 5: A full lifecycle for the semantic enrichment of folksonomies

Limitations of folksonomies

5

Ambiguityof tags

… or in Texas, USA ?

… in France ?

paris

Page 6: A full lifecycle for the semantic enrichment of folksonomies

6

How to turn folksonomies ...

?

... intotopic structures (thesaurus) ?

pollution

Soil pollutions

has narrower

pollutant Energy

related related

Page 7: A full lifecycle for the semantic enrichment of folksonomies

7

… without overloading users

… and by collectingall user's expertiseinto the process

Page 8: A full lifecycle for the semantic enrichment of folksonomies

1. State of the art

8

Page 9: A full lifecycle for the semantic enrichment of folksonomies

9

State of the art

Involving users in tags structuring:

• Simple syntax to structure tags (Huyn-Kim

Bang et al. 2008)

• Crowdsourcing strategy to validate tag-concepts mapping (Lin et al. 2010)

pollution

Soil pollutions

has narrower

pollutant Energy

related related

RDF

? : Resource Description Framework☐ Rwanda Defense Force

Page 10: A full lifecycle for the semantic enrichment of folksonomies

10

State of the art

Automatic extraction of tag semantics:

• Similarity based on co-occurrence patterns (Specia& Motta 2007;

Catutto 2008)

• Association rule mining (Mika 2005; Hothoet al. 2006)

pollution

Soil pollutions

has narrower

pollutant Energy

related related

Page 11: A full lifecycle for the semantic enrichment of folksonomies

11

State of the art

Tags and Semantic Web models

TAGS + SCOT + SIOC + FOAF for tags and tagging :

tags:Tagging#11111

sioc:Itemhttp://www.windenergy.com

tags:taggedResource

scot:Tag#wind-energy

tags:associatedTag

foaf:Agent#freddy.limpens

tags:taggedBy

Page 12: A full lifecycle for the semantic enrichment of folksonomies

2. Tagging &folksonomyenrichment models

12

Page 13: A full lifecycle for the semantic enrichment of folksonomies

13

Tagging model

Tagging = linkingaresourcewith asign

What is a tagging ?

"nature"

picture shows "nature"

(1) (2) (3)

place located l:england

editing makes me : )

Page 14: A full lifecycle for the semantic enrichment of folksonomies

14

Tagging model

NiceTag(Monnin et al, 2010):

Tagging as named graphs*

*Carrolet al. (2005)

nt:TaggedResourcehttp://www.windenergy.com

nt:ManualTagAction (named graph)

nt:isAboutscot:Tag

#wind-energy

sioc:UserAccountfreddy

sioc:has_creator

sioc:Containerdelicious.com

sioc:has_container

Page 15: A full lifecycle for the semantic enrichment of folksonomies

15

Folksonomy enrichment

2 complementary semantic enrichment:

http://www.windenergy.com

nt:ManualTagAction

nt:isAbout wind-energy

renewable energy

windenergy

wind turbine

has broader

close match

has narrower

environment

related

Structuring tags as in a thesaurus (SKOS)

Page 16: A full lifecycle for the semantic enrichment of folksonomies

16

Tagging model

Supporting diverging points of view

car pollutionskos:related

john

agrees

paul

disagrees

Page 17: A full lifecycle for the semantic enrichment of folksonomies

Supporting diverging points of view

Reification of relations with named graphs

car pollutionskos:related

srtag:SingleUser"john"

srtag:hasApproved

srtag:SingleUser"paul"

srtag:hasRejected

srtag:TagSemanticStatement

srtag:TagStructureComputer"r2d2"

srtag:hasProposed

17

Page 18: A full lifecycle for the semantic enrichment of folksonomies

18

Ademe scenario

Expertsproduce docs

+ tagArchivists

centralize + tag

Public audienceread + tag

Life-cycle grounded on usage analysis

Page 19: A full lifecycle for the semantic enrichment of folksonomies

19

Ademe’s dataset

Delicious TheseNet Cadic

WhatBookmarks of usersof tag "ademe"

Keywords for Ademe'sPhDproje

cts

Archivistsindexinglexicon

# tags 1015 6583 1439

# resources 196 1425 4675

# tagging

(1R - 1T - 1U)3015 10160 25515

# users 812 1425 1

Page 20: A full lifecycle for the semantic enrichment of folksonomies

3. Going through the folksonomy enrichment life-cycle

20

Page 21: A full lifecycle for the semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centricstructuring

Detectconflicts

Globalstructuring

Flatfolksonomy

Structuredfolksonomy

Folksonomy enrichment life-cycle

21

Page 22: A full lifecycle for the semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centricstructuring

Detectconflicts

Globalstructuring

Flatfolksonomy

Structuredfolksonomy

Folksonomy enrichment life-cycle

22

Page 23: A full lifecycle for the semantic enrichment of folksonomies

Automaticprocessing

1. String-based

2. Co-occurrence patterns

3. User-based associations

Flatfolksonomy

23

3 methodsto automaticallyextract tags semantics

Page 24: A full lifecycle for the semantic enrichment of folksonomies

24

1. String-based metrics

pollution Soil pollutions

pollutantpollution

=> « pollution » related to « pollutant »

=> « pollution » broaderthan« soil pollutions »

Page 25: A full lifecycle for the semantic enrichment of folksonomies

1. String-based

metrics results1. String-based metrics

25results on full dataset

tags from experts

tags fromarchivstsclose matchrelated

broader

Page 26: A full lifecycle for the semantic enrichment of folksonomies

26

2. Co-occurrence patterns

Example of folksonomy

cc

Page 27: A full lifecycle for the semantic enrichment of folksonomies

ecology energy wind turbine sustainability housing

ecology 0 1 1 3 1

energy 1 0 2 4 3

wind turbine 1 2 0 1 1

sustainability 3 4 1 0 4

housing 1 3 1 4 0

IFσ> 0.85 => "energy" related "sustainability"

2. Co-occurrence patterns

27

Page 28: A full lifecycle for the semantic enrichment of folksonomies

28

2. Co-occurrence patterns

Cadicdataset

Page 29: A full lifecycle for the semantic enrichment of folksonomies

renewableenergywind-energy

Alex

Delphine

Claire

Monique

Anne

Hyponym relations (broader/narrower):

« renewableenergy »broaderthan« wind-energy »

3. User-based association

29

Page 30: A full lifecycle for the semantic enrichment of folksonomies

3. User-based association

THESENET dataset

30

Page 31: A full lifecycle for the semantic enrichment of folksonomies

Global results of automatic processings

Total with 3 automatic methods: 83027 relations for 9037 tags

– 68633 related

– 11254hyponym

– 3193 spelling variants

31

Page 32: A full lifecycle for the semantic enrichment of folksonomies

32

?

Computed relations are not always accurate

Page 33: A full lifecycle for the semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centricstructuring

Detectconflicts

Globalstructuring

Flatfolksonomy

Structuredfolksonomy

Folksonomy enrichment life-cycle

33

Firefox extension SRTAgEditor

Page 34: A full lifecycle for the semantic enrichment of folksonomies

34

Capturing users's contributions

Embedding structuring tasks within everyday activity (searching e.g)

Page 35: A full lifecycle for the semantic enrichment of folksonomies

35

Capturing users's contributions

Page 36: A full lifecycle for the semantic enrichment of folksonomies

36

Capturing user's point of view

John

srtag:hasRejectedenergie

france

skos:broader

srtag:TagSemanticStatement

Exemple:Rejecting a relation

Page 37: A full lifecycle for the semantic enrichment of folksonomies

37

Capturing user's point of view

John

srtag:hasRejectedenergie

energy

skos:related

srtag:TagSemanticStatement

Exemple:Proposing another

relation

energie

energy

skos:closeMatch

srtag:TagSemanticStatement

srtag:hasProposed

Page 38: A full lifecycle for the semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centricstructuring

Detectconflicts

Globalstructuring

Flatfolksonomy

Structuredfolksonomy

Folksonomy enrichment life-cycle

38

Page 39: A full lifecycle for the semantic enrichment of folksonomies

39

Conflict detection

environment pollution

Using rules:

IFnum(narrower)/num(broader) ≥ cTHENnarrowerwinsELSErelatedwins

narrower

John

srtag:hasApproved

Anne

srtag:hasApproved

broader

Monique

srtag:hasApproved

Delphine

srtag:hasApproved

Page 40: A full lifecycle for the semantic enrichment of folksonomies

40

Conflict detection

related

broader narrower

less constrained less constrained less constrained

close match

relatedenvironment pollutionnarrower

broader

Page 41: A full lifecycle for the semantic enrichment of folksonomies

ConflictingConflictSolverchoicedebatablerejected

41

Conflict detection

Page 42: A full lifecycle for the semantic enrichment of folksonomies

Experimentationat ADEME

42

Page 43: A full lifecycle for the semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centricstructuring

Detectconflicts

Globalstructuring

Flatfolksonomy

Structuredfolksonomy

Folksonomy enrichment life-cycle

43

Page 44: A full lifecycle for the semantic enrichment of folksonomies

44

Global map

Includes all points of view, highlightsconflicts + consensuses

Page 45: A full lifecycle for the semantic enrichment of folksonomies

Referent choices

45

Choices of the referent user (archivistsatAdemee.g.)

Page 46: A full lifecycle for the semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centricstructuring

Detectconflicts

Globalstructuring

Flatfolksonomy

Structuredfolksonomy

Folksonomy enrichment life-cycle

46

Page 47: A full lifecycle for the semantic enrichment of folksonomies

Each point of viewcorresponds to a layer

47

Page 48: A full lifecycle for the semantic enrichment of folksonomies

Enriching individual points of view

Integratingothers' contributions:1. Current user -> "Anne"2. ReferentUser (e.g. archivists)3. ConflictSolver (software agent)4. Otherindividualusers5. Automatons (metrics)

BROADER

NARROWER

RELATED

CLOSE MATCH

environnementSearch:

preoccupation environnementales

grenelle de lenvironnement

competences environnementales

environment

environmental

domainesenvironnementaux

Anne is looking for resources tagged "environnement"

48

Page 49: A full lifecycle for the semantic enrichment of folksonomies

Algorithmbased on random labelspropagation

(Raghavan et al., 2007):

Why not using tags instead and theirsemantic relations ?

Application to communitydetection (Érétéo, 2011)

49

Page 50: A full lifecycle for the semantic enrichment of folksonomies

Application to communitydetection (Érétéo, 2011)

50

Application to Ademe'ssocial network :

•linking 1 tag to each user (the mostoftenused one)

•whentwousers are linkedAND their tags are related=>mergethem

•with 9200 tags => group users in 30 communities

Page 51: A full lifecycle for the semantic enrichment of folksonomies

Application to communitydetection (Érétéo, 2011)

51

Result for "biggest" tags :

1. pollution

2. développement durable

3. énergie

4. chimie

5. pollution de l'air

6. métaux

7. biomasse

8. déchets

Page 52: A full lifecycle for the semantic enrichment of folksonomies

5. Conclusion

52

Page 53: A full lifecycle for the semantic enrichment of folksonomies

53

What we do :

Help online communities

structure their tagswind-energy

renewable energy

sustainability

wind turbine

has broader

related

has narrower

environment

related

Page 54: A full lifecycle for the semantic enrichment of folksonomies

An approach to bridge tagging with Semantic Web:

Automatic processing of tags:

User interface to capture tag structuring embedded in every-day tasks

Implementation within ISICIL solution (tagging server)

54

Our contributions:

Page 55: A full lifecycle for the semantic enrichment of folksonomies

• More user interfaces :

• ISICIL :test with final users Ademe and Orange labs

• Testing on other types of communities

• Temporal dimension

• Multilinguism

55

Future work

Page 56: A full lifecycle for the semantic enrichment of folksonomies

56

Thank you for your attention !

me : [email protected]://www-sop.inria.fr/members/Freddy.Limpens

myadvisors : Fabien Gandon : [email protected] Buffa : [email protected]

"communitydetectionguy":Guillaume Erétéo : [email protected]

ISICIL team :http://isicil.inria.fr

Page 57: A full lifecycle for the semantic enrichment of folksonomies

ANGELETOU S., SABOU M. & MOTTA E. (2008). Semantically Enriching Folksonomies with FLOR. In CISWeb Workshop at European Semantic Web Conference ESWC.

BRAUN S., SCHMIDT A., WALTER A., NAGYPÁL G. & ZACHARIAS V. (2007). Ontology maturing: a collaborative web 2.0 approach to ontology engineering. In CKC, volume 273 of CEUR Workshop Proceedings: CEURWS.org.

CATTUTO C., BENZ D., HOTHO A. & STUMME G. (2008). Semantic grounding of tag relatedness in social bookmarking systems. In Proceedings of the 7th International Conference on The Semantic Web, Berlin, Heidelberg: Springer-Verlag.

GANDONF.,BOTTOLIERV.,CORBYO.&DURVILLEP. (2007).Rdf/xml source declaration, w3c member submission. http://www.w3.org/Submission/rdfsource/.

HALPIN H. & PRESUTTI V. (2009). An ontology of resources: Solving the identity crisis in ESWC, volume 5554 of Lecture Notes in Computer Science, p. 521–534: Springer.

HOTHO A., JÄSCHKE R., SCHMITZ C. & STUMME G. (2006). Information retrieval in folksonomies: Search and ranking. In The Semantic Web: Research and Applica- tions, LNCS(4011) , Heidelberg: Springer.

HUYNH-KIM BANG B., DANÉ E. & GRANDBASTIEN M. (2008). Merging semantic and participative approaches for organizing teachers’ documents. In Proceedings of World Conference on Educational Multimedia, Hypermedia & Telecommunications, p. x4959–4966, Vienna France.

KIM H.-L., YANG S.-K., SONG S.-J., BRESLIN J. G. & KIM H.-G. (2007). Tag Mediated Society with SCOT Ontology. In Semantic Web Challenge, ISWC.

LIN H. & DAVIS J. (2010). Computational and crowdsourcing methods for extracting ontological structure from folksonomy. In ESWC (2), volume 6089 of Lecture Notes in Computer Science, p. 472–477: Springer.

MIKA P. (2005). Ontologies are Us: a Unified Model of Social Networks and Semantics. In ISWC, volume 3729 of LNCS, p. 522–536: Springer.

MONNIN A., LIMPENS F., GANDON F. & LANIADO D. (2010). Speech acts meet tagging: Nicetag ontology. In I-SEMANTICS ’10: Proceedings of the 6th International Conference on Semantic Systems, p. 1–10, New York, NY, USA: ACM.

PASSANT A. & LAUBLET P. (2008). Meaning of a tag: A collaborative approach to bridge the gap between tagging and linked data. In Proceedings of the WWW 2008 Workshop Linked Data on the Web (LDOW2008), Beijing, China.

SPECIA L. & MOTTA E. (2007). Integrating folksonomies with the semantic web. In Proc. of the European Semantic Web Conference (ESWC2007), volume 4519 of LNCS, p. 624–639, Berlin Heidelberg, Germany: Springer-Verlag. 57

References