PhD defense : Multi-points of view semantic enrichment of folksonomies

82
Multi-points of view semantic enrichment of folksonomies 1 Ph.D Thesis defense – October 25th 2010 Freddy Limpens Edelweiss, INRIA Sophia Antipolis Edelweiss Picasso 129ieth birthday Supervisors Fabien Gandon, Edelweiss, INRIA Sophia Antipolis Michel Buffa, Kewi/I3S, UNSA/CNRS

description

This thesis, set at the crossroads of Social Web and Semantic Web, is an attempt to bridge Social tagging-based systems with structured representations such as thesauri or ontologies (in the informatics sense). Folksonomies resulting from the use of social tagging systems suffer from a lack of precision that hinders their potentials to retrieve or exchange information. This thesis proposes supporting the use of folksonomies with formal languages and ontologies from the Semantic Web. Automatic processing of tags allows bootstraping the process by using a combination of a custom method analyzing tags' labels and adapted methods analyzing the structure of folksonomies. The contributions of users are described thanks to our model SRTag, which allows supporting diverging points of view, and captured thanks to our user friendly interface allowing the users to structure tags while searching the folksonomy. Conflicts between individual points of view are detected, solved, and then exploited to help a referent user maintain a global and coherent structuring of the folksonomy, which is in return used to garanty the coherence while enriching individual contributions with the others' contributions. The result of our method allows enhancing the navigation within tag-based knowledge systems, but can also serve as a basis for building thesauri fed by a truly bottom up process.

Transcript of PhD defense : Multi-points of view semantic enrichment of folksonomies

Page 1: PhD defense : Multi-points of view semantic enrichment of folksonomies

Multi-points of view semantic enrichment of folksonomies"

1 P h . D T h e s i s d e f e n s e – O c t o b e r 2 5 t h 2 0 1 0

Freddy Limpens Edelweiss, INRIA Sophia Antipolis

Edelweiss  

Picasso  129ieth  birthday  

Supervisors Fabien Gandon, Edelweiss, INRIA Sophia Antipolis

Michel Buffa, Kewi/I3S, UNSA/CNRS

Page 2: PhD defense : Multi-points of view semantic enrichment of folksonomies

1.   Context  and  mo-va-ons  

2

Page 3: PhD defense : Multi-points of view semantic enrichment of folksonomies

•  Online  communi7es  of  interest  

•  "Enterprise  2.0"  &  organiza7ons  

⇒ Cross-­‐fer7lizing  Web  2.0  and  Seman7c  Web  

Context  of  the  thesis  

3

Page 4: PhD defense : Multi-points of view semantic enrichment of folksonomies

•  Tools  for  techno/science  monitoring  

•  Experts  seeking  

•  Industrial  partners:  

•  Academic  partners:    

Context  of  the  thesis  

4

Page 5: PhD defense : Multi-points of view semantic enrichment of folksonomies

5

From  social  tagging  to  folksonomies  

Tags  freely  associated  to  resources  …    

…  collected  and  shared  on  the  web  

Page 6: PhD defense : Multi-points of view semantic enrichment of folksonomies

6

…  resul7ng  in  

FOLKSONOMIES  

A  mass  of  users  for  a  mass  of  resources  

Page 7: PhD defense : Multi-points of view semantic enrichment of folksonomies

Limita-ons  of  folksonomies  

7

Spelling  varia-ons  of  tags:  

newyork  =  new_york    =  nyc    

Page 8: PhD defense : Multi-points of view semantic enrichment of folksonomies

Limita-ons  of  folksonomies  

8

Ambiguity  of  tags  

…  or  in    Texas,  USA  ?  

…  in  France  ?  

paris  

Page 9: PhD defense : Multi-points of view semantic enrichment of folksonomies

Lack  of  seman-c  links  between    tags  

Limita-ons  of  folksonomies  

9

Page 10: PhD defense : Multi-points of view semantic enrichment of folksonomies

10

How  to  turn    folksonomies  ...  

?...  into  

 topic  structures  (thesaurus)  ?  

pollution

Soil pollutions

has narrower

pollutant Energy

related related

Page 11: PhD defense : Multi-points of view semantic enrichment of folksonomies

11

…  without  overloading  users  

… and by collecting all user's expertise into the process

Page 12: PhD defense : Multi-points of view semantic enrichment of folksonomies

Outline  of  the  presenta-on  

12

1. Context  and  mo7va7ons  

2. State  of  the  art  and  posi7oning  3. Tagging  &  folksonomy  enrichment  models  

4. Folksonomy  enrichment  life-­‐cycle  

Page 13: PhD defense : Multi-points of view semantic enrichment of folksonomies

2.    State  of  the  art  and  posi-oning  

13

Page 14: PhD defense : Multi-points of view semantic enrichment of folksonomies

14

State  of  the  art  

Automa-c  extrac-on  of  tag  seman-cs:  

•  Similarity  based  on  co-­‐occurrence  paZerns  (Specia  &  MoZa  2007;  CatuZo  2008)  

•  Associa7on  rule  mining  (Mika  2005;  Hotho  et  al.  2006)    

pollution

Soil pollutions

has narrower

pollutant Energy

related related

Page 15: PhD defense : Multi-points of view semantic enrichment of folksonomies

15

State  of  the  art  

Involving  users  in  tags  structuring:  

•  Simple  syntax  to  structure  tags  (Huyn-­‐Kim  Bang  et  al.  2008)  

•  Crowdsourcing  strategy  to  validate  tag-­‐concepts  mapping  (Lin  et  al.  2010)  

•  Integrate  ontology  maturing  into  Social  Bookmarking  tool  (Braun  et  al.  2007)  

pollution

Soil pollutions

has narrower

pollutant Energy

related related

4.2 Ontology Development Methodology Based

on Imagenotions

Based on the ontology maturing process, we can developa solution for these problems. In terms of the previously in-troduced ontology maturing process we concentrate on thefirst three steps from the emergence of ideas up to formaliza-tion. What distinguishes our methodology from the usualontology development methodologies is the strong emphasison collaborative ontology development in the consolidation-in-communities phase. This is motivated by the success ofcollaborative tagging in Web 2.0 projects.

The basis of our ontology formalism is a concept we callimagenotions. An imagenotion graphically represents a se-mantic notion through an image, or a set of images. Inaddition, similarly to many existing ontology formalism, itis also possible to associate tags with an imagenotion indifferent languages (such as English or German). For eachlanguage, one of these tags is selected as the main label ofthe imagenotion. The other tags are termed synonyms.

Instead of tags, images are annotated with imagenotions.It is easy to see the advantage: all the shortcomings of tag-ging approaches are solved using imagenotions—it is easyto find images using search terms in any language. In otherwords, we provide semantic search instead of full-text search.

In the terminology of classical ontologies, imagenotionsare usually instances, but they may also correspond to con-cepts or relations. There are two major advantages of usingimagenotions over the classical ontology constructs:

1. The distinctions between concepts, instances and re-lations are hard to understand for most users. In ourmind, notions play the role of an instance, a concept ora relation, depending on the actual context. This factis acknowledged by many ontology formalisms that al-low metamodeling. Using imagenotions, users do notneed to understand this somewhat artificial separationof notions.

2. Because imagenotions are associated with images, theyare meaningful internationally as an image has thesame meaning in different languages.

The goal of our methodology is to guide the process ofcreating an ontology of imagenotions. The main steps ofthis methodology is based on the ontology maturing processmodel:

1. Emergence of Ideas. In this step, new imagenotions arecreated. Already this step can become collaborative,as users can jointly collect the tags describing imageno-tions, and select the most representative images for animagenotion. Collaborative editing is especially use-ful in a multi-lingual environment where it cannot beexpected that any individual user speaks all requiredlanguages.

2. Consolidation in Communities. Because it is so easy tocreate new imagenotions, it cannot be avoided that forthe same semantic notion initially many imagenotionsare created (synonyms, also in different languages) orthat an imagenotion represents more than one seman-tic notion (homonyms). In this step, these problemsshould be solved by merging synonymous imageno-tions, and by splitting imagenotions representing morethan one notion.

3. Formalization. In this step, taxonomical (“is-a”) andad-hoc relations are specified among imagenotions.

After step 2, the quality of image search already increasessignificantly, as the problems with synonyms and homonymsdo not appear anymore. Moreover, it is easy to see that allannotated images automatically benefit from the maturingimagenotions. E.g. adding a new tag to an imagenotionsautomatically allows users to find all images annotated withthat imagenotion using the new tag. In addition, the out-come of step 3 also allows requests for related images basedon the current context.

Imagenotions are useful to collaboratively build an on-tology supporting manual annotation and semantic search.However, to fulfill the requirement of IMAGINATION forautomatic annotation, a classical formal ontology is neededthat can be exploited by text-mining and object identifica-tion algorithms. This last axiomatization step is not yet di-rectly supported by our methodology, it is subject of futureresearch. Nevertheless, it is easy to see that a conversion ofan imagenotion ontology to a standard ontology formalism(e.g. OWL) is possible. The only missing information iswhether an imagenotion should be modelled as a concept,instance or a relation in the target ontology formalism.

4.3 Tool Support

Currently we implement a web-based tool that allows thecreation of new imagenotions and the editing of available im-agenotions. This tool supports all three steps of our method-ology. It can be easily invoked during semantic search orwhen uploading new images into an image repository: e.g.it is fully integrated into the user’s workflow.

We now demonstrate some functionality of the tool interms of the steps of our development methodology.

4.3.1 Step 1: Emergence of IdeasFigure 2 shows an example for the emergence of ideas.

Let us assume that a content owner has new images aboutelephants. The imagenotion “elephant” was so far not avail-able. Therefore, she creates a new imagenotion, adds animage or part of an image that shows elephants and startsdescribing the new imagenotion with more details. She usesEnglish as spoken language. As synonyms, she enters “ele-phantidae” and “tusker”. Instead of tagging the new imagesthat show elephants with these words, she can use the newimagenotion—she just pulls this imagenotion over the newimages via drag and drop.

Figure 2: Editing an imagenotion with the No-tionEditor tool

Page 16: PhD defense : Multi-points of view semantic enrichment of folksonomies

16

State  of  the  art  

Tags  and  Seman-c  Web  models  

•  SCOT  for  tags  and  tagging  (Kim  et  al.  2007):  

Page 17: PhD defense : Multi-points of view semantic enrichment of folksonomies

17

State  of  the  art  

Tags  and  Seman-c  Web  models  

•  SCOT  for  tags  and  tagging  (Kim  et  al.  2007):  •  MOAT  (Passant  &  Laublet,  2008)  :  Raising  ambiguity  by  linking  tags  to  concepts  from  Linked  Data  

Page 18: PhD defense : Multi-points of view semantic enrichment of folksonomies

18

Posi-oning  

Computed  Tag  similarity  

Tag-­‐Concept  mapping  

Users'  contrib.  

Sem-­‐Web  formalism  

Mul7-­‐points  of  view  

Angeletou  et  al.  (2008)  

✓   ✓   ✓  

Huynh-­‐Kim  Bang  et  al.  (2008)  

✓   ✓  

Passant  &  Laublet(2008)  

✓   ✓   ✓  

Lin  &  Davis  (2010)  

✓   ✓   ✓   ✓  

Braun  et  al.  (2007)  

✓   ✓  

Our  approach   ✓   ✓   ✓   ✓  

Page 19: PhD defense : Multi-points of view semantic enrichment of folksonomies

3.   Tagging  &  folksonomy  enrichment  models  

19

Page 20: PhD defense : Multi-points of view semantic enrichment of folksonomies

20

Tagging  model  

Tagging  =  linking  a  resource  with  a  sign  

What  is  a  tagging  ?  

"nature"!

picture   shows   "nature"  (1)   (2)   (3)  

place   located   l:england  

edi7ng   makes  me   :  )  

Page 21: PhD defense : Multi-points of view semantic enrichment of folksonomies

21

Tagging  model  

NiceTag  (Monnin  et  al,  2010):    

     Tagging  as  named  graphs*  

nt:TaggedResource   rdfs:Resource  nt:isRelatedTo  

nt:TagAc7on(named  graph)  

sioc:UserAccount  

sioc:has_creator  

sioc:Container  

sioc:has_container  

xsd:Date  

dc:date  

*Carrol  et  al.  (2005)

Page 22: PhD defense : Multi-points of view semantic enrichment of folksonomies

22

Tagging  model  

No  constraints  on  the  model  of  the  sign  used  to  tag   nt:TaggedResource   rdfs:Resource  nt:isRelatedTo  

nt:TagAc7on(named  graph)  

nt:TaggedResource  

hZp:geonames.org/2990440  nt:isRelatedTo  

scot:Tag  

:)  

skos:Concept  

nt:isRelatedTo  

nt:isRelatedTo  

nt:isRelatedTo  

nt:isRelatedTo  

moat:Tag   moat:hasMeaning  

Page 23: PhD defense : Multi-points of view semantic enrichment of folksonomies

23

Tagging  model  

Typing  the  rela,on  to  reflect  on  pragma-cs  of  use  of  tags   nt:TaggedResource   rdfs:Resource  nt:isRelatedTo  

nt:TagAc7on(named  graph)  

Page 24: PhD defense : Multi-points of view semantic enrichment of folksonomies

24

Tagging  model  

Typing  the  named  graphs  for  addi-onal  dimensions  of  tagging  

nt:TaggedResource   rdfs:Resource  nt:isRelatedTo  

nt:TagAc7on(named  graph)  

Page 25: PhD defense : Multi-points of view semantic enrichment of folksonomies

25

Tagging  model  

Example  of  a  tagging  in  delicious  

hZp://www.windenergy.com  

nt:ManualTagAc7on  

nt:isAbout  scot:Tag  

#wind-­‐energy  

<nt:TaggedResource  rdf:about="http://www.windenergy.com"        cos:graph="http://mysocialsi.te/tagaction#7182904">        

 <nt:isAbout  rdf:resource="http://mysocialsi.te/tag#wind-­‐energy"  />  </nt:TaggedResource>  

freddy  

sioc:has_creator  

using  RDF  source  declara-on  

delicious.com  

sioc:has_container  

<nt:ManualTagAction  rdf:about="http://mysocialsi.te/tagaction#7182904">    <sioc:has_creator  rdf:resource="http://mysocialsi.te/user#freddy"    

</nt:ManualTagAction>  

Page 26: PhD defense : Multi-points of view semantic enrichment of folksonomies

26

Folksonomy  enrichment  

2  complementary  seman7c  enrichment:  

hZp://www.windenergy.com  

nt:ManualTagAc7on  

nt:isAbout   wind-­‐energy  

renewable    energy  

windenergy  

wind  turbine  

has  broader  

close  match  

has  narrower  

environment  

related  

Structuring tags as in a thesaurus (SKOS)

Page 27: PhD defense : Multi-points of view semantic enrichment of folksonomies

27

Folksonomy  enrichment  

2  complementary  seman7c  enrichment:  

wind-­‐energy  

renewable    energy  

windenergy  

wind  turbine  

has  broader  

close  match  

has  narrower  

environment  

related  

Structuring tags as in a thesaurus (SKOS)

Page 28: PhD defense : Multi-points of view semantic enrichment of folksonomies

28

Folksonomy  enrichment  

2  complementary  seman7c  enrichment:  

wind-­‐energy  

renewable    energy  

windenergy  

wind  turbine  

has  broader  

close  match  

has  narrower  

environment  

related  

Structuring tags as in a thesaurus (SKOS)

Page 29: PhD defense : Multi-points of view semantic enrichment of folksonomies

29

Tagging  model  

Suppor,ng  diverging  points  of  view  

car   pollu7on  skos:related  

john  

agrees  

paul  

disagrees  

Page 30: PhD defense : Multi-points of view semantic enrichment of folksonomies

Suppor-ng  diverging  points  of  view  

Reifica-on  of  rela7ons  with  named  graphs  

30

Page 31: PhD defense : Multi-points of view semantic enrichment of folksonomies

Suppor-ng  diverging  points  of  view  

Extending  SIOC  to  model  different  types  of  agents  

31

Page 32: PhD defense : Multi-points of view semantic enrichment of folksonomies

Suppor-ng  diverging  points  of  view  

Reifica-on  of  rela7ons  with  named  graphs  

car   pollu7on  skos:related  

srtag:SingleUser  "john"  

srtag:hasApproved  

srtag:SingleUser  "paul"  

srtag:hasRejected  

srtag:TagSeman7cStatement  

srtag:TagStructureComputer  "r2d2"  

srtag:hasProposed  

32

Page 33: PhD defense : Multi-points of view semantic enrichment of folksonomies

33

Ademe  scenario    

Experts  produce  docs    

+  tag  Archivists  

centralize  +  tag  

Public  audience  read  +  tag  

Life-­‐cycle  grounded  on  usage  analysis  

Page 34: PhD defense : Multi-points of view semantic enrichment of folksonomies

34

Ademe’s  dataset  

Delicious TheseNet Cadic

What Bookmarks of users of tag

"ademe"

Keywords for Ademe's PhD

projects

Archivists indexing lexicon

# tags 1015 6583 1439

# resources 196 1425 4675

# tagging (1R - 1T - 1U)

3015 10160 25515

# users 812 1425 1

Page 35: PhD defense : Multi-points of view semantic enrichment of folksonomies

4.   Going  through  the  folksonomy  enrichment  life-­‐cycle  

35

Page 36: PhD defense : Multi-points of view semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centric structuring

Detect conflicts

Global structuring

Flat folksonomy

Structured folksonomy

Folksonomy  enrichment  life-­‐cycle  

36

Page 37: PhD defense : Multi-points of view semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centric structuring

Detect conflicts

Global structuring

Flat folksonomy

Structured folksonomy

Folksonomy  enrichment  life-­‐cycle  

37

Page 38: PhD defense : Multi-points of view semantic enrichment of folksonomies

Automatic processing

1.  String-based

2.  Co-occurrence patterns

3.  User-based associations

Flat folksonomy

38

3 methods to automatically extract tags semantics

Page 39: PhD defense : Multi-points of view semantic enrichment of folksonomies

39

1.  String-­‐based  metrics  

pollution Soil pollutions

pollutant pollution

=> « pollution » related to « pollutant »

=> « pollution » broader than « soil pollutions »

Page 40: PhD defense : Multi-points of view semantic enrichment of folksonomies

•  Benchmark  of  30  different  string-­‐based  similarity  from    SimMetrics*  :  σ  (t1,t2)  ∊  [0,  1]  

•  Reference  data  set  built  with  Ademe  experts  

•  Which  metric  is  best  for  which  rela0on  at  what  threshold  ?  

•  Informa7on-­‐retrieval  metrics        precision,  recall,  and  F1-­‐measure  

40

1.  String-­‐based  metrics  

* http://staffwww.dcs.shef.ac.uk/people/S.Chapman/simmetrics.html

Page 41: PhD defense : Multi-points of view semantic enrichment of folksonomies

1.  String-­‐based  metrics  

41

•  MongeElkan_Soundex  to  detect  seman,cally  linked  tags    (close  match  +  hyponym  +  related)    

 threshold  =  0.8  

•  JaroWinkler  to  dis7nguish  closeMatch    threshold  =  0.9  

•  asymmetry  of  MongeElkan_QGram  to  dis7nguish  hyponyms  •  σ  (t1,t2)  ≠  σ  (t2,t1)  •  δ  =  σ  (t1,t2)  -­‐  σ  (t2,t1)  >  0.4  

!"#!$

!"%!$

!"&!$

!"'!$

!"(!$

!")!$

!"*!$

!"+!$

,"!!$

$-./01

23456/$$

$70386954.3:;<2=>1

.=6/<.3$$

$;<2=>1.=6/<.3$$

$70386954.3:;<2=>1

.=6/<.3?0=0>

$$

$70386954.3:-./0$$

$;<2=>1.=6/<.3?0=0

>$$

$70386954.3:@66A56<.31

B3C>$$

$70386954.3:-./01

23456/$$

$;0B3A6D$$

$70386954.3:;0B3A6D$$

!"#$%

&'(#)#*&

+,'-

.(/#

0&%'(-1'223#2,(4&*#$5(*#6&(17%'84/#

;6/26E&$ ;6/26E,$

;6/26E#$ ;6/26E%$

!"#!$

!"%!$

!"&!$

!"'!$

!"(!$

!")!$

!"*!$

!"+!$

,"!!$

$-./012345/6789:;<5:1=85/$$

$789:;<5:1=85/$$

$-./012345/6>5=.$$

$?11@3185/<A/B;$$

$-./012345/6>5=.<9/431=$$

$C1D1/E;:19/$$

$-./012345/67.A/@1F$$

$7.A/@1F$$

$>5=.$$

$>5=.<9/431=$$

!"#$%&'(#)#*&+,'-.(/#

01&22,(3#+'4,'(56#$7(*#8&(9::%'4;/#

7G=91E&$ 7G=91E,$

7G=91E#$ 7G=91E%$

!"#"$%

"#""%

"#"$%

"#&"%

"#&$%

"#'"%

"#'$%

"#("%

"#($%

"#)"%

%*+,-./012,34..50.62,78,9:35;<%%

%*+,-./012,3=+8,5.>35;<%%

%*+,-./012,3=6;?:72?.@6

2,A+?+:35;<%%

%*+,-./012,3B0+91C;D?2,9.35;<%%

%*+,-./012,3EF.@02GH+.I9;.,?35;<%%

%*+,-./012,3H+D;,.=;6

;02@;?J35;<%%

%*+,-./012,3C;9.=;6

;02@;?J35;<%%

%*+,-./012,3/890;5.2,C;D?2,9.35;<%%

%*+,-./012,3K2992@5=;6

;02@;?J35;<%%

%*+,-./012,3*2?9:;,-H+.I9;.,?35;<%%

%*+,-./012,3K2@+35;<%%

%*+,-./012,3K2@+7;,10.@35;<%

%*+,-./012,3L.F.,D:?.;,35;<%%

%*+,-./012,3=6;?:72?.@6

2,35;<%%

%*+,-./012,3MA@26DC;D?2,9.35;<%%

!"#$%&%'"()#*+$%

,)-"."$/"%0"12""$%'"31#4567+$689%:%'"31#4;+$:<67+$689=%'"31#>?41@A1B9:?41BA1@9%

=N@;.D&%

=N@;.D'%

=N@;.D(%

=N@;.D)%

Page 42: PhD defense : Multi-points of view semantic enrichment of folksonomies

Cas

1.  String-­‐based  metrics  1.  String-­‐based  metrics  

Heuris-c  in  3  steps  

seman-cally  linked  :  MongeElkan-­‐Soundex  σ1  IF  σ1(t1,t2)  >  0.8    

 closeMatch  :  JaroWinkler  σ2      IF    σ2  (t1,t2)  >  0.9          =>  t1  closeMatch  t2      hyponym  :    MongeElkan-­‐QGram  σ3      ELSE  IF    σ3  (t1,t2)  -­‐  σ3  (t2,t1)    >  0.4    =>  t1  has  narrower  t2      related  otherwise      ELSE              =>  t1  related  t2    

42

Page 43: PhD defense : Multi-points of view semantic enrichment of folksonomies

Cas

1.  String-­‐based  metrics  1.  String-­‐based  metrics  

Performances  

!"

!#$"

!#%"

!#&"

!#'"

!#("

!#)"

!#*"

+,-../01"234/305" 67,8079" 4-.35-:"

!"#$%&%'()*)"#$+,,)

;4-</+/80"6-=4/+><" ?-<3.."6-=4/+><"

43

Page 44: PhD defense : Multi-points of view semantic enrichment of folksonomies

1.   String-based metrics results

!"#$%&'"()&$!"#$*"&&'+)&$!"#$#,)--.*/$0"&."*1$!"#$&)-"1)($

,'+)&)($%2$/),!.3'&/$

1.  String-­‐based  metrics  

44 results on full dataset

             tags  from  experts  

             tags  from  archivsts  

close  match  related  

broader  

Page 45: PhD defense : Multi-points of view semantic enrichment of folksonomies

45

2.  Co-­‐occurrence  pacerns  

Example  of  folksonomy  

c c

Page 46: PhD defense : Multi-points of view semantic enrichment of folksonomies

ecology energy wind turbine sustainability housing

ecology 0 1 1 3 1

energy 1 0 2 4 3

wind turbine 1 2 0 1 1

sustainability 3 4 1 0 4

housing 1 3 1 4 0

IF σ > 0.85 => "energy" related "sustainability"

v ecology

v energy

v wind turbine

v sustainability

v housing

2.  Co-­‐occurrence  pacerns  

46

σ(energy,sustainability) = cos( v energy, v sustainability )

Page 47: PhD defense : Multi-points of view semantic enrichment of folksonomies

47

2.  Co-­‐occurrence  pacerns  

Cadic dataset

Page 48: PhD defense : Multi-points of view semantic enrichment of folksonomies

renewable  energy  wind-­‐energy  

   Alex  

   Delphine  

   Claire  

   Monique  

   Anne  

⇒   Hyponym  rela7ons  (broader/narrower):    

 «  renewable  energy  »  broader  than  «  wind-­‐energy  »  

3.  User-­‐based  associa-on  

48

Page 49: PhD defense : Multi-points of view semantic enrichment of folksonomies

3.  User-­‐based    associa-on  

THESENET dataset

49

Page 50: PhD defense : Multi-points of view semantic enrichment of folksonomies

Global  results  of  automa-c  processings  

Total  with  3  automa7c  methods:  83027  rela-ons  for  9037  tags  

–  68633  related  

–  11254  hyponym  

–  3193  spelling  variants  

50

Page 51: PhD defense : Multi-points of view semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centric structuring

Detect conflicts

Global structuring

Flat folksonomy

Structured folksonomy

Folksonomy  enrichment  life-­‐cycle  

51

compu7ng  server

Page 52: PhD defense : Multi-points of view semantic enrichment of folksonomies

!"#$%&'"()&$!"#$*"&&'+)&$!"#$#,)--.*/$0"&."*1$!"#$&)-"1)($

,'+)&)($%2$/),!.3'&/$

52

?  

Computed  rela0ons  are  not  always  accurate    

Page 53: PhD defense : Multi-points of view semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centric structuring

Detect conflicts

Global structuring

Flat folksonomy

Structured folksonomy

Folksonomy  enrichment  life-­‐cycle  

53

Firefox  extension  SRTAgEditor

Page 54: PhD defense : Multi-points of view semantic enrichment of folksonomies

54

Capturing  users's  contribu-ons    

Embedding  structuring  tasks  within  everyday  ac0vity  (searching  e.g)  

Page 55: PhD defense : Multi-points of view semantic enrichment of folksonomies

55

Capturing  users's  contribu-ons    

Page 56: PhD defense : Multi-points of view semantic enrichment of folksonomies

56

Capturing  user's  point  of  view  

John  

srtag:hasRejected  energie  

france  

skos:broader  

srtag:TagSeman7cStatement  

Exemple:  Rejec7ng  a  rela7on  

Page 57: PhD defense : Multi-points of view semantic enrichment of folksonomies

57

Capturing  user's  point  of  view  

John  

srtag:hasRejected  energie  

energy  

skos:related  

srtag:TagSeman7cStatement  

Exemple:  Proposing  another  

rela7on  

energie  

energy  

skos:closeMatch  

srtag:TagSeman7cStatement  

srtag:hasProposed  

Page 58: PhD defense : Multi-points of view semantic enrichment of folksonomies

58

Capturing  user's  point  of  view  

John  

srtag:hasRejected  energie  

energy  

skos:related  

srtag:TagSeman7cStatement  

Exemple:  Proposing  another  

rela7on  

energie  

energy  

skos:closeMatch  

srtag:TagSeman7cStatement  

srtag:hasProposed  

Page 59: PhD defense : Multi-points of view semantic enrichment of folksonomies

59

Capturing  user's  point  of  view  

John  

srtag:hasRejected  energie  

energy  

skos:related  

srtag:TagSeman7cStatement  

Exemple:  Proposing  another  

rela7on  

energie  

energy  

skos:closeMatch  

srtag:TagSeman7cStatement  

srtag:hasProposed  

Page 60: PhD defense : Multi-points of view semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centric structuring

Detect conflicts

Global structuring

Flat folksonomy

Structured folksonomy

Folksonomy  enrichment  life-­‐cycle  

60

Page 61: PhD defense : Multi-points of view semantic enrichment of folksonomies

61

Conflict  detec-on  

environment   pollu7on  

Using rules:

IF num(narrower)/num(broader) ≥ c THEN narrower wins ELSE related wins

narrower

John  

srtag:hasApproved  

Anne  

srtag:hasApproved  

broader

Monique  

srtag:hasApproved  

Delphine  

srtag:hasApproved  

Page 62: PhD defense : Multi-points of view semantic enrichment of folksonomies

62

Conflict  detec-on  

related

broader narrower

less constrained less constrained less constrained

close match

related environment   pollu7on  narrower

broader

Page 63: PhD defense : Multi-points of view semantic enrichment of folksonomies

63

Experimenta-on  at  ADEME  

Par7cipa7on  of  3  members  at  Ademe    +  2  professionals  in  environment    

Si je cherche des informations, je dois

pouvoir utiliser indifféremment le Tag1 ou le Tag2

Si je cherche des informations liées à

Tag1, les informations liées à Tag2 sont

pertinentes, mais pas le contraire

Si je cherche des informations liées à

Tag2, les informations liées à Tag1 sont

pertinentes, mais pas le contraire

Si je cherche des informations sur l'un

des tags, il est pertinent de suggérer des informations sur

l'autre tag (Tag1 et Tag2 sont

équivalents)(Tag1 est plus général

que Tag2)(Tag2 est plus général

que Tag1) (Tag1 et Tag2 liés)

agriculture durable agriculture raisonneebiologie agriculture biologique

changements sociaux changement socialchimie verte chanvre

Climat/changement changement climatiquecollectivite action collectivecollectivite collecte de donneescommande communication entre acteurs

comportements pro-environnementaux

comportements pro-environnemental

compost composantconception ecoconception

conception travail collaboratif vis a vis de la conception

cycle de rankine cycle organique de rankinedeveloppement durable developpement local

accumulateurs li-ion tours d'habitationacteurs du territoire territorialite

agglomeration cooperationagriculture durable agriculture biologiquediversite culturelle diversite microbienne

ecologie ecologyelements finis methode des elements finis

energie politique energetiqueenergie production energieenergie energie renouvelableenergie autonomie energetiqueenergy energies

environmental environment

environnement domaines environnementauxenvironnement grenelle de l'environnementenvironnement competences environnementales

environnement socialisation aux preoccupations environnementales

ester gasteropodesexperimentation electromediation

extraction phytoextractionfinance financementgestion gestion stock

gestion stock gestion des ressources naturellesgouvernance gouvernance forestiere

hybride vehicules electriques et hybrides

Nom Prénom : Poste :

Profil en quelques mots-clés :

Indiquer par un "X" la relation que vous jugez la plus exacte entre les deux tags. Choisissez une seule relation pour chaque tag. Les deux premières lignes sont des exemples fictifs.

Tag1 Tag2

Ces 2 tags ne sont pas

spécialement liés

Page 64: PhD defense : Multi-points of view semantic enrichment of folksonomies

 Several  cases  of  conflic-ng  situa-ons  

Conflic-ng  :  >1  rela7on  per  pair  of  tags  

Approved  :  1  rela7on,  only  approved  

Debatable  :  1  rela7on,  BOTH  approved  and  rejected  

Rejected  :  1  rela7on,  only  rejected  

!"#$%&'#()*+,)

-../"012)34,)

516787691):;,)

<1=1&812):+,)

!"#$%&'("&$)*+,&-$'.$/'012/-$+'&3204$64

Page 65: PhD defense : Multi-points of view semantic enrichment of folksonomies

 Several  cases  of  conflic-ng  situa-ons  

Distribu-on  over    rela-on  types  :  •   "closeMatch"  tends  to  draw  a  consensus  more  easily  than  others  •   "broader/narrower"    and  "related"  cause  more  debates/conflicts  

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)!"#

*!"#

+!"#

,!"#

$!!"#

-./01234-5# 67/3817# 9377/:17# 71.3418#

!"#$%&'()&"*+,-$,.$/,-0&/($/1'2'$,32)$)241+,-$(562'$

;/9<=-4#0/.>17#?7/?/03.# @??7/>18# A163418# B1C1-418#

65

Page 66: PhD defense : Multi-points of view semantic enrichment of folksonomies

 Several  cases  of  conflic-ng  situa-ons  

Influence  of    compound  words  

?

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)!"#

*!"#

+!"#

,!"#

$!!"#

-./0.12345.637#08967#

:.24;./0.123#5.637#08967#

<==#08967#

-.2>9;?2@# <006.AB3# CBD8E8D=B# FBGB;EB3#

energy  

renewable  energy  

80%  

46%  

66

Page 67: PhD defense : Multi-points of view semantic enrichment of folksonomies

Example  conflict  resolu7on  Conflic7ng  Conflict  solver  choice  debatable  rejected   67

Page 68: PhD defense : Multi-points of view semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centric structuring

Detect conflicts

Global structuring

Flat folksonomy

Structured folksonomy

Folksonomy  enrichment  life-­‐cycle  

68

Page 69: PhD defense : Multi-points of view semantic enrichment of folksonomies

Helping  Referent  User  (Ademe  archivists)  choose  solu0ons  to  conflicts  

Repor-ng  

69

Page 70: PhD defense : Multi-points of view semantic enrichment of folksonomies

70

Global  map  

Includes  all  points  of  view,  highlights  conflicts  +  consensuses  

Page 71: PhD defense : Multi-points of view semantic enrichment of folksonomies

Referent  choices  

71 Choices  of  the  referent  user  (archivists  at  Ademe  e.g.)  

Page 72: PhD defense : Multi-points of view semantic enrichment of folksonomies

Referent  choices  

72

Page 73: PhD defense : Multi-points of view semantic enrichment of folksonomies

ADDING TAGS

Automatic processing

User-centric structuring

Detect conflicts

Global structuring

Flat folksonomy

Structured folksonomy

Folksonomy  enrichment  life-­‐cycle  

73

Page 74: PhD defense : Multi-points of view semantic enrichment of folksonomies

Enriching  individual  points  of  view  

Integra7ng  others'  contribu7ons:  1.  Current  user  -­‐>  "Anne"  2.  ReferentUser  (e.g.  archivists)  3.  ConflictSolver  (sowware  agent)  4.  Other  individual  users  5.  Automatons  (metrics)  

BROADER  

NARROWER  

RELATED  

CLOSE  MATCH  

environnement  Search:  

preoccupa7on  environnementales  

grenelle  de  l  environnement  

competences  environnementales  

environment  

environmental  

domaines  environnementaux  

Anne  is  looking  for  tag  "environnement"  

74

Page 75: PhD defense : Multi-points of view semantic enrichment of folksonomies

Each    point  of  view  corresponds  to  a  layer  

75

Page 76: PhD defense : Multi-points of view semantic enrichment of folksonomies

5.   Conclusion  

76

Page 77: PhD defense : Multi-points of view semantic enrichment of folksonomies

77

What  we  do  :  

Help  online  communi7es                                        

structure  their  tags  wind-­‐energy  

renewable    energy  

sustainability  

wind  turbine  

has  broader  

related  

has  narrower  

environment  

related  

Page 78: PhD defense : Multi-points of view semantic enrichment of folksonomies

  An  approach  to  bridge    tagging  with  Seman-c  Web:    

  NiceTag  for  tagging    

  SRTag  for  mul7-­‐points  of  view  structuring  of  tags  

  Complete  life-­‐cycle  of  folksonomy  enrichment  

  Automa-c  processing  of  tags:  

  String-­‐based  heuris-c  

  State  of  the  art  methods  integrated  in  Seman7c  Web  compu7ng  environment  (Corese  Sparql  engine)  

  User  interface  to  capture  tag  structuring  embedded  in  every-­‐day  tasks  

  Implementa-on  within  ISICIL  solu7on  (tagging  server)  

78

Our  contribu-ons:  

Page 79: PhD defense : Multi-points of view semantic enrichment of folksonomies

•  More  user  interfaces  :  

•  Collabora-ve  aspects  •  Visualisa-on  of  large  structured  folksonomy  

•  Tag  searching    •  Other  computa7onal  methods  +  op7miza7on  

•  ISICIL  :  test  with  final  users  Ademe  and  Orange  labs  

•  Tes7ng  on  other  types  of  communi7es  (Life2Times)  

•  Temporal  dimension  

•  Mul7linguism  

•  Integra7ng  collabora-ve  ergonomics  in  design  processes  

79

Future  work  

Page 80: PhD defense : Multi-points of view semantic enrichment of folksonomies

80

Thank  you  !  

[email protected]  

hZp://www-­‐sop.inria.fr/members/Freddy.Limpens/  

Page 81: PhD defense : Multi-points of view semantic enrichment of folksonomies

2010  •  Monnin,  A.;  Limpens,  F.;  Gandon,  F.  &  Laniado,  D.  Speech  acts  meets  tagging:  NiceTag  ontology  AIS  SigPrag  Interna7onal  Pragma7c  Web  

Conference,  2010  •  Monnin,  A.;  Limpens,  F.;  Gandon,  F.  &  Laniado,  D.  ,L'ontologie  NiceTag  :  les  tags  en  tant  que  graphes  nommés,A.  Monnin,  F.  Limpens,  D.  

Laniado,  F.  Gandon,  EGC  2010,  Atelier  Web  Social  

•  Limpens,  F.;  Gandon,  F.  &  Buffa,  M.  Helping  online  communi-es  to  seman-cally  enrich  folksonomies  Proceedings  of  the  WebSci10:  Extending  the  Fron7ers  of  Society  On-­‐Line,  hZp://webscience.org,  2010  

2009  

•  Limpens,  F.;  Monnin,  A.;  Laniado,  D.  &  Gandon,  F.  NiceTag  Ontology:  tags  as  named  graphs  Interna7onal  Workshop  in  Social  Networks  Interoperability,  ASWC09,  2009  

•  Limpens,  F.;  Gandon,  F.  &  Buffa,  M.  Séman-que  des  folksonomies  :  structura-on  collabora-ve  et  assistée  Ingénierie  des  Connaissances,  2009    

•  Limpens,  F.;  Gandon,  F.  &  Buffa,  M.  Collabora-ve  seman-c  structuring  of  folksonomies  (short  ar-cle)  IEEE/WIC/ACM  Int.  Conf.  on  Web  Intelligence,  2009  

•  Erétéo,  G.;  Buffa,  M.;  Gandon,  F.;  Leitzelman,  M.  &  Limpens,  F.  Leveraging  Social  data  with  Seman-cs  W3C  Workshop  on  the  Future  of  Social  Networking,  Barcelona.,  2009  

•  Henri,  F.;  Charlier,  B.  &  Limpens,  F.  Understanding  and  Suppor-ng  the  Crea-on  of  More  Effec-ve  PLE  Int.  Conf.  on  Informa7on  Resources  Management,  Dubai,  2009  

2008    •  Henri,  F.;  Charlier,  B.  &  Limpens,  F.  Understanding  PLE  as  an  Essen-al  Component  of  the  Learning  Process  World  Conf.  on  Educa7onal  

Mul7media,  Hypermedia  &  Telecommunica7ons,  ED-­‐Media,  Vienna,  Austria,  2008    •  Limpens,  F.;  Gandon,  F.  &  Buffa,  M.  Rapprocher  les  ontologies  et  les  folksonomies  pour  la  ges-on  des  connaissances  partagées  :  un  Etat  

de  l'art  Proc.  19èmes  journées  francophones  d'Ingénierie  des  Connaissances,  Nancy,  2008  

•  Limpens,  F.;  Gandon,  F.  &  Buffa,  M.  Bridging  Ontologies  and  Folksonomies  to  Leverage  Knowledge  Sharing  on  the  Social  Web:  a  Brief  Survey  Proc.  1st  Interna7onal  Workshop  on  Social  Sowware  Engineering  and  Applica7ons  (SoSEA),    

http://www-­‐sop.inria.fr/members/Freddy.Limpens/?q=biblio  

81

Personal  publica-ons  

Page 82: PhD defense : Multi-points of view semantic enrichment of folksonomies

ANGELETOU  S.,  SABOU  M.  &  MOTTA  E.  (2008).  Seman7cally  Enriching  Folksonomies  with  FLOR.  In  CISWeb  Workshop  at  European  Seman7c  Web  Conference  ESWC.  

BRAUN  S.,  SCHMIDT  A.,  WALTER  A.,  NAGYPÁL  G.  &  ZACHARIAS  V.  (2007).  Ontology  maturing:  a  collabora7ve  web  2.0  approach  to  ontology  engineering.  In  CKC,  volume  273  of  CEUR  Workshop  Proceedings:  CEURWS.org.  

CATTUTO  C.,  BENZ  D.,  HOTHO  A.  &  STUMME  G.  (2008).  Seman7c  grounding  of  tag  relatedness  in  social  bookmarking  systems.  In  Proceedings  of  the  7th  Interna7onal  Conference  on  The  Seman7c  Web,  Berlin,  Heidelberg:  Springer-­‐Verlag.  

GANDONF.,BOTTOLIERV.,CORBYO.&DURVILLEP.  (2007).Rdf/xml  source  declara7on,  w3c  member  submission.  hZp://www.w3.org/Submission/rdfsource/.  

HALPIN  H.  &  PRESUTTI  V.  (2009).  An  ontology  of  resources:  Solving  the  iden7ty  crisis  in  ESWC,  volume  5554  of  Lecture  Notes  in  Computer  Science,  p.  521–534:  Springer.  

HOTHO  A.,  JÄSCHKE  R.,  SCHMITZ  C.  &  STUMME  G.  (2006).  Informa7on  retrieval  in  folksonomies:  Search  and  ranking.  In  The  Seman7c  Web:  Research  and  Applica-­‐  7ons,  LNCS(4011)  ,  Heidelberg:  Springer.  

HUYNH-­‐KIM  BANG  B.,  DANÉ  E.  &  GRANDBASTIEN  M.  (2008).  Merging  seman7c  and  par7cipa7ve  approaches  for  organizing  teachers’  documents.  In  Proceedings  of  World  Conference  on  Educa7onal  Mul7media,  Hypermedia  &  Telecommunica7ons,  p.  x4959–4966,  Vienna  France.  

KIM  H.-­‐L.,  YANG  S.-­‐K.,  SONG  S.-­‐J.,  BRESLIN  J.  G.  &  KIM  H.-­‐G.  (2007).  Tag  Mediated  Society  with  SCOT  Ontology.  In  Seman7c  Web  Challenge,  ISWC.  

LIN  H.  &  DAVIS  J.  (2010).  Computa7onal  and  crowdsourcing  methods  for  extrac7ng  ontological  structure  from  folksonomy.  In  ESWC  (2),  volume  6089  of  Lecture  Notes  in  Computer  Science,  p.  472–477:  Springer.  

MIKA  P.  (2005).  Ontologies  are  Us:  a  Unified  Model  of  Social  Networks  and  Seman7cs.  In  ISWC,  volume  3729  of  LNCS,  p.  522–536:  Springer.  

MONNIN  A.,  LIMPENS  F.,  GANDON  F.  &  LANIADO  D.  (2010).  Speech  acts  meet  tagging:  Nicetag  ontology.  In  I-­‐SEMANTICS  ’10:  Proceedings  of  the  6th  Interna7onal  Conference  on  Seman7c  Systems,  p.  1–10,  New  York,  NY,  USA:  ACM.  

PASSANT  A.  &  LAUBLET  P.  (2008).  Meaning  of  a  tag:  A  collabora7ve  approach  to  bridge  the  gap  between  tagging  and  linked  data.  In  Proceedings  of  the  WWW  2008  Workshop  Linked  Data  on  the  Web  (LDOW2008),  Beijing,  China.  

SPECIA  L.  &  MOTTA  E.  (2007).  Integra7ng  folksonomies  with  the  seman7c  web.  In  Proc.  of  the  European  Seman7c  Web  Conference  (ESWC2007),  volume  4519  of  LNCS,  p.  624–639,  Berlin  Heidelberg,  Germany:  Springer-­‐Verlag.   82

References