The Semantic Web in use: Analyzing FOAF Documents

41
UMBC UMBC an Honors University in an Honors University in Maryland Maryland The Semantic Web in The Semantic Web in use: use: Analyzing Analyzing FOAF Documents FOAF Documents Li Ding, Lina Zhou, Tim Finin and Anupam Joshi University of Maryland, Baltimore County DARPA contract F30602-00-0591and NSF awards ITR-IIS- 0326460 and ITR-IIS-0325464 provided partial research support for this work

description

The Semantic Web in use: Analyzing FOAF Documents. Li Ding, Lina Zhou, Tim Finin and Anupam Joshi University of Maryland, Baltimore County. DARPA contract F30602-00-0591and NSF awards ITR-IIS-0326460 and ITR-IIS-0325464 provided partial research support for this work. Outline. Motivation - PowerPoint PPT Presentation

Transcript of The Semantic Web in use: Analyzing FOAF Documents

Page 1: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

The Semantic Web in The Semantic Web in use:use:

AnalyzingAnalyzingFOAF DocumentsFOAF Documents

Li Ding, Lina Zhou,Tim Finin and Anupam Joshi

University of Maryland, Baltimore County

DARPA contract F30602-00-0591and NSF awards ITR-IIS-0326460 and ITR-IIS-0325464 provided partial research support for this work

Page 2: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Outline Motivation Introduction

The six popular ontologies FOAF vocabulary Why FOAF

Building FOAF Document collection FOAF Document Identification FOAF Document Discovery Popular Properties of foaf:Person

Applications Personal Information Fusion Social Network Analysis

Page 3: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

The Semantic Web The semantic web vision is that information and

services are described using shared ontologies in KR-like markup languages, making them accessible to machines (programs).

How do we get there? What kind of ontologies? IEEE SUO? Cyc? What kind of languages? RDF? OWL? RuleML?

It’s reasonable to start with the simple and move toward the complex From Dublin Core to CYC From RDF to OWL and beyond

Significant semantic web content exists today Using simple vocabularies (e.g., FOAF) and RDF/RDFS

Page 4: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

The Semantic Web The more important word in “Semantic Web” is

the latter The KR aspects of the SW were taken off the

shelf, the result of 25 years of research done in the AI community

Remember hypertext? It was a nice research backwater going back to the 50’s (recall Memex and Xanadu) Hypertext was forever change by the Web So maybe the web will forever change KR

TBL: “The Semantic Web will globalize KR, just as the WWW globalize hypertext”

Page 5: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Web of what? What features does the web bring to the

table? “Anyone can say anything about anything” The meaning of RDF terms will be (partly)

determined socially It’s a web of documents, services, agents and

people

Page 6: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

What kind of Ontologies?

Catalog/ID

GeneralLogical

constraints

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance

Value Restriction

Disjointness, Inverse,part of…

After Deborah L. McGuinness (Stanford)

Taxonomies

Expressive

Ontologies

Wordnet

CYCRDF DAML

OO

DB Schema RDFS

IEEE SUOOWL

UMLS

Vocabularies Simple

Ontologies

Page 7: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

The Semantic Web Today There are several simple RDF vocabularies

that are widely used today Dublin Core RSS FOAF

It’s instructive to study how these are being used today

And to track how their usage changes

Page 8: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

The Six Most Popular Ontologies

RDF

DC

RSS

FOAF

RDFS

MCVB

The statistics is generated by http://swoogle.umbc.edu

Page 9: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

A usecase: FOAF FOAF (Friend of a Friend) is a simple ontology to describe

people and their social networks. See the foaf project page: http://www.foaf-project.org/

We recently crawled the web and discovered over 1,500,000 valid RDF FOAF files. Most of these are from seveal blogging system that encode

basic user info in foaf See http://apple.cs.umbc.edu/semdis/wob/foaf/

<foaf:Person><foaf:name>Tim Finin</foaf:name><foaf:mbox_sha1sum>2410…37262c252e</foaf:mbox_sha1sum><foaf:homepage rdf:resource="http://umbc.edu/~finin/" /><foaf:img rdf:resource="http://umbc.edu/~finin/images/passport.gif" />

</foaf:Person>

Page 10: The Semantic Web in use: Analyzing FOAF Documents

FOAF vocabularyhttp://xmlns.com/foaf/0.1/

@

Page 11: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

FOAF: why RDF? Extensibility! FOAF vocabulary provides 50+ basic terms for

making simple claims about people FOAF files can use other RDF terms too: RSS,

MusicBrainz, Dublin Core, Wordnet, Creative Commons, blood types, starsigns, …

RDF guarantees freedom of independent extension OWL provides fancier data-merging facilities 

Result: Freedom to say what you like, using any RDF markup you want, and have RDF crawlers merge your FOAF documents with other’s and know when you’re talking about the same entities. 

After Dan Brickley, [email protected] 

Page 12: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

No free lunch!Consequence: We must plan for lies, mischief, mistakes, stale

data, slander Dataset is out of control, distributed, dynamic Importance of knowing who-said-what

Anyone can describe anyone We must record data provenance Modeling and reasoning about trust is critical

Legal, privacy and etiquette issues emerge Welcome to the real world

After Dan Brickley, [email protected] 

Page 13: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

FOAF example using XML<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-

rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"><foaf:Person> <foaf:name>Tim Finin</foaf:name> <foaf:mbox

rdf:resource="mailto:[email protected]"/> </foaf:Person></rdf:RDF>

Page 14: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

FOAF example using XML<foaf:Person> <foaf:name>Tim Finin</foaf:name> <foaf:mbox

rdf:resource="mailto:[email protected]"/> <foaf:nick>Tim</foaf:nick> <foaf:homepage

rdf:resource="http://umbc.edu/~finin/"/> <foaf:img rdf:resource=

"http://umbc.edu/~finin/passport.gif"/> </foaf:Person>

Page 15: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

FOAF example using XML

<foaf:Person> <foaf:name>Tim Finin</foaf:name> <foaf:knows>

<foaf:Person>

<foaf:name>Anupam Joshi</foaf:name>

<rdf:seeAlso rdf:resource = "http://umbc.edu/~joshi/joshi.foaf"/>

<foaf:knows>

</foaf:Person>

Page 16: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

FOAF isn’t the only one Other ontologies are used to publish social

information Swoogle finds >360 RDFs or OWL classes with the

local name “person.”

Page 17: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Lots of FOAF tools

Page 18: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Why FOAF Information Creators

Community membership management Unique Person Identification (privacy preserved) Indicating Authorship

Information Consumers Provenance tracking Social networking

Expose community information to new comers Match interests

Trust building block

Page 19: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Studying how FOAF is being used

What counts as a FOAF document?

How can we find foaf documents?

Page 20: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

1. D is an RDF document.2. D uses FOAF namespace3. The RDF graph serialized by D contains the sub-graph below

4. D defines one and only one Person instance

1. D is an RDF document.2. D uses FOAF namespace3. The RDF graph serialized by D contains the sub-graph below

4. D defines one and only one Person instance

Identify a FOAF document D is a generic FOAF document when 1,2,3 met D is a strict FOAF document when 1,2,3,4 met

X

foaf:Person

Z foaf:Y

rdf:type

Page 21: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Different FOAF collections DS-Swoogle

Foaf documents selected from Swoogle’s database of ~340K semantic web documents

Swoogle selects at most 1000 documents from any site

DS-FOAF Custom crawler found 1.5M foaf documents, most

from a few large blog sites (e.g., livejournal) DS-FOAF-Small

Subset of ~7K non-blog foaf documents from ~1K sites defining ~37K people

Page 22: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

FOAF document Discovery Bootstrap: using web search engine (Got 10,000 docs) Discovery: using rdfs:seeAlso semantics (Got 1.5M docs)

Top 7 FOAF websites

Page 23: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

From DS-Swoogle 17 SWDs add to the definition of foaf:Person

e.g., defining superclasses, disjointness, etc. 162 properties are defined for foaf:Person

e.g., properties whose domain is foaf:Person 74 properties defined as relations between

people e.g., properties with both domain and range of

foaf:Person 582 properties used

e.g., used to assert something of a foaf:Person instance

Page 24: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Popular properties of foaf:Person

non-blog(26,936)

liveJournal.com(20,298,073)

DS-FOAF-SMALL *(33,790)

1 foaf:mbox_sha1sum (0.84) foaf:mbox_sha1sum (1.0) foaf:name(0.80)

2 foaf:homepage (0.66 ) dc:description(1.0) foaf:mbox_sha1sum(0.71)

3 foaf:name (0.64) dc:title (1.0) foaf:nick (0.51)

4 foaf:nick (0.61) foaf:nick (1.0) foaf:homepage (0.40)

5 foaf:weblog (0.60) foaf:page (1.0) foaf:depiction (0.35)

6 foaf:knows (0.44) foaf:weblog (0.99) foaf:weblog (0.30)

7 foaf:mbox (0.38) rdfs:seeAlso (0.85) foaf:knows (0.28)

8 foaf:img (0.38) foaf:knows (0.85) foaf:surname (0.27)

9 bio:olb (0.35) foaf:dateOfBirth (0.71) foaf:firstName (0.26)

10 rdfs:seeAlso (0.34) foaf:interest (0.67) rdfs:seeAlso (0.26)

11 foaf:mbox (0.26)

*DS-FOAF-SMALL is a newly dataset in Oct 2004, based on 7276 evenly sampled documents.

Top 10 popular properties (per document)

Page 25: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Popular properties of foaf:Person

non-blog(26,936)

liveJournal.com(20,298,073)

DS-FOAF-SMALL *(33,790)

1 foaf:name (0.84) dc:title (1.74) foaf:name(0.69)

2 foaf:knows (0.79) foaf:interest (1.68) foaf:mbox_sha1sum(0.65)

3 foaf:homepage (0.63) foaf:nick (1.04) rdfs:seeAlso (0.39)

4 foaf:mbox_sha1sum (0.51) foaf:weblog (1.00) foaf:nick (0.26)

5 rdfs:seeAlso (0.40) rdfs:seeAlso (0.99) foaf:homepage (0.18)

6 dc:title (0.31) foaf:knows (0.95) foaf:mbox (0.15)

7 foaf:nick (0.22) foaf:page (0.95) foaf:weblog (0.15)

8 foaf:weblog (0.18) dc:description (0.046) foaf:firstName (0.11)

9 foaf:mbox (0.15) foaf:mbox_sha1sum (0.046) foaf:surname (0.11)

10 daml:equivalentTo (0.13) foaf:dateOfBirth (0.046) foaf:depiction (0.10)

11 foaf:knows (0.07)

Top 10 popular properties (per instance)

*DS-FOAF-SMALL is a newly dataset in Oct 2004, based on 7276 evenly sampled documents.

Page 26: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Extracting social networksThree steps Discovering foaf instances Merging instances representing the same

person Linking people via foaf:knows and other foaf

based relations e.g., quaffing:drankBeerWith

Integrating other SNA data e.g., from co-author relationships mined from

citeseer

Page 27: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Merging instances Named instances Inverse functional properties Set of nearly inverse functional properties OWL constraints Rdf:seeAlso

Page 28: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Collecting Personal Information

http:www.cs.umbc.edu/~dingli1/foaf.rdf

http://www-2.cs.cmu.edu/People/fgandon/foaf.rdf

Page 29: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Caution: Collision? Mistake!

http://www.mindswap.org/~katz/2002/11/jordan.foaf

http://www.ilrt.bris.ac.uk/people/cmdjb/webwho.xrdf

caution

Page 30: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA1: Instances of foaf:Person/doc Zipf’s distribution Sloppy tail: few foaf documents contain

thousands of instances

1

10

100

1000

10000

1 10 100 1000 10000 100000

# of persons

# of

FO

AF

doc

umen

ts

Cumulative distribution

Page 31: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA2: Instances of foaf:Person/group Zipf’s distribution Sloppy tail: some instances are wrongly

fused due to incorrect FOAF documents

1

10

100

1000

10000

100000

1 10 100 1000

group size (# of persons)

# of

gro

ups

Cumulative distribution

A group refers to a fused person

Page 32: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Degree analysis For social networks, the in-degree and out-

degree measure of a person is of interest Can be used to identify hubs and authorities

or to compute other interesting properties or rankings

Analyzing most large social networks reveals that in-degree and out-degree follows a power law or Zipf distribution

We found that to be the case for social networks induced by foaf documents.

Page 33: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA3: In-degree of group Zipf’s Distribution Sharp tail: few FOAF documents have large in-

degrees

1

10

100

1000

10000

100000

1 10 100in degree of group

# o

f gro

ups

Cumulative distribution

Page 34: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA4: Out-degree of group Zipf’s distribution Sloppy tail: few person directory documents

1

10

100

1000

10000

1 10 100 1000 10000 100000

out degree per group

# of

gro

ups

Cumulative distribution

Page 35: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA5: Patterns of FOAF Network Four types of group

Isolated Only in

only one inlink (97%) Only out Both (intermediate)

Basic Patterns: Singleton: (isolated) Star: (only out) an active

person publishes friends Clique: a small group

Page 36: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA6: Size of components Zipf’s distribution Sloppy head: singleton Sloppy tail: blog websites (e.g. www.livejournal.com)

1

10

100

1000

10000

1 10 100 1000 10000 100000

# of groups per connected component

# of connecte

d com

ponent

Cumulative distribution

Page 37: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA7: Growth of FOAF networkThe data suggests that there is a natural evolution for a social network

(1) disjointed star-like, connected components

(2) link together to form trees and forests,

(3) eventually forming a scale-free network

Page 38: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

SNA7: Growth of FOAF network1

2

3

Page 39: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

The Map of FOAF network

www.livejournal.com

www.ecademy.com

Blog.livedoor.jp

non-blog

June 2004

Page 40: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Conclusions The semantic web is evolving There is a growing volume of RDF content FOAF is one of the one of the early

successes. FOAF data is being used FOAF data is relatively easy to collect and

analize FOAF data is a good source for social

network information

Page 41: The Semantic Web in use: Analyzing FOAF Documents

UMBCUMBCan Honors University in an Honors University in

MarylandMaryland

Questions?

Demo: http://apple.cs.umbc.edu/semdis

Swoogle: http://swoogle.umbc.edu/

ebiquity group: http://ebiquity.umbc.edu