Lecture 3: Social Web Data Formats (2012)

70
Social Web Lecture III What DATA looks like on the Social Web? Lora Aroyo The Network Institute VU University Amsterdam Monday, February 27, 12

description

Lecture 3 of the Social Web course at the VU University Amsterdam http://semanticweb.cs.vu.nl/socialweb2012/

Transcript of Lecture 3: Social Web Data Formats (2012)

Page 1: Lecture 3: Social Web Data Formats (2012)

Social WebLecture III

What DATA looks like on the Social Web?

Lora AroyoThe Network Institute

VU University Amsterdam

Monday, February 27, 12

Page 2: Lecture 3: Social Web Data Formats (2012)

What do people contribute on

the SW?

Monday, February 27, 12

Page 3: Lecture 3: Social Web Data Formats (2012)

History & Natureof Blogs

• Blog = weB LOG = we blog

• evolved from online diary (in the 1980’s)

• the term blog coined in late 1990’s

• one of the first ways people could contribute content on the Web themselves

• Nature: political, technical, art, journalistic, cultural, personal

• Software: WordPress, Blogger, LifeJournal

Monday, February 27, 12

Page 4: Lecture 3: Social Web Data Formats (2012)

Types of Blogs• Single- or Multi-authored

• Photo-blog, Video-blog, Audio-blog

• Life (b)log, now - microlifeblog (twitter)

• lifecasting: in 2007 by Justin Kan: webcam on a cap

• Gordon Bell MyLifeBits: Microsoft SenseCam

http://www.justin.tv/

http://research.microsoft.com/en-us/projects/mylifebits/Monday, February 27, 12

Page 5: Lecture 3: Social Web Data Formats (2012)

Question?Why has microblogging (eg Twitter) taken over the

popularity from more traditional blogs?

Monday, February 27, 12

Page 6: Lecture 3: Social Web Data Formats (2012)

Wikis

• Wiki in Hawaiian meaning fast/quick

• "the simplest online database that could possibly work" (Ward Cunningham), 1995

• first wiki software: WikiWikiWeb (the QuickWeb)

http://en.wikipedia.org/wiki/Ward_Cunninghamhttp://en.wikipedia.org/wiki/WikiWikiWeb

Monday, February 27, 12

Page 7: Lecture 3: Social Web Data Formats (2012)

Wiki Features• a website powered by wiki software

• created and maintained collaboratively by multiple users = an ongoing process that constantly changes the site

• not a carefully crafted site for casual visitors

• users can add, modify or delete content

• to obtain meaningful topic associations between different pages, page link creation is easy

• Examples: community websites, corporate intranets, knowledge management systems, and note taking

Monday, February 27, 12

Page 8: Lecture 3: Social Web Data Formats (2012)

Wiki Implementation• as an application server that runs on one or more web servers

• content is stored in a file system, and changes to the content are stored in a relational database management system

• commonly implemented software package is MediaWiki (known from Wikipedia)

• pages structure & formatting: simplified markup language (wikitext)

• style & syntax of wikitexts vary among wiki implementations (some also allow HTMLtags or use WYSIWYG editing)

• Issues: control of editing & changes, trust & security

Monday, February 27, 12

Page 9: Lecture 3: Social Web Data Formats (2012)

http://en.wikipedia.org/wiki/List_of_wikis

http://www.wikimedia.org/

Monday, February 27, 12

Page 10: Lecture 3: Social Web Data Formats (2012)

Question?Blogging and wikis are examples of '(lay) users

publishing content'.

What are requirements to make this publishing effective?

Monday, February 27, 12

Page 11: Lecture 3: Social Web Data Formats (2012)

User-generated data

Monday, February 27, 12

Page 12: Lecture 3: Social Web Data Formats (2012)

Exploiting the crowd

• in the wiki applications crowd contributes with collective intelligence (textual)

• later other media & recourses emerged, e.g. photo, video, music

• crowdsourcing

Monday, February 27, 12

Page 13: Lecture 3: Social Web Data Formats (2012)

Why crowdsourcing?• many tedious and time-consuming tasks

• professional results not always complete

• professionals (experts) are few & expensive

• professionals do not always know the needs, the language and the perspectives of the users

• people have wide range of hobbies and detailed knowledge

• people have time

Monday, February 27, 12

Page 14: Lecture 3: Social Web Data Formats (2012)

Example

• in 1760 Wolfgang von Kempelen designed The Turk

• in 2005 Amazon introduced the Amazon Mechanical Turk

• marketplace for work; people perform tasks computers are lousy at, e.g. identifying items in a photo/video, writing product descriptions, transcribing podcasts

• organized work

• HITs = human intelligence tasks

• require very little time & offer very little compensation

• workers & requesters

Monday, February 27, 12

Page 15: Lecture 3: Social Web Data Formats (2012)

5 Rules of the New Labor Pool

• The crowd is dispersed and can perform a range of tasks – from the most rote to the highly specialized

• The crowd has a short attention span, so jobs need to be broken into “micro-chunks”

• The crowd is full of specialists

• The crowd produces mostly crap - no increase in the amount of talent – the challenge is to find and leverage that talent

• The crowd finds the best stuff - finds the best material and corrects errors

By Jeff HoweMonday, February 27, 12

Page 16: Lecture 3: Social Web Data Formats (2012)

Question?Was the $1 million Netflix prize a victory for crowdsourcing?

Monday, February 27, 12

Page 17: Lecture 3: Social Web Data Formats (2012)

Question?Crowdsourcing is about exploiting collective effort or

collective intelligence.

What are aspects that make it now much more applicable than before?

Monday, February 27, 12

Page 18: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 19: Lecture 3: Social Web Data Formats (2012)

Folksonomies

Monday, February 27, 12

Page 20: Lecture 3: Social Web Data Formats (2012)

Structure on the Web

• In the evolution of the Web, Semantic Web refers to an approach to add ‘semantics’ to the web, by naming terms in a domain

• A specification of such terms is called an ‘ontology’

• For software: ontologies help to effectively use content on the Web (like DB schemas)

Monday, February 27, 12

Page 21: Lecture 3: Social Web Data Formats (2012)

Folksonomy

• On the social web the user-generated content is organized in light-weight ontologies, i.e. folksonomies

• Community-based semantics = a relationship between Users, Tags & Resources

• user-created, bottom-up classification/categorization of (domain) terms / user-labels, e.g. tags

• tagging = the social process where lay users attach labels to resources (as opposed to annotation by professional experts)

Monday, February 27, 12

Page 22: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 23: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 24: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 25: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 26: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 27: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 28: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 29: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 30: Lecture 3: Social Web Data Formats (2012)

• cleaning messy data• transforming data from one format to another• fetching missing data

Monday, February 27, 12

Page 31: Lecture 3: Social Web Data Formats (2012)

Question?Folksonomies typically show the relationships between users,

tags and resources.

Can you think of ways to aggregate user-tag-resource combinations to get more concise and therefore more meaningful folksonomies?

Monday, February 27, 12

Page 32: Lecture 3: Social Web Data Formats (2012)

What DATA formats do we have?

Monday, February 27, 12

Page 33: Lecture 3: Social Web Data Formats (2012)

Vocabularies on the (Social) Web

• to create interfaces or exchange data between applications the software needs to know the terms in the data

• vocabularies define set of terms in a certain domain, e.g. describing people, relationships, content of different type

Monday, February 27, 12

Page 34: Lecture 3: Social Web Data Formats (2012)

FOAF• FOAF = Friend of a Friend

• a machine-readable ontology describing persons, their activities & their relations to other people and objects

• an open, decentralized technology for connecting social Web sites, & the people they describe

• http://www.foaf-project.org/

• Create your own FOAF file:

http://www.ldodds.com/foaf/foaf-a-matic

Monday, February 27, 12

Page 35: Lecture 3: Social Web Data Formats (2012)

FOAF Vocabulary

• Gradual evolution since mid-2000

• Stable core of classes and properties that will not be changed

• New terms may be added at any time

• FOAF RDF namespace URI is fixed

• http://xmlns.com/foaf/spec/

Monday, February 27, 12

Page 36: Lecture 3: Social Web Data Formats (2012)

FOAF Files• Text documents, that adopt the conventions of RDF and

may be written in XML, RDFa or N3

• Contain FOAF vocabulary and other RDF vocabularies

• FOAF defines classes, e.g. foaf:Person, foaf:Document, foaf:Image

• FOAF defines properties of those things, e.g. foaf:name, foaf:mbox (i.e. an internet mailbox), foaf:homepage

• FOAF defines relationship that hold between members of these categories, e.g. foaf:depiction relates something (e.g. a foaf:Person) to a foaf:Image

Monday, February 27, 12

Page 37: Lecture 3: Social Web Data Formats (2012)

• model for publishing simple factual data via a networked of linked RDF documents

• FOAF is an attempt to use the Web to:• integrate factual information with

information in human-oriented documents (e.g. videos, books, spreadsheets, 3d models)

• and info that is still in people's heads

• linking networks of information with networks of people

Linked Data & FOAF

Monday, February 27, 12

Page 38: Lecture 3: Social Web Data Formats (2012)

FOAF Example

• there is a foaf:Person

• with a foaf:name property of 'Dan Brickley'

• in foaf:homepage and foaf:openid relationships to a thing called http://danbri.org/

• in foaf:img relationship to a thing referenced by a relative URI of /images/me.jpg

Monday, February 27, 12

Page 39: Lecture 3: Social Web Data Formats (2012)

FOAF Auto-Discovery

• If you publish a FOAF self-description (e.g. using foaf-a-matic) you can make it easier for tools to find your FOAF by putting markup in the head of your HTML homepage

• Common filename foaf.rdf is a common choice

Monday, February 27, 12

Page 40: Lecture 3: Social Web Data Formats (2012)

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:admin="http://webns.net/mvcb/"><foaf:PersonalProfileDocument rdf:about=""> <foaf:maker rdf:resource="#me"/> <foaf:primaryTopic rdf:resource="#me"/> <admin:generatorAgent rdf:resource="http://www.ldodds.com/foaf/foaf-a-matic"/> <admin:errorReportsTo rdf:resource="mailto:[email protected]"/></foaf:PersonalProfileDocument>

<foaf:Person rdf:ID="me"><foaf:name>Lora Aroyo</foaf:name><foaf:title>Ms</foaf:title><foaf:givenname>Lora</foaf:givenname><foaf:family_name>Aroyo</foaf:family_name><foaf:nick>laroyo</foaf:nick><foaf:mbox_sha1sum>d21e8b414a0533e5b4b23411fd76aabbf63ad232</foaf:mbox_sha1sum><foaf:homepage rdf:resource="http://lora-aroyo.org"/><foaf:depiction rdf:resource="lora.jpg"/><foaf:phone rdf:resource="tel:123456789"/><foaf:workplaceHomepage rdf:resource="http://www.cs.vu.nl/~laroyo"/>

<foaf:knows><foaf:Person><foaf:name>Marieke van Erp</foaf:name><foaf:mbox_sha1sum>f4e16d18528b83fd8b91b603583cbfd8d15f30f2</foaf:mbox_sha1sum></foaf:Person></foaf:knows>

<foaf:knows><foaf:Person><foaf:name>Dan Brickley</foaf:name><foaf:mbox_sha1sum>748934f32135cfcf6f8c06e253c53442721e15e7</foaf:mbox_sha1sum><rdfs:seeAlso rdf:resource="http://danbri.org/foaf.rdf"/></foaf:Person></foaf:knows></foaf:Person></rdf:RDF>

Monday, February 27, 12

Page 41: Lecture 3: Social Web Data Formats (2012)

foaf:depiction

Monday, February 27, 12

Page 42: Lecture 3: Social Web Data Formats (2012)

SIOC• Semantically-Interlinked Online Communities

• a standard way for expressing user-generated content, i.e. enable the integration of online community information

• methods for interconnecting discussions, e.g. blogs, forums & mailing lists

• Semantic Web ontology for representing rich data from the Social Web in RDF

• commonly used in conjunction with the FOAF vocabulary for expressing personal profile and social networking information

• http://sioc-project.org/

Monday, February 27, 12

Page 43: Lecture 3: Social Web Data Formats (2012)

<sioc:Post rdf:about="http://jbreslin.com/blog/2006/09/07/creating-connections"> <dc:title>Creating connections between discussion clouds with SIOC</dc:title> <dcterms:created>2006-09-07T09:33:30Z</dcterms:created> <sioc:has_container rdf:resource="http://jbreslin.com/blog/index.php?sioc_type=site#weblog"/> <sioc:has_creator> <sioc:UserAccount rdf:about="http://jbreslin.com/blog/author/cloud/" rdfs:label="Cloud"> <rdfs:seeAlso rdf:resource="http://jbreslin.com/blog/index.php?sioc_type=user&sioc_id=1"/> </sioc:UserAccount> </sioc:has_creator> <foaf:maker rdf:resource="http://jbreslin.com/blog/author/cloud/#foaf"/> <sioc:content>SIOC provides a unified vocabulary for content and interaction description: a semantic layer that can co-exist with existing discussion platforms. </sioc:content> <sioc:topic rdfs:label="Semantic Web" rdf:resource="http://jbreslin.com/blog/category/semantic-web/"/> <sioc:topic rdfs:label="Blogs" rdf:resource="http://jbreslin.com/blog/category/blogs/"/> <sioc:has_reply> <sioc:Post rdf:about="http://jbreslin.com/blog/2006/09/07/creating-connections/#comment-123928"> <rdfs:seeAlso rdf:resource="http://johnbreslin.com/blog/index.php?sioc_type=comment&sioc_id=123928"/> </sioc:Post> </sioc:has_reply></sioc:Post>

• A post (1) titled "Creating connections between discussion clouds with SIOC" (2) created at 09:33:30 on 2006-09-07 (3) written by user "Cloud" (4) on topics "Blogs" and "Semantic Web" (5) with contents described in sioc:content.

• (6) More information about its author at http://johnbreslin.com/blog/index.php?sioc_type=user&sioc_id=1

• The post has a (7) reply and (8) detailed SIOC information about this reply can be found at http://johnbreslin.com/blog/index.php?sioc_type=comment&sioc_id=123928

1

2

3

4

5

6

8

7

Monday, February 27, 12

Page 44: Lecture 3: Social Web Data Formats (2012)

SIOC

• http://rdfs.org/sioc/ns# - SIOC Core Ontology Namespace• http://rdfs.org/sioc/access# - SIOC Access Ontology Module Namespace• http://rdfs.org/sioc/types# - SIOC Types Ontology Module Namespace• http://rdfs.org/sioc/services# - SIOC Services Ontology Module Namespace

Monday, February 27, 12

Page 45: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 46: Lecture 3: Social Web Data Formats (2012)

Activity Streams• A list of recent activities performed by someone on a

website

• Example: Facebook News Feed

• Activity Streams project aims is to develop an activity stream protocol to syndicate activities across social Web applications

• Major websites with activity stream implementations have already opened up their activity streams to developers to use, e.g. Facebook and MySpace

• http://activitystrea.ms/

Monday, February 27, 12

Page 47: Lecture 3: Social Web Data Formats (2012)

Activity StreamsSpecification

• an actor, a verb, an object and a target

• person performing an action on/with an object

• Geraldine posted a photo to her album

• John shared a video

• activity metadata to present to a user in a rich human-friendly format, e.g. constructing readable sentences about the activity that occurred, visual representations of the activity, or combining similar activities for display

• Activities are serialized using the JSON format

• There is also an ATOM-oriented specification

Monday, February 27, 12

Page 48: Lecture 3: Social Web Data Formats (2012)

Activity StreamsExample

http://activitystrea.ms/specs/json/1.0/Monday, February 27, 12

Page 49: Lecture 3: Social Web Data Formats (2012)

Activity StreamsExample

http://activitystrea.ms/specs/json/1.0/Monday, February 27, 12

Page 50: Lecture 3: Social Web Data Formats (2012)

Activity StreamsExample

http://activitystrea.ms/specs/json/1.0/Monday, February 27, 12

Page 51: Lecture 3: Social Web Data Formats (2012)

Verbs, Objects, MappingVerbs Objects

http://wiki.activitystrea.ms/w/page/1359319/Verb%20MappingMonday, February 27, 12

Page 52: Lecture 3: Social Web Data Formats (2012)

XFN• Xhtml Friends Network

• relationships between individuals: by defining a small set of values that describe personal relationships

• In HTML and XHTML documents, these are given as values for the rel attribute on a hyperlink. XFN allows authors to indicate which of the weblogs they read belong to friends, whom they've physically met, and other personal relationships. Using XFN values, which can be listed in any order, people can humanize their blogrolls and links pages, both of which have become a common feature of weblogs.

• using XFN can easily style all links of a particular type; thus, friends could be boldfaced, co-workers italicized, etc.

• http://gmpg.org/xfn/

Monday, February 27, 12

Page 53: Lecture 3: Social Web Data Formats (2012)

XFN Example

• Joe has a set of five links in his blogroll: his girlfriend Jane; his friends Dave and Darryl; industry expert James, who Joe briefly met once at a conference; and MetaFilter.

• MetaFilter gets no value since it is not an actual person

http://gmpg.org/xfn/introMonday, February 27, 12

Page 54: Lecture 3: Social Web Data Formats (2012)

5 people who’ve met

http://gmpg.org/xfn/intro

friends vs. acquaintances

love vs. familycolleagues vs. co-workers

Monday, February 27, 12

Page 55: Lecture 3: Social Web Data Formats (2012)

Open Graph

• protocol originally developed in Facebook

• enables web pages to become a rich object in a social graph, i.e. any web page to have the same functionality as any other object on Facebook

• Basic Metadata: to turn your web pages into graph objects

• og:title = title of your object e.g., "The Rock"• og:type = type of your object e.g.,

"video.movie"• og:image = image URL to represent your object

within the graph• og:url = canonical URL of your object that will

be used as its permanent ID in the graph, e.g., "http://www.imdb.com/title/tt0117500/"

Monday, February 27, 12

Page 56: Lecture 3: Social Web Data Formats (2012)

OGP: Explained• “Like” button on each of your posts

• Open Graph Protocol to mark up content OGP:

• prefix="og: http://ogp.me/ns#" specifies the OGP vocabulary

Monday, February 27, 12

Page 57: Lecture 3: Social Web Data Formats (2012)

OGP Explained

1. import the Dublin Core & Open Graph Protocol vocabularies using the prefix attribute

2. associate a prefix, dc and og with the URL for each vocabulary

3. use dc:creator and og:title, which are short-hand for the full vocabulary term URLs http://purl.org/dc/creator/creator and http://ogp.me/ns#title, respectively

Monday, February 27, 12

Page 58: Lecture 3: Social Web Data Formats (2012)

Monday, February 27, 12

Page 59: Lecture 3: Social Web Data Formats (2012)

RDFa• another syntax for RDF

• embedded in HTML, e.g. specify that a text is the name of a product = “adding semantic markup”.

• initially specified only for XHTML

• RDFa 1.1 = specified for XHTML and HTML5 (for any XML-based language, e.g. SVG)

• RDFa Lite = “a small subset of RDFa consisting of a few attributes that may be applied to most simple to moderate structured data markup tasks.”

• Publish your data as Linked Data through RDFa --> link to other URIs (others can link to your HTML+RDFa)

Monday, February 27, 12

Page 60: Lecture 3: Social Web Data Formats (2012)

Why RDFa?• data can be easily shared & reused (no need of maintaining the raw

structured data in a separate file in a separate format)

• RDFa processors can easily extract all the structured data from a webpage

• search engines

• Yahoo was a pioneer in this area, starting with Search Monkey

• Google started with Rich Snippets

• Recently, Google, Yahoo, Bing --> Schema.org

• recommendation for publishers on how to semantically markup their webpages

• Google Recipe = what can be done with structured data on the web

Monday, February 27, 12

Page 61: Lecture 3: Social Web Data Formats (2012)

Microformats

• a set of simple, open data formats built upon existing and widely adopted standards

• Designed for humans first and machines second

• Design principles for formats

• Highly correlated with semantic XHTML (aka the real world semantics, lowercase semantic web, lossless XHTML)

• “An evolutionary revolution”

Monday, February 27, 12

Page 62: Lecture 3: Social Web Data Formats (2012)

Microformats

Monday, February 27, 12

Page 63: Lecture 3: Social Web Data Formats (2012)

Your first microformat

• You can put a microformat on your website in less than 5 mins

• Example: putting an hCard (online business card) on your site

http://microformats.org/get-started

1. Find your name somewhere on your website2. Wrap your name in an fn (formatted name)

<span class="fn">Jamie Jones</span>

3. Wrap it all in a vcard (declares that everything inside is the hCard microformat):

<span class="vcard"><span class="fn">Jamie Jones</span></span><address class="vcard"><span class="fn">Jamie Jones</span></address>

The address element indicates that the person in the hCard is the contact for the page

<p class="vcard">My name is <span class="fn">Jamie Jones</span> I dig microformats!</p>

Monday, February 27, 12

Page 64: Lecture 3: Social Web Data Formats (2012)

Further microformats

• Add more information to your hCard

• Link to your friends and contacts with XFN

• Add events to your site with hCalendar

• Review movies, books, and more with hReview

http://microformats.org/get-startedMonday, February 27, 12

Page 65: Lecture 3: Social Web Data Formats (2012)

HTML Microdata

• HTML Microdata allows machine-readable data to be embedded in HTML documents in an easy-to-write manner, with an unambiguous parsing model

• It is compatible with numerous other data formats including RDF and JSON

• Microdata DOM API

• http://www.w3.org/TR/microdata/

Monday, February 27, 12

Page 66: Lecture 3: Social Web Data Formats (2012)

Microdata Syntax

• Microdata consists of a group of name-value pairs. The groups are called items, and each name-value pair is a property

• itemscope is used to create an item

• itemprop is used to add a property to an item

Monday, February 27, 12

Page 67: Lecture 3: Social Web Data Formats (2012)

Microdata Example

URL

Time

3 properties

top-levelMonday, February 27, 12

Page 68: Lecture 3: Social Web Data Formats (2012)

Question?We have seen many approaches to 'organizing' embedded

semantics, e.g. RDFa, Microformats, schema.org.

All these are driven by different parties and motives. How do you think this is best organized?

Monday, February 27, 12

Page 69: Lecture 3: Social Web Data Formats (2012)

Question?For which things on the social web would more vocabularies for embedded semantics be needed (besides what we have

already seen)?

Monday, February 27, 12

Page 70: Lecture 3: Social Web Data Formats (2012)

Hands-on Teaser

• mining data in various social web formats

• see the differences in what each of the formats can contain & what purpose they serve

• start: simple search where we pull in some XFN data and visualise a graph of people that we find on a website

• check: software you will be working with on the website

image source: http://www.flickr.com/photos/bionicteaching/1375254387/

Monday, February 27, 12