@twitter Mining #Microblogs Using #Semantic Technologies

Post on 11-May-2015

2.161 views 0 download

Tags:

description

Presenation of Selver Softic at 6th Workshop on Semantic Web Applications and Perspectives (SWAP 2010)

Transcript of @twitter Mining #Microblogs Using #Semantic Technologies

@twitter Mining #Microblogs Using #SemanticTechnologies

Selver Softic, Martin Ebner, Herbert Mühlburger , Thomas Altmann,

Behnam Taraghi

Web 2.0 - well known story

Web 2.0 technologies brought users closer to Web …– Wikis, Blogs, Forums …– Podcasts, RSS, XML …

… then users started to generate content …

Source: http:mediabistro.com

From Web to Social Web

• Result = a vast of information– Text, Pictures, Audio, Videos ….

• Communication, networking, exchange of data• Web became more personal• Cultural, geographical and social borders

disappeared

Source: http://www.ignitesocialmedia.com

Social Media Boom!

Social sites are data silos

source: www.pidgintech.com

But still disconnected ?

source: www.pidgintech.com

Data is still captured in Walled Garden!

Statements

• Social Web relies on users and communication among them

• While communicating users produce or consume content

• Social sites are data silos rich on variety of information

• This information could be interesting for:– monitoring of trends, advertising, statistics, reputation,

news broadcasting , tagging …• This data is captured in Walled garden !!!

Questions

• How to use this data to gain more useful insights• What are the advantages of online (offline) search

on such data and how to reach it in an uniform way

• Is it possible to structurize, connect and expose the data in order to be used by humans and machines more efficiently

• What would an architecture look like for this issue

Social Web Trends

MicrobloggingSocial BookmarkingSocial NetworkingSocial MarketingSharing Photos, Videos …

Source: http://socialwebresearch.com

Microblogs• Microblogs

– Used for communication,publishing and information exchange– Simple for processing – Information generated by many different users– Social user relations– Tripartite communication structure– Variety of informations – No boundaries by culture, location or technology (mobile users)

• Twitter– Most Popular – Large amount od data– But limitedAccording: http://an.kaist.ac.kr/traces/WWW2010.html41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106

million tweets

Semantic aspects and Twitter

• Twitter– User realtions– Tweets as short information artefacts – Communication with tripartite pattern– Time related information

• Vocabularies– SIOC, FOAF, Dublin Core

Linked Data and Twitter

• Twitter contains infos on:– People, Organisations,

Locations, Trends …

• LOD Cloud contains– Billions of triples about:

• Geolocations , data about science, government, common knowledge , persons, news …

• Vocabularies– MOAT, CommmonTag

Architecture model

Acquisition - Grabeeter

Grabeeter

• Search in your Tweets• Filter your Tweets by date• Search in your Tweets offline using the

Grabeeter Client• Filter your tweets offline using the Grabeeter

Client• Grabeeter provides an API

Triplification Module

• Author• Date• Content• Reciever

<tweet url="http://grabeeter.tugraz.at/tweet/199272" text="Sitting in Prater #vienna, launch party. Nice" screen_name="selvers" created="2010-08-19" twitterUrl="http://twitter.com/selvers/status/21606926237"/>

TriplifierRDF Store

Triplification Module@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix sioc: <http://rdfs.org/sioc/ns#> .

@prefix sioct: <http://rdfs.org/sioc/types#> .

@prefix dcterms: <http://purl.org/dc/terms/#> .

<http://twitter.com/selvers/status/21606926237> rdf:type sioct:MicroblogPost ;

sioc:content "Sitting in Prater #vienna, launch party. Nice" ;

sioc:has_creator <http://twitter.com/selvers/> ;

foaf:maker <http://grabeteer.tugraz.at/foaf/selvers/> ;

dcterms:created “2010-08-19” ; rdfs:sameAs <http://grabeeter.tugraz.at/tweet/199272> .

<http://twitter.com/selvers/> rdf:type foaf:Person ;

foaf:name "Selver Softic" ;

foaf:depiction <http://a0.twimg.com/profile_images/905118560/f9e4b6eba.13070201_3_normal.jpg> ;

foaf:knows <http://twitter.com/hmuehlburger/> ;

foaf:knows <http://twitter.com/mhausenblas/> ;

foaf:knows <http://twitter.com/mebner/> .

Interlinking Module

• Hashtags (People, Organisation, Locations)• MOAT, CommonTag• Later NLP processed content, SILK Framework

SELECT ?post ?content ?maker ?name WHERE {?post rdf:type sioct:MicroblogPost; foaf:maker ?maker; ?maker foaf:name ?name;sioc:content ?content.FILTER(regex(?content,#vienna))}

tag: tagName "vienna" ;moat: tagMeaning <http://dbpedia .org/resource/Vienna>tag: taggedResource <http://twitter.com/selvers/status/2160692623>

Classifier

Analysis

Conclusions & Outlook

• Current state of the art technologies suffice to realise the proposed architecture paradigm

• Interlinking with LOD Cloud (Tweet-O-Sphere)• Involving NLP Methods• Sentiment classification• (Re)Tagging of Tweets• Providing SPARQL Endpoint + Lookup Service as

research interface• Social Semantic Web Apps

Questions?