Semantic Technologies to Support the User-Centric Analysis of Activity Data

25
Semantic Technologies to Support the User-Centric Analysis of Activity Data Mathieu d’Aquin, Salman Elahi, Enrico Motta Knowledge Media Institute, The Open University

Transcript of Semantic Technologies to Support the User-Centric Analysis of Activity Data

Semantic Technologies to Support the User-Centric Analysis of Activity Data

Mathieu d’Aquin, Salman Elahi, Enrico Motta

Knowledge Media Institute, The Open University

Consumer/user centric data

Activity Data

Actor

Event

(Trace)

Resource

Action

realizesby

on

Usual Web Analytics

Groups of

Actors

Set of Events

(Traces)

Resource

Actions

realizesby

on

User-centric Activity Data Analysis

Actor

Set of Events

(Trace)

Set of

Resources

Actions

realizesby

on

Challenges in user centric

activity data

• Activity data that sit in logs are – Heterogeneous –

different models for different sites/systems

– Raw – uninterpreted

– Horribly big –thousands of pieces of information generated every minute

– Hard to exploit, understand, analyze

User Centric Activity Data

Users

Organisation

Website 1

Website 2

Website 3

Website 4

Logs 1

Logs 2

Logs 3

Logs 4

ConsolidationIntegration

Interpretation

Activity analysis for and by individual users

Ontologies

User support

User Logging or register

Display Activity Data related to all known settings of the user

Detect setting (agent+IP)

Check setting non-

ambiguous

It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your

account?

Add setting to known setting

Register setting as

ambiguous

known setting for user

unknown setting

ambiguousno

n-a

mb

igu

ou

s

yes

no

mathieuUser name:

******Password:

Your current setting is:

Computer IP: 137.108.2x.1xx

User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)

AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13

This setting is not currently attached to a user, so it will be added to your

known settings as you log into the system

PREFIX tr:<http://uciad.info/ontology/trace/>

PREFIX actor:<http://uciad.info/ontology/actor/>

construct {

?trace ?p ?x.

?x ?p2 ?x2.

?x2 ?p3 ?x3.

?x3 ?p4 ?x4

} where{

<http://uciad.info/actor/mathieu> actor:knownSetting

?set.

?trace tr:hasSetting ?set.

?trace ?p ?x.

?x ?p2 ?x2.

?x2 ?p3 ?x3.

?x3 ?p4 ?x4

}

User support

User Logging or register

Display Activity Data related to all known settings of the user

Detect setting (agent+IP)

Check setting non-

ambiguous

It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your

account?

Add setting to known setting

Register setting as

ambiguous

known setting for user

unknown setting

ambiguousno

n-a

mb

igu

ou

s

yes

no

for graph http://uciad.info/users/mathieu

Export

my data

<rdf:RDF>

<rdf:Description rdf:about="http://uciad.info/trace/kmi-

web13/ede2ab38da27695eec1e0b375f9b20da">

<rdf:type rdf:resource="http://uciad.info/ontology/trace/Trace"/>

<hasAction rdf:resource="http://uciad.info/action/GET"/>

<hasPageInvolved

rdf:resource="http://uciad.info/page/0b9abc62fcf90afc53797b938af435dd"/>

<hasResponse

rdf:resource="http://uciad.info/response/ea95add1414aba134ff9e0482b921a33"/>

<hasSetting

rdf:resource="http://uciad.info/actorsetting/119696ec92c5acec29397dc7ef98817f"/>

<hasTime

rdf:datatype="http://www.w3.org/2001/XMLSchema#string">13/Jun/2011:01:37:23+0100</hasTi

me>

</rdf:Description>

</rdf:RDF>

<rdf:Description rdf:about="http://uciad.info/page/0b9abc62fcf90afc53797b938af435dd">

<rdf:type rdf:resource="http://uciad.info/ontology/sitemap/WebPage"/>

<isPartOf rdf:resource="http://uciad.info/ontology/test1/dataopenacuk"/>

<onServer rdf:resource="http://kmi-web13.open.ac.uk"/>

<url rdf:datatype="http://www.w3.org/2001/XMLSchema#string">

/resource/person/ext-718a372e10788bb58d562a8bf6fb864e

</url>

</rdf:Description>

<rdf:Description rdf:about="http://uciad.info/ontology/test1/dataopenacuk">

<rdf:type rdf:resource="http://uciad.info/ontology/sitemap/Website"/>

<rdf:type rdf:resource="http://uciad.info/ontology/test1/LinkedDataPlatform"/>

<onServer rdf:resource="http://kmi-web13.open.ac.uk"/>

<urlPattern rdf:datatype="http://www.w3.org/2001/XMLSchema#string">/*</urlPattern>

</rdf:Description>

<rdf:Description rdf:about="http://uciad.info/response/ea95add1414aba134ff9e0482b921a33">

<rdf:type rdf:resource="http://uciad.info/ontology/trace/HTTPResponse"/>

<hasResponseCode rdf:resource="http://uciad.info/ontology/trace/200"/>

<hasSizeInBytes

rdf:datatype="http://www.w3.org/2001/XMLSchema#int">1085</hasSizeInBytes>

</rdf:Description>

Technical infrastructure

Server1 Server2 Server3

Application

Application

Log Log

Log Log

Log

Parser/RDF renderer

Parser/RDF renderer

Parser/RDF renderer

Parser/RDF renderer

Parser/RDF renderer

Daily RDF traces

Daily RDF traces

Daily RDF traces

Daily RDF traces

Daily RDF traces

Scheduler/Manager

Semantic Triple Store

Ontologies

Formal conceptual models of

a domain: online user

activity

Key Concepts:

– Actor: the things accessing

resources (through agents)

– Resources: Webpages,

Websites

– Actions: realized by actors

on resources, e.g., requests

– Events: an actor realizing an

action on a resource

Ontologies

User support

User Logging or register

Display Activity Data related to all known settings of the user

Detect setting (agent+IP)

Check setting non-

ambiguous

It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your

account?

Add setting to known setting

Register setting as

ambiguous

known setting for user

unknown setting

ambiguousno

n-a

mb

igu

ou

s

yes

no

Authenticated SPARQL

Protected SPARQL endpoint

SPARQL endpoint interface with authentication

Access right info:User->graphs

Query:Select ?xwhere {?x a uciad:Website}Credentials:User=mathieuPass=mypass

mathieu?

HTTP + basic auth

matgraphonto

Query:Select ?xFrom matgraph,ontowhere {?x a uciad:Website}

Standard SPARQL results

Customizing the Ontologies =

Customizing the Analysis

The User Activity Ontologies for the basis to describe generic activity data in a sharable way

Customized extensions:– Specific categories of

resources, actions and events

– Formally defined to allow inference

Create customized aggregations, classifications and distributions in the data that allow for specific analyses

Base Activity

Ontologies

User Activity Data

Inference

Specific Classifications, Distributions, Aggregations…

Examples

In the ontology:1. vhs-wiki is a Wiki

2. Data.open.ac.uk is a DataPlatform

3. Actions on a Page which is part-of a Wiki are called usingWiki

4. Similarly for usingDataPlatform

And…1. Activities usingAWiki with a user-

agent which is an RSS-Readerare checkingWikiUpdatesWithRSS

2. Otherwise, they are usingWikiThroughBrowser

Pages involved in

usingWikiThoughtBro

wser

S

u

b

-

c

l

a

s

s

e

s

o

f

u

s

Examples

In the Ontology1. The page

http://data.open.ac.uk/query is a SPARQLEndpoint

2. An action on a SPARQLEndpoint with a queryparameter is ExecutingASparqlQuery

3. Pages of the form http://data.open.ac.uk/page/*are DataPages

4. An action on a DataPage with a BrowserAgent is ConsultingADataPage

Sub-classes of

usingDataPlatform

Settings used in

executingASparqlQuery

The most used is curl on

the user’s laptop

Sub-classes of DataPages

consulted by the user

Browsing Interface: LDI

Class

Sub-

classes

with

distribution

of

instances

Properties

with

distribution

of Values

List of

members

(instances)

Details of a

member

(instance)

Conclusion

• The idea of the UCIAD project was to investigate

and experiment with the use of semantic

technologies for the user centric integration of

activity data

• Demonstrated the value of the approach, as well

as current technical limitations:

– Scalability

– Flexible Access-control

– Usability

Future Work/Next Steps

• User studies: what can people do with their

activity data? In which form?

• Scenarios for user centric activity data

– Project Danube, Higgins, Mydex, personal.com, …

with semantics?

• Licensing User Data?

Analyzing Search History

Web search history is known to provide interesting indications of the user’s interests. Using Open-Calais SemanticProxy (ht t p: / / ht t p: / / semant i cpr oxy. opencal ai s. com/ ), we detect general themesfrom the analysis of search keywords, directly pointing to additional resources. Also, we see pat-terns emerging from the use of search engines, in terms of navigational and informational searches.

Trust in Domains and Criticality of Data

A simple iterative model is defined to compute the perceived trust in websites (top), and the per-ceived criticality of personal data (bottom) based on observing the exchange of this data. Thesimple intuition on which we rely is that a trusted website receives critical data, and that critical datais shared only with a few trusted websites. Exposing this model to the user in an interactive way canhelp aligning the perceived behavior with the intended one, and detect possible conflicts betweendata exchange and personal privacy rules.

Basic Analytics

Number of requests per hour of the day (Sum). Allowsto identify events appearing on a typical day.

Map of the locations of the servers where requests havebeen sent. Allows to identify the physical space of Webinteractions.

Cloud of the most commonly access websites. Showsthe impact of ‘implicit’ requests.

49 different tools accessing the Web (User-Agent) canbe identified, including Web browsers, twitter clients, e-mail clients, update utilities, social applications, etc.

With more and more services relying on the Web to communicate

with their users, the amount of information exchanged daily by an

individual through various Web channels has become difficult to

control. While in principle this gives better possibilities to share

and exchange information with various people and organizations,

it also makes it more difficult for Web users to fully comprehend,

explore and exploit exchanges of their own data.

We developed a Web lifelogger, dedicated to tracking every ex-

changes realized over the Web by an individual Web user, and to

store these logs using semantic technologies. We ran an experi-

ment on using such a tool for a period of 2.5 months for a particular

user. The collected data (100M Triples) can be used by the user

to monitor and study his own online behavior based in particular

on basic analytics, models of the perceived trust relationship this

user has with different websites and on what can be learnt from

analyzing the use of Web search engines.

Mathieu d’Aquin, Salman Elahi and Enrico Motta – [email protected]

Personal Monitoring of Web Information Exchange:

Towards Web Lifelogging

Future Work/Next Steps

Our previous work on using

local proxy to collect

information on user

generated Web traffic…

… and linking this

information to web

resources…

… to create online personal

information/personal

analytics interfaces..

Re

so

urc

es

Pro

file

EntitiesSites

Languages

Time

Locations

Filters

Graph View

By MonthBy WeekBy Hour

English

French

German

Italian

24%

68%

5%

2%

Friday 14th October 2011 (number of requests) People

Organizations

Places

Other Keywords

Peter Scott Kurt Cobain Adele Ashley MacIsaac

Steve Jobs Bach Vincent Cassel Enrico MottaVirginia Woolf Terry Pratchett Jane Austen William

Gibson Neil Gaiman Martin Bean Nicolas

Sarkozy Fouad Zablith David Cameron

Marta Sabou Michael Jackson Jimi Hendrix Tim

Berners Lee Stuart Brown Carlo AlloccaScott Adams

British Broadcasting Coorporation The Gardian

The Open University Joint Information

Systems Committee Engineering and Physical Science

Resource Council Google Amazon La compagnie des

branques Facebook Arts and Humanities Research Council

Knowledge Media Institute Wikimedia Foundation

Agence National de la Recherche Apple EuropeanCommission

United Kingdom Euston Walton Hall France

Paris Luxembourg Heathrow Metz Nancy

Birmingham Coulsdon New York London Washington

Manchester Dublin Bonn Dusseldorf Rome Thionville

Chamonix Milton Keynes Mont Blanc England

Alderaan Nice Gare de l'Est Croydon Saint Pancras

Bletchley Luton

Education Semantics iPad Summer School Semantic Web Cajon Case-Based Reasoning

Artificial Intelligence Dataset PHP Data Mining School

University Educational Resources OpenLearn

SocialLearn Ontologies OWL Editor Journal

Conference Linked Data Teaching Music

Workshop iPhone Java Javascript Discovery RDF

Guitar Pirates

Re

so

urc

es

Pro

file

EntitiesSites

Languages

Time

Locations

Filters

Graph View

By MonthBy WeekBy Hour

English

Italian

German

French

14

%

86%

5%

Friday 14th October 2011 (number of requests) People

Organizations

Places

Other Keywords

Peter Scott Kurt Cobain Adele Ashley MacIsaac

Steve Jobs Bach Vincent Cassel Enrico MottaVirginia Woolf Terry Pratchett Jane Austen William

Gibson Neil Gaiman Martin Bean Nicolas

Sarkozy Fouad Zablith David Cameron

Marta Sabou Michael Jackson Jimi Hendrix Tim

Berners Lee Stuart Brown Carlo AlloccaScott Adams

British Broadcasting Coorporation The Gardian

The Open University Joint Information

Systems Committee Engineering and Physical Science

Resource Council Google Amazon La compagnie des

branques Facebook Arts and Humanities Research Council

Knowledge Media Institute Wikimedia Foundation

Agence National de la Recherche Apple EuropeanCommission

United Kingdom Euston Walton Hall France

Paris Luxembourg Heathrow Metz Nancy

Birmingham Coulsdon New York London Washington

Manchester Dublin Bonn Dusseldorf Rome Thionville

Chamonix Milton Keynes Mont Blanc England

Alderaan Nice Gare de l'Est Croydon Saint Pancras

Bletchley Luton

Education Semantics iPad Summer School Semantic Web Cajon Case-Based Reasoning

Artificial Intelligence Dataset PHP Data Mining School

University Educational Resources OpenLearn

SocialLearn Ontologies OWL Editor Journal

Conference Linked Data Teaching Music

Workshop iPhone Java Javascript Discovery RDF

Guitar Pirates

Enrico MottaProfessor at the Knowledge media Institute Relation to you: Colleague, Friend, Line Manager

More info

UCIAD Blog: http://uciad.info

Code base: http://github.com/uciad

Twitter: #uciad

@mdaquin