Semantic Technologies to Support the User-Centric Analysis of Activity Data
-
Upload
mathieu-daquin -
Category
Technology
-
view
1.231 -
download
0
Transcript of Semantic Technologies to Support the User-Centric Analysis of Activity Data
Semantic Technologies to Support the User-Centric Analysis of Activity Data
Mathieu d’Aquin, Salman Elahi, Enrico Motta
Knowledge Media Institute, The Open University
User-centric Activity Data Analysis
Actor
Set of Events
(Trace)
Set of
Resources
Actions
realizesby
on
Challenges in user centric
activity data
• Activity data that sit in logs are – Heterogeneous –
different models for different sites/systems
– Raw – uninterpreted
– Horribly big –thousands of pieces of information generated every minute
– Hard to exploit, understand, analyze
User Centric Activity Data
Users
Organisation
Website 1
Website 2
Website 3
Website 4
Logs 1
Logs 2
Logs 3
Logs 4
ConsolidationIntegration
Interpretation
Activity analysis for and by individual users
Ontologies
User support
User Logging or register
Display Activity Data related to all known settings of the user
Detect setting (agent+IP)
Check setting non-
ambiguous
It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your
account?
Add setting to known setting
Register setting as
ambiguous
known setting for user
unknown setting
ambiguousno
n-a
mb
igu
ou
s
yes
no
mathieuUser name:
******Password:
Your current setting is:
Computer IP: 137.108.2x.1xx
User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US)
AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13
This setting is not currently attached to a user, so it will be added to your
known settings as you log into the system
PREFIX tr:<http://uciad.info/ontology/trace/>
PREFIX actor:<http://uciad.info/ontology/actor/>
construct {
?trace ?p ?x.
?x ?p2 ?x2.
?x2 ?p3 ?x3.
?x3 ?p4 ?x4
} where{
<http://uciad.info/actor/mathieu> actor:knownSetting
?set.
?trace tr:hasSetting ?set.
?trace ?p ?x.
?x ?p2 ?x2.
?x2 ?p3 ?x3.
?x3 ?p4 ?x4
}
User support
User Logging or register
Display Activity Data related to all known settings of the user
Detect setting (agent+IP)
Check setting non-
ambiguous
It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your
account?
Add setting to known setting
Register setting as
ambiguous
known setting for user
unknown setting
ambiguousno
n-a
mb
igu
ou
s
yes
no
for graph http://uciad.info/users/mathieu
Export
my data
<rdf:RDF>
<rdf:Description rdf:about="http://uciad.info/trace/kmi-
web13/ede2ab38da27695eec1e0b375f9b20da">
<rdf:type rdf:resource="http://uciad.info/ontology/trace/Trace"/>
<hasAction rdf:resource="http://uciad.info/action/GET"/>
<hasPageInvolved
rdf:resource="http://uciad.info/page/0b9abc62fcf90afc53797b938af435dd"/>
<hasResponse
rdf:resource="http://uciad.info/response/ea95add1414aba134ff9e0482b921a33"/>
<hasSetting
rdf:resource="http://uciad.info/actorsetting/119696ec92c5acec29397dc7ef98817f"/>
<hasTime
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">13/Jun/2011:01:37:23+0100</hasTi
me>
</rdf:Description>
</rdf:RDF>
<rdf:Description rdf:about="http://uciad.info/page/0b9abc62fcf90afc53797b938af435dd">
<rdf:type rdf:resource="http://uciad.info/ontology/sitemap/WebPage"/>
<isPartOf rdf:resource="http://uciad.info/ontology/test1/dataopenacuk"/>
<onServer rdf:resource="http://kmi-web13.open.ac.uk"/>
<url rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
/resource/person/ext-718a372e10788bb58d562a8bf6fb864e
</url>
</rdf:Description>
<rdf:Description rdf:about="http://uciad.info/ontology/test1/dataopenacuk">
<rdf:type rdf:resource="http://uciad.info/ontology/sitemap/Website"/>
<rdf:type rdf:resource="http://uciad.info/ontology/test1/LinkedDataPlatform"/>
<onServer rdf:resource="http://kmi-web13.open.ac.uk"/>
<urlPattern rdf:datatype="http://www.w3.org/2001/XMLSchema#string">/*</urlPattern>
</rdf:Description>
<rdf:Description rdf:about="http://uciad.info/response/ea95add1414aba134ff9e0482b921a33">
<rdf:type rdf:resource="http://uciad.info/ontology/trace/HTTPResponse"/>
<hasResponseCode rdf:resource="http://uciad.info/ontology/trace/200"/>
<hasSizeInBytes
rdf:datatype="http://www.w3.org/2001/XMLSchema#int">1085</hasSizeInBytes>
</rdf:Description>
Technical infrastructure
Server1 Server2 Server3
Application
Application
Log Log
Log Log
Log
Parser/RDF renderer
Parser/RDF renderer
Parser/RDF renderer
Parser/RDF renderer
Parser/RDF renderer
Daily RDF traces
Daily RDF traces
Daily RDF traces
Daily RDF traces
Daily RDF traces
Scheduler/Manager
Semantic Triple Store
Ontologies
Formal conceptual models of
a domain: online user
activity
Key Concepts:
– Actor: the things accessing
resources (through agents)
– Resources: Webpages,
Websites
– Actions: realized by actors
on resources, e.g., requests
– Events: an actor realizing an
action on a resource
User support
User Logging or register
Display Activity Data related to all known settings of the user
Detect setting (agent+IP)
Check setting non-
ambiguous
It is the first time you log into UCIAD with this setting (detail) do you want to attach it to your
account?
Add setting to known setting
Register setting as
ambiguous
known setting for user
unknown setting
ambiguousno
n-a
mb
igu
ou
s
yes
no
Authenticated SPARQL
Protected SPARQL endpoint
SPARQL endpoint interface with authentication
Access right info:User->graphs
Query:Select ?xwhere {?x a uciad:Website}Credentials:User=mathieuPass=mypass
mathieu?
HTTP + basic auth
matgraphonto
Query:Select ?xFrom matgraph,ontowhere {?x a uciad:Website}
Standard SPARQL results
Customizing the Ontologies =
Customizing the Analysis
The User Activity Ontologies for the basis to describe generic activity data in a sharable way
Customized extensions:– Specific categories of
resources, actions and events
– Formally defined to allow inference
Create customized aggregations, classifications and distributions in the data that allow for specific analyses
Base Activity
Ontologies
User Activity Data
Inference
Specific Classifications, Distributions, Aggregations…
Examples
In the ontology:1. vhs-wiki is a Wiki
2. Data.open.ac.uk is a DataPlatform
3. Actions on a Page which is part-of a Wiki are called usingWiki
4. Similarly for usingDataPlatform
And…1. Activities usingAWiki with a user-
agent which is an RSS-Readerare checkingWikiUpdatesWithRSS
2. Otherwise, they are usingWikiThroughBrowser
Pages involved in
usingWikiThoughtBro
wser
S
u
b
-
c
l
a
s
s
e
s
o
f
u
s
Examples
In the Ontology1. The page
http://data.open.ac.uk/query is a SPARQLEndpoint
2. An action on a SPARQLEndpoint with a queryparameter is ExecutingASparqlQuery
3. Pages of the form http://data.open.ac.uk/page/*are DataPages
4. An action on a DataPage with a BrowserAgent is ConsultingADataPage
Sub-classes of
usingDataPlatform
Settings used in
executingASparqlQuery
The most used is curl on
the user’s laptop
Sub-classes of DataPages
consulted by the user
Browsing Interface: LDI
Class
Sub-
classes
with
distribution
of
instances
Properties
with
distribution
of Values
List of
members
(instances)
Details of a
member
(instance)
Conclusion
• The idea of the UCIAD project was to investigate
and experiment with the use of semantic
technologies for the user centric integration of
activity data
• Demonstrated the value of the approach, as well
as current technical limitations:
– Scalability
– Flexible Access-control
– Usability
Future Work/Next Steps
• User studies: what can people do with their
activity data? In which form?
• Scenarios for user centric activity data
– Project Danube, Higgins, Mydex, personal.com, …
with semantics?
• Licensing User Data?
Analyzing Search History
Web search history is known to provide interesting indications of the user’s interests. Using Open-Calais SemanticProxy (ht t p: / / ht t p: / / semant i cpr oxy. opencal ai s. com/ ), we detect general themesfrom the analysis of search keywords, directly pointing to additional resources. Also, we see pat-terns emerging from the use of search engines, in terms of navigational and informational searches.
Trust in Domains and Criticality of Data
A simple iterative model is defined to compute the perceived trust in websites (top), and the per-ceived criticality of personal data (bottom) based on observing the exchange of this data. Thesimple intuition on which we rely is that a trusted website receives critical data, and that critical datais shared only with a few trusted websites. Exposing this model to the user in an interactive way canhelp aligning the perceived behavior with the intended one, and detect possible conflicts betweendata exchange and personal privacy rules.
Basic Analytics
Number of requests per hour of the day (Sum). Allowsto identify events appearing on a typical day.
Map of the locations of the servers where requests havebeen sent. Allows to identify the physical space of Webinteractions.
Cloud of the most commonly access websites. Showsthe impact of ‘implicit’ requests.
49 different tools accessing the Web (User-Agent) canbe identified, including Web browsers, twitter clients, e-mail clients, update utilities, social applications, etc.
With more and more services relying on the Web to communicate
with their users, the amount of information exchanged daily by an
individual through various Web channels has become difficult to
control. While in principle this gives better possibilities to share
and exchange information with various people and organizations,
it also makes it more difficult for Web users to fully comprehend,
explore and exploit exchanges of their own data.
We developed a Web lifelogger, dedicated to tracking every ex-
changes realized over the Web by an individual Web user, and to
store these logs using semantic technologies. We ran an experi-
ment on using such a tool for a period of 2.5 months for a particular
user. The collected data (100M Triples) can be used by the user
to monitor and study his own online behavior based in particular
on basic analytics, models of the perceived trust relationship this
user has with different websites and on what can be learnt from
analyzing the use of Web search engines.
Mathieu d’Aquin, Salman Elahi and Enrico Motta – [email protected]
Personal Monitoring of Web Information Exchange:
Towards Web Lifelogging
Future Work/Next Steps
Our previous work on using
local proxy to collect
information on user
generated Web traffic…
… and linking this
information to web
resources…
… to create online personal
information/personal
analytics interfaces..
…
Re
so
urc
es
Pro
file
EntitiesSites
Languages
Time
Locations
Filters
Graph View
By MonthBy WeekBy Hour
English
French
German
Italian
24%
68%
5%
2%
Friday 14th October 2011 (number of requests) People
Organizations
Places
Other Keywords
Peter Scott Kurt Cobain Adele Ashley MacIsaac
Steve Jobs Bach Vincent Cassel Enrico MottaVirginia Woolf Terry Pratchett Jane Austen William
Gibson Neil Gaiman Martin Bean Nicolas
Sarkozy Fouad Zablith David Cameron
Marta Sabou Michael Jackson Jimi Hendrix Tim
Berners Lee Stuart Brown Carlo AlloccaScott Adams
British Broadcasting Coorporation The Gardian
The Open University Joint Information
Systems Committee Engineering and Physical Science
Resource Council Google Amazon La compagnie des
branques Facebook Arts and Humanities Research Council
Knowledge Media Institute Wikimedia Foundation
Agence National de la Recherche Apple EuropeanCommission
United Kingdom Euston Walton Hall France
Paris Luxembourg Heathrow Metz Nancy
Birmingham Coulsdon New York London Washington
Manchester Dublin Bonn Dusseldorf Rome Thionville
Chamonix Milton Keynes Mont Blanc England
Alderaan Nice Gare de l'Est Croydon Saint Pancras
Bletchley Luton
Education Semantics iPad Summer School Semantic Web Cajon Case-Based Reasoning
Artificial Intelligence Dataset PHP Data Mining School
University Educational Resources OpenLearn
SocialLearn Ontologies OWL Editor Journal
Conference Linked Data Teaching Music
Workshop iPhone Java Javascript Discovery RDF
Guitar Pirates
Re
so
urc
es
Pro
file
EntitiesSites
Languages
Time
Locations
Filters
Graph View
By MonthBy WeekBy Hour
English
Italian
German
French
14
%
86%
5%
Friday 14th October 2011 (number of requests) People
Organizations
Places
Other Keywords
Peter Scott Kurt Cobain Adele Ashley MacIsaac
Steve Jobs Bach Vincent Cassel Enrico MottaVirginia Woolf Terry Pratchett Jane Austen William
Gibson Neil Gaiman Martin Bean Nicolas
Sarkozy Fouad Zablith David Cameron
Marta Sabou Michael Jackson Jimi Hendrix Tim
Berners Lee Stuart Brown Carlo AlloccaScott Adams
British Broadcasting Coorporation The Gardian
The Open University Joint Information
Systems Committee Engineering and Physical Science
Resource Council Google Amazon La compagnie des
branques Facebook Arts and Humanities Research Council
Knowledge Media Institute Wikimedia Foundation
Agence National de la Recherche Apple EuropeanCommission
United Kingdom Euston Walton Hall France
Paris Luxembourg Heathrow Metz Nancy
Birmingham Coulsdon New York London Washington
Manchester Dublin Bonn Dusseldorf Rome Thionville
Chamonix Milton Keynes Mont Blanc England
Alderaan Nice Gare de l'Est Croydon Saint Pancras
Bletchley Luton
Education Semantics iPad Summer School Semantic Web Cajon Case-Based Reasoning
Artificial Intelligence Dataset PHP Data Mining School
University Educational Resources OpenLearn
SocialLearn Ontologies OWL Editor Journal
Conference Linked Data Teaching Music
Workshop iPhone Java Javascript Discovery RDF
Guitar Pirates
Enrico MottaProfessor at the Knowledge media Institute Relation to you: Colleague, Friend, Line Manager