Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Post on 11-Aug-2014

546 views 6 download

description

 

Transcript of Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen Staabstaab@uni-koblenz.de

1WeST

Web Science & TechnologiesUniversity of Koblenz ▪ Landau, Germany

Modelling the Web Examples of Modelling Text, Knowledge Networks

and Physical-Social Systems

Steffen Staab

Steffen Staabstaab@uni-koblenz.de

2WeST

What do people want from the Web?

Web as storagelibrary

memory

Web as toolsearch

transaction

Web as social mediumcommunication

cooperation

Web as mirror of selfIdentification

outreach

Steffen Staabstaab@uni-koblenz.de

3WeST

What are some of the footprints people leave?

Steffen Staabstaab@uni-koblenz.de

4WeST

My Agenda in the Large

Web Content Discovering patterns Building tools Understanding

Web Interaction Monitoring Exploiting Guiding Understanding

Web Evolution Monitoring Predicting Guiding Understanding

Steffen Staabstaab@uni-koblenz.de

5WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Steffen Staabstaab@uni-koblenz.de

6WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Steffen Staabstaab@uni-koblenz.de

7WeST

Autocompletion of queries

„UK is“?

Steffen Staabstaab@uni-koblenz.de

8WeST

Language Models

What follows „UK is“?

Conditional probability:

where

Issue:Long word sequences can rarely be observed

Steffen Staabstaab@uni-koblenz.de

9WeST

Modified Kneser-Ney Smoothing of n-grams

If sequence is hard to observethen approximate recursively observing marginal frequencies of

......

Steffen Staabstaab@uni-koblenz.de

10WeST

Modified Kneser-Ney Smoothing of n-grams

If sequence is hard to observethen approximate recursively observing marginal frequencies of

First recursion step:

Problem:If last word in the sequnce is rare, the overall sequence will be rare,

then the approximation will be of low quality.

Steffen Staabstaab@uni-koblenz.de

11WeST

Generalized Language Models [ACL14]

If sequence is too hard to observe, then approximate based on marginal probabilities of

...

recursively.

Core idea of formal solution: Recursively applicable, commutative skip operators

Steffen Staabstaab@uni-koblenz.de

12WeST

Improvement of GLMs [ACL14]

Evaluation measure: Perplexity

Data set: English Wikipedia, different sample sizes

Relative improvement: 2,6% (most training data, smallest model) to13,9% (least training data, largest model)

Perplexity (normalized)

Steffen Staabstaab@uni-koblenz.de

13WeST

Outlook for Generalized Language Models Correcting mistakes that are done in all tools

Lack of appropriate models

Other operators („the wild black cat“) Delete: „the black cat“ Part-of-speech: „the adj adj cat“

Application: e.g. next word prediction

Other data structures Tree-like data Graph data

proposal for Google

current focus

Semantic Web

Steffen Staabstaab@uni-koblenz.de

14WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Steffen Staabstaab@uni-koblenz.de

15WeST

Evolution of Networks [ICWSM 2013]

Additions RemovalsTraining

Link Prediction Problem

Unlink Prediction Problem

Markov assumption:

history irrelevant

Steffen Staabstaab@uni-koblenz.de

16WeST

Related Work in Brief

Prediction feature f assigns a score to node pair (i, j) implies to be ranked above

• Link Prediction: edge likelier to be added• Unlink Prediction: edge likelier to be removed

f (i , j ) > f (i , k ) (i , j) (i , k )

Steffen Staabstaab@uni-koblenz.de

17WeST

Related Work in Brief

Static features degree common-neighbours path3 local-clustering-

coefficient/embeddedness ...

Prediction feature f assigns a score to node pair (i, j) implies to be ranked above

• Link Prediction: edge likelier to be added• Unlink Prediction: edge likelier to be removed

f (i , j ) > f (i , k ) (i , j) (i , k )

Steffen Staabstaab@uni-koblenz.de

18WeST

Unlink prediction is much more difficult than link prediction

The Snapshot View

Link and unlink prediction

(ICWSM 2013)

Steffen Staabstaab@uni-koblenz.de

19WeST

Related Work in Brief

Additions RemovalsTraining

Link Prediction Problem

Unlink Prediction Problem

Markov assumption:

history irrelevant

Advantage: General ModelDisadvantage: General Model

IdeaKeep generality,

improve prediction

Steffen Staabstaab@uni-koblenz.de

20WeST

Our Approach - 1

Additions RemovalsTraining

Link Prediction Problem

Unlink Prediction Problem

Markov assumption:

history irrelevant

Hypothesis: Temporal information generally improves prediction

Idea1 Nodes concerned2 Neighbourhood

Steffen Staabstaab@uni-koblenz.de

21WeST

Our Approach - 2

Dynamic features:+ recency+ longevity

Extrapolation for temporal preferential attachment:

Steffen Staabstaab@uni-koblenz.de

22WeST

Evaluation & Discussion (excerpt)

Temporal link prediction significantly better, but only sightly Temporal unlink prediction always significantly improved Temporal preferential attachment best

AUC baselinequalitativequantitativeextrapolation

Steffen Staabstaab@uni-koblenz.de

23WeST

Outlook for Evolution of Networks

Temporal dynamics still underexplored lack of datasets! next experiments:

• Twitter followers• Xing.de

Unlinks lead to link recommendation new Wikipedia link (reorganization of Wikipedia pages!) new job new friend

Steffen Staabstaab@uni-koblenz.de

24WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Steffen Staabstaab@uni-koblenz.de

25WeST

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

rice, fish

lobster, seafood, shrimp

coffee

coffee, wine

coffee

wine

wine

pizza, wine

pizza, wine

pasta, wine

pasta, shrimplobster, shrimp

seafood, shrimp

Tagged photos with geo-coordinates from Flickr

Steffen Staabstaab@uni-koblenz.de

26WeST

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

seafood, shrimp

lobster, shrimp

Tasks: Discovering topics, finding clusters

Steffen Staabstaab@uni-koblenz.de

27WeST

Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions

wikipedia.org

Challenge

Steffen Staabstaab@uni-koblenz.de

28WeST

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

A. Ahmed, L. Hong and A. Smola, 2013 (following (Yin et al 2011; Sizov 2010))

Existing approaches: Gaussian regions

Steffen Staabstaab@uni-koblenz.de

29WeST

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

MGTM 1: Global Topic Clustering

Steffen Staabstaab@uni-koblenz.de

30WeST

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

MGTM 2: Determining Neighbourhoods

Steffen Staabstaab@uni-koblenz.de

31WeST

Cluster adjacency Dependencies of document-specific topic distributions

Exchange of topic information between clusters

MGTM 3: Derived Topic Model

Steffen Staabstaab@uni-koblenz.de

32WeST

Exchange of topic information between clusters

MGTM 4: Exchange of Topic Information

Steffen Staabstaab@uni-koblenz.de

33WeST

Exchange of topic information between clusters

MGTM 4: Exchange of Topic Information

Steffen Staabstaab@uni-koblenz.de

34WeST

Exchange of topic information between clusters

MGTM 4: Exchange of Topic Information

Steffen Staabstaab@uni-koblenz.de

36WeST

Evaluation: Anectodal, Perplexity, Gaming

Gaming study: intrusion detection

Precision 8 topicsavg / median

LGTA 0.60 / 0.58

Basic model 0.64 / 0.58

MGTM 0.78 / 0.75

Steffen Staabstaab@uni-koblenz.de

37WeST

Outlook for LDA with structure

Texts + social network structures scientometry xing.de

Web pages + user visits chefkoch.de

Steffen Staabstaab@uni-koblenz.de

38WeST

Future: Knowledge about social aspects needed

Future: CS style models for social sciences

Steffen Staabstaab@uni-koblenz.de

39WeST

References[ACL14] R. Pickhardt, T. Gottron, M. Körner, P. G. Wagner, T. Speicher, S.

Staab. A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing. In: Proc. of ACL-2014 - The 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, June 22-27, 2014.

[WSDM14] C. Kling, J. Kunegis, S. Sizov, S. Staab. Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections. In: Proc. of the 7th ACM Conference on Web Search and Data Mining (WSDM2014), New York, US, February 24-28, 2014.

[ICWSM13] J.Preusse, J.Kunegis, M.Thimm, T.Gottron, S. Staab. Structural Changes in Collaborative Knowledge Networks. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM 2013), Boston, July 8-10, 2013.

Steffen Staabstaab@uni-koblenz.de

40WeST

Semantic Web

Social Web & Web Retrieval

Interactive Web & Human Computing

Web & Economy

Software & Services

Web Science & Technologies Team & Research

Computational Social Science

Thank You!

Steffen Staabstaab@uni-koblenz.de

41WeST

Maslows pyramid of needs