Knowledge Acquisition from Social Awareness Streams

21
The Wisdom of the Tweets Knowledge Acquisition from Social Awareness Streams Claudia Wagner [email protected] a TRADITION of INNOVATION (Supervisor: Markus Strohmaier)

description

Slides from PhD symposiumat ESWC2010 (http://www.eswc2010.org/program-menu/phd-symposium)

Transcript of Knowledge Acquisition from Social Awareness Streams

Page 1: Knowledge Acquisition from Social Awareness Streams

The Wisdom of the Tweets

Knowledge Acquisition from

Social Awareness Streams

Claudia Wagner [email protected]

a TRADITION of INNOVATION

(Supervisor: Markus Strohmaier)

Page 2: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

2

Social Awareness Streams (SAS)

Short, natural-language messages created by users

Broadcasted

Information consumption is driven by social networks

Applications such as Twitter or Facebook

[Naaman, 2010]

Page 3: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

3

What‘s the knowledge of SAS?

Users

Web Resources

Real-World Happenings

Natural Language Constructs

http://www.flickr.com/photos/matthewfield/2306001896/

What‘s the knowledge of SAS?

Users

Web Resources

Natural Language Constructs

Real-World Happenings

http://www.flickr.com/photos/waldoj/722508166/

clauwa
users, their expertise, their interests, their influence --> we may use SAS to create ontological user profiles --> ontology of competenciesweb resources, the content of web resources or the popularity of web resourcesmeaning of NL constructs or community-specific vocabularyThings which are happening in the real-word, e.g. specific events such as this conference or more general happenings such as flu epidemics, birth or marriages.
Page 4: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

4

Proposed Approach

Aim What kind of knowledge is contained in SAS How we can acquire knowledge from SAS Which factors influence knowledge acquisition results

Method Develop a SAS Analyzer system Controlled experiments

clauwa
The aim of my PhD research is to explore what kind of knowledge can we acquire from SAS and how can we acuire it and which factors influence what we can can acquire.To achieve this aim I started developing a SAS analyzer system which will produce ontological constructs for given input labels and help me to conduct controlled experiments to explore how different variables influence what we can acquire.The expected contribution of my PhD research is to gain knowledge about the nature of SAS, to what extend we can use these streams for KD and which factors influence the KD results.
Page 5: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

5

Research Questions

Do ontological structures emerge from SAS?

Which factors influence their emergence? Stream aggregation/sampling strategies? Semantic enrichment strategies? Knowledge acquisition methods and algorithms?

clauwa
Stream aggregation or sampling strategies: should we only use RT messages or messages which contain links or should we only use messages produced by certain users (which show certain tweeting pragmatics)Does semantic enrichment strategies influence acquired semantic models? Whats the impact of incooperating external knowledge?How do different KD methods influence acquired models? How does our method and algorithm choice influence what we can observce?
Page 6: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

6

SAS Analyzer

wac
we represent SAS as multiple 3rd order tensors (=generalization of a matrix, vector and scalar) with the following dimensions: users, messages, resources (=all items contained in messages except users),.--------------------In addition, each dimensions can be qualified according to the exact type of node or relation.e.g. a user can be qualified according to the different ways he can be related with a message or with resources (such as is-author-of or is-target-of or is-origin-of);a resource can be qualified according to their type (e.g. hahstag, URL or keyword) or the different ways it is used in messages or by users always in the end, only used in RT messages, only used by experts)a message can be qualified according to their type (public broadcasted meg, private conversational messages, informational messages, retweeted messages)different types of user-relations (such as is-friend-of, is-conversation-partner, is-),different types of resource relations ()
Page 7: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

7

Controlled Experiments (1)

Do ontological structures emerge from SAS?

ground truth ontologyrandomly sample

Input labels

Compare (e.g., via Precision, Recall, RLA [Maedche et al, 2000] )

wac
we represent SAS as multiple 3rd order tensors (=generalization of a matrix, vector and scalar) with the following dimensions: users, messages, resources (=all items contained in messages except users),.--------------------In addition, each dimensions can be qualified according to the exact type of node or relation.e.g. a user can be qualified according to the different ways he can be related with a message or with resources (such as is-author-of or is-target-of or is-origin-of);a resource can be qualified according to their type (e.g. hahstag, URL or keyword) or the different ways it is used in messages or by users always in the end, only used in RT messages, only used by experts)a message can be qualified according to their type (public broadcasted meg, private conversational messages, informational messages, retweeted messages)different types of user-relations (such as is-friend-of, is-conversation-partner, is-),different types of resource relations ()
Page 8: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

8

Controlled Experiments (2)

Which factors influence emerging semantics?

ground truth ontologyrandomly sample

Input labels

vary variables

Compare (e.g., via Precision, Recall, RLA [Maedche et al, 2000] )

wac
we represent SAS as multiple 3rd order tensors (=generalization of a matrix, vector and scalar) with the following dimensions: users, messages, resources (=all items contained in messages except users),.--------------------In addition, each dimensions can be qualified according to the exact type of node or relation.e.g. a user can be qualified according to the different ways he can be related with a message or with resources (such as is-author-of or is-target-of or is-origin-of);a resource can be qualified according to their type (e.g. hahstag, URL or keyword) or the different ways it is used in messages or by users always in the end, only used in RT messages, only used by experts)a message can be qualified according to their type (public broadcasted meg, private conversational messages, informational messages, retweeted messages)different types of user-relations (such as is-friend-of, is-conversation-partner, is-),different types of resource relations ()
Page 9: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

9

Preliminary Results

Network-theoretic Model of SAS

Structural Stream Measures

First Experiment on acquiring latent conceptual structures from SAS

Page 10: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

10

A network-theoretic model of SAS

A Social Awareness Stream is a tupel

U, M and R are finite sets whose elements are called users, messages and resources

q1, q2, q3 are qualifiers

Y is a ternary relation

ft is a function

fl is a function

),,,,,( 321 flftYRMUS qqq

),(: longlatYfl

Page 11: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

11

Example

Page 12: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

12

Structural Stream Measures (1)

Page 13: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

13

Structural Stream Measures (2)

Social Diversity How many different users participate in a stream? Social variety:

How balanced are their participations? Social balance:

Page 14: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

14

Experiment

Aim Can we observe emerging semantics from SAS?

Method Input: topic of interest, in our case „semanticweb“ 4 different stream aggregations 3-mode networks (users, resources and messages) Network transformations (projections) to obtain lower-

order networks of resources Output: weighted resource networks Manual evaluation

Page 15: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

15

Dataset

4 different stream aggregations from Twitter

Same topic Hashtag stream: #semanticweb Keyword stream: semanticweb and semweb User list stream: semweb user list from twitter user sclopit User directory stream: wefollow semanticweb directory

Same time interval 2 time intervals: 16th of Dec 2009 - 20th of Dec 2009 and

29th of Dec 2009 - 1st of Jan 2010

Page 16: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

16

Network Transformations

co-occurence

context

[Harris, 1954]

[Mika, 2007]

communities

Page 17: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

17

First Results (1)

Hashtag StreamOR(RUa)S(Rh)

User List StreamOR(RUa)S(RUL)

Type of stream aggregations influence emerging semantics Hashtag stream aggregations are more robust against

external disturbances than user list streams

Page 18: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

18

First Results (2)

Type of network transformation influence emerging semantics Hashtags seem to be good context indicators Resource-hashtag networks reveal good latent

conceptual structures

Page 19: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

19

Limitations

Small Dataset

Only one topic/domain

Manual Evaluation

Page 20: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

20

ReferencesZ. Harris. Distributional structure. The Structure of Language:

Readings in the philosophy of language,10:146-162, 1954.

A. Maedche and S. Staab. Discovering Conceptual Relations from Text. In: W.Horn (ed.): ECAI 2000. In Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Amsterdam, 2000.

P. Mika. Ontologies are us: A unified model of social networks and semantics. Web Semantics, 5(1):5-15, 2007.

M. Naaman, J. Boase, and C.-H. Lai. Is it all about me? user content in social awareness streams. In Proceedings of the ACM 2010 conference on Computer supported cooperative work, 2010.

Page 21: Knowledge Acquisition from Social Awareness Streams

ISO

9001

cert

ified

a TRADITION of INNOVATION

10 A

pril

2023

21

Thank you!

http://clauwa.info/me

[email protected]

http://twitter.com/clauwa