Computing & Information Sciences Kansas State University Boulder, Colorado First International...

23
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural Link Analysis from User Profiles and Friends Networks: A Feature Construction Approach William H. Hsu, Joseph Lancaster, Martin S. R. Paradesi, Tim Weninger Monday, 26 March 2007 Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org/KSU/CIS/ICWSM-20070326.ppt

Transcript of Computing & Information Sciences Kansas State University Boulder, Colorado First International...

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Structural Link Analysis from User Profiles and Friends Networks:

A Feature Construction Approach

William H. Hsu, Joseph Lancaster, Martin S. R. Paradesi, Tim Weninger

Monday, 26 March 2007

Laboratory for Knowledge Discovery in Databases

Kansas State University

http://www.kddresearch.org/KSU/CIS/ICWSM-20070326.ppt

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Link Analysis in Social Networks:The K-State Corpus

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Outline

Background, Related Work and Rationale

Technical Objective: Link Mining in Social Networks

Methodology: Graph Feature Extraction

Experimental Results: K-State LJMiner Corpus

Continuing Work: Statistical Relational Models

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Problem Definition

Given: records of users of weblog or social network service

Discover

Features of entities: users, communities

Relationships: friendship, membership, moderatorship

Explanations and predictions for relationships

Goals

Boost precision and recall of link existence prediction

Find relevant features

Significance: Recommendations (Friendship, Membership)

Problem Statement:Link Mining in Social Networks

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Related Work:Link Mining

Getoor and Diehl (2005) - Graphical model representations of link structure

Ketkar et al. (2005) - Data mining techniques vs graph-based representation

Sarkar & Moore (2005) - Change in link structure across discrete time steps

Popescul & Ungar (2003) - ER model to predict links

Hill (2003), Bhattacharya & Getoor (2004) – Statistical Relational Learning to resolve identity uncertainty

Resig et al. (2004) - Predicting IM online times using friends graph degree

McCallum et al. (2005) - Inferring roles and topic categories based on link analysis

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Rationale

Limitations of Current State of the Art

Do not take graph features into account

Limited ability to select, extract features

Novel Contribution: Link Mining System

Extracts, computes features of network model

Towards dependent types for relational link

mining

Rationale

Desired functionality: infer new links from old

Evaluation: precision, recall for link existence

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Outline

Background, Related Work and Rationale

Technical Objective: Link Mining in Social Networks

Methodology: Graph Feature Extraction

Experimental Results: K-State LJMiner Corpus

Continuing Work: Statistical Relational Models

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

K-State Test Bed:LJMiner Corpus

UserContact Info

UserInterest, Schools, Friends

CommunityMembership Info

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

LiveJournal Topology [1]:Tools and Security Model

LJMindMap.com© 2004 mcfnord

© 2007 Denga, Inc.

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

LiveJournal Topology [2]:Definitions

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Outline

Background, Related Work and Rationale

Technical Objective: Link Mining in Social Networks

Methodology: Graph Feature Extraction

Experimental Results: K-State LJMiner Corpus

Continuing Work: Statistical Relational Models

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Graph Features [1]:Node, Pair, Link-Dependent

u v

u vu v

u v

u v

Node-Dependent Features: specific to one node (vertex) within candidate pair

Indegree (u)“Source

popularity”

Outdegree (u)“Sourcefertility”

Outdegree (v)“Targetfertility”

Indegree (v)“Target

popularity”

Pair-Dependent Features: specific to one candidate pair of nodes (vertices)

Link-Dependent Features: specific to one link (edge) in directed graph

u vCommon entities:interests, friends, schools, etc.Attributes of common entities

Computed fromrelational query on entities u, v

Past, predicted durationDiagnosed cause

Computed and storedwith relationship set

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Graph Features [2]:Node and Pair Features in LJMiner

Graph Features Interest-Related Features

Computing & Information SciencesKansas State University

LJCrawler

System Design Data acquisition: client, injector, parser Ancillary issues

Multi-threadingDistributionStorage

Analytical postprocessing: LJClipper, LJStats Distinguishing features of LJCrawler Results

200 users/second maximum, 5 users/second allowed Approximately 2 million pages crawled

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Outline

Background, Related Work and Rationale

Technical Objective: Link Mining in Social Networks

Methodology: Graph Feature Extraction

Experimental Results: K-State LJMiner Corpus

Continuing Work: Statistical Relational Models

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Network Statistics:Graph Distance

1000 nodes 4000 nodes

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Interpretation of Results

941-node graph (Hsu et al., 2006): LJCrawler v1 output 1000-4000 node graphs: LJCrawler v2 output

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Outline

Background, Related Work and Rationale

Technical Objective: Link Mining in Social Networks

Methodology: Graph Feature Extraction

Experimental Results: K-State LJMiner Corpus

Continuing Work: Statistical Relational Models

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Results

Establishing an Interdisciplinary Research Initiative

K-State / KU / UNL collaboration

Resources: Linguistic Data Consortium

NIST evaluations

Involving End Users of Machine Translation

Document users

Machine learning, data mining, info extraction researchers

Novel Applications

Social networks and collaborative recommendation

Gisting and beyond

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Information Extraction and Intelligent IR Learning models for IE: ontologies

Latent semantic analysis

Machine Learning Natural language learning

Time series learning and understanding

Relational and first-order models

Automated Reasoning Probabilistic

Case-based and analogical

Data Mining and Warehousing

Grid Computing

Continuing Work

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

References

Knight, K. What’s New in Statistical Machine Translation. Invited Talk, International Joint Conference on Artificial Intelligence (IJCAI-2005), Edinburgh, UK, August, 2005.

Knight, K. & Graehl, J. (2005). An Overview of Probabilistic Tree Transducers for Natural Language Processing. In Proceedings of CICLing 2005, p. 1-24.

Chiang, D. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL 2005), p. 263–270.

Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical Phrase-Based Translation. In Proceedings of HLT-NAACL 2003, the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, May 27 - June 1, 2003, Edmonton, CANADA.

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Acknowledgements

K-State Lab for Knowledge Discovery in Databases Vikas Bahirwani

Tejaswi Pydimarri

Andrew King

Social Networks, Graph Theory, Graph Algorithms Kirsten Hildrum (IBM T. J. Watson Labs)

Todd Easton (K-State, Industrial and Manufacturing Systems Engineering)

Machine Learning Dan Roth, Cinda Heeren, Jiawei Han (University of Illinois

at Urbana-Champaign)

AnHai Doan (University of Wisconsin – Madison)

Computing & Information SciencesKansas State University

Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)

Questions and Discussion