Icwsm Workshop Psychology Social Media Gosling Niederhoffer Slideshare
Computing & Information Sciences Kansas State University Boulder, Colorado First International...
-
Upload
octavia-jennifer-ross -
Category
Documents
-
view
214 -
download
0
Transcript of Computing & Information Sciences Kansas State University Boulder, Colorado First International...
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Structural Link Analysis from User Profiles and Friends Networks:
A Feature Construction Approach
William H. Hsu, Joseph Lancaster, Martin S. R. Paradesi, Tim Weninger
Monday, 26 March 2007
Laboratory for Knowledge Discovery in Databases
Kansas State University
http://www.kddresearch.org/KSU/CIS/ICWSM-20070326.ppt
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Link Analysis in Social Networks:The K-State Corpus
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Outline
Background, Related Work and Rationale
Technical Objective: Link Mining in Social Networks
Methodology: Graph Feature Extraction
Experimental Results: K-State LJMiner Corpus
Continuing Work: Statistical Relational Models
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Problem Definition
Given: records of users of weblog or social network service
Discover
Features of entities: users, communities
Relationships: friendship, membership, moderatorship
Explanations and predictions for relationships
Goals
Boost precision and recall of link existence prediction
Find relevant features
Significance: Recommendations (Friendship, Membership)
Problem Statement:Link Mining in Social Networks
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Related Work:Link Mining
Getoor and Diehl (2005) - Graphical model representations of link structure
Ketkar et al. (2005) - Data mining techniques vs graph-based representation
Sarkar & Moore (2005) - Change in link structure across discrete time steps
Popescul & Ungar (2003) - ER model to predict links
Hill (2003), Bhattacharya & Getoor (2004) – Statistical Relational Learning to resolve identity uncertainty
Resig et al. (2004) - Predicting IM online times using friends graph degree
McCallum et al. (2005) - Inferring roles and topic categories based on link analysis
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Rationale
Limitations of Current State of the Art
Do not take graph features into account
Limited ability to select, extract features
Novel Contribution: Link Mining System
Extracts, computes features of network model
Towards dependent types for relational link
mining
Rationale
Desired functionality: infer new links from old
Evaluation: precision, recall for link existence
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Outline
Background, Related Work and Rationale
Technical Objective: Link Mining in Social Networks
Methodology: Graph Feature Extraction
Experimental Results: K-State LJMiner Corpus
Continuing Work: Statistical Relational Models
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
K-State Test Bed:LJMiner Corpus
UserContact Info
UserInterest, Schools, Friends
CommunityMembership Info
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
LiveJournal Topology [1]:Tools and Security Model
LJMindMap.com© 2004 mcfnord
© 2007 Denga, Inc.
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
LiveJournal Topology [2]:Definitions
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Outline
Background, Related Work and Rationale
Technical Objective: Link Mining in Social Networks
Methodology: Graph Feature Extraction
Experimental Results: K-State LJMiner Corpus
Continuing Work: Statistical Relational Models
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Graph Features [1]:Node, Pair, Link-Dependent
u v
u vu v
u v
u v
Node-Dependent Features: specific to one node (vertex) within candidate pair
Indegree (u)“Source
popularity”
Outdegree (u)“Sourcefertility”
Outdegree (v)“Targetfertility”
Indegree (v)“Target
popularity”
Pair-Dependent Features: specific to one candidate pair of nodes (vertices)
Link-Dependent Features: specific to one link (edge) in directed graph
u vCommon entities:interests, friends, schools, etc.Attributes of common entities
Computed fromrelational query on entities u, v
Past, predicted durationDiagnosed cause
Computed and storedwith relationship set
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Graph Features [2]:Node and Pair Features in LJMiner
Graph Features Interest-Related Features
Computing & Information SciencesKansas State University
LJCrawler
System Design Data acquisition: client, injector, parser Ancillary issues
Multi-threadingDistributionStorage
Analytical postprocessing: LJClipper, LJStats Distinguishing features of LJCrawler Results
200 users/second maximum, 5 users/second allowed Approximately 2 million pages crawled
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Outline
Background, Related Work and Rationale
Technical Objective: Link Mining in Social Networks
Methodology: Graph Feature Extraction
Experimental Results: K-State LJMiner Corpus
Continuing Work: Statistical Relational Models
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Network Statistics:Graph Distance
1000 nodes 4000 nodes
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Interpretation of Results
941-node graph (Hsu et al., 2006): LJCrawler v1 output 1000-4000 node graphs: LJCrawler v2 output
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Outline
Background, Related Work and Rationale
Technical Objective: Link Mining in Social Networks
Methodology: Graph Feature Extraction
Experimental Results: K-State LJMiner Corpus
Continuing Work: Statistical Relational Models
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Results
Establishing an Interdisciplinary Research Initiative
K-State / KU / UNL collaboration
Resources: Linguistic Data Consortium
NIST evaluations
Involving End Users of Machine Translation
Document users
Machine learning, data mining, info extraction researchers
Novel Applications
Social networks and collaborative recommendation
Gisting and beyond
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Information Extraction and Intelligent IR Learning models for IE: ontologies
Latent semantic analysis
Machine Learning Natural language learning
Time series learning and understanding
Relational and first-order models
Automated Reasoning Probabilistic
Case-based and analogical
Data Mining and Warehousing
Grid Computing
Continuing Work
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
References
Knight, K. What’s New in Statistical Machine Translation. Invited Talk, International Joint Conference on Artificial Intelligence (IJCAI-2005), Edinburgh, UK, August, 2005.
Knight, K. & Graehl, J. (2005). An Overview of Probabilistic Tree Transducers for Natural Language Processing. In Proceedings of CICLing 2005, p. 1-24.
Chiang, D. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL 2005), p. 263–270.
Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical Phrase-Based Translation. In Proceedings of HLT-NAACL 2003, the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, May 27 - June 1, 2003, Edmonton, CANADA.
Computing & Information SciencesKansas State University
Boulder, ColoradoFirst International Conference onWeblogs And Social Media (ICWSM-2007)
Acknowledgements
K-State Lab for Knowledge Discovery in Databases Vikas Bahirwani
Tejaswi Pydimarri
Andrew King
Social Networks, Graph Theory, Graph Algorithms Kirsten Hildrum (IBM T. J. Watson Labs)
Todd Easton (K-State, Industrial and Manufacturing Systems Engineering)
Machine Learning Dan Roth, Cinda Heeren, Jiawei Han (University of Illinois
at Urbana-Champaign)
AnHai Doan (University of Wisconsin – Madison)