Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

Post on 25-Feb-2016

55 views 0 download

Tags:

description

Mining Social Network Big Data. Intelligent. Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu. Research Dimensions. Networks. Intelligent Pervasive Data Access. Data. Mobility. Research Agenda. - PowerPoint PPT Presentation

Transcript of Wang-Chien Lee Pervasive Data Access ( i PDA ) Group Pennsylvania State University wlee@cse.psu.edu

Wang-Chien LeePervasive Data Access (iPDA) Group

Pennsylvania State University

wlee@cse.psu.edu

Mining Social Network Big DataIntelligent

2

Research Dimensions

Industry Day

IntelligentPervasive Data

Access

Networks

MobilityData

4/3/14

3

Research Agenda

Location-Based Services Road/Transportation Networks

Sensor Data Management Peer-to-Peer Data Management Wireless Data Broadcast and

Mobile Access Social Networks

Industry Day

Developing data management techniques for supporting complex services in networking and mobile environments

4/3/14

4Industry Day

Big Data Landscape

4/3/14

5

Social Media

4/3/14Industry Day

6

Location-based Social Networks

Important Aspacts Users (Social Network) Places (Locations) Who visits Where in form of

check-in & trajectory logs

4/3/14Industry Day

7

LBSN App.’s & Research Opp.’s LBSN users can track & share their locations and

relevant info. Collective social intelligence can be leveraged from user-

generated location data to enable novel applications. LBSN Applications

Suggesting the best restaurants, finding popular hiking routes, or forming a biking community.

Recommendation services for location, activity, trip planning, friends, etc.

Research opportunities Techniques for LBSN Apps, social network analysis, user

profiling, data management and mining, pervasive computing, etc, are urgently needed.

4/3/14Industry Day

8

Point-of-Interest Recommendation POI Recommendation

Helps a user to explore new POIs

Good for local business to gain customers

Where to have dinner tonight?

Requirements Interests, e.g., Seafood Geo-proximity, e.g,, not

too far away Real-time, i.e., time is

money 4/3/14Industry Day

9

Collaborative Filtering Treating POI as items

The idea is that users’ preference can be deduced by other users who exhibit similar visiting behaviors to POIs in previous check-in activities

Key issue is to find similar users and similar places/POIs effectively and efficiently.

4/3/14Industry Day

10

Social & Geo Influences POI recommendation in LBSN is more than a

problem of item recommendation Social Network

People may turn to friends for suggestion Geographical Proximity

Tobler’s First law of geography “Everything is related to everything else, but near things are more related than distant things”

People may go to places near home or office favored places

4/3/14Industry Day

11

Our approach Incorporate the following three factors:

User preference Social Influence from friends who has a role on user

activities. Geographical influence existing in user activities.

4/3/14Industry Day

User preference

Social Influence

Geo Influence DB

POI Recommendation System

Check in

12

Recommendation based on user preference i.e., Pure collaborative filtering (CF) approach User-POI matrix

User Preference

POI1 POI2 POI3 POI4 POI5

User1 X X X

User2 X X

User3 X X

User4 X X X

User5 X X

Users with similar preference

4/3/14Industry Day

13

Recommendation based on Social influence Social influenced CF

approach Similarity function considers

both the strength of social tie and check-in similarity …

Friend-POI matrix

Social Influence

POI1 POI2 POI3 POI4 POI5

User1 X X X

User2 X X

User4 X X X

POI1 POI2 POI3 POI4 POI5

User1 X X X

User2 X X

User3 X X

User4 X X X

User5 X X X

user1

user2

user3

user4

user5

4/3/14Industry Day

Social Influence Selection Model

User u picks a friend (f) which includes herself (i.e., f=u). Social influence.

User f generates a latent topic z. User preference.

Latent topic z generates item i and a descriptive word w.

Nov. 201314Industry Day

15

Phenomenon of spatial clustering in user’s check-ins

Geographical Influence

Let p1 and p2 denote two POIs, and d(p1,p2) be their distance, the probability is denoted by Pr[d(p1,p2)] How likely are two of a user’s check-in POIs in a given distance?

Power law

4/3/14Industry Day

16

Exploiting Geographical Influence for Recommendation

Geographical Influence

User I’s check-in history Pi={p1,p2…}

Which POI is the best candidate to explore?

p1 p2

p3 p4

p5

User iq1

q2

q3

Pr[q1|Pi] = ?Pr[q2|Pi] = ?Pr[q3|Pi] = ?

4/3/14Industry Day

17

Fusion Framework

User’s own preference

Social influence

Geographical influence

q1 (Su)

q2 (Ss)

q3 (Sg)

Fusion

q3q3 q2

q3 q1

q1 q2

q1 (S)

q2

4/3/14Industry Day

18

Tags can support:1) Location search2) Recommendation service3) Data cleaning4) …

32.00%

68.00%

Places missing tagsPlaces with

tags

The above shows statistics summarized from our dataset collected from Whrrl. Statistics in our Foursquare dataset is similar.

Semantic Annotation of Places

Tags are very useful! Tags are missing

4/3/14Industry Day

19

Problem Description

Given a database of user check-in logs <who, where, when> where some places are tagged, infer tags for the rest of places i.e., places with question mark in the above figure

How to automatically label appropriate tags on places is a very challenging issue!

Our approach is to reduce the place semantic annotation problem into a classification problem.

4/3/14Industry Day

20

How to learn the classifier for a tag (or tag type)?

Feature extraction is very important Features explicitly describing places Features implicitly correlating similar places (i.e.,

places with same/similar tags) Feature source?

The SAP Framework

Feature Extraction Component

Check-in logs

Place

Binary classifier for tag t1

Binary classifier for tag t2

Binary classifier for tag tm

Decision for t1

Decision for t2

Decision for tm

Classification Process:

check-in logsIndustry Day 4/3/14

21

What are the explicit patterns associated with individual places?

Explicit Patterns (EP) Extraction

EP Feature List

Total number of check-in

Total number of unique visitors

Maximum number of check-in of a single userDaily probability of check-in

Hourly probability of check-in

4/3/14Industry Day

22

Are places really correlated? If yes, how do we extract the IR between places?

Places checked in by the user at around the same time are probably in the same category

Implicit Relatedness (IR) Extraction

00:00

23:59

Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Day 8

Bars Bars

Bars

RestaurantRestaurant Restaurant

Restaurant

Restaurant

Restaurant

Restaurant Restaurant Restaurant

Restaurant

Shopping ShoppingShopping

Gym Health Beauty

Spa

?

Check-in log of a user.

Industry Day 4/3/14

23

Build an NRP by exploring the regularities in users-places and time-places interactions.

Network of Related Places (NRP)

Relatedness between places

Network of Related Places (NRP)

Users Places

Times Places

RandomWalkwith

Restart

4/3/14Industry Day

24

Label Propagation on NRPIR features:Tag 1 – score1Tag 2 – score2….Tag k – scorek

restaurant

restaurant

shopping

?

restaurant

restaurant

shopping

Label propagation

Restaurant 0.66Shopping 0.34

restaurant

restaurant

shopping

restaurant

restaurant

4/3/14Industry Day

25

LBSNs have received a lot of attention from the research community LBSN data have rich social and location information.

Novel applications can be developed from the rich user-generated data in LBSNs. We have incorporated social and geo influences with

collaborative filtering technique for POI recommendation. To address the semantic annotation problem in LBSNs,

we extract explicit pattern (EP) of individual places and implicit relatedness (IR) among places to classify the missing tags.

New applications and more research are forth coming.

Conclusion

4/3/14Industry Day

26

4/3/14Industry Day