A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic...
-
Upload
richard-preston -
Category
Documents
-
view
212 -
download
1
Transcript of A Graph-Based Approach to Link Prediction in Social Networks Using a Pareto-Optimal Genetic...
A Graph-Based Approach to Link
Prediction in Social Networks Using a Pareto-Optimal
Genetic Algorithm
Jeff NaruchitparamesUniversity of Nevada, Reno - CSE
CS 790: Complex Networks, Fall 2010
biologicalsocial
2
3
4
‣ Social networks =
‣ Dynamic, judgmental environment
‣ Affect friendships over time
5
very dynamicheterogeneous
6
7
‣ 1-2 hop distance only
‣ Friend-of-friend
‣ Multiple hops; >1
‣ Structural; purely graph-based
‣ No explicit correlation between potential friends...
8
‣ Silva, et. al.,‣ A Graph-based Recommendation System Using Genetic
Algorithms, 2010
9
10
11
Friends-of-Friends
2 hops
Filter Order
12
Filtering
“It’s more probable that you know a friend of your friend than any other random person”
Mitchell M., Complex Systems: Network Thinking, 2006.
13
14
15
Indexes
16
‣ Heterogeneity
‣ Human behavior and preferences
‣ Multiple hops
17
What’s missing?
Pretty much a filtering problem...
18
My approach
‣ Components (for filtering)
‣ Betweenness centrality
‣ Community detection
‣ Clique Percolation Method (CPM)
‣ Friends of friends
‣ 10-dimensional Pareto-optimal genetic algorithm
19
My approach
Betweenness Centrality
20
Community Detection
21
‣ Remove duplicates
‣ Remove our test cases
‣ (More on this later...)
22
The Genetic Algorithm Part
23
Pareto Fronts
24
The Features
1. # of shared friends
2. location
3. age range
4. general interest
5. music
6. attended same events
7. groups
8. movies
9. education
10.religion/politics
25
Pareto Optimality
‣ Localized to implementation of selection
‣ Feature subset selection
‣ We want to find the best combination of these subsets that can give us the best solutions for how we determine friendships
26
Pareto Optimality and Feature Subset Selection
27
FF11 FF22 FF33 FF44 FF55 FF66 FF77 FF88 FF99 FF1010
CC11 00 11 00 11 00 00 00 11 11 00
CC22 11 11 00 00 00 11 00 11 00 11
..
..
..
CCnn 00 00 00 00 11 00 00 11 00 00
A Point System
28
FF11 FF22 FF33 FF44 FF55 FF66 FF77 FF88 FF99 FF1010
UU11 -- 33 -- 1111 -- -- -- 2020 4444 --
UU22 -- 11 -- 1313 -- -- -- 3131 99 --
..
..
..
UUnn -- 1010 -- 1414 -- -- -- 4949 6161 --
Pareto Optimality
‣ Compare with the test cases we removed earlier...
‣ For all chromosomes in population, do:
‣ If ALL test cases ≥ optimal Pareto front
‣ Calculate fitness
‣ Good to go
‣ Else
‣ Calculate fitness
‣ Continue onto next chromosome
29
Fitness Function
∑ ∑ pi ln( fj )pi-1
30
n 10
i=1 j=1
Continuing on with the Evolutionary
Process
‣ Apply fitness proportional selection
‣ Randomly select 2 parents to mate
‣ Apply 1-point crossover (82% chance)
‣ Bit mutation (0.05% chance)
‣ Do this until ALL test cases better than Pareto front OR fitness does not improve for 5 consecutive generations
31
1-Point Crossover
32
‣ Complex network theory + Genetic algorithm + social theory
‣ Betweenness centrality
‣ Community detection
‣ Clique Percolation Method
‣ Binary 10-dimensional Pareto-optimal genetic algorithm
‣ Dominant, fitness proportional selection
‣ Several levels of filtering and selection (aka filtering ☺)
33
Conclusion
‣ Better fitness function (need to ask Sociologists)
‣ Weighted chromosome for Pareto optimization (as opposed to binary)
‣ Prove all this stuff actually works (sociology standpoint??)
‣ Parallelize or GPU-ize the code (it’s in Python)
34
Future Work
35