Grafos Recomendación Basada en
Transcript of Grafos Recomendación Basada en
Recomendación Basada en Grafos
Denis Parra
IIC3633 – Sistemas Recomendadores
Problema de Recomendación
• Nuevamente revisitamos el problema de recomendación.
• Una alternativa válida a los métodos vistos hasta ahora es explotar las relaciones entre items en la forma de grafos.
Hoy
• Associative retrieval techniques to alleviate the sparsity problem in CF (Huang et al. 2004)
• The link Prediction Problem for Social Networks (Liben-Nowel, Kleinberg, 2002)
• SimRank: A Measure of Structural-Context Similarity (Jeh and Widom, 2002)
Paper 1
• Zan Huang, Hsinchun Chen, and Daniel Zeng. 2004. Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Trans. Inf. Syst. 22, 1 (January 2004), 116-142.
Resumen
• Lidiar con el problema de escasez de evaluaciones del usuario (ratings)
• Filtrado Colaborativo es estudiado como un grafo bi-partito.
• Técnicas de recuperación asociativa son utilizadas sobre el grafo (Spreading Activation)
• RESULTADO: Cuando hay escasez de ratings, estas técnicas basadas en grafos mejoran los resultado del filtrado colaborativo.
Recordatorio: Grafo bipartito
- Grafo cuyos nodos pueden ser separados en 2 conjuntos disjuntos e independientes
- Cada nodo en U está conectado con uno en V
El Problema de Escasez (Sparsity)
• Al 2004, los problemas de cold-start y new-item se habían atacado usando:– Item-Based CF (Sarwar 2001)
– Reducción de Dimensionalidad (Golderg 2001)
– Híbridos (Balanovic 2002, Basu 1998, Condliff 1999, etc.)
• Ninguno de los métodos mencionados había tenido consenso absoluto de su éxito
CF como Recuperación Asociativa
• Idea básica: construir un grafo entre usuarios e items y explorar asociaciones transitivas entre ellos.
CF como Recuperación Asociativa
• Idea básica: construir un grafo entre usuarios e items y explorar asociaciones transitivas entre ellos.
3 hops 3 hops 5 hops
Notación Matricial
• Consideremos la matriz consumidor/producto A
• Parámetros: M: hops, α= decaimiento (peso asociado al enlace
Ejemplo
• Dado A
• Luego, para M = 3, α= 0.5
• Luego, para M = 5, α= 0.5
Dificultades
• Calcular la potencia de una matriz puede ser muy costoso para un “c” y un “n” muy grandes, lo cual motiva los 3 métodos probados por Huang et al. en el paper.
Supuesto de la Investigación
• Los métodos de Spreading Activation funcionarán mejor cuando la red tiene muy baja densidad, en caso contrario puede ocurrir sobre-activación.
Modelos
• Constrained Leaky Capacitor Model (LCM)
• Branch-and-Bound
• Hopfield Net
LCM
• Propuesto por Anderson (1983)
• Pasos:– Identificar nodo-vector inicial V, setear D(0)
– Cálculo de nivel de activación
Donde (1-γ): speed of decay (0.8), α: efficiency (0.8)
– Condición de detención: en el paper = 10, top 50
Branch-and-Bound
• Implementación basada en (Chen & Ng 1995)• Paso 1, Inicialización: Nodo correspondiente al
usuario es activado (1), los otros = 0. Cola Q
priority se inicializa con nodo usuario activo.
• Paso 2, Cálculo de activación: Sacar nodos de Q
priority, por cada nodo vecino calcular
y agregar/actualizar nodo activado a Qoutput
• Paso 3, detención: determinada empiricamente (70)
Hopfield Net
• Paralelo con red neuronal. Usuarios e items son neuronas. Sinapsis son las activaciones.
• Inicialización: igual que las anteriores
• Cálculo de activación:
• Condición de detención:
Estudio Experimental
• Tienda de libros en línea de China
9,695 libros / 2,000 usuarios / 18,771 transacciones
• Métricas de evaluación:
Precision, Recall, F-1
• Y utility rank
h is the viewing halflife (the rank of the book on the list
such that there is a 50% chance the user will review that book)
Recordemos hipótesis• H1. Spreading activation-based CF can achieve higher
recommendation quality than the 3-hop, User-based (Correlation), User-based (Vector Similarity), and Item-based approaches.
• H2. Spreading activation-based CF can achieve higher recommendation quality than the 3-hop, User-based (Correlation), User-based (Vector Similarity), and Item-based approaches for new users (the cold-start problem).
• H3. The recommendation quality of spreading activation-based CF decreases when the density of user–item interactions is beyond a certain level (the over-activation effect).
Resultados
H1: Comparación de algoritmos bajo condiciones normales
H2: Comparación de algoritmos con usuarios sparse
Resultados 2
H2: Comparación de algoritmos en base a Recall, con usuarios cold-start
Resultados 3
Resultados 3.2
Computational Efficiency
Lecciones
• H1, H2 y H3 se demuestran
• Sensibilidad de los parámetros:– LCM: no es muy sensible (alfa, gama e iteraciones)
– BNB: diferencia en 70 y 100 iteraciones es baja, sobre 100 baja drásticamente
– Hopfield Net: poca diferencia entre parámetros
Paper 2
• Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7), 1019-1031.
El Problema
Definiciones
Imagen desde http://be.amazd.com/link-prediction/
Notación para arXiv de Física
Métricas 1: distancia en el grafo
Imagen desde http://be.amazd.com/link-prediction/
Vecinos en Común
Imagen desde http://be.amazd.com/link-prediction/
Jaccard
Imagen desde http://be.amazd.com/link-prediction/
Adamic-Adar
Preferential Attachment
Imagen desde http://be.amazd.com/link-prediction/
Katz
Espectrales/Random Walk
• Hitting Time
• Rooted Page Rank
• SimRank
Resultados
Resultados 2
Paper 3
• Glen Jeh and Jennifer Widom (2002). SimRank: A Measure of Structural-Context Similarity. ACM SIGKDD 2002.
Motivation• Many applications require a measure of
“similarity” between objects.▪ Web search
▪ Shopping Recommendations
▪ Search for “Related Works” among scientific papers
• But “similarity” may be domain-dependent.
• Can we define ageneric model forsimilarity?
Common Ground
• What do all these applications have in common?→ data set of objects linked by a set of relations.
• Then, a generic concept of similarity is structural-context similarity.▪ “Two objects are similar if the relate to similar
objects.”
• Recall automorphic equivalence:▪ “Two objects are equivalent if the relate to equivalent
objects.”
• Given a Graph G = (V, E), for each pair of vertices a,b ∈ V, compute a similarity (ranking) score s(a,b) based on the concept of structural-context similarity.
Problem Statement
• Directed Graph G = (V,E)
▪ V = set of objects
▪ E = set of unweighted edges
▪ Edge (u,v) exists if there is an relation u → v
▪ I(v) = set of in-neighbors of vertex v
▪ O(v) = set of out-neighbors of vertex v
Basic Graph Model
SimRank Similarity▶ Recursive Model
– “Two objects are similar if they are referenced by similar objects”
– That is, a ~ b if⚫ c → a and d → b, and ⚫ c ~ d
– An object is equivalent to itself (score = 1)
▶ Example1. ProfA ~ ProfB because both are
referenced by Univ.2. StudentA ~ StudentB because they
are referenced by similar nodes{ProfA,ProfB}
• s(a,b) = similarity between a and b= average similarity between
in-neighbors of a and in-neighbors of b▪ s(a,b) is in the range [0, 1]▪ If a=b, then s(a,b) = 1▪ If a≠b,
▪ C is a constant, 0 < C < 1▪ if I(a) or I(b) = ∅ , then s(a,b) = 0
Basic SimRank Equation
Decay Factor C• X is identical to itself:
s(x,x) = 1
• Since we have x→a and x→ b,should s(a,b) = 1 also?▪ If the graph represented all the information about x, a, and b,
then s(a,b) would ideally = 1.▪ But, in reality the graph does not describe everything about
them, so we expect s(a,b) < 1.
• Therefore, the constant C expresses our limited confidence or decay with distance:s(a,b) = C ∙ average similarity of (I(a), I(b))
x
b
a
• Given graph G, define G2=(V2, E2) where▪ V2=V x V. Each vertex in V2 is a pair of vertices in V.
▪ E2: (a,b)→(c,d) in G2 iff a→c and b→d in G
• Since similarity scores are symmetric, (a,b) and (b,a) are merged into a single vertex.
G2 Paired-Vertex Perspective
• SimRank score for a vertex (a,b) in G2
= similarity between a and b in G.
• The source of similarity is self-vertices, like (Univ, Univ).
• Then, similarity propagates along pair-paths in G2, away from the sources.
• Note that values decrease away from (Univ, Univ)
Source and Flow of Similarity
• Bipartite: 2 types of objects▪ Example: Buyers and Items
SimRank in Bipartite Domains
• Two types of similarity:
▪ Two buyers are similar if they buy the similar items
▪ Out-neighbors of buyers are relevant:
▪ Two items are similar if they are bought by similar buyers
▪ In-neighbors of items are relevant:
• In general, we can use I(.) and/or O(.) for any graph
Bipartite SimRank Equations
• Motivation: Two students A and B take the same courses: {Eng1, Math1, Chem1, Hist1}▪ SimRank compares each course of A with each course of B
▪ But intuitively we just want the best matching pairs:s(Eng1
A,Eng1
B), s(Math1
A,Math1
B) , etc.
• Solution: Two steps▪ Max: Pair each neighbor of A with only its most similar neighbor of B. Do the same
in the other direction:Min: Final s(A,B) is the smaller of s
A(A,B) and s
B(A,B) [weakest link]
MiniMax Variant
• Rk(a,b) = estimate of SimRank after k iterations. ▪ Initialization:
▪ Iteration:
▪ Rk(a,b) is the similarity that has flowed a distance k
away from the sources.R
k values are non-decreasing as k increases.
• We can prove that Rk(a,b) converges to s(a,b)
Computing SimRank
• Space complexity : O(n2) to store Rk(a,b)
• Time complexity : O(kn2d2), d
2 is the average of
|I(a)||I(b)| over all vertex pairs (a,b)
• To improve performance, we can prune G2:▪ Idea: vertices that are far apart should have very low
similarity. We can approximate it as 0.▪ Select a radius r. If vertex-pair (a,b) cannot meet in less
than r steps, remove it from the graph G2.▪ space complexity: O(nd
r)
▪ time complexity: O(Kndrd
2),
dr = avg. number of neighbors within radius r.
Time and Space Complexity
Random Surfer-Pairs Model
• SimRank s(a,b) measures how soon two random surfers are expected to meet at the same node if they start at nodes a and b and randomly walk the graph backwards
• Background: Basic Forward Random Walk▪ Motion is in discrete steps, using edges of the graph.▪ Each time step, there is an equal probability of moving from your current
vertex to one of your out-neighbors.▪ Given adjacency matrix A, the probability of walking from x to y is pxy =
axy/O(x).• Random Walk as a Markov Process
▪ Initial location is described by the prob. distribution vector π(0)
▪ Prob. of being at y at time 1:
• Given adjacency matrix A:
• The forward and backward transition matrices:
Random Walk Transition Matrices
Paired Backwards Random Walk
▶ Probability of walking backwards to x in one step:
▶ Two walkers meet at x if they start at a and b, and if one goes x ←a and the other goes x ←b, respectively.
sx(a,b) = P(meeting at x) = π(a,b) p(x←a) p(x←b)s(a,b) = P(meeting) = Σx π(a,b) p(x←a) p(x←b)
▶ If they start together, they have met,so s(0)
xy = 1 if i = j; 0 otherwise [identity matrix]
▶ Then
Experiments: Data Sets
• Two data sets▪ ResearchIndex (www.researchindex.com)
• a corpus of scientific research papers
• 688,898 cross-reference among 278,628 papers
▪ Student’s transcripts• 1030 undergraduate students in the School of
Engineering at Stanford University
• Each transcript lists all course that the student has taken so far (average: 40 courses/student)
Performance Validation Metric
• Problem: Difficult to know what is the “correct” similarity between items.
• Solution: Define a rough domain-specific metric σ(p,q):▪ For scientific papers, we have two versions:
σC(p,q) = fraction of q’s citations also cited by p σT(p,q) = fraction of words in q’s title also in p’s title▪ For university courses:
σD(p,q) = 1 if p, q are in the same department, else 0
• Run the similarity algorithms:▪ SimRank (naïve, pruned, minmax)
▪ Co-Citation
• For each object p and algorithm A, form a set topA,N
(p) of the N objects most similar to p.
• For each q ∈ topA,N(p), compute σ(p,q).
• Return the average σA,N
(p) over all q.
Computing the Performance Score
• Setup▪ Used bipartite SimRank, only considering in-neighbors
(validation uses out-neighbors)
▪ N ∈ {5, 10, …, 45, 50}
• Results▪ Not very sensitive to decay factors C
1 and C
2
▪ Pruning the search radius had little effort on rank order of scores.
Experiment: Scientific Papers
Results: Scientific Papers
• Setup▪ Bipartite domain
▪ N ∈ {5, 10}
• Results▪ Min-Max version of SimRank performed the best
▪ Not very sensitive to decay factors C1 and C2
Experiment: Students and Courses
Results: Students and Courses
Co-citation scores are very poor (=0.161 for N=5, and =0.147 for N=10), so are not shown in the graph.
Conclusions
• Defined a recursive model of structural similarity between objects in a network
• Mathematically formulated SimRank based on the recursive concept
• Presented a convergent algorithm to compute SimRank
• Described a random-walk interpretation of SimRank equations and scores
• Experimentally validated SimRank over two real data sets
Open Issues and Critique
• O(n2) is large; scalability needs to be improved.
• s(a,b) only includes contributions for paths when a and b are the same distance from some x.What if the distances are offset (total is odd)?
• As |I(a)| and |I(b)| increase, SimRank decreases, even if I(a) = I(b)!▪ Addressed partially by Minimax method
Referencias
• Zan Huang, Hsinchun Chen, and Daniel Zeng. 2004. Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering. ACM Trans. Inf. Syst. 22, 1 (January 2004), 116-142.
• Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American society for information science and technology, 58(7), 1019-1031.
• G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada,July 2002.
• Nguyen, P., Tomeo, P., Di Noia, T., & Di Sciascio, E. (2015, May). An evaluation of SimRank and Personalized PageRank to build a recommender system for the Web of Data. In Proceedings of the 24th International Conference on World Wide Web Companion (pp. 1477-1482). International World Wide Web Conferences Steering Committee.