22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) Workshop L EHIGH U NIVERSITY.
-
Upload
bartholomew-hodge -
Category
Documents
-
view
213 -
download
0
Transcript of 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) Workshop L EHIGH U NIVERSITY.
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
LEHIGH
UNIVERSITY
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Introduction: Web Search Web search – the access to the Web for
hundreds of millions of people Hundreds of millions of queries per day
Queries + people = TRAFFIC
A HUGE incentive for web site owners to rank highly in search engine results Communicate some message
(advertising, political statement) Install viruses, adware, etc.
Yahoo!
MSN
Search
Ask
A9
Exalead
Gigablast
+
metasearch
+
many
more!
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Introduction: Web Spam a.k.a. search engine spam, spamdexing Any technique to manipulate search engine
results Target page gets an undeservedly higher
ranking
Many methods Link farms, keyword stuffing, cloaking, link
bombs, and more
The target of much of our work!
Propagating Trust and Distrust to Demote Web Spam
Baoning Wu, Vinay Goel, and Brian D. Davison
Computer Science & EngineeringLehigh UniversityBethlehem, PA USA
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Outline
Background and motivation Proposed methods Experimental results
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Background: PageRank (Page and Brin, 1998) Uses number and status of “parents” to
determine status of child r(i+1) = (1-α) * T * r(i) + α * s
r: PageRank score vector (with N nodes) T: transition matrix (NxN) (1-α): decay factor; α: jump probability s: uniform distribution of 1/N
PageRank score generates a ranking of importance of node
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Background: TrustRank (Gyongyi and Garcia-Molina, VLDB 2004) Uses number and trust of “parents” to
determine trust status of child t(i+1) = (1-α) * T * t(i) + α * s
t: TrustRank score vector (with N nodes) T: transition matrix (NxN) (1-α): decay factor s: seed set trust score distribution
Vector of size N, but only seed nodes are non-zero
Demotes web spam by propagating trust from a known good seed set.
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Specific Motivation In TrustRank
Parent divides its trust among its children. This may not be optimal – real-world trust
relationships are independent of the number of trusted entities.
Distrust can also be propagated.
A BHyperlink
Trust Propagation
Distrust Propagation
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Key steps in propagation
Decay of trust (d) Trust is not perfectly transitive.
Splitting of trust For each parent, how to divide its score
among its children.
Accumulation of trust For each child, how to accumulate the
overall score given the portions from all of its parents.
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Outline
Background and motivation Proposed methods Experimental results
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Choices for Trust Splitting
Given a node i with trust score TR(i) and O(i) outgoing links: Equal splitting
Gives d*TR(i)/O(i) to each child (used by TrustRank)
Constant splitting Gives d*TR(i) to each child
Logarithmic splitting Gives d*TR(i)/log(1+O(i)) to each child
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Choices for Trust Accumulation
Simple summation Sum the trust values from each parent
Maximum share Use the maximum of the trust values
sent by the parents
Maximum parent Sum the trust values but never exceed
the trust score of most-trusted parent
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Propagating Distrust
Distrust can be propagated from a seed set of bad nodes.
Similar to trust propagation, but in reverse – follow incoming links, not outgoing links
Same key choices for decay, splitting and accumulation
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Combining Trust and Distrust
For each node i, Trust score TR(i) and Distrust score DIS_TR(i), the combination score Total(i) can be
Total(i) = ŋ * TR(i) – ß * DIS_TR(i)
where 0 ≤ ŋ ≤ 1, 0 ≤ ß ≤ 1
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Outline
Background and motivation Proposed methods Experimental results
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Data set 20M pages from the Swiss search
engine [search.ch] in 2004 350K sites with “.ch” domain
We used only this site graph Seed sets
3,589 labeled sites as using web spam with various techniques (provided)
20,005 sites with pages in dir.search.ch topics as trusted set
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Experimental Design
Explore various combinations of trust and distrust propagation
Evaluation Performance of TrustRank is the number
of spam sites found among the highest-ranked ~1% of sites.
We use the same metric in this work.
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Baseline result
Algorithm Num. spam sites
PageRank 90
TrustRank 58
Topical TrustRank(Wu et al., WWW2006)
33-42
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
0
5
10
15
20
25
30
35
40
45
50
55
60
65
Jump probability
No.
of s
pam
site
s in
top
10 b
ucke
tsSimple TrustRank Improvement:
Increase jump probability (α)
(α)
defaultα=0.15
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Other trust propagation methods
Algorithm ConstantSplitting
LogarithmicSplitting
Decay= 0.1 0.3 0.7 0.9 0.1 0.3 0.7 0.9
Simple Summation
364 364 364 364 364 364 364 364
MaximumShare
34 34 34 34 13 12 20 18
MaximumParent
27 32 33 33 372 27 29 32
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Results of propagating distrustCombined equally with TrustRank, 200 seeds
AlgorithmConstantSplitting
LogarithmicSplitting
dDistrust = 0.1 0.3 0.7 0.9 0.1 0.3 0.7 0.9
Simple Summation
53 53 55 55 57 53 53 53
MaximumShare
53 53 53 53 59 53 52 52
MaximumParent
53 53 53 53 57 53 53 53
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Combining trust and distrust Using best scoring trust and distrust formulations, beta=(1-eta)
0
2
4
6
8
10
12
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Value of eta
Nu
mb
er o
f sp
am s
ites
in t
op
1.1
%
Trial 1
Trial 2
Trial 3
(Distrust Only) (Trust Only)
>2200
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Coverage of trust propagation
Algorithm ConstantSplitting
LogarithmicSplitting
Decay 0.1 0.3 0.7 0.9 0.1 0.3 0.7 0.9
MaximumShare
77.71 77.73 77.74 77.74 77.19 77.72 77.73 77.73
MaximumParent
77.52 77.71 77.73 77.74 76.93 77.60 77.71 77.72
Percentage of sites affected by approach. TrustRank reached 76.05%.
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Conclusions Propagating trust based on outdegree does
not appear to be optimal. Alternative splitting and accumulation
methods can help to demote top ranked spam sites.
Propagating distrust can also help to demote top ranked spam sites.
Additional tests needed! E.g., to examine impact on retrieval
22 May 2006Wu, Goel and Davison
Models of Trust for the Web (MTW)WWW2006 Workshop
Thank You!
Questions?
Contact Info:Dr. Brian D. Davisondavison(at)cse.lehigh.eduWUME LaboratoryComputer Science and EngineeringLehigh UniversityBethlehem, PA 18015 USA
The WUME Lab http://wume.cse.lehigh.edu/