A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.
-
Upload
roy-flowers -
Category
Documents
-
view
214 -
download
0
Transcript of A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.
A Hybrid Search Engine -- Combining Google and P2P
Xuanhui Wang
What's wrong with ?
• unlikely to index everything that‘s of interest (deep web)
• infeasible to run expensive algorithms on 8 billion documents
• difficult to input human knowledge
Peer-to-peer searchApproach 0
• Each peer has a local crawler and index
• Nobody posts any information about local indices
• Search can only be done by (limited) flooding
• No way to know where to find information in advance
• Very low recall for unpopular queries Matrix
factorization
Relevant nerd
P2P Search
• Other methods have been proposed (see I. Weber 2004)
• What’s wrong?– Too complicated protocol to collaborate the p
eers– Too much data traffic and communication– Low speed
Hybrid—possible solution
• Combine Google and P2P together– Google indexes all the peer machine, but ho
w??– Each peer machine has an local index– When querying, Google selects the “appropria
te” peers and sends the query.– Finally, Google merges all the results together.
Hybrid—possible solution
• Benefits:– Efficient compared to P2P– May overcome Google’s drawback
• Challenge:– Google’s PageRank is benefited from its large scale o
f indexed documents, how to adapt to the hybrid system
– How does Google collaborate with peer machine? How can the peer machine benefit from Google’s PageRank?
• Funding this with $10M, do you agree?
References
• I. Weber et al (2004) Concept-based P2P Search http://www.mpi-sb.mpg.de/~iweber/peer-to-peer/Concept-based%20P2P%20Search.ppt
• Inspired by the discussion with Shui-Lung Chuang