A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

A Hybrid Search Engine -- Combining Google and P2P

Xuanhui Wang

What's wrong with ?

• unlikely to index everything that‘s of interest (deep web)

• infeasible to run expensive algorithms on 8 billion documents

• difficult to input human knowledge

Peer-to-peer searchApproach 0

• Each peer has a local crawler and index

• Nobody posts any information about local indices

• Search can only be done by (limited) flooding

• No way to know where to find information in advance

• Very low recall for unpopular queries Matrix

factorization

Relevant nerd

P2P Search

• Other methods have been proposed (see I. Weber 2004)

• What’s wrong?– Too complicated protocol to collaborate the p

eers– Too much data traffic and communication– Low speed

Hybrid—possible solution

• Combine Google and P2P together– Google indexes all the peer machine, but ho

w??– Each peer machine has an local index– When querying, Google selects the “appropria

te” peers and sends the query.– Finally, Google merges all the results together.

Hybrid—possible solution

• Benefits:– Efficient compared to P2P– May overcome Google’s drawback

• Challenge:– Google’s PageRank is benefited from its large scale o

f indexed documents, how to adapt to the hybrid system

– How does Google collaborate with peer machine? How can the peer machine benefit from Google’s PageRank?

• Funding this with $10M, do you agree?

References

• I. Weber et al (2004) Concept-based P2P Search http://www.mpi-sb.mpg.de/~iweber/peer-to-peer/Concept-based%20P2P%20Search.ppt

• Inspired by the discussion with Shui-Lung Chuang

http://www.mpi-sb.mpg.de/~iweber/peer-to-peer/Concept-based%20P2P%20Search.ppt




A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

Documents

Transcript of A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.