A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

7
A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang

Transcript of A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

Page 1: A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

A Hybrid Search Engine -- Combining Google and P2P

Xuanhui Wang

Page 2: A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

What's wrong with ?

• unlikely to index everything that‘s of interest (deep web)

• infeasible to run expensive algorithms on 8 billion documents

• difficult to input human knowledge

Page 3: A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

Peer-to-peer searchApproach 0

• Each peer has a local crawler and index

• Nobody posts any information about local indices

• Search can only be done by (limited) flooding

• No way to know where to find information in advance

• Very low recall for unpopular queries Matrix

factorization

Relevant nerd

Page 4: A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

P2P Search

• Other methods have been proposed (see I. Weber 2004)

• What’s wrong?– Too complicated protocol to collaborate the p

eers– Too much data traffic and communication– Low speed

Page 5: A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

Hybrid—possible solution

• Combine Google and P2P together– Google indexes all the peer machine, but ho

w??– Each peer machine has an local index– When querying, Google selects the “appropria

te” peers and sends the query.– Finally, Google merges all the results together.

Page 6: A Hybrid Search Engine -- Combining Google and P2P Xuanhui Wang.

Hybrid—possible solution

• Benefits:– Efficient compared to P2P– May overcome Google’s drawback

• Challenge:– Google’s PageRank is benefited from its large scale o

f indexed documents, how to adapt to the hybrid system

– How does Google collaborate with peer machine? How can the peer machine benefit from Google’s PageRank?

• Funding this with $10M, do you agree?