Building a Mini-Google: High-Performance Computing in Ruby

72
Building Mini-Google in Ruby @igrigorik #railsconf http://bit.ly/railsconf-pagerank Building Mini-Google in Ruby Ilya Grigorik @igrigorik

description

Let's build a mini-Google and compute the PageRank score for a 1-million page web – that's a non-trivial challenge! High performance computing may not be Ruby's strength, but we will investigate the available gems, tools, and algorithms which make this a tractable problem (spoiler: it's possible).

Transcript of Building a Mini-Google: High-Performance Computing in Ruby

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Building Mini-Google in Ruby

Ilya Grigorik

@igrigorik

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

postrank.com/topic/ruby

The slides… Twitter My blog

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Ruby + MathOptimization

PageRank

IndexingExamplesMisc Fun

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank PageRank + Ruby

IndexingExamplesTools

+ Optimization

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Consume with care…everything that follows is based on released / public domain info

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Search-engine graveyardGoogle did pretty well…

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Search pipeline50,000-foot view

Query: Ruby

Results

1. Crawl 2. Index 3. Rank

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Query: Ruby

Results

1. Crawl 2. Index 3. Rank

Bah FunInteresting

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

circa 1997-1998

CPU Speed 333MhzRAM 32-64MB

Index 27,000,000 documentsIndex refresh once a month~ishPageRank computation several days

Laptop CPU 2.1GhzVM RAM 1GB1-Million page web ~10 minutes

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Creating & Maintaining an Inverted Index DIY and the gotchas within

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Building an Inverted Index

require 'set'

pages = {"1" => "it is what it is","2" => "what is it","3" => "it is a banana"

}

index = {}

pages.each do |page, content|content.split(/\s/).each do |word|

if index[word]index[word] << page

elseindex[word] = Set.new(page)

endend

end

{"it"=>#<Set: {"1", "2", "3"}>,"a"=>#<Set: {"3"}>,"banana"=>#<Set: {"3"}>,"what"=>#<Set: {"1", "2"}>,"is"=>#<Set: {"1", "2", "3"}>}

}

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Building an Inverted Index

require 'set'

pages = {"1" => "it is what it is","2" => "what is it","3" => "it is a banana"

}

index = {}

pages.each do |page, content|content.split(/\s/).each do |word|

if index[word]index[word] << page

elseindex[word] = Set.new(page)

endend

end

{"it"=>#<Set: {"1", "2", "3"}>,"a"=>#<Set: {"3"}>,"banana"=>#<Set: {"3"}>,"what"=>#<Set: {"1", "2"}>,"is"=>#<Set: {"1", "2", "3"}>}

}

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Building an Inverted Index

require 'set'

pages = {"1" => "it is what it is","2" => "what is it","3" => "it is a banana"

}

index = {}

pages.each do |page, content|content.split(/\s/).each do |word|

if index[word]index[word] << page

elseindex[word] = Set.new(page)

endend

end

{"it"=>#<Set: {"1", "2", "3"}>,"a"=>#<Set: {"3"}>,"banana"=>#<Set: {"3"}>,"what"=>#<Set: {"1", "2"}>,"is"=>#<Set: {"1", "2", "3"}>}

}

Word => [Document]

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Querying the index

# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>

# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>

# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>

{"it"=>#<Set: {"1", "2", "3"}>,"a"=>#<Set: {"3"}>,"banana"=>#<Set: {"3"}>,"what"=>#<Set: {"1", "2"}>,"is"=>#<Set: {"1", "2", "3"}>}

}

1 32

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Querying the index

# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>

# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>

# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>

{"it"=>#<Set: {"1", "2", "3"}>,"a"=>#<Set: {"3"}>,"banana"=>#<Set: {"3"}>,"what"=>#<Set: {"1", "2"}>,"is"=>#<Set: {"1", "2", "3"}>}

}

1 32

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Querying the index

# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>

# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>

# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>

{"it"=>#<Set: {"1", "2", "3"}>,"a"=>#<Set: {"3"}>,"banana"=>#<Set: {"3"}>,"what"=>#<Set: {"1", "2"}>,"is"=>#<Set: {"1", "2", "3"}>}

}

1 32

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Querying the index

# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>

# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>

# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>

{"it"=>#<Set: {"1", "2", "3"}>,"a"=>#<Set: {"3"}>,"banana"=>#<Set: {"3"}>,"what"=>#<Set: {"1", "2"}>,"is"=>#<Set: {"1", "2", "3"}>}

}

What order?

[1, 2] or [2,1]

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Building an Inverted Index

require 'set'

pages = {"1" => "it is what it is","2" => "what is it","3" => "it is a banana"

}

index = {}

pages.each do |page, content|content.split(/\s/).each do |word|

if index[word]index[word] << page

elseindex[word] = Set.new(page)

endend

end

Hmmm?

PDF, HTML, RSS?Lowercase / Upcase?

Compact Index?Stop words?Persistence?

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Ferret is a high-performance, full-featured text search engine library written for Ruby

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

require 'ferret'include Ferret

index = Index::Index.new()

index << {:title => "1", :content => "it is what it is"}index << {:title => "2", :content => "what is it"}index << {:title => "3", :content => "it is a banana"}

index.search_each('content:"banana"') do |id, score|puts "Score: #{score}, #{index[id][:title]} "

end

> Score: 1.0, 3

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

require 'ferret'include Ferret

index = Index::Index.new()

index << {:title => "1", :content => "it is what it is"}index << {:title => "2", :content => "what is it"}index << {:title => "3", :content => "it is a banana"}

index.search_each('content:"banana"') do |id, score|puts "Score: #{score}, #{index[id][:title]} "

end

> Score: 1.0, 3

Hmmm?

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

class Ferret::Analysis::Analyzerclass Ferret::Analysis::AsciiLetterAnalyzerclass Ferret::Analysis::AsciiLetterTokenizerclass Ferret::Analysis::AsciiLowerCaseFilterclass Ferret::Analysis::AsciiStandardAnalyzerclass Ferret::Analysis::AsciiStandardTokenizerclass Ferret::Analysis::AsciiWhiteSpaceAnalyzerclass Ferret::Analysis::AsciiWhiteSpaceTokenizerclass Ferret::Analysis::HyphenFilterclass Ferret::Analysis::LetterAnalyzerclass Ferret::Analysis::LetterTokenizerclass Ferret::Analysis::LowerCaseFilterclass Ferret::Analysis::MappingFilterclass Ferret::Analysis::PerFieldAnalyzerclass Ferret::Analysis::RegExpAnalyzerclass Ferret::Analysis::RegExpTokenizerclass Ferret::Analysis::StandardAnalyzerclass Ferret::Analysis::StandardTokenizerclass Ferret::Analysis::StemFilterclass Ferret::Analysis::StopFilterclass Ferret::Analysis::Tokenclass Ferret::Analysis::TokenStreamclass Ferret::Analysis::WhiteSpaceAnalyzerclass Ferret::Analysis::WhiteSpaceTokenizer

class Ferret::Search::BooleanQueryclass Ferret::Search::ConstantScoreQueryclass Ferret::Search::Explanationclass Ferret::Search::Filterclass Ferret::Search::FilteredQueryclass Ferret::Search::FuzzyQueryclass Ferret::Search::Hitclass Ferret::Search::MatchAllQueryclass Ferret::Search::MultiSearcherclass Ferret::Search::MultiTermQueryclass Ferret::Search::PhraseQueryclass Ferret::Search::PrefixQueryclass Ferret::Search::Queryclass Ferret::Search::QueryFilterclass Ferret::Search::RangeFilterclass Ferret::Search::RangeQueryclass Ferret::Search::Searcherclass Ferret::Search::Sortclass Ferret::Search::SortFieldclass Ferret::Search::TermQueryclass Ferret::Search::TopDocsclass Ferret::Search::TypedRangeFilterclass Ferret::Search::TypedRangeQueryclass Ferret::Search::WildcardQuery

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

ferret.davebalmain.com/trac

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Ranking Results0-60 with PageRank…

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Naïve: Term Frequency

index.search_each('content:"the brown cow"') do |id, score|puts "Score: #{score}, #{index[id][:title]} "

end

> Score: 0.827, 3> Score: 0.523, 5> Score: 0.125, 4

Relevance?

3 5 4

the 4 3 5

brown 1 3 1

cow 1 4 1

Score 6 10 7

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Naïve: Term Frequency

index.search_each('content:"the brown cow"') do |id, score|puts "Score: #{score}, #{index[id][:title]} "

end

> Score: 0.827, 3> Score: 0.523, 5> Score: 0.125, 4

Skew

3 5 4

the 4 3 5

brown 1 3 1

cow 1 4 1

Score 6 10 7

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

TF-IDFTerm Frequency * Inverse Document Frequency

Skew

3 5 4

the 4 3 5

brown 1 3 1

cow 1 4 1

Total # of documents: 10

# of docs

the 6

brown 3

cow 4

Score = TF * IDF

TF = # occurrences / # wordsIDF = # docs / # docs with W

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

TF-IDFScore = 0.204 + 0.120 + 0.092 = 0.416

# of docs

the 6

brown 3

cow 4

3 5 4

the 4 3 5

brown 1 3 1

cow 1 4 1

Total # of documents: 10# words in document: 10

Doc # 3 score for ‘the’:4/10 * ln(10/6) = 0.204

Doc # 3 score for ‘brown’:1/10 * ln(10/3) = 0.120

Doc # 3 score for ‘cow’:1/10 * ln(10/4) = 0.092

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Frequency Matrix

W1 W2 … … … … … … WN

Doc 1 15 23 …

Doc 2 24 12 …

… … … …

Doc K

Size = N * K * size of Ruby object

Ouch.

Pages = N = 10,000Words = K = 2,000Ruby Object = 20+ bytes

Footprint = 384 MB

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

NArrayhttp://narray.rubyforge.org/

NArray is an Numerical N-dimensional Array class (implemented in C)

NArray.new(typecode, size, ...)NArray.byte(size,...)NArray.sint(size,...)NArray.int(size,...)NArray.sfloat(size,...)NArray.float(size,...)NArray.scomplex(size,...)NArray.complex(size,...)NArray.object(size,...)

# create new NArray. initialize with 0.# 1 byte unsigned integer# 2 byte signed integer# 4 byte signed integer# single precision float# double precision float# single precision complex# double precision complex# Ruby object

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

NArrayhttp://narray.rubyforge.org/

NArray is an Numerical N-dimensional Array class (implemented in C)

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRankthe google juice

Links as votes

Problem: link gaming

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Random Surferpowerful abstraction

Follow link from page he/she is currently on.

Teleport to a random location on the web.

P = 0.85

P = 0.15

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Surfin’rinse & repeat, ad naseum

Follow link from page he/she is currently on.

Teleport to a random location on the web.

Page K

Page N Page M

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Surfin’rinse & repeat, ad naseum

On Page P, clicks on link to K

P = 0.15

P = 0.85

On Page K clicks on link to M

On Page M teleports to X

P = 0.85

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Analyzing the Web Graphextracting PageRank

P = 0.6

N

MK

X

P = 0.15

P = 0.20P = 0.05

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

What is PageRank?It’s a scalar!

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

What is PageRank?it’s a probability!

P = 0.6

N

MK

X

P = 0.15

P = 0.20P = 0.05

P = 0.6

P = 0.15

P = 0.20P = 0.05

P = 0.6

P = 0.15

P = 0.20P = 0.05

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

What is PageRank?it’s a probability!

P = 0.6

N

MK

X

P = 0.15

P = 0.20P = 0.05

P = 0.6

P = 0.15

P = 0.20P = 0.05

Higher Pr, Higher Importance?

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Teleportation?sci-fi fans, … ?

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Reasons for teleportationenumerating edge cases

N

M

K

X

1. No in-links!

M

2. No out-links!

3. Isolated Web

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Exploring Graphsgratr.rubyforge.com

•Breadth First Search•Depth First Search•A* Search •Lexicographic Search •Dijkstra’s Algorithm •Floyd-Warshall•Triangulation and Comparability detection

require 'gratr/import'

dg = Digraph[1,2, 2,3, 2,4, 4,5, 6,4, 1,6]

dg.directed? # truedg.vertex?(4) # truedg.edge?(2,4) # truedg.vertices # [5, 6, 1, 2, 3, 4]

Graph[1,2,1,3,1,4,2,5].bfs # [1, 2, 3, 4, 5]Graph[1,2,1,3,1,4,2,5].dfs # [1, 2, 5, 3, 4]

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Teleportationprobabilities

N

M

K

X

M

P(T) = 0.03

P(T) = 0.03

P(T) = 0.03

P(T) = 0.03

P(T) = 0.03

P(T) = 0.15 / # of pagesP(T) = 0.03

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank: Simplified Mathematical Def’ncause that’s how we roll

𝐿 = 𝑇 =

0.15𝑁

⋮0.15

𝑁

Assume the web is N pages bigAssume that probability of teleportation (t) is 0.15, and following link (s) is 0.85Assume that teleportation probability (E) is uniformAssume that you start on any random page (uniform distribution L), then

Then after one step, the probability your on page X is:

𝐿 ∗ 𝑠𝐺 + 𝑡𝐸

𝐿 ∗ (0.85 ∗ 𝐺 + 0.15 ∗ 𝐸)

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

G = The Link Graphginormous and sparse

1 2 … … N

1 1 0 … … 0

2 0 1 … … 1

… … … … … …

… … … … … …

N 0 1 … … 1

Link Graph No link from 1 to N

Huge!

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

G as a dictionarymore compact…

{

"1" => [25, 26],

"2" => [1],

"5" => [123,2],

"6" => [67, 1]

}

Page

Links to…

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Computing PageRankthe tedious way

Follow link from page he/she is currently on.

Teleport to a random location on the web.

Page K

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Computing PageRankin one swoop

Identity matrix

Don’t trust me! Verify it yourself!

𝑞 = 𝑡 𝐼 − 𝑠𝐺 −1𝐸 = 𝑃1

⋮𝑃𝑛

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Enough hand-waving, dammit!show me the code

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Birth of EM-Proxyflash of the obvious

Hot, Fast, Awesome

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Hot, Fast, Awesome

http://rb-gsl.rubyforge.org/

Click there! … Give yourself a weekend.

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Click there! … Give yourself a weekend. http://ruby-gsl.sourceforge.net/

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank in Ruby6 lines, or less

require "gsl"include GSL

# INPUT: link structure matrix (NxN)# OUTPUT: pagerank scoresdef pagerank(g)

raise if g.size1 != g.size2

i = Matrix.I(g.size1) # identity matrixp = (1.0/g.size1) * Matrix.ones(g.size1,1) # teleportation vector

s = 0.85 # probability of following a linkt = 1-s # probability of teleportation

t*((i-s*g).invert)*pend

Verify NxN

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank in Ruby6 lines, or less

require "gsl"include GSL

# INPUT: link structure matrix (NxN)# OUTPUT: pagerank scoresdef pagerank(g)

raise if g.size1 != g.size2

i = Matrix.I(g.size1) # identity matrixp = (1.0/g.size1) * Matrix.ones(g.size1,1) # teleportation vector

s = 0.85 # probability of following a linkt = 1-s # probability of teleportation

t*((i-s*g).invert)*pend

Constants…

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank in Ruby6 lines, or less

require "gsl"include GSL

# INPUT: link structure matrix (NxN)# OUTPUT: pagerank scoresdef pagerank(g)

raise if g.size1 != g.size2

i = Matrix.I(g.size1) # identity matrixp = (1.0/g.size1) * Matrix.ones(g.size1,1) # teleportation vector

s = 0.85 # probability of following a linkt = 1-s # probability of teleportation

t*((i-s*g).invert)*pend

PageRank!

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Ex: Circular Webtesting intuition…

N

K

X P = 0.33

pagerank(Matrix[[0,0,1], [0,0,1], [1,0,0]])> [0.33, 0.33, 0.33]

P = 0.33

P = 0.33

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Ex: All roads lead to Ktesting intuition…

N

K

X P = 0.07

pagerank(Matrix[[0,0,0], [0.5,0,0], [0.5,1,1]])> [0.05, 0.07, 0.87]

P = 0.87

P = 0.05

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank + Ferretawesome search, ftw!

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

require 'ferret'include Ferret

index = Index::Index.new()

index << {:title => "1", :content => "it is what it is", :pr => 0.05 }index << {:title => "2", :content => "what is it", :pr => 0.07 }index << {:title => "3", :content => "it is a banana", :pr => 0.87 }

1

3

2 P = 0.07

P = 0.87

P = 0.05

Store PageRank

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

index.search_each('content:"world"') do |id, score|puts "Score: #{score}, #{index[id][:title]} (PR: #{index[id][:pr]})"

end

puts "*" * 50

sf_pr = Search::SortField.new(:pr, :type => :float, :reverse => true)

index.search_each('content:"world"', :sort => sf_pr) do |id, score|puts "Score: #{score}, #{index[id][:title]}, (PR: #{index[id][:pr]})"

end

# Score: 0.267119228839874, 3 (PR: 0.87)# Score: 0.17807948589325, 1 (PR: 0.05)# Score: 0.17807948589325, 2 (PR: 0.07)# ***********************************# Score: 0.267119228839874, 3, (PR: 0.87)# Score: 0.17807948589325, 2, (PR: 0.07)# Score: 0.17807948589325, 1, (PR: 0.05)

TF-IDF Search

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

index.search_each('content:"world"') do |id, score|puts "Score: #{score}, #{index[id][:title]} (PR: #{index[id][:pr]})"

end

puts "*" * 50

sf_pr = Search::SortField.new(:pr, :type => :float, :reverse => true)

index.search_each('content:"world"', :sort => sf_pr) do |id, score|puts "Score: #{score}, #{index[id][:title]}, (PR: #{index[id][:pr]})"

end

# Score: 0.267119228839874, 3 (PR: 0.87)# Score: 0.17807948589325, 1 (PR: 0.05)# Score: 0.17807948589325, 2 (PR: 0.07)# ***********************************# Score: 0.267119228839874, 3, (PR: 0.87)# Score: 0.17807948589325, 2, (PR: 0.07)# Score: 0.17807948589325, 1, (PR: 0.05)

PageRank FTW!

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

index.search_each('content:"world"') do |id, score|puts "Score: #{score}, #{index[id][:title]} (PR: #{index[id][:pr]})"

end

puts "*" * 50

sf_pr = Search::SortField.new(:pr, :type => :float, :reverse => true)

index.search_each('content:"world"', :sort => sf_pr) do |id, score|puts "Score: #{score}, #{index[id][:title]}, (PR: #{index[id][:pr]})"

end

# Score: 0.267119228839874, 3 (PR: 0.87)# Score: 0.17807948589325, 1 (PR: 0.05)# Score: 0.17807948589325, 2 (PR: 0.07)# ***********************************# Score: 0.267119228839874, 3, (PR: 0.87)# Score: 0.17807948589325, 2, (PR: 0.07)# Score: 0.17807948589325, 1, (PR: 0.05)

Google

Others

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Search*: Graphs are ubiquitous!PageRank is a general purpose hammer

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank + Social GraphGitHub

Username GitCred

==============================

37signals 10.00

imbriaco 9.76

why 8.74

rails 8.56

defunkt 8.17

technoweenie 7.83

jeresig 7.60

mojombo 7.51

yui 7.34

drnic 7.34

pjhyett 6.91

wycats 6.85

dhh 6.84

http://bit.ly/3YQPU

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank + Social GraphTwitter

Hmm…

Analyze the social graph:- Filter messages by ‘TwitterRank’- Suggest users by ‘TwitterRank’- …

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank + Product GraphE-commerce

Link items purchased in same cart… Run PR on it.

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank = Powerful Hammeruse it!

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Personalizationhow would you do it?

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

PageRank + Personalizationcustomize the teleportation vector

𝑇 =

0.15𝑁

⋮0.15

𝑁

Teleportation distribution doesn’t

have to be uniform!

yahoo.com is my homepage!

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Gaming PageRankfor fun and profit (I don’t endorse it)

Make pages with links!

http://bit.ly/pagerank-spam

Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank

Questions?

The slides… Twitter My blog

Slides: http://bit.ly/railsconf-pagerank

Ferret: http://bit.ly/ferretRB-GSL: http://bit.ly/rb-gsl

PageRank on Wikipedia: http://bit.ly/wp-pagerankGaming PageRank: http://bit.ly/pagerank-spam

Michael Nielsen’s lectures on PageRank:http://michaelnielsen.org/blog