Riak Search - Erlang Factory London 2010
-
Upload
rusty-klophaus -
Category
Technology
-
view
8.798 -
download
0
description
Transcript of Riak Search - Erlang Factory London 2010
![Page 1: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/1.jpg)
Erlang Factory· London· June 2010
Basho Technologies
Rusty Klophaus - @rklophaus
Riak SearchA Full-Text Search
and Indexing Engine
based on Riak
![Page 2: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/2.jpg)
Why did we build it?
What are the major goals?
How does it work?
2
![Page 3: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/3.jpg)
Part One
Why did we build
Riak Search?
3
![Page 4: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/4.jpg)
Riak is
a scalable, highly-available, networked,
open-source key/value store.
4
![Page 5: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/5.jpg)
Key/Value
CLIENT RIAK
5
Writing to a Key/Value Store
![Page 6: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/6.jpg)
Object
CLIENT RIAK
6
Writing to a Key/Value Store
![Page 7: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/7.jpg)
Key
Object
CLIENT RIAK
Querying a Key/Value Store
7
![Page 8: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/8.jpg)
Key + Instructions
Object(s)
CLIENT RIAK
Walk to Related
Keys
Querying Riak via LinkWalking
8
![Page 9: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/9.jpg)
Key(s) + JS Functions
Computed Value(s)
CLIENT RIAK
Map
Reduce
Map
Querying Riak via Map/Reduce
9
![Page 10: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/10.jpg)
Key/Value Stores
like
Key-Based Queries
10
![Page 11: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/11.jpg)
where Category == "Shoes"
CLIENT RIAK
WTF!? I'm aKV store!
Query by Secondary Index
11
![Page 12: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/12.jpg)
"Converse AND Shoes"
CLIENT RIAK
This is getting old.
Full-Text Query
12
![Page 13: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/13.jpg)
These kinds of queries
need an Index.
*Market Opportunity!*
13
![Page 14: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/14.jpg)
Part Two
What are the major
goals of Riak Search?
14
![Page 15: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/15.jpg)
Your Application
Riak
An application built on Riak.
15
![Page 16: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/16.jpg)
Your Application
RiakIndex
Object
Hrm... I need an index.
16
![Page 17: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/17.jpg)
Your Application
Riak???
Hrm... I need an index with more features.
17
![Page 18: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/18.jpg)
Your Application
RiakLucene
Lucene should do the trick...
18
![Page 19: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/19.jpg)
Your Application
Lucene Lucene Lucene Riak
...shard to add more storage capacity...
19
![Page 20: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/20.jpg)
Your Application
Lucene Lucene Lucene
Lucene Lucene Lucene
Lucene Lucene Lucene
Riak
...replicate to add more throughput.
20
![Page 21: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/21.jpg)
Your Application
Lucene Lucene Lucene
Lucene Lucene Lucene
Lucene Lucene Lucene
Riak
...replicate to add more throughput.
21
Operations nightmare!
![Page 22: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/22.jpg)
Your Application
Riak-ifiedLucene
Riak
What do we really want?
22
![Page 23: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/23.jpg)
Your Application
RiakSearch
Riak
What do we really want?
23
![Page 24: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/24.jpg)
Functionality? Be like Lucene (and more).
• Lucene Syntax
• Leverages Java Lucene Analyzers
• Solr Endpoints
• Integration via Riak Post-Commit Hook (Index)
• Integration via Riak Map/Reduce (Query)
• Near-Realtime
• Schema-less
24
![Page 25: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/25.jpg)
Operations? Be like Riak.
• No special nodes
• Add nodes, get more compute and storage
• Automatically load balance
• Replicas for durability and performance
• Index and query in parallel
• Swappable storage backends
25
![Page 26: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/26.jpg)
Part Three
How do we do it?
26
![Page 27: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/27.jpg)
A Gentle Introduction to
Document Indexing
27
![Page 28: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/28.jpg)
Every dog has his day.#1
day, 1
dog, 1
every, 1
has, 1
his, 1
Inverted IndexDocument
The Inverted Index
28
![Page 29: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/29.jpg)
The dog's bark is worse than his bite.
Every dog has his day.
Let the cat out of the bag.
It's raining cats and dogs.
#1
#2
#3
#4
Combined Inverted IndexDocuments
and, 4
bag, 3
bark, 2
bite, 2
cat, 3
cat, 4
day, 1
dog, 1
dog, 2
dog, 4
every, 1
has, 1
...
The Inverted Index
29
![Page 30: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/30.jpg)
"dog AND cat"
AND
dog cat
At Query Time...
30
![Page 31: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/31.jpg)
AND
dog cat
dog, 1
dog, 2
dog, 4
cat, 3
cat, 4
At Query Time...
31
![Page 32: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/32.jpg)
AND(Merge Intersection)
1
2
4
3
4
Result: 4
At Query Time...
32
![Page 33: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/33.jpg)
OR(Merge Union)
1
2
4
3
4
Result: 1, 2, 3, 4
At Query Time...
33
![Page 34: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/34.jpg)
Complex Behavior from Simple Structures
34
![Page 35: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/35.jpg)
Storage Approaches...
35
![Page 36: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/36.jpg)
Riak Search uses
Consistent Hashing
to store data on
Partitions
36
![Page 37: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/37.jpg)
Partitions = 10
Number of Nodes = 5
Partitions per Node = 2
Replicas (NVal) = 2
Introduction to Consistent Hashing and Partitions
37
![Page 38: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/38.jpg)
Object
Introduction to Consistent Hashing and Partitions
38
![Page 39: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/39.jpg)
Document Partitioning
vs.
Term Partitioning
39
![Page 40: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/40.jpg)
...and the
Resulting Tradeoffs
40
![Page 41: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/41.jpg)
Every dog has his day.#1
Document Partitioning @ Index Time
41
![Page 42: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/42.jpg)
"dog OR cat"
Document Partitioning @ Query Time
42
![Page 43: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/43.jpg)
Every dog has his day.#1
day, 1
dog, 1
every, 1
has, 1
his, 1
Term Partitioning @ Index Time
43
![Page 44: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/44.jpg)
day, 1 has, 1
every, 1his, 1
dog, 1
Term Partitioning @ Index Time
44
![Page 45: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/45.jpg)
"dog OR cat"
Term Partitioning @ Query Time
45
![Page 46: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/46.jpg)
Document Partitioning Term Partitioning
+ Lower Latency Queries
- Lower Throughput
- Lots of Disk Seeks
- Higher Latency Queries
+ Higher Throughput
- Hotspots in Ring (the "Obama" problem)
Tradeoffs...
46
![Page 47: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/47.jpg)
Riak Search: Term Partitioning
47
Term-partitioning is the most viable approach for our beta clients’ needs: high throughput on Really Big Datasets.
Optimizations:
• Term splitting to reduce hot spots
• Bloom filters & caching to save query-time bandwidth
• Batching to save query-time & index-time bandwidth
Support for either approach eventually.
![Page 48: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/48.jpg)
Diving Deeper:
The Lifecycle of a Query
48
![Page 49: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/49.jpg)
Parse the Query
49
![Page 50: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/50.jpg)
meeting AND (face OR phone)
The Query
50
![Page 51: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/51.jpg)
[{land, [
{term,"meeting",[]},
{lor,[
{term,"face",[]},
{term,"phone",[]}
]}
]}]
The Query as an Erlang Term (Parse w/ Leex and Yecc)
51
![Page 52: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/52.jpg)
#land
#term
"meeting"
#lor
#term #term
"face""phone"
The Query as a Graph
52
![Page 53: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/53.jpg)
Plan the Query
53
![Page 54: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/54.jpg)
Catalog CatalogCatalog Catalog
Catalog CatalogCatalog Catalog
System Catalog
54
![Page 55: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/55.jpg)
Term TermID
Term Weight
&
File Offset
System Catalog
55
![Page 56: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/56.jpg)
#land
#term
"meeting"
#lor
#term #term
"face""phone"23 @ node B
17 @ node A 13 @ node C
Consult the System Catalog for Term/Node Weights
56
![Page 57: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/57.jpg)
#land
#term
"meeting"
#lor
#term #term
"face""phone"23 @ node B
17 @ node A 13 @ node C
Use Term Weights to Plan the Query
57
![Page 58: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/58.jpg)
#land
#term
"meeting"
#lor
#term #term
"face""phone"23 @ node B
17 @ node A 13 @ node C
Use Term Weights to Plan the Query
58
![Page 59: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/59.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
23 @ node B
17 @ node A13 @ node C
Use Term Weights to Plan the Query
59
![Page 60: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/60.jpg)
[{node, {land, [ {node, {lor, [ {term,{"email","body","face"}, [ {node_weight,'[email protected]', 13} ]}, {term,{"email","body", "phone"}, [ {node_weight,'[email protected]', 17} ]} ]}, '[email protected]' }, {term, {"email","body","meeting"}, [ {node_weight,'[email protected]', 23} ]} ]}, '[email protected]'}]
The Node-Assigned Query as an Erlang Term
60
![Page 61: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/61.jpg)
Execute the Query
61
![Page 62: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/62.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes
62
![Page 63: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/63.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes
63
![Page 64: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/64.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes
64
![Page 65: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/65.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes
65
![Page 66: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/66.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes
66
![Page 67: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/67.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes & Stream the Results
67
![Page 68: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/68.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes & Stream the Results
68
![Page 69: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/69.jpg)
#land
#term
"meeting" #lor
#term #term
"face""phone"
#node@A
#node@B
Spawn the Query Processes & Stream the Results
69
![Page 70: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/70.jpg)
#land
#term
'disconnect' #lor
#term #term
'disconnect''disconnect'
#node@A
#node@B
Terminate When Finished
70
![Page 71: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/71.jpg)
Message Format
71
![Page 72: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/72.jpg)
Message ::
{results, [Result]} |
{results, disconnect}
Result ::
{DocID, Properties}
DocID ::
term()
Properties ::
proplist()
The Message Format
72
![Page 73: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/73.jpg)
{results, [
{375, []},
{961, [{color, "red"}]},
{155, [{pos, [1,2,5]}]}
]}
The Message Format
73
![Page 74: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/74.jpg)
Yay for Erlang!
74
• Clean lines between load balancing and logic, single- and multi-node look the same
• Easy to create new operators, rapid development of experimental features
• Linked processes make cleanup a breeze
• Significant code reduction over early Java prototypes
![Page 75: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/75.jpg)
Part Four
Review
75
![Page 76: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/76.jpg)
"Converse AND Shoes"
CLIENT RIAK
WTF!? I'm a
KV store!
Riak Search turns this...
76
![Page 77: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/77.jpg)
"Converse AND Shoes"
CLIENT RIAK
Gladly!
...into this...
77
![Page 78: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/78.jpg)
"Converse AND Shoes"
CLIENT RIAK
Keys or Objects
...into this...
78
![Page 79: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/79.jpg)
Your Application
RiakSearch
Riak
...while keeping operations easy.
79
![Page 80: Riak Search - Erlang Factory London 2010](https://reader034.fdocuments.in/reader034/viewer/2022051514/5491a128ac79595e288b4596/html5/thumbnails/80.jpg)
Thanks! Questions?
Search Team:
John Muellerleile - @jrecursive
Rusty Klophaus - @rklophaus
Kevin Smith - @kevsmith
Currently working with a small set of Beta users.
Open-source release planned for Q3.
www.basho.com