Advanced Data Modeling and Bitmap Indexes
-
Upload
planet-cassandra -
Category
Technology
-
view
1.777 -
download
1
description
Transcript of Advanced Data Modeling and Bitmap Indexes
![Page 2: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/2.jpg)
WHO ARE YOUR
Customers?
Monday, May 6, 13
![Page 3: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/3.jpg)
WHERE DO THEY
Hang out?
Monday, May 6, 13
![Page 4: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/4.jpg)
HOW SHOULD YOU
Engage?
Monday, May 6, 13
![Page 5: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/5.jpg)
What is User Experience?
Monday, May 6, 13
![Page 6: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/6.jpg)
What is my Data
?Monday, May 6, 13
![Page 7: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/7.jpg)
Form Follows Function
Monday, May 6, 13
![Page 8: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/8.jpg)
Data Follows Queries
Monday, May 6, 13
![Page 9: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/9.jpg)
Primary Key
CREATE TABLE users ( username text PRIMARY KEY, first_name text, last_name text, postal_code text, last_login timestamp);
INSERT INTO users (username,first_name,last_name,postal_code,last_login)VALUES ('cstar','Cassandra','Database','11111','2013-4-4');
SELECT first_name, last_nameFROM users WHERE username = 'cstar';
Monday, May 6, 13
![Page 10: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/10.jpg)
Primary Key
RowKey username first_name last_name postal_code
cstar cstar Cassandra Database 11111
user2 user2 Some Guy 22222
Monday, May 6, 13
![Page 11: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/11.jpg)
Secondary Index
CREATE INDEX user_zipcode ON users(postal_code);
11111 cstar
22222 user2 user3 user456 ...
Monday, May 6, 13
![Page 12: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/12.jpg)
Where Secondary Indexes Break
High Cardinality Data1
Only one index per query2
Indexes are distributed3
Only some datatypes; no counters4
Range queries are expensive5
Monday, May 6, 13
![Page 13: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/13.jpg)
Roll Your Own Using Wide Rows
RowKey 05/02/2012 02/01/2013 05/02/2013 ...
user2 JSON JSON JSON JSON
All events for “user2” indexed by time
Monday, May 6, 13
![Page 14: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/14.jpg)
Limitations to Rolling Your Own
Can’t query across rows1
Only some datatypes; no counters2
Requires lots of work in the application3
No complex queries4
Monday, May 6, 13
![Page 15: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/15.jpg)
What do I need
?Monday, May 6, 13
![Page 16: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/16.jpg)
A Query Engine Wishlist
High cardinality data; counters1
Complex queries, multiple clauses2
Results in < 500ms for billions of rows3
Sub-field searching; regex4
Range queries5
Monday, May 6, 13
![Page 17: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/17.jpg)
First Iteration: Ginormus String Sets
11111 cstar
22222 user2 user3 user456 ...
11111 22222
Monday, May 6, 13
![Page 18: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/18.jpg)
Bitmaps
Monday, May 6, 13
![Page 19: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/19.jpg)
Bitmaps
Monday, May 6, 13
![Page 20: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/20.jpg)
Bitmaps: How do they Work?
0-7 8-15 16-23 24-31
11111 11010011 1011011 1010000 00000000
22222 00000000 0011011 00000000 00000000
Monday, May 6, 13
![Page 21: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/21.jpg)
Bitmaps: Equality
0-7 8-15 16-23 24-31
11111 11010011 1011011 1010000 00000000
22222 00000000 0011011 00000000 00000000
SELECT * FROM users WHERE postal_code IN ('11111','22222');
0-7 8-15 16-23 24-31
11111 & 22222 00000000 0011011 00000000 00000000
Monday, May 6, 13
![Page 22: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/22.jpg)
Bitmaps: Range, or How Do I Query Counters?
Field Value 0-7 8-15 16-23 24-31
Event2 1 11010011 1011011 1010000 00000000
Event2 4 00000000 0011011 00000000 00000000
0-7 8-15 16-23 24-31
1 & 4 00000000 0011011 00000000 00000000
SELECT * FROM users WHERE Event2 > 0 AND Event2 < 5;
Monday, May 6, 13
![Page 23: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/23.jpg)
Trigrams; AKA You Promised REGEX
Field Value 0-7 8-15 16-23 24-31
last_name “foo” 11010011 1011011 1010000 00000000
last_name “bar” 00000000 0011011 00000000 00000000
0-7 8-15 16-23 24-31“foo” & “bar” 00000000 0011011 00000000 00000000
SELECT * FROM users WHERE last_name ~= ‘f.*bar’;
INSERT INTO users (username,first_name,last_name,postal_code,last_login)VALUES ('foobar82','johnny','foobar','94110','2013-4-4');
Monday, May 6, 13
![Page 24: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/24.jpg)
Monday, May 6, 13
![Page 25: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/25.jpg)
Not Everything is Roses and Honey
Indexes can be huge1
Requires a read before write2
Requires synchronization3
4
Monday, May 6, 13
![Page 26: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/26.jpg)
Compression
2
4
Monday, May 6, 13
![Page 27: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/27.jpg)
RLE Compression: How it Works
2
4
Header Fill, 11 blocks of 1s Literal 15 bits Fill,18 blocks of 0s Literal 15 bits
1010 10000000001011 111010000100101 000000000010010 000000010000011
Example taken from PWAH: http://www.sjvs.nl/?p=72
Monday, May 6, 13
![Page 28: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/28.jpg)
Dealing with Read Before Write
Partition Index Using a Ring
4
{ "product": 124, "user": 22, "event": "event2", "value": "Name=Jonathan+Doe&Age=23"}
Apply Hash to User Configured Fieldhash(:product) = c62fb32eadd5a0fcceb1ddf2697e2345c604f451
Monday, May 6, 13
![Page 29: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/29.jpg)
Ring Partitioning
Solves read before write1
Solves synchronization issues2
Insures index locality3
4 Easy to isolate big customers4
Index size is limited to the largest customer
5
Monday, May 6, 13
![Page 30: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/30.jpg)
Sparse Indexes
2
4
Offset 0x00 Offset 0x01 Offset 0xA0 Offset 0xF0
Field1 0111010101101111 1001010100100101 0111010000100101 0111011100100101
Only Store the Set Bits
Monday, May 6, 13
![Page 31: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/31.jpg)
Query & Indexing Engine
The Whole Enchilada
4
Queries and Events
Monday, May 6, 13
![Page 32: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/32.jpg)
Goals
Core query and index engine, wrapped1
Extensible events and queries via Lua2
Equality, range and REGEX queries3
44
No single point of failure5
Distributed, <500ms for billions of rows
Monday, May 6, 13
![Page 33: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/33.jpg)
Resources
Lots of Papers on Bitmap Compressionhttp://www-users.cs.umn.edu/~kewu/annotated.html
4
How Google Code Search Workedhttp://swtch.com/~rsc/regexp/regexp4.html
Monday, May 6, 13
![Page 34: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/34.jpg)
GOT ANY
Questions
?Monday, May 6, 13
![Page 35: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/35.jpg)
Thanks
4
Eric Tschetter of the Druid Projectand
Cassandra Devs for answering my questions
Monday, May 6, 13
![Page 36: Advanced Data Modeling and Bitmap Indexes](https://reader034.fdocuments.in/reader034/viewer/2022052310/55509a1ab4c9058b208b4866/html5/thumbnails/36.jpg)
THANK YOU!
Matt Stumpwww.matthewstump.com
@mattstump
Monday, May 6, 13