Learning Data Systems Components

102
Tim Kraska <[email protected]> [Disclaimer: I am NOT talking on behalf of Google] Learning Data Systems Components Work partially done at

Transcript of Learning Data Systems Components

Page 1: Learning Data Systems Components

Tim Kraska <[email protected]> [Disclaimer: I am NOT talking on behalf of Google]

Learning Data Systems Components

Work partially done at

Page 2: Learning Data Systems Components

HashMaps

SortingJoins

BloomFilter

Tree

“Machine Learning Just Ate Algorithms In One Large Bite….” [Christopher Manning, Professor at Stanford]

Comments on Social Media

Page 3: Learning Data Systems Components

Disclaimer

HashMaps

Sorting

Joins

BloomFilter

Tree

Page 4: Learning Data Systems Components

Fundamental Building Blocks

Sorting

B-TreeHash- Map

Scheduling

Join

Priority Queue

Bloom Filter

CachingRange Filter

Page 5: Learning Data Systems Components

Databases as an Example:

B-Trees

Page 6: Learning Data Systems Components
Page 7: Learning Data Systems Components
Page 8: Learning Data Systems Components
Page 9: Learning Data Systems Components

B-CC-G

G-JK-N

N-R

Q-SS-U

U-VV-X

X-@

Page 10: Learning Data Systems Components

B-CC-G

G-JK-N

N-R

A-B B-C C-G …Key

Page 11: Learning Data Systems Components

B-CC-G

G-JK-N

N-R

A-B B-C C-G …

AA-AL

AL-AK

AK-AP … BA-

BEBI- BL

BL-BR … … ... ... …

….

Key

Page 12: Learning Data Systems Components

A-B B-C C-G …

AA-AL

AL-AK

AK-AP … BA-

BEBI- BL

BL-BR … … ... ... …

….

… … … … …. … …. …….….

… … … … …. … …. … … … … … …. … …. …

Key

Page 13: Learning Data Systems Components

B-CC-G

G-JK-N

N-R

Page 14: Learning Data Systems Components

The Librarian

Page 15: Learning Data Systems Components

Harry Potter

Childreen Books

Curious George

O’Reilly Books

Travel Books

DaVinci Code

The Girl on the Train

Bill Brycen

The Source

The GruffaloThe Gruffalo

A Day in the Life of Marlon Bundo

ML With Python

Page 16: Learning Data Systems Components

Harry Potter

Childreen Books

Curious George

O’Reilly Books

Travel Books

DaVinci Code

The Girl on the Train

Bill Brycen

The Source

The GruffaloMake Way for Ducklings 

A Day in the Life of Marlon Bundo

ML With Python

Page 17: Learning Data Systems Components

B-CC-G

G-JK-N

N-R

Page 18: Learning Data Systems Components

B-CC-G

G-JK-N

N-R

A-B

B-C

C-G …

AA-AL

AL-AK

AK-AP …

BA-BE

BI- BL

BL-BR … … ... ... …

….

… … … … …. … …. …….….

… … … … …. … …. … … … … … …. … …. …

Key

Page 19: Learning Data Systems Components

B-CC-G

G-JK-N

N-R

Model

Key

Page 20: Learning Data Systems Components

Fundamental Algorithms & Data Structures

Hash-MapTreeSortingJoin

Range-Filter Priority Queue

…..Scheduling Cache Policy

Bloom-Filter

Page 21: Learning Data Systems Components

Not convinced yet?

Page 22: Learning Data Systems Components

Another Example:Index All Integers from 900 to 800M

900 901 902 903 904 905 906 907 908 909 800M…

… … … …

… … … … … … … … … ... ... …….

… … … … …. … …. …….….

… … … … …. … …. … … … … … …. … …. …

B-Tree?

Page 23: Learning Data Systems Components

A More Concrete Example:Index All Integers from 900 to 800M

900 901 902 903 904 905 906 907 908 909 800M…

data_array[lookup_key - 900]

Page 24: Learning Data Systems Components

Goal: Index All Integers from 900 to 800M

900 901 902 903 904 905 906 907 908 909 800M…

900 902 904 906 908 910 912 914 916 918 800M…

Index All Even Integers from 900 to 800M

data_array[(lookup_key – 900) / 2]

Page 25: Learning Data Systems Components

Still holds for other data distributions

Page 26: Learning Data Systems Components

Key InsightTraditional data structures (typically) make no assumptions about the data

But knowing the data distribution might allow for significant performance gains and might even change the complexity of data structures (e.g., O(log n) ! O(1) for lookups or O(n) ! O(1) for storage)

Page 27: Learning Data Systems Components

Building A System From Scratch For Every Use Case Is Not Economical

Page 28: Learning Data Systems Components

Conceptually a B-Tree maps a key to a page

B-Tree

key

page

For simplicity assume all pages are continuously stored in main memory

Page 29: Learning Data Systems Components

Alternative ViewB-Tree maps a key to a position with a fixed min/max error

For simplicity assume all pages are continuously stored in main memory

B-Tree

Sorted Array

key

position

pos pos + page-size

1. B-tree: key→pos 2. Binary search within

min/max-error

Page 30: Learning Data Systems Components

Sorted Array

key

position

pos pos + page-size

Model

A B-Tree Is A Model

Page 31: Learning Data Systems Components

Finding an item 1. Any model: key → pos estimate 2. Binary search in

[pos - errmin, pos + errmax]

errmin and errmax are known from the

training process

Sorted Array

key

position

pos pos + page-size

Model

A B-Tree Is A Model

Page 32: Learning Data Systems Components

A B-Tree Is A ModelA form of a regression modelkey→ pos is equivalent of modeling the CDF of the (observed) key distribution: Pos-estimate = P(X ≤ key) * #keys

Sorted Array

key

position

pos pos + page-size

Model

Page 33: Learning Data Systems Components

A B-Tree Is A Model

Pos-estimate = F(key) * #keys

Page 34: Learning Data Systems Components

B-Trees Are Regression Trees

B-Tree

Sorted Array

key

position

Page 35: Learning Data Systems Components

What Does This Mean

Page 36: Learning Data Systems Components

What Does This Mean

Database people were the first to do large scale machine learning :)

Page 37: Learning Data Systems Components

Potential Advantages of Learned B-Tree Models

• Smaller indexes → less (main-memory) storage

• Faster Lookups?

• More parallelism → Sequential if-statements are exchanged for multiplications

• Hardware accelerators → Lower power, better $/compute….

• Cheaper inserts? → more on that later. For the moment, assume read-only

Page 38: Learning Data Systems Components

A First Attempt

• 200M web-server log records by timestamp-sorted • 2 layer NN, 32 width, ReLU activated • Prediction task: timestamp ! position within

sorted array

Page 39: Learning Data Systems Components

Cache-Optimized B-Tree

≈250ns ???

A First Attempt

Page 40: Learning Data Systems Components

A First Attempt

≈250ns ≈80,000ns

Cache-Optimized B-Tree

Page 41: Learning Data Systems Components

ReasonsProblem I: Tensorflow is designed for large models

Problem II: B-Trees are great for overfitting

Problem III: B-Trees are cache-efficient

Problem IV: Search does not take advantage of the prediction

Page 42: Learning Data Systems Components

Problem I: The Learning Index Framework (LIF)

• An index synthesis system

• Given an index configuration generate the best possible code

• Uses ideas from Tupleware [VLDB15]

• Simple models are trained “on-the-fly”, whereas for complex models we use Tensorflow and extract weights afterwards (i.e., no Tensorflow during inference time)

• Best index configuration is found using auto-tuning (e.g., see TuPAQ [SOCC15]

Page 43: Learning Data Systems Components

Problem II + III: Precision Gain per Node

…….

…….

…….

…….

…….

…….…….

…….

Index over 100M records. Page-size: 100

Precision Gain: 100M --> 1M (Min/Max-Error: 1M)

Precision Gain: 1M --> 10k

Precision Gain: 10k --> 100

100M records (i.e., 1M pages)

Page 44: Learning Data Systems Components

The Last Mile Problem

Pos

Key

Page 45: Learning Data Systems Components

Solution: Recursive Model Index (RMI)

Page 46: Learning Data Systems Components

How Does The Lookup-Code Look LikeModel on stage 1: f0(key_type key) Models on stage two: f1[] (e.g., the first model in the second stage is is f1[0](key_type key)) Lookup Code:

pos_estimate " f1[f0(key)](key)pos " exp_search(key, pos_estimate, data);

Number of operations with linear regression models: offset " a + b * key

weights2 " weights_stage2[offset]pos_estimate " weights2.a +

weights2.b * keypos " exp_search(key, pos_estimate, data)

2x multiplies 2x additions 1x array-lookup

Page 47: Learning Data Systems Components

How Does The Lookup-Code Look LikeModel on stage 1: f0(key_type key) Models on stage two: f1[] (e.g., the first model in the second stage is is f1[0](key_type key)) Lookup Code for a 2-stage RMI:

pos_estimate " f1[f0(key)](key)pos " exp_search(key, pos_estimate, data);

Operations with a 2-stage RMI with linear regression models offset " a + b * key

weights2 " weights_stage2[offset]pos_estimate " weights2.a +

weights2.b * keypos " exp_search(key, pos_estimate, data)

2x multiplies 2x additions 1x array-lookup

Page 48: Learning Data Systems Components

Hybrid RMI

Worst-Case Performance is the one of a B-Tree

Page 49: Learning Data Systems Components

Problem IV: Min-/Max-Error vs Average Error

0 N

Actual Position

Predicted Position

Min-Model Error

Max-Model Error

Page 50: Learning Data Systems Components

Binary Search

0 N

Actual Position

Predicted Position

RightLeft Middle

Page 51: Learning Data Systems Components

Binary Search

0 N

Actual Position

Predicted Position

RightLeft Middle

Page 52: Learning Data Systems Components

Binary Search

0 N

Actual Position

Predicted Position

MiddleLeft Right

Page 53: Learning Data Systems Components

Quaternary Search

0 N

Actual Position

Predicted Position

RightLeft Q2

Page 54: Learning Data Systems Components

Quaternary Search

0 N

Actual Position

Predicted Position

RightLeft Q2

Q1: Prediction – 2x std err

Q3: Prediction + 2x std err

Page 55: Learning Data Systems Components

Quaternary Search

0 N

Actual Position

Predicted Position

Left Q1 RightQ2 Q3

Page 56: Learning Data Systems Components

Exponential Search

0 N

Actual Position

Predicted Position

Page 57: Learning Data Systems Components

Does it have to be

Page 58: Learning Data Systems Components

Does It Work?

Type Config Lookup time

Speedup vs. BTree

Size (MB) Size vs. Btree

BTree page size: 128 260 ns 1.0X 12.98 MB 1.0X

Learned index

2nd stage size: 10000 222 ns 1.17X 0.15 MB 0.01X

Learned index

2nd stage size: 50000

162 ns 1.60X 0.76 MB 0.05X

Learned index

2nd stage size: 100000

144 ns 1.67X 1.53 MB 0.12X

Learned index

2nd stage size: 200000

126 ns 2.06X 3.05 MB 0.23X60% faster at 1/20th the space, or 17% faster at 1/100th the space

200M records of map data (e.g., restaurant locations). index on longitude Intel-E5 CPU with 32GB RAM without GPU/TPUs No Special SIMD optimization (there is a lot of potential)

Page 59: Learning Data Systems Components

Does It Work?

Type Config Lookup time

Speedup vs. BTree

Size (MB) Size vs. Btree

BTree page size: 128 260 ns 1.0X 12.98 MB 1.0X

Learned index

2nd stage size: 10000 222 ns 1.17X 0.15 MB 0.01X

Learned index

2nd stage size: 50000

162 ns 1.60X 0.76 MB 0.05X

Learned index

2nd stage size: 100000

144 ns 1.67X 1.53 MB 0.12X

Learned index

2nd stage size: 200000

126 ns 2.06X 3.05 MB 0.23X60% faster at 1/20th the space, or 17% faster at 1/100th the space

200M records of map data (e.g., restaurant locations). index on longitude Intel-E5 CPU with 32GB RAM without GPU/TPUs No Special SIMD optimization (there is a lot of potential)

Page 60: Learning Data Systems Components

You Might Have Seen Certain Blog Posts

Page 61: Learning Data Systems Components

0 50 100 150 200 250 300 350

256

32

4

0.5

Size (MB)

Lookup-Time (ns)

FAST

LookupTable

Fixed-Size Read-OptimizedB-Tree w/ interpolation search

Learned Index

BetterW

orse

Better Worse

Page 62: Learning Data Systems Components

My Own Comparison

Page 63: Learning Data Systems Components

A Comparison To ARTful Indexes (Radix-Tree)

Experimental setup:

• Dense: continuous keys from 0 to 256M • Sparse: 256M keys where each bit is equally likely 0 or 1.

Viktor Leis, Alfons Kemper, Thomas Neumann: The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. ICDE 2013

Page 64: Learning Data Systems Components

A Comparison To ARTful Indexes (Radix-Tree)

Experimental setup: continuous keys from 0 to 256M

Reported lookup throughput: 10M/s ≈ 100ns(1)

Size: not measured, but paper says overhead of ≈8 Bytes per key (dense, best case): 256M * 8 Byte ≈ 1953MB

(1)Numbers from the paper

Viktor Leis, Alfons Kemper, Thomas Neumann: The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. ICDE 2013

Page 65: Learning Data Systems Components

Learned Index

Generate Code: Record lookup(key) {

return data[0 + 1 * key];}

Page 66: Learning Data Systems Components

Learned Index

Generate Code: Record lookup(key) {

return data[key];}

Page 67: Learning Data Systems Components

Learned Index

Generate Code: Record lookup(key) {

return data[key];}

Lookup Latency: 10ns (learned index) vs 100ns* (ARTfull) or one-order-of-magnitude better Space: 0MB vs 1953MB Infinitely better :)

Page 68: Learning Data Systems Components

?

Page 69: Learning Data Systems Components

What about Updates and Inserts?

Page 70: Learning Data Systems Components

What about Updates and Inserts?Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, Tim Kraska: A-Tree: A Bounded Approximate Index Structurehttps://arxiv.org/abs/1801.10207

Page 71: Learning Data Systems Components

The Simple Approach: Delta Indexing

updates

Training a simple Multi-Variate Regression Model Can be done in one pass over the data

Page 72: Learning Data Systems Components

Leverage the Distribution

Page 73: Learning Data Systems Components

Leverage the Distribution for Appends

Inse

rts

(e

.g.,

Tim

esta

mps

)

Time

New Inserts

If the Learned Model Can Generalize to Inserts Insert complexity is O(1) not O(Log N)

Page 74: Learning Data Systems Components

Updates/Inserts • Less beneficial as the data still has to be stored sorted

• Idea: Leave space in the array where more updates/inserts are expected

• Can also be done with traditional trees.

• But, the error of learned indexes should increase with per node in RMI whereas traditional indexes with𝑁 𝑁

Page 75: Learning Data Systems Components

Still at the Beginning!

• Can we provide bounds for inserts? • When to retrain? • How to retrain models on the fly? • …

Page 76: Learning Data Systems Components

Fundamental Algorithms & Data Structures

Hash-MapTreeSortingJoin

Range-Filter Priority Queue

…..Scheduling Cache Policy

Bloom-Filter

Page 77: Learning Data Systems Components

Fundamental Algorithms & Data Structures

TreeSortingJoin

Range-Filter Priority Queue

…..Scheduling Cache Policy

Bloom-FilterHash-Map

Page 78: Learning Data Systems Components

Hash FunctionKey ModelKey

Hash Map

Goal: Reduce Conflicts

Page 79: Learning Data Systems Components

25% - 70% Reduction in Hash-Map Conflicts

Hash Map - Results

Skip

Page 80: Learning Data Systems Components

You Might Have Seen Certain Blog Posts

Page 81: Learning Data Systems Components

Independent of Hash-Map Architecture

Page 82: Learning Data Systems Components

Hash Map – Example Results

Type Time (ns) Utilization

Stanford AVX Cuckoo, 4 Byte value 31ns 99%Stanford AVX Cuckoo, 20 Byte record - Standard Hash 43ns 99% Commercial Cuckoo, 20Byte record - Standard Hash 90ns 95% In-place chained Hash-map, 20Byte record, learned hash functions 35ns 100%

Page 83: Learning Data Systems Components

Fundamental Algorithms & Data Structures

Hash-MapTreeSortingJoin

Range-Filter Priority Queue

…..Scheduling Cache Policy

Bloom-Filter

Page 84: Learning Data Systems Components

Fundamental Algorithms & Data Structures

Hash-MapTreeSortingJoin

Range-Filter Priority Queue

…..Scheduling Cache Policy

Bloom-Filter

Page 85: Learning Data Systems Components

Is This Key In My Set?

Maybe Yes No No

Maybe No

Is This Key In My Set?

Model

Maybe Yes

36% Space Improvement over Bloom Filter at Same False Positive Rate

Bloom Filter- Approach 1

Page 86: Learning Data Systems Components

Bloom Filter- Approach 2 (Future Work)

Hash Function 1

KeyModelKey

Hash Function 2

Hash Function 3

Page 87: Learning Data Systems Components

Fundamental Algorithms & Data Structures

Hash-MapTreeSortingJoin

Range-Filter Priority Queue

…..Scheduling Cache Policy

Bloom-Filter

Page 88: Learning Data Systems Components

Future Work

CDF

How Would You Design Your Algorithms/Data Structure If You Have a Model for the Empirical Data Distribution?

Page 89: Learning Data Systems Components

Future Work

Hash-MapTreeSortingJoin

Range-Filter Priority Queue

…..Scheduling Cache Policy

Bloom-Filter

Page 90: Learning Data Systems Components

Future work: Multi-Dim Indexes

Page 91: Learning Data Systems Components

Future work: Data Cubes

Page 92: Learning Data Systems Components

Other Database Components

• Cardinality Estimation • Cost Model • Query Scheduling • Storage Layout • Query Optimizer • …

How Would You Design Your Algorithms/Data Structure If You Have a Model for the Empirical Data Distribution?

Page 93: Learning Data Systems Components
Page 94: Learning Data Systems Components

Related Work• Succinct Data Structures ! Most related, but succinct data

structures usually are carefully, manually tuned for each use case • B-Trees with Interpolation search ! Arbitrary worst-case

performance • Perfect Hashing ! Connection to our Hash-Map approach, but

they usually increase in size with N • Mixture of Expert Models ! Used as part of our solution • Adaptive Data Structures / Cracking ! orthogonal problem • Local Sensitive Hashing (LSH) (e.g., learened by NN)! Has nothing to do with Learned Structures

Page 95: Learning Data Systems Components

Local Sensitive Hashing (LSH)

Thanks Alkis for the analogy

Page 96: Learning Data Systems Components

Summarize

CDF

How Would You Design Your Algorithms/Data Structure If You Have a Model for the Empirical Data Distribution?

Page 97: Learning Data Systems Components

Adapts To Your Data

Page 98: Learning Data Systems Components

Big Potential For TPUs/GPUs

Page 99: Learning Data Systems Components

O(N2)

O(N)

O(Log N)

O(1)N

Tim

e o

r S

pace

Can Lower the Complexity Class

data_array[(lookup_key – 900)]

Page 100: Learning Data Systems Components

Warning Not An Almighty Solution

Page 101: Learning Data Systems Components

Data System for AI Lab DSAIL@CSAIL

Rese

arch

Ar

eaSy

stem

Fa

cult

yFo

undi

ng

Spon

sors

ML

Facu

lty

Page 102: Learning Data Systems Components

Tim Kraska <[email protected]>

Spec

ial

tha

nks

to:

• A new approach to indexing

• Framework to rethink many existing data structures/algorithms

• Under certain conditions, it might allow to change the complexity class of data structures

• The idea might have implications within and outside of database systems

Technical Report: Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis: The Case for Learned Index Structures

Work partially done at