The Data Cyclotron Query Processing Scheme

22
The Data Cyclotron Query Processing Scheme Can data movement sometimes be OK? Carlos & Mali

Transcript of The Data Cyclotron Query Processing Scheme

The Data Cyclotron Query Processing Scheme

Can data movement sometimes be OK?

Carlos & Mali

Distributed Query Processing

Why do we want it?

Sticky Data

Relations A, B, C, D, E, F

A

B

D

C

F

E

Select A, B, C

Optimizer load

Impact of an unpredictable load on the optimizer?

Does this scale well?

What's our goal?

A self-organizing architecture

Network hardware is getting much better

Source: http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf

RDMA: much less CPU overhead

InfinibandEDR

A Modern Large Resource Pool

30GB Hot set

3TB Hot set

Turbulent data

Load balancing

A ?

Optimizers

Storage in the Data Cyclotron

OID val

Data assignment

random

Query Plans

SELECTc.t_id, t.id

FROMt, c

WHEREc.t_id = t.id;

The Data Cyclotron ArchitectureDBMS Layer

Cyclotron Layer

Network Layer

Source: http://homepages.cwi.nl/~mk/datacyclotron.pdf

Other Distributed Systems

Query protocols?

Storage protocols?

Experiments: Limited Ring Capacity

In a constrained ring, high LOIT -> better throughput. Why?

Experiments: Limited Ring Capacity

What's happening with the 2GB line? What is bad about the red line?

Experiments: Skewed Workloads

How quickly and how well does DaCy adapt to a rapidly changing hot set?Where do we see evidence of this?

Experiments: Non-uniform workloads

Highly used BATs are kept longer in the ring as a result of a high LOI, reducing their load rate.

TPC-H Workload

What is the effect of additional nodes?

Thoughts on latency beyond 8 nodes?

Future Outlook