The Data Cyclotron Query Processing Scheme
Can data movement sometimes be OK?
Carlos & Mali
Distributed Query Processing
Why do we want it?
Sticky Data
Relations A, B, C, D, E, F
A
B
D
C
F
E
Select A, B, C
Optimizer load
Impact of an unpredictable load on the optimizer?
Does this scale well?
What's our goal?
A self-organizing architecture
Network hardware is getting much better
Source: http://www.hpcadvisorycouncil.com/events/2014/swiss-workshop/presos/Day_1/1_Mellanox.pdf
RDMA: much less CPU overhead
A Modern Large Resource Pool
30GB Hot set
3TB Hot set
Load balancing
A ?
Storage in the Data Cyclotron
OID val
Data assignment
random
Query Plans
SELECTc.t_id, t.id
FROMt, c
WHEREc.t_id = t.id;
The Data Cyclotron ArchitectureDBMS Layer
Cyclotron Layer
Network Layer
Source: http://homepages.cwi.nl/~mk/datacyclotron.pdf
Other Distributed Systems
Query protocols?
Storage protocols?
Experiments: Limited Ring Capacity
In a constrained ring, high LOIT -> better throughput. Why?
Experiments: Limited Ring Capacity
What's happening with the 2GB line? What is bad about the red line?
Experiments: Skewed Workloads
How quickly and how well does DaCy adapt to a rapidly changing hot set?Where do we see evidence of this?
Experiments: Non-uniform workloads
Highly used BATs are kept longer in the ring as a result of a high LOI, reducing their load rate.
TPC-H Workload
What is the effect of additional nodes?
Thoughts on latency beyond 8 nodes?