Streaming PostgreSQL with...
Transcript of Streaming PostgreSQL with...
![Page 1: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/1.jpg)
Streaming PostgreSQL with
PipelineDB
Derek Nelson
![Page 2: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/2.jpg)
What is PipelineDB?
![Page 3: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/3.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
![Page 4: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/4.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
![Page 5: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/5.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.4.4
![Page 6: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/6.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.4.4
● No special client libraries
![Page 7: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/7.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.4.4
● No special client libraries
● Eventually factored out as extension
![Page 8: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/8.jpg)
What is PipelineDB?
● Continuous SQL on streams (continuous views)
● High-throughput, incremental materialized views
● Based on PostgreSQL 9.4.4
● No special client libraries
● Eventually factored out as extension
![Page 9: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/9.jpg)
100,000 feet
Produce
Process
Consume
![Page 10: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/10.jpg)
100,000 feet
Process
ConsumeSQLProduce
![Page 11: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/11.jpg)
100,000 feet
Process
ConsumeSQLProduce
AggregationFilteringSliding windows
![Page 12: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/12.jpg)
100,000 feet
Process
ConsumeSQLProduce
AggregationFilteringSliding windows
= Reduction
![Page 13: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/13.jpg)
Why did we build it?
![Page 14: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/14.jpg)
Produce
Process
Consume
![Page 15: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/15.jpg)
ProduceConsume
![Page 16: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/16.jpg)
ProduceConsume
PipelineDBcontinuous view
![Page 17: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/17.jpg)
ProduceConsume
PipelineDB
Simplicity is nice,but what else?
continuous view
![Page 18: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/18.jpg)
Benefits of continuous SQL on streams
● Aggregate before writing to disk
![Page 19: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/19.jpg)
Benefits of continuous SQL on streams
● Aggregate before writing to disk
total data ingested
database size
CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream
![Page 20: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/20.jpg)
Benefits of continuous SQL on streams
● Aggregate before writing to disk
total data ingested
database size
CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream
(winning)
![Page 21: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/21.jpg)
Benefits of continuous SQL on streams
● Sliding window queries
● Any information outside of the window is excluded from results and deleted from disk
● Essentially automatic TTL
CREATE CONTINUOUS VIEW v WITH (max_age = ‘1 hour’) AS SELECT COUNT(*) FROM stream
![Page 22: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/22.jpg)
Benefits of continuous SQL on streams
CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream
● Probabilistic computations on infinite inputs
● Streaming Top-K, Percentiles, distincts, large set cardinalities● Constant space● No sorting● Small margin of error
![Page 23: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/23.jpg)
(Demo #1)
![Page 24: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/24.jpg)
Internals (part 1/2)
Streams andWorkers
![Page 25: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/25.jpg)
Streams
● Internally, a stream is Foreign Table
CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);
![Page 26: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/26.jpg)
Streams
CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);
● Internally, a stream is Foreign Table
● System-wide Foreign Server called pipeline_streams
● stream_fdw reads from/writes to the Stream Buffer
● No permanent storage
● Stream rows only exist until they’ve been fully read
![Page 27: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/27.jpg)
stream buffer query on microbatch incremental table update
![Page 28: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/28.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
![Page 29: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/29.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
![Page 30: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/30.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
![Page 31: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/31.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
HeapTuple {0,1,0,1,1}
![Page 32: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/32.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
HeapTuple {0,1,0,1,1}
HeapTuple {1,1,1,1,1}
![Page 33: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/33.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
HeapTuple
HeapTuple
● INSERT INTO ...
● Concurrent circular buffer
● Preallocated block of shared memory
HeapTuple {0,1,0,1,1}
HeapTuple {1,1,1,1,1}✗
![Page 34: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/34.jpg)
stream buffer query on microbatch incremental table update
/* At Postmaster startup time ... */worker.bgw_main = any_function;worker.bgw_main_arg = (Datum) arg;
RegisterDynamicBackgroundWorker(&worker, &handle);
![Page 35: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/35.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
...
HeapTuple
SELECT count(*), avg(x) FROM stream
![Page 36: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/36.jpg)
stream buffer query on microbatch incremental table update
HeapTuple
HeapTuple
HeapTuple
...
HeapTuple
SELECT count(*), avg(x) FROM stream
AGG
stream_fdw#GetStreamScanPlan
while (!BatchDone(node)){ tup = PinNext(buf) yield MarkAsRead(tup)}
1000
count
{1000, 4000}
avg
microbatch_result
![Page 37: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/37.jpg)
Worker proc 0
Worker proc 1
Worker proc ...
Worker proc n
tuples round-robin’d acrossn worker procs
Worker process parallelism
Stream buffer
![Page 38: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/38.jpg)
Internals (part 2/2)
IncrementalUpdates
![Page 39: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/39.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
![Page 40: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/40.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
![Page 41: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/41.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
● No changes to pg_aggregate catalog table or existing aggregate functions
![Page 42: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/42.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
● No changes to pg_aggregate catalog table or existing aggregate functions
● User-defined aggregates just need a combinefunc to be combinable
CREATE AGGREGATE combinable_agg(x)( sfunc=sfunc, finalfunc=finalfunc, combinefunc=combinefunc,);
![Page 43: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/43.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
![Page 44: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/44.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
{1000, 4000}
avg
1000
count
{5000, 10000}5000
combine()
![Page 45: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/45.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
{1000, 4000}
avg
1000
count
{5000, 10000}5000
combine()
existing on-disk row
![Page 46: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/46.jpg)
stream buffer query on microbatch incremental table update
● transition_state = combine(microbatch_tstate, existing_tstate)
● pipeline_combine catalog table maps combine functions to aggregates
microbatch_result
{1000, 4000}
avg
1000
count
{1000, 4000}
avg
1000
count
{5000, 10000}5000
combine()
{6000, 14000}
avg
6000
count
updated_result
existing on-disk row
![Page 47: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/47.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
![Page 48: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/48.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));
![Page 49: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/49.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));
set_values(lookup_plan, values)
![Page 50: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/50.jpg)
stream buffer query on microbatch incremental table update
lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))
/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));
set_values(lookup_plan, values)
existing = PortalRun(lookup_plan, ...)
/* now we’re reading to combine these on-disk tuples with the incoming batch result */
![Page 51: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/51.jpg)
stream buffer query on microbatch incremental table update
● This query needs to be as fast as possible
● Continuous views indexed on a 32-bit hash of grouping
● Pro: maximize cardinality of the index keyspace, great for random perf
● Con: must deal with collisions programmatically
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
![Page 52: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/52.jpg)
stream buffer query on microbatch incremental table update
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
● If the grouping contains a time-based column, we can do better
CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day
![Page 53: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/53.jpg)
stream buffer query on microbatch incremental table update
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
● If the grouping contains a time-based column, we can do better
● These continuous views are indexed with 64 bits:hash of grouping
Timestamp from group (32 bits) Regular 32-bit grouping hash
CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day
![Page 54: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/54.jpg)
stream buffer query on microbatch incremental table update
SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)
● If the grouping contains a time-based column, we can do better
● These continuous views are indexed with 64 bits:hash of grouping
● Pro: most incoming groups will have a similar timestamp, so better index caching
● Con: larger index footprint
Timestamp from group (32 bits) Regular 32-bit grouping hash
CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day
![Page 55: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/55.jpg)
stream buffer query on microbatch incremental table update
microbatch result generated from stream by worker✔
![Page 56: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/56.jpg)
stream buffer query on microbatch incremental table update
microbatch result generated from stream by worker
existing result retrieved from disk
✔✔
![Page 57: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/57.jpg)
stream buffer query on microbatch incremental table update
microbatch result generated from stream by worker
existing result retrieved from disk
✔✔
combine_plan = get_plan(SELECT group, combine(count), combine(avg) FROM microbatch_result UNION existing GROUP BY group);
combined = PortalRun(combine_plan, ...)
foreach(row, combined){ if (new_tuple(row)) heap_insert(row, …); else heap_update(row, …);}
![Page 58: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/58.jpg)
stream buffer query on microbatch incremental table update
microbatch result generated from stream by worker
existing result retrieved from disk
✔✔
combine_plan = get_plan(SELECT group, combine(count), combine(avg) FROM microbatch_result UNION existing GROUP BY group);
combined = PortalRun(combine_plan, ...)
foreach(row, combined){ if (new_tuple(row)) heap_insert(row, …); else heap_update(row, …);}
![Page 59: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/59.jpg)
grouping (a, b, c)
grouping (d, e, f)
grouping (g, h, i)
grouping (j, k, l)
On-disk groupings are sharded over combiners by group
Each row is guaranteed to only ever be updated by one combiner process
Combiner process parallelism
Continuous view
![Page 60: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/60.jpg)
(Demo #2)
![Page 61: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/61.jpg)
Demo #2
● Wikipedia data summarized by hour and url● <hour | project | page title | view count | bytes served>
CREATE STREAM wiki_stream(
hour timestamp,project text,title text,view_count bigint,size bigint
);
CREATE CONTINUOUS VIEW wiki_stats ASSELECT project::text, sum(view_count::bigint) AS total_views,
fss_agg(title, 3) AS titlesFROM wiki_streamGROUP BY hour, project;
![Page 62: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/62.jpg)
Tuning Tips
● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query
![Page 63: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/63.jpg)
Tuning Tips
● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query
● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query
![Page 64: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/64.jpg)
Tuning Tips
● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query
● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query
● continuous_query_batch_size○ Larger batch size will give better performance for aggregations
![Page 65: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/65.jpg)
Tuning Tips
● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query
● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query
● continuous_query_batch_size○ Larger batch size will give better performance for aggregations
● step_size (string)○ CREATE CONTINUOUS VIEW v WITH (step_size = ‘1 second’, max_age = ‘1 minute’)...○ Larger step size will yield less granular sliding but better performance
![Page 66: Streaming PostgreSQL with PipelineDBinfo.citusdata.com/rs/235-CNE-301/images/PipelineDB_the_Streamin… · Continuous SQL on streams (continuous views) High-throughput, incremental](https://reader036.fdocuments.in/reader036/viewer/2022062921/5f03200e7e708231d407a967/html5/thumbnails/66.jpg)
Tuning Tips
● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query
● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query
● continuous_query_batch_size○ Larger batch size will give better performance for aggregations
● step_size (string)○ CREATE CONTINUOUS VIEW v WITH (step_size = ‘1 second’, max_age = ‘1 minute’)...○ Larger step size will yield less granular sliding but better performance
● step_factor (integer in (0, 100])○ CREATE CONTINUOUS VIEW v WITH (step_factor = 10, max_age = ‘1 minute’)...○ Larger step factor will yield less granular sliding but better performance