Streaming PostgreSQL with...

Streaming PostgreSQL with

PipelineDB

Derek Nelson

What is PipelineDB?

● Continuous SQL on streams (continuous views)

What is PipelineDB?

● High-throughput, incremental materialized views

What is PipelineDB?

● Based on PostgreSQL 9.4.4

What is PipelineDB?

● No special client libraries

What is PipelineDB?

● Eventually factored out as extension

What is PipelineDB?

● Eventually factored out as extension

100,000 feet

Produce

Process

Consume

100,000 feet

Process

ConsumeSQLProduce

100,000 feet

Process

ConsumeSQLProduce

AggregationFilteringSliding windows

100,000 feet

Process

ConsumeSQLProduce

AggregationFilteringSliding windows

= Reduction

Why did we build it?

Produce

Process

Consume

ProduceConsume

PipelineDBcontinuous view

ProduceConsume

PipelineDB

Simplicity is nice,but what else?

continuous view

Benefits of continuous SQL on streams

● Aggregate before writing to disk

total data ingested

database size

CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream

total data ingested

database size

CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream

(winning)

● Sliding window queries

● Any information outside of the window is excluded from results and deleted from disk

● Essentially automatic TTL

CREATE CONTINUOUS VIEW v WITH (max_age = ‘1 hour’) AS SELECT COUNT(*) FROM stream

CREATE CONTINUOUS VIEW v AS SELECT COUNT(DISTINCT x) FROM never_ending_stream

● Probabilistic computations on infinite inputs

● Streaming Top-K, Percentiles, distincts, large set cardinalities● Constant space● No sorting● Small margin of error

(Demo #1)

Internals (part 1/2)

Streams andWorkers

Streams

● Internally, a stream is Foreign Table

CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);

Streams

CREATE STREAM stream (x int, y int, z int);INSERT INTO stream (x, y, z) VALUES (0, 1, 2);

● Internally, a stream is Foreign Table

● System-wide Foreign Server called pipeline_streams

● stream_fdw reads from/writes to the Stream Buffer

● No permanent storage

● Stream rows only exist until they’ve been fully read

stream buffer query on microbatch incremental table update

HeapTuple

● INSERT INTO ...

HeapTuple

● INSERT INTO ...

● Concurrent circular buffer

HeapTuple

● INSERT INTO ...

● Preallocated block of shared memory

HeapTuple

● INSERT INTO ...

HeapTuple {0,1,0,1,1}

HeapTuple

● INSERT INTO ...

HeapTuple

● INSERT INTO ...

HeapTuple {1,1,1,1,1}✗

/* At Postmaster startup time ... */worker.bgw_main = any_function;worker.bgw_main_arg = (Datum) arg;

RegisterDynamicBackgroundWorker(&worker, &handle);

HeapTuple

SELECT count(*), avg(x) FROM stream

HeapTuple

SELECT count(*), avg(x) FROM stream

stream_fdw#GetStreamScanPlan

while (!BatchDone(node)){ tup = PinNext(buf) yield MarkAsRead(tup)}

{1000, 4000}

microbatch_result

Worker proc 0

Worker proc 1

Worker proc ...

Worker proc n

tuples round-robin’d acrossn worker procs

Worker process parallelism

Stream buffer

Internals (part 2/2)

IncrementalUpdates

● transition_state = combine(microbatch_tstate, existing_tstate)

● pipeline_combine catalog table maps combine functions to aggregates

● No changes to pg_aggregate catalog table or existing aggregate functions

● User-defined aggregates just need a combinefunc to be combinable

CREATE AGGREGATE combinable_agg(x)( sfunc=sfunc, finalfunc=finalfunc, combinefunc=combinefunc,);

microbatch_result

{1000, 4000}

microbatch_result

{1000, 4000}

{5000, 10000}5000

combine()

microbatch_result

{1000, 4000}

{5000, 10000}5000

combine()

existing on-disk row

microbatch_result

{1000, 4000}

{5000, 10000}5000

combine()

{6000, 14000}

updated_result

existing on-disk row

lookup_plan = get_plan(SELECT * FROM matrel WHERE hash_group(x, y, z) IN (...))

/* dynamically generate a VALUES node */foreach(row, microbatch) values = lappend(values, hash_group(row));

set_values(lookup_plan, values)

existing = PortalRun(lookup_plan, ...)

/* now we’re reading to combine these on-disk tuples with the incoming batch result */

● This query needs to be as fast as possible

● Continuous views indexed on a 32-bit hash of grouping

● Pro: maximize cardinality of the index keyspace, great for random perf

● Con: must deal with collisions programmatically

SELECT * FROM matrel WHERE hash_group(x, y, z) IN (hash(microbatch group), ...)

● If the grouping contains a time-based column, we can do better

CREATE ... AS SELECT day(timestamp), count(*) FROM stream GROUP BY day

● These continuous views are indexed with 64 bits:hash of grouping

Timestamp from group (32 bits) Regular 32-bit grouping hash

● These continuous views are indexed with 64 bits:hash of grouping

● Pro: most incoming groups will have a similar timestamp, so better index caching

● Con: larger index footprint

Timestamp from group (32 bits) Regular 32-bit grouping hash

microbatch result generated from stream by worker✔

microbatch result generated from stream by worker

existing result retrieved from disk

✔✔

combine_plan = get_plan(SELECT group, combine(count), combine(avg) FROM microbatch_result UNION existing GROUP BY group);

combined = PortalRun(combine_plan, ...)

foreach(row, combined){ if (new_tuple(row)) heap_insert(row, …); else heap_update(row, …);}

✔✔

combine_plan = get_plan(SELECT group, combine(count), combine(avg) FROM microbatch_result UNION existing GROUP BY group);

combined = PortalRun(combine_plan, ...)

foreach(row, combined){ if (new_tuple(row)) heap_insert(row, …); else heap_update(row, …);}

grouping (a, b, c)

grouping (d, e, f)

grouping (g, h, i)

grouping (j, k, l)

On-disk groupings are sharded over combiners by group

Each row is guaranteed to only ever be updated by one combiner process

Combiner process parallelism

Continuous view

(Demo #2)

Demo #2

● Wikipedia data summarized by hour and url● <hour | project | page title | view count | bytes served>

CREATE STREAM wiki_stream(

hour timestamp,project text,title text,view_count bigint,size bigint

CREATE CONTINUOUS VIEW wiki_stats ASSELECT project::text, sum(view_count::bigint) AS total_views,

fss_agg(title, 3) AS titlesFROM wiki_streamGROUP BY hour, project;

Tuning Tips

● continuous_query_num_workers○ n workers per server○ n workers reading microbatches for the same query

Tuning Tips

● continuous_query_num_combiners○ n combiners per server○ n combiners per continuous query

Tuning Tips

● continuous_query_batch_size○ Larger batch size will give better performance for aggregations

Tuning Tips

● step_size (string)○ CREATE CONTINUOUS VIEW v WITH (step_size = ‘1 second’, max_age = ‘1 minute’)...○ Larger step size will yield less granular sliding but better performance

Tuning Tips

● step_size (string)○ CREATE CONTINUOUS VIEW v WITH (step_size = ‘1 second’, max_age = ‘1 minute’)...○ Larger step size will yield less granular sliding but better performance

● step_factor (integer in (0, 100])○ CREATE CONTINUOUS VIEW v WITH (step_factor = 10, max_age = ‘1 minute’)...○ Larger step factor will yield less granular sliding but better performance

?github.com/pipelinedb

derek@pipelinedb.com

Thanks!

Streaming PostgreSQL with...

Documents

Transcript of Streaming PostgreSQL with...

The PostgreSQL Global Development Group · 2019-09-26 · PostgreSQL 9.5.19 Documentation The PostgreSQL Global Development Group. PostgreSQL 9.5.19 Documentation by The PostgreSQL

Cne online 10.28.09

SQL on Kafka - Meetupfiles.meetup.com/6368962/usmanm_kafka.pdf · • Fork of PostgreSQL 9.5 • Leverage existing clients and tools that support vanilla PostgreSQL. Continuous Query

PostgreSQL: Introduction and Concepts - justpainjustpain.com/eBooks/Databases/PostgreSQL/PostgreSQL Introductio… · Title: PostgreSQL: Introduction and Concepts Author: Bruce Momjian

Migration to PostgreSQL - preparation and … · Overview Oracle to PostgreSQL Informix to PostgreSQL MySQL to PostgreSQL MSSQL to PostgreSQL Replication and/or High Availability

Resolución CNE

Manually Upgrading PostgreSQL 9.1to PostgreSQL 9.4

Cloud PostgreSQL Automation Management with Ansibleinfo.citusdata.com/rs/235-CNE-301/images/Cloud... · Simple systems automation/orchestration framework Easy to learn and use Little

Cne Programme

Hacking PostgreSQL · PostgreSQL Source Code Hacking PostgreSQL Final Code PostgreSQL Subsystems Hacking the PostgreSQL Way Top Level Backend Code Backend Code - Down the Rabbit Hole

cne cambrige

Расширяемость PostgreSQL для хакеров и архитекторовmegera/postgres/talks/PostgreSQL... · Расширяемость PostgreSQL для хакеров

CNE-CPBP20 Compact power bank with digital capacious power ... · CNE-CPBP20B black 5291485006181 CNE-CPBP20W white 5291485006198 CNE-CPBP20 Compact power bank with digital display

Requirement of Continuous Professional Development COUNTRY ... · The Continuing Nursing Education (CNE) / Continuing Medical Education (CME) As of 26th AJCCN COUNTRY Regulations

Contents What is CNE? Why do CNE? CNE Teams€¦ · CNE Operations • Network End Points • Counter Terrorism • Single End Points • Data Harvesting • Effects • CNE Scarborough

CNE Moartfest 2014

Renal Biopsy Cne

CNE-1 C666-1 HK1 CNE-2

2014ToolStorageFlyer CNE

Choices - cne-eg.com