Grouping sets sfpug_20141118

85
GROUPING SETS CCUBE, ROLLUP, and Friends SFPUG 2014/11/18 Copyright© 2014 David Fetter Tuesday, November 18, 14

description

Upcoming features for PostgreSQL 9.5: GROUPING SETS, including CUBE and ROLLUP

Transcript of Grouping sets sfpug_20141118

Page 1: Grouping sets sfpug_20141118

GROUPING SETSCCUBE, ROLLUP, and Friends

SFPUG 2014/11/18Copyright© 2014David Fetter

Tuesday, November 18, 14

Page 2: Grouping sets sfpug_20141118

Thanks,

Tuesday, November 18, 14

Page 3: Grouping sets sfpug_20141118

Why?!?Tuesday, November 18, 14

Page 4: Grouping sets sfpug_20141118

Analyzing

Tuesday, November 18, 14

Page 5: Grouping sets sfpug_20141118

Reporting

Tuesday, November 18, 14

Page 6: Grouping sets sfpug_20141118

Tuesday, November 18, 14

Page 7: Grouping sets sfpug_20141118

• CUBE (Power set/Ring the changes)

• ROLLUP (Hierarchy)

• GROUPING SETS (Precision)

Tuesday, November 18, 14

Page 8: Grouping sets sfpug_20141118

Shhh. A little code.

Tuesday, November 18, 14

Page 9: Grouping sets sfpug_20141118

CREATE TABLE employee ( id SERIAL PRIMARY KEY, first_name TEXT, last_name TEXT);

CREATE TABLE sales ( employee_id INTEGER NOT NULL, sale_closed TIMESTAMPTZ NOT NULL DEFAULT NOW(), sale_amount MONEY, /* We need to do fix this */ FOREIGN KEY(employee_id) REFERENCES employee(id));

Tables

Tuesday, November 18, 14

Page 10: Grouping sets sfpug_20141118

INSERT INTO employee (first_name, last_name)VALUES ('Larry', 'Ellison'), ('Bill', 'Gates'), ('Vladimir', 'Yulianov');

Data

Tuesday, November 18, 14

Page 11: Grouping sets sfpug_20141118

Moar Data

INSERT INTO salesSELECT floor(random()*3)+1, /* Who */ '2014-01-01 00:00:00+00'::timestamptz + random() * interval '1 year', /* When */ (random() * 1000)::numeric(8,2)::MONEY /* ¿Cuando? */FROM generate_series(1,1000);

Tuesday, November 18, 14

Page 12: Grouping sets sfpug_20141118

How much did each sell each quarter?

Tuesday, November 18, 14

Page 13: Grouping sets sfpug_20141118

SIMPLE!

Tuesday, November 18, 14

Page 14: Grouping sets sfpug_20141118

SELECT employee_id, date_trunc('Quarter', sale_closed) AS "Quarter", SUM(sale_amount)FROM sales GROUP BY employee_id, date_trunc('Quarter', sale_closed)ORDER BY employee_id, date_trunc('Quarter', sale_closed);

* I left out some formatting.

Tuesday, November 18, 14

Page 15: Grouping sets sfpug_20141118

!"""""""""""""#"""""""""#""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %)"""""""""""""*"""""""""*""""""""""""+(12 rows)

Results:

Tuesday, November 18, 14

Page 16: Grouping sets sfpug_20141118

That's nice, BUT

(We all grimace when we hear that)

Tuesday, November 18, 14

Page 17: Grouping sets sfpug_20141118

How about annual totals?

Tuesday, November 18, 14

Page 18: Grouping sets sfpug_20141118

Old way:UNION ALL

Tuesday, November 18, 14

Page 19: Grouping sets sfpug_20141118

( SELECT employee_id, to_char(date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q') AS "Quarter", sum(sale_amount) FROM sales GROUP BY employee_id, date_trunc('Quarter', sale_closed) ORDER BY employee_id, date_trunc('Quarter', sale_closed))UNION ALL( SELECT employee_id, to_char(date_trunc('Year', sale_closed), 'YYYY') AS "Year", sum(sale_amount) FROM sales GROUP BY employee_id, date_trunc('Year', sale_closed) ORDER BY employee_id, date_trunc('Year', sale_closed));

Still Doable...Kinda

Tuesday, November 18, 14

Page 20: Grouping sets sfpug_20141118

Results!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 1 % 2014 % $160,477.14 %% 2 % 2014 % $165,131.20 %% 3 % 2014 % $158,285.04 %)"""""""""""""*"""""""""*"""""""""""""+(15 rows)

Tuesday, November 18, 14

Page 21: Grouping sets sfpug_20141118

That's nice, BUT

Tuesday, November 18, 14

Page 22: Grouping sets sfpug_20141118

Can't we look at each sales repwith each of their quarterly totals?

Tuesday, November 18, 14

Page 23: Grouping sets sfpug_20141118

ARGHH!!!!!!

Tuesday, November 18, 14

Page 24: Grouping sets sfpug_20141118

Tuesday, November 18, 14

Page 25: Grouping sets sfpug_20141118

These requests are reasonable!

Tuesday, November 18, 14

Page 26: Grouping sets sfpug_20141118

But the code...not so much.

Tuesday, November 18, 14

Page 27: Grouping sets sfpug_20141118

Take it from the top!

Tuesday, November 18, 14

Page 28: Grouping sets sfpug_20141118

CUBE...ring the changes...

Tuesday, November 18, 14

Page 29: Grouping sets sfpug_20141118

Quick stareSELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales GROUP BY CUBE ( employee_id, date_trunc('Quarter', sale_closed))ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Tuesday, November 18, 14

Page 30: Grouping sets sfpug_20141118

Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % 2014-Q1 % $126,311.81 %% % 2014-Q2 % $120,127.75 %% % 2014-Q3 % $118,708.84 %% % 2014-Q4 % $118,744.98 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(20 rows)

Tuesday, November 18, 14

Page 31: Grouping sets sfpug_20141118

That's nice, BUT

Tuesday, November 18, 14

Page 32: Grouping sets sfpug_20141118

We don't careabout undifferentiated

quarterly totals.

Tuesday, November 18, 14

Page 33: Grouping sets sfpug_20141118

Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % 2014-Q1 % $126,311.81 %% % 2014-Q2 % $120,127.75 %% % 2014-Q3 % $118,708.84 %% % 2014-Q4 % $118,744.98 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(20 rows)

Tuesday, November 18, 14

Page 34: Grouping sets sfpug_20141118

ROLLUP...hierarchy...

Tuesday, November 18, 14

Page 35: Grouping sets sfpug_20141118

Let's try that!

Tuesday, November 18, 14

Page 36: Grouping sets sfpug_20141118

SELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales

GROUP BY ROLLUP( employee_id, date_trunc('Quarter', sale_closed))ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Tuesday, November 18, 14

Page 37: Grouping sets sfpug_20141118

!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(16 rows)

Hmmm...

Tuesday, November 18, 14

Page 38: Grouping sets sfpug_20141118

That's nice, BUT

Tuesday, November 18, 14

Page 39: Grouping sets sfpug_20141118

There was an extra line.

Tuesday, November 18, 14

Page 40: Grouping sets sfpug_20141118

!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %% % % $483,893.38 %)"""""""""""""*"""""""""*"""""""""""""+(16 rows)

Tuesday, November 18, 14

Page 41: Grouping sets sfpug_20141118

Hierarchies: Top to Bottom

Tuesday, November 18, 14

Page 42: Grouping sets sfpug_20141118

We didn't want the top.

Tuesday, November 18, 14

Page 43: Grouping sets sfpug_20141118

GROUPING SETS...Precision

Tuesday, November 18, 14

Page 44: Grouping sets sfpug_20141118

SELECT employee_id, to_char( date_trunc('Quarter', sale_closed), 'YYYY-"Q"Q' ) AS "Quarter", sum(sale_amount)FROM sales GROUP BY GROUPING SETS( (employee_id, date_trunc('Quarter', sale_closed)), (employee_id))ORDER BY employee_id, date_trunc('Quarter', sale_closed);

Tuesday, November 18, 14

Page 45: Grouping sets sfpug_20141118

Results:!"""""""""""""#"""""""""#"""""""""""""$% employee_id % Quarter % sum %&"""""""""""""'"""""""""'"""""""""""""(% 1 % 2014-Q1 % $42,227.43 %% 1 % 2014-Q2 % $42,974.71 %% 1 % 2014-Q3 % $41,364.66 %% 1 % 2014-Q4 % $33,910.34 %% 1 % % $160,477.14 %% 2 % 2014-Q1 % $38,733.24 %% 2 % 2014-Q2 % $40,480.96 %% 2 % 2014-Q3 % $43,875.72 %% 2 % 2014-Q4 % $42,041.28 %% 2 % % $165,131.20 %% 3 % 2014-Q1 % $45,351.14 %% 3 % 2014-Q2 % $36,672.08 %% 3 % 2014-Q3 % $33,468.46 %% 3 % 2014-Q4 % $42,793.36 %% 3 % % $158,285.04 %)"""""""""""""*"""""""""*"""""""""""""+(15 rows)

Tuesday, November 18, 14

Page 46: Grouping sets sfpug_20141118

There we go!

Tuesday, November 18, 14

Page 47: Grouping sets sfpug_20141118

HOW?!?

Tuesday, November 18, 14

Page 48: Grouping sets sfpug_20141118

Extant Planner/Executor

Tuesday, November 18, 14

Page 49: Grouping sets sfpug_20141118

Extant Planner/Executor

•HashAgg

Tuesday, November 18, 14

Page 50: Grouping sets sfpug_20141118

Extant Planner/Executor

•HashAgg

•GroupAggTuesday, November 18, 14

Page 51: Grouping sets sfpug_20141118

HashAgg

Result Group Intermediate State

Tuesday, November 18, 14

Page 52: Grouping sets sfpug_20141118

HashAgg

• One pass:

• Update hash value for each row

• Output final value at the end

Tuesday, November 18, 14

Page 53: Grouping sets sfpug_20141118

HashAgg

• Not yet in GROUPING SETS

• Algorithmic speedup opportunity:

• O(n) vs. O(n log n)

Tuesday, November 18, 14

Page 54: Grouping sets sfpug_20141118

HashAgg-- :-(

• Non-hashable data types

• Aggregate functions with LOTS of state

• Ordered aggs

• Distinct aggs

• No spill-to-disk

Tuesday, November 18, 14

Page 55: Grouping sets sfpug_20141118

GroupAgg

• Sorts all input to the agg node to

• Detect group boundary

• Output that group

• Results before end-of-scan

Tuesday, November 18, 14

Page 56: Grouping sets sfpug_20141118

Phase I

Tuesday, November 18, 14

Page 57: Grouping sets sfpug_20141118

GroupAgg for ROLLUP

Tuesday, November 18, 14

Page 58: Grouping sets sfpug_20141118

GroupAgg for ROLLUP

• Sort for the heirarchy

Tuesday, November 18, 14

Page 59: Grouping sets sfpug_20141118

GroupAgg for ROLLUP

• Sort for the heirarchy

• Output results at each boundary

Tuesday, November 18, 14

Page 60: Grouping sets sfpug_20141118

GroupAgg for ROLLUP

• Sort for the heirarchy

• Output results at each boundary

• k for the price of one!

Tuesday, November 18, 14

Page 61: Grouping sets sfpug_20141118

Phase II

Tuesday, November 18, 14

Page 62: Grouping sets sfpug_20141118

GroupAgg !ROLLUP

Tuesday, November 18, 14

Page 63: Grouping sets sfpug_20141118

GroupAgg !ROLLUP

Tuesday, November 18, 14

Page 64: Grouping sets sfpug_20141118

GroupAgg !ROLLUP

• Re-plan input to sort with >1 order

Tuesday, November 18, 14

Page 65: Grouping sets sfpug_20141118

GroupAgg !ROLLUP

• Re-plan input to sort with >1 order

• Plan keeps tons of global state

Tuesday, November 18, 14

Page 66: Grouping sets sfpug_20141118

GroupAgg !ROLLUP

• Re-plan input to sort with >1 order

• Plan keeps tons of global state

• Does NOT like to be called >1x/plan

Tuesday, November 18, 14

Page 67: Grouping sets sfpug_20141118

Tuesday, November 18, 14

Page 68: Grouping sets sfpug_20141118

GROUPING SETS ~ WINDOW

Tuesday, November 18, 14

Page 69: Grouping sets sfpug_20141118

WINDOW implementation

Tuesday, November 18, 14

Page 70: Grouping sets sfpug_20141118

Shuffle a deck of WindowAgg and Sort nodes.

Tuesday, November 18, 14

Page 71: Grouping sets sfpug_20141118

WindowAgg → Sort → WindowAgg → Sort ...

Tuesday, November 18, 14

Page 72: Grouping sets sfpug_20141118

Similar pattern

Tuesday, November 18, 14

Page 73: Grouping sets sfpug_20141118

Tuesday, November 18, 14

Page 74: Grouping sets sfpug_20141118

• Expand all GROUPING SETS

Tuesday, November 18, 14

Page 75: Grouping sets sfpug_20141118

• Expand all GROUPING SETS

• Arrange into fewest ROLLUPs

Tuesday, November 18, 14

Page 76: Grouping sets sfpug_20141118

• Expand all GROUPING SETS

• Arrange into fewest ROLLUPs

• Shuffle Sort and ChainAgg

Tuesday, November 18, 14

Page 77: Grouping sets sfpug_20141118

GroupAgg → Sort → ChainAgg → Sort → (input data)

Tuesday, November 18, 14

Page 78: Grouping sets sfpug_20141118

ChainAgg?!?

Tuesday, November 18, 14

Page 79: Grouping sets sfpug_20141118

ChainAgg Nodes

• Pass input state through unchanged

• Update aggregate state

• Put rows into a chain-wide shared tuplestore when they hit a group boundary

Tuesday, November 18, 14

Page 80: Grouping sets sfpug_20141118

The Last GroupAgg

• Produces its normal output until end-of-data

• Outputs the shared tuplestore

Tuesday, November 18, 14

Page 81: Grouping sets sfpug_20141118

Phase III

Tuesday, November 18, 14

Page 82: Grouping sets sfpug_20141118

Future

Tuesday, November 18, 14

Page 83: Grouping sets sfpug_20141118

• HashAgg

• Alone?

• With ChainAggs?

• Agg Associativity (A + B) + C = A + (B + C)

• Make CUBE a reserved word?

Tuesday, November 18, 14

Page 84: Grouping sets sfpug_20141118

Questions?Comments?

Tuesday, November 18, 14