2016 may-countdown-to-postgres-v96-parallel-query
-
Upload
ashnikbiz -
Category
Technology
-
view
171 -
download
0
Transcript of 2016 may-countdown-to-postgres-v96-parallel-query
Welcome Parallelism to PostgreSQL
Thursday, 19 May 2016
• Current State of Parallelism in PostgreSQL
• What was needed to bring server side parallelism – Work done in v9.4 and v9.5
• Parallel Query in v9.6
• Review some parallel plans
• Parallelism may not be used always
• Parallelism may not be useful always
• Parameters
• Benefits
• Questions
Agenda
2
• Client side parallelism – Application can open multiple sessions • One can run a batch with multiple application threads
• Server side languages can potentially do parallel operations
• I/O activity is taken off from main query execution process by walwriter and bgwriter
• effective_io_concurrency allows page prefetch requests to the kernel, for bitmap joins
• But there is no server side parallelism for dividing the same task among multiple-workers
Current State (v9.5) of Parallelism in PostgreSQL
3
v9.4
• Dynamic background workers
• Dynamic shared memory
• Implementation of shared memory message queues
v9.5
• Message propagation i.e. error messages from background worker can be sent to master and received by master
• Synchronization of state (GUC values, XID, CID mapping, current user and current dbetc)
• Parallel Contexts can be used by backend code to launch worker processes
A lot of work was needed and was done!
4
• Parallel Sequential Scan
• Parallel Joins
• Parallel Aggregates
• Though these are not in their best forms and have certain exceptions/limitations but they still work and quite useful!
v9.6: We have something that users can use!
5
Basically how parallelism is supposed to work
6
Let’s look at some plans
Sequential Scan without Parallelism
8
Parallel Sequential Scans
9
You may not get as many workers as you desire
10
Parallel Aggregate
11
Parallel Joins
12
Wow! So using ‘Parallel Workers’ should be
preferred!No, not really!
Parallel Query May not be used all the time
• Cost of working and coordinating among multiple worker processes defeats the advantage of parallelism
• Cost of setting up parallelism infrastructure is too high
• No worker process is available
14
Example
15
Parallel Query may not be good all the time
16
Parallel Query may not be good all the time
• It depends a lot on your hardware resources and process scheduling by your OS
• I tried various degree of parallelism on a test machine • 3 CPU, 3GB RAM • VM Running CentOS• Single I/O disk
• A simple ‘count’ on a table with 100million rows and 8 byte width• explain analyze select count(*) from pgbench_accounts ;
• It performs faster with parallel degree set to 0, as index scan is performed
• Make sure you have tuned your parameters well to help optimizer decide
17
Parameters Involved
Parameters which govern parallel query execution
• parallel_setup_cost
• parallel_tuple_cost
• max_worker_processes
• max_parallel_degree
• force_parallel_mode
• ALTER TABLE … SET (parallel_degree=n)
• ALTER FUNCTION … PARALLEL SAFE
• ALTER FUNCTION … COST
19
Benefits to the users
• Sequential scan on large tables would be faster
• Analytics workload involve aggregates would be faster
• Faster JOINs between large tables
• PostgreSQL v9.6 can be a good candidate for the backend database of data warehouse
• More parallel operations to come in future releases
20
What can you do?
• PostgreSQL Beta 1 is out
• Try it out…
• Test it…
• Break it…
• Report it
• Help PostgreSQL community make it better
21
Further Reading
• PGCon 2014: Implementing Parallelism in PostgreSQL, Robert Haas
• PGConf.US, 2016: PostgreSQL 9.6, Magnus Hagander
• PGCon, Ottawa 2015: Parallel Sequential Scan, Robert Haas and Amit Kapila at
• EnterpriseDB Blog: Parallelism Progress, Robert Haas
• Parallel Sequential Scan is Committed, Robert Haas
• EnterpriseDB Blog: Parallelism Becomes a Reality in Postgres, Amit Kapila
22