Postgres Plus Advanced Server v8.3 R2

25
Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 1 Presentation Title Presentation Sub-Title Breaking the Scalability Barrier of PostgreSQL with Infinite Cache and GridSQL January 14, 2010

Transcript of Postgres Plus Advanced Server v8.3 R2

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 1

Presentation TitlePresentation Sub-Title

Breaking the Scalability Barrier of PostgreSQL with Infinite Cache and GridSQL

January 14, 2010

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 2

First, a MySQL Comparison

• MySQL performance drops considerably under high concurrency

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 3

PostgreSQL Scalability

• PostgreSQL performs better, performance levels off

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 4

PostgreSQL Limits

• Maximum size for a database: unlimited• Maximum size for a table: 32 TB• Maximum size for a row: 400 GB• Maximum size for a field: 1 GB• Maximum number of rows in a table: unlimited• Maximum number of columns in a table: 250-1600

depending on column types• Maximum number of indexes on a table: unlimited

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 5

PostgreSQL Scalability

• SMP scalability good but limited to one system (can scale up, but not out)

• For large queries, will only use one core (no native multi-threading)

• Can leverage some outside components• May be forced to use techniques such as sharding,

but requires architectural changes

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 6

Other Scaling Strategies

• Use pgbouncer– Good for connection pooling, but only connection

pooling• Use replication (Slony, pgpool-II) and load balance

reads– Updates still need to be propagated and

processed on the slaves, consuming CPU• For query parallelism

– PL/Proxy: requires manually coding each query and using stored procedures

– pgpool-II: limited parallelism, not optimal

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 7

Postgres Plus Advanced Server

• Postgres Plus Advanced Server is an improved version of PostgreSQL with performance and management enhancements

• Created bv EnterpriseDB, which employs key PostgreSQL community members and is a major contributor to open source PostgreSQL

• Includes important community components like pgbouncer

• Includes Infinite Cache, exclusive to Advanced Server• Includes GridSQL for Business Intelligence and Data

Warehousing

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 8

Infinite Cache

• High performance horizontal scaling architecture for cache memory

• Cache expands with inexpensive commodity hardware

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 9

Infinite Cache Scalability

• Infinite Cache - Performance

Advanced Server is up to 5X faster on a 2 cache 27GB setup than a single machine with 8 GB Shared Buffers.

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 10

Infinite Cache Overview

• Designed for Read-Mostly applications (e.g. Content Management, Query Intensive)

• Cache is transparent to client applications, including updates (no cache coding needed)

• Cache scales using inexpensive commodity hardware• Infinite Cache can be run to boost single machine performance!• Created for:

– DBAs and data managers overseeing large amounts of data requiring fast response times for queries and reporting loads

– Developers who don’t want to write specialized caching code and re-architect their applications

• Easily deployable in the cloud

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 11

Infinite Cache Features

• Compression feature enables caching entire databases (e.g. put a 250 GB database into 32 GB RAM Cache)

• Populating Cache– Pre-warming at startup based on a list of tables– Multi-process parallel warming utility for multi-core systems– Any table at anytime– During normal usage, via LRU algorithm

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 12

Features – Statistics and Monitoring

• Infinite Cache Node Statistics– hostname, port, status, bytes_written, cmd_get, cmd_set,

connection_structures, curr_connections, curr_items, evictions, get_hits, get_misses, limit_maxbytes, pid, pointer_size, rusage_user, rusage_system, threads, total_time, total_connections, total_items, uptime, version

• Pg_catalog changes: the columns heap_blks_hit_icache and idx_blks_hit_icache added to– pg_statio_all_tables, pg_statio_sys_tables, pg_statio_user_tables,

pg_statio_all_indexes, pg_statio_sys_indexes, pg_statio_user_indexes

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 13

Features – Node Failure Handling

• Automatically detect and handle failed Infinite Cache Nodes– Requests re-routed to disk– Uptime of Database is not impacted

• Detect when node is back alive– If only a small number of buffers have been impacted, just invalidate

those– Otherwise, invalidate entire node cache

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 14

Infinite Cache Example

• Infinite Cache – Single Machine Performance

Advanced Server is 16X faster on a single machine with large amounts of memory (e.g. greater than 2 GB)

Infinite Cache can be usedon a single machine!

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 15

Infinite Cache Example

• Infinite Cache – PPAS vs PostgreSQL Performance

Advanced Server is faster than PostgreSQL on a single machine with large amounts of memory (e.g. 32GB)

PostgreSQL shops with heavy query loads should consider Advanced Server with Infinite Cache!

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 16

GridSQL Introduction

• GridSQL enables organizations to meet complex Data Warehousing and Business Intelligence challenges at a fraction of the cost of traditional solutions using a “shared-nothing”, distributed data architecture.

• GridSQL enables OLAP applications to leverage the power of multiple commodity servers while appearing as a single database to the application.

• Using GridSQL, database performance improves nearly linearly as additional servers are added to the grid.

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 17

GridSQL Details

• Designed for Parallel Querying• Loosely-coupled shared-nothing architecture• Utilizes EnterpriseDB Advanced Server or PostgreSQL • Data Loader for parallel loading• Not just “Read-Only”, can execute UPDATE, DELETE (~100+

transactions per second)• Transaction Support• Standard connectivity via PostgreSQL compatible connectors:

JDBC, ODBC, ADO.NET, libpq (psql)

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 18

GridSQL Architecture

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 19

GridSQL Configuration

• Can be configured for multiple logical “nodes” per physical server, to take advantage of multi-core processors

• Tables may be either replicated or partitioned• Replicated tables for static lookup data or dimensions• Partitioned tables for large fact tables• Tables may also simultaneously use Constraint Exclusion

Partitioning, creating subtables that fulfill constraints. • Combining native Constraint Exclusion Partitioning with

GridSQL Partitioning, – Large queries scan a much smaller subset of data by using subtables– Since each subtable is also partitioned across nodes, they are scanned

in parallel– Queries execute much faster

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 20

The Metadata Database

● Contains schema information including table partitioning and replication

● DDL issued to the GridSQL is recorded in the metadata database

● SQL requests made to the GridSQL interrogate the metadata database for partitioning and replication information to parallelize query plan

xsystables

xsyscolumns

xsysindexes

xsystabspaces

xsysconstraints

xsysindexkeys

xsysviews

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 21

Central Coordinator

• Multi-threaded process running on designated node that manages and coordinates work between the nodes

• Makes use of metadata information

• Performs traditional DBMS functions and manages interactions with the node agents– Parsing and optimizing– Scheduling and execution

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 22

Data Distribution

• Inserted Data Distributed for Partitioned Tables

Node 1

Node 2 Node 3 Node 4

3

1 24 5 6

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 23

The GridSQL Database

• GridSQL views its otherwise independent nodes together as one large single “virtual” database

• All separate instances of the underlying EnterpriseDB Advanced Server databases running on the individual nodes of the GridSQL are catalogued in the metadata database

• All interaction with GridSQL is done through GridSQL, and not directly with the nodes

• Tables are designated as being either partitioned, replicated, or on a single node

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 24

Scalability Summary

• Native community PostgreSQL is limited to one server • Some components may help scalability, like pgbouncer• Postgres Plus Advance Server's Infinite Cache feature can

provide a big performance improvement for read-only and read-mostly applications-- even on a single server, thanks to compression

• GridSQL allows for scale-out across multiple nodes for Data Warehousing and reporting type of applications, leveraging parallelism to significantly reduce response times for large queries

Copyright 2010 EnterpriseDB Corporation. All rights Reserved. Breaking the PostgreSQL Scalability Barrier Slide: 25

Presentation TitlePresentation Sub-Title

Thank You

Mason Sharp