PostgreSQL as an Alternative to MSSQL

32
Alexei Krasner Nov 2015 PostgreSQL as MSSQL Alternative

Transcript of PostgreSQL as an Alternative to MSSQL

Page 1: PostgreSQL as an Alternative to MSSQL

Alexei KrasnerNov 2015

PostgreSQL as MSSQL Alternative

Page 2: PostgreSQL as an Alternative to MSSQL

What is PostgreSQL▪ Powerful, open source object-relational database system.▪ 15 years of active development and strong reputation.▪ Runs on all major operating systems (Linux, Unix, Mac

OS, Windows…).▪ Enterprise class database.▪ Large and responsive community.▪ Winner of the 2015 Database Trends and Applications

Readers Choice:– The most advanced open source database.– Best relational database.

Page 3: PostgreSQL as an Alternative to MSSQL

Lets Start With Standards▪ Fully ACID compliant.▪ Includes most of SQL:2008 data types along with

storage of binary objects.▪ Conforms to the ANSI-SQL:2008 standard:– Full support for subqueries (including sub-selects).– Read-Committed and serializable transaction isolation levels.– Full support for Primary keys, Foreign Keys, Joins, Views, Triggers,

Stored Procedures, Restrictions (check, unique and not null) and Cascading.

– Fully relational system catalog – multiple schema per database.▪ Native programming interfaces: Java, .NET, C/C++, Perl,

Python, ODBC

Page 4: PostgreSQL as an Alternative to MSSQL

Continue With a Little of Splurging▪ Multi-Version Concurrency Control (MVCC).▪ Asynchronous Replication, Load Balancing and Online/Hot Backups with

Point in Time Recovery.▪ Write Ahead Logging – fault tolerance.▪ Performance:

– Sophisticated Query Planner/Optimizer.– Compound, Unique, Partial and functional indexes.

▪ Supports: – International character sets, multi-byte encodings, Unicode, locale awareness.– Built-in Types – Geospatial, XML, JSON\JSONB, Ranges and Arrays!– NoSQL – Key-Value store with incredible performance and Full Text Search.

▪ Highly customizable and extensible.

Page 5: PostgreSQL as an Alternative to MSSQL

Before We Dive – Generalized Search Tree (GiST)▪ Advanced indexing system – different sorting and

searching algorithms:– B-tree, B+-tree, R-tree, Partial Sum trees, ranked B+-trees etc.– API for creating custom data types and extensible query methods

for search.▪ Decide WHAT to persist, HOW to persist and a way to

SEARCH for it.▪ Exceeds the general search algorithms using standard

B\R-trees.▪ Foundation for many public projects – OpenFTS and

PostGIS

Page 6: PostgreSQL as an Alternative to MSSQL

Features Deep Dive

▪ MVCC▪ Partitioning▪ Useful Data Types– Date and Time– Interval– Array– Ranges– JSON– HSTORE– XML

▪ PostGIS – Geographic

▪ Full Text Search▪ Server Side

Programming▪ Backup and Restore▪ High Availability,

Load Balancing and Replication– Sharding

▪ Big Data Readiness

Page 7: PostgreSQL as an Alternative to MSSQL

Multi Version Concurrency Control - MVCC▪ Reads should never block writes and

vice versa.▪ Each transaction sees a snapshot of

data (version).– Protection from viewing inconsistency –

transaction isolation.▪ Avoidance of explicit locking solutions

– minimize lock contention.▪ Table\Row level locking mechanism is

still available – although proper MVCC usage will provide performance benefits.

Page 8: PostgreSQL as an Alternative to MSSQL

Partitioning – Table Inheritance▪ Support of basic table partitioning via the table

inheritance concept.– Includes known partitioning benefits:▪ Improved heavy load query performance (on a single partition).▪ Sequential scan of a partition instead of index usage.▪ Bulk loads and deletes accomplished by adding or removing partitions.▪ Infrequent data can be migrated to a cheaper\slower storage solution.

– Range Partitioning:▪ Table partitioned into “ranges” defined by a single\set key column (e.g.

dates).– List Partitioning:▪ Table partitioned into a list of discrete values as partitioning keys.

– Hundred partitions is an acceptable limit, thousands of partitions will crucially harm performance.

Page 9: PostgreSQL as an Alternative to MSSQL

Useful Data Types▪ Date and Time – Date, Time, TimeStamp and

TimeStamp with zone.– Converted to and from Unix time.– Supports the INTERVAL type.– Very convenient casting and conversion to text.– Performance wise searching and sorting algorithms (including

zone\offset).▪ INTERVAL – representation of a period of time.– Possible negative interval values (e.g. year ago).– Intuitive arithmetic and persistence of time durations– Easy casting and converting to relevant types.– Performance wise searching and sorting algorithms on intervals.

Page 10: PostgreSQL as an Alternative to MSSQL

Useful Data Types Cont.▪ Array – supported as first-class datatype (actual field in

a table).– Contain any datatype (sub arrays too).– Parameters to functions as an array.– Usages – Functions results, aggregations, get\set array of data in\

from the application.▪ Range – Supported as first-class datatype.– Put range on TIME, INT or NUMERIC as a single data value.– Possible dedicated indexes to support queries utilizing ranges.– Exposed methods to define custom ranges.

Page 11: PostgreSQL as an Alternative to MSSQL

Useful Data Types Cont.▪ JSON – full support along with large dedicated set of utility

functions.– Known JSON\JSONB benefits – data transfer and integration

standard.– Transformation from\to types and tables.– Retrieval and construction of JSON data.– Parsing, casting and conversion.

▪ HSTORE – Fast key-value store as a datatype.– NoSQL capabilities – flexibility of schema-less data store.– Still ACID compliant.– Interchange data between JSON and HSTORE.

Page 12: PostgreSQL as an Alternative to MSSQL

Useful Data Types Cont.▪ XML – Supported as a first-class datatype.– Check well formedness + type-safe operations.– Querying using Xpath.– Producing XML content, Predicates, Processing, Mapping tables to

XML etc.

Page 13: PostgreSQL as an Alternative to MSSQL

PostGIS▪ Fully featured, reliable geospatial database project base on GiST

(Following ISO OGC)▪ SQL types and functions to manage vector geometries (spatial

data).▪ Capabilities:– Support for three dimensional data.– Support for geospatial formats (KML, GeoJSON)– Processing and analytics functions for vector and raster data.– Map “rastering” and geo queries.– Geo searches and reverse geo searches.

▪ Huge popularity and respect extension module – compered to ArcGIS

Page 14: PostgreSQL as an Alternative to MSSQL

Full Text Search▪ Online indexing of data and relevance ranking for

database searches.▪ Good Enough:– Stemming– Ranking– Multilingual– Fuzzy searches (misspelling)\ Accent.

Page 15: PostgreSQL as an Alternative to MSSQL

Server Side Programming▪ Super Extensible – functions, data types, procedural

languages, operators, aggregates etc.– Embedding Functions and Stored Procedures using procedural– PL/pgSQL, PL/Tcl, PL/Perl, PL/Python

▪ Triggers – tables, views and foreign tables.▪ Event Triggers – database global trigger.▪ Rule System – Query modification based on given rules.

Page 16: PostgreSQL as an Alternative to MSSQL

Backup and Restore▪ Extremely flexible dump utility – migration, replication

and backups becomes more reliable, controllable and configurable.– Compressed format or plain SQL (human readable).– Single table or whole database cluster.

▪ Approaches:– SQL Dump – file with generated SQL commands. On restore the

backed up commands will be replayed.– File system level backup – direct copy of PostgreSQL data files.

Restore will include reattaching the data files.– Continuous archiving – backing up Write Ahead Log (WAL) files.

On restore log commands will be replayed.

Page 17: PostgreSQL as an Alternative to MSSQL

High Availability, Load Balancing and ReplicationFeature Shared Disk

FailoverFile System Replication

Transaction Log Shipping

Trigger-Based Master-Standby Replication

Statement-Based Replication Middleware

Asynchronous Multimaster Replication

Synchronous Multimaster Replication

Most Common Implementation NAS DRBD Streaming Repl. Slony pgpool-II Bucardo  

Communication Method shared disk disk blocks WAL table rows SQL table rows table rows and row

locksNo special hardware required   X X X X X X

Allows multiple master servers         X X X

No master server overhead X   X   X    

No waiting for multiple servers X   with sync off X   X  

Master failure will never lose data X X with sync on   X   X

Standby accept read-only queries     with hot X X X X

Per-table granularity       X   X XNo conflict resolution necessary

X X X X     X

Page 18: PostgreSQL as an Alternative to MSSQL

Sharding and Replication▪ Pure Sharding:– pg_shard – popular sharding extension for PostgreSQL.▪ Running on Linux!

– BDR/UDR Project – Bi-Directional Replication which adds multi-master replication to PostgreSQL.▪ Running on Linux! Migration to windows only in a non-near future.▪ Forked of the main PostgreSQL source.

– Postgres-XL – all purpose fully ACID open source scale-out db solution. ▪ Running on Linux!▪ Forked of the main PostgreSQL source.

Page 19: PostgreSQL as an Alternative to MSSQL

Sharding and Replication Cont.▪ Via Replication:– Hot Standby – Reducing read loads from Master to slaves

(horizontal scale).– Streaming (or Bucardo, or other possible option) replication to

slaves.– Load balancing “write” queries to Master, “read” queries to

slaves.

Page 20: PostgreSQL as an Alternative to MSSQL

PostgreSQL and Big Data▪ PostgreSQL was used a decade before Hadoop launched, for

large data volumes and complex analytics (as the only pure open source).

▪ Today heavily used in mid-sized warehouses and data-marts (1-10 TB).

▪ Source of code for many big data systems:– Netezza (IBM).– Greenplum (Pivotal) – Open Source Massively Parallel Data Warehouse.– PipelineDB – open source, run SQL queries continuously on streaming data.– EnterpriseDB and CitusDB (commercial license) – fully scaled out Postgres.– Redshift (Amazon).

▪ PostgreSQL project continuously provide new features and better performance to support big data usage.

Page 21: PostgreSQL as an Alternative to MSSQL

PostgreSQL and Big Data – Features▪ Serious NoSQL database competitor.– JSON\B advanced features and ongoing massive development plan .– Extensions that provide NoSQL like API.

▪ Faster Sorts – text and long numeric sorting improvements.▪ TABLESAMPLE – result set of pseudo-random number of

rows to provide a data glimpse for further analysis.▪ Cubes, Rollups and Grouping Sets – summarizing and

exploring huge data sets in the OLAP way.▪ BRIN indexes – much faster, suits for TBs size tables on

incrementally increasing value fields (like timestamps or integers).

Page 22: PostgreSQL as an Alternative to MSSQL

PostgreSQL and Big Data – Features Cont.▪ Foreign Data Wrappers – linking external data (for

querying like local) for hybrid solutions.– Foreign schema import.– JOIN pushdowns

▪ Vacuum (garbage collection – deleting) – became parallel with multi-process mode (maintaining several large tables at once).

▪ Scaling UP – Multicore scalability improvements.

Page 23: PostgreSQL as an Alternative to MSSQL

Enterprise Wise

▪ Open Source▪ Reliability▪ Authentication▪ Logging▪ Documentation▪ Support▪ Maintenance

Page 24: PostgreSQL as an Alternative to MSSQL

Open Source▪ Available under the open source license – PostgreSQL

License.▪ Using, modifying and distributing in any open\close

form.▪ Extending and patching the relational database per

project\client etc.▪ Variety of modules, extensions and tools based on its

open source license.

Page 25: PostgreSQL as an Alternative to MSSQL

Reliability▪ PostgreSQL is relatively bug-free (compared to MSSQL).▪ Very large community reporting, fixing\workarounds

bugs.▪ Constantly growing community

Page 26: PostgreSQL as an Alternative to MSSQL

Authentication▪ Trust Authentication.▪ Password Authentication.▪ GSSAPI\SSPI Authentication – using Kerberos.▪ Ident Authentication.▪ Peer Authentication.▪ LDAP Authentication▪ RADIUS Authentication.▪ Certificate Authentication.▪ Pluggable Authentication Modules.

Page 27: PostgreSQL as an Alternative to MSSQL

Logging▪ Logs in one place.– Unlike MSSQL – error logs, event log, profiler log, agent log…

▪ Easily configurable logging level.▪ Easily redirect to CSV files and shipped to tables.▪ Easily redirect to System Log, Windows Event Log.▪ Logs are human readable with a great sysadmin value.

Page 28: PostgreSQL as an Alternative to MSSQL

Documentation▪ There is nothing more to add than a link:

http://www.postgresql.org/docs/

Page 29: PostgreSQL as an Alternative to MSSQL

Support▪ Community based support – seems like a fast one too.▪ Numerous companies specialized in enterprise support:

http://www.postgresql.org/support/professional_support/▪ Enterprise database management companies like:

EnterpriseDB▪ Total Cost of Ownership is significantly lower even with

enterprise support. (Based on reports. e.g. Gartner 2015).

Page 30: PostgreSQL as an Alternative to MSSQL

vs. MySQL

▪ ACID fully! compliant.▪ Subqueries and Joins.▪ Better locking mechanism.▪ JSON\JSONB support.▪ NoSQL and Key-Value store.▪ Advanced GIS abilities.▪ Full Text Search abilities.▪ Advanced and attractive data types.▪ Way better and useful extensibility patterns. ▪ Licensing issues.

Page 31: PostgreSQL as an Alternative to MSSQL

vs. PostgreSQL

▪ Partitioning based on table inheritance (Pros. and Cons.)

▪ Can be an overkill in case of simple read-heavy operations. (Improved in newer versions).

▪ Replication and Clustering (especially multi-master). Not “there” yet, but on a right track.

▪ Popularity – not as popular as MySQL (for example) but gains popularity constantly, as opposite to MySQL.

▪ Expertise issues – different syntax and administration (compared to MSSQL).

Page 32: PostgreSQL as an Alternative to MSSQL

THANK YOU