EDB WHITE PAPER Multitenancy Options in Postgres · multitenancy requirements for their application...
Transcript of EDB WHITE PAPER Multitenancy Options in Postgres · multitenancy requirements for their application...
EDB WHITE PAPER
Multitenancy Options in Postgres
EnterpriseDB | www.enterprisedb.com
Functionality, Isolation and Limitations
By: Matthew Lewandowski Senior Sales Engineer
CO
NT
EN
TS
03 Postgres Architecture Overview
07 Postgres Multitenancy Options
08 Multiple Databases in a Single Cluster
10 Multiple Schemas in a Single Database
EnterpriseDB, EDB and EDB Postgres are trademarks of EnterpriseDB Corporation.
Other names may be trademarks of their respective owners. Copyright© 2019. All rights reserved. 20190903 W W W . E N T E R P R I S E D B . C O M
12 Single Schema in a Single Database
13 Other Options
18 Summary
03 Multitenancy in Databases
EDB WHITE PAPER / PAGE 3 W W W . E N T E R P R I S E D B . C O M
Typically, multitenancy refers to a
software application that serves multiple
distinct groups of users sharing a single
instance of running software. From a
database perspective, this means that a
single instance of a database is used by
multiple applications. If you have a use for
such a multitenant database deployment,
this document will show you the different
ways that Postgres can help you achieve
a multitenant database architecture
and what the differences are in terms of
functionality and the level of separation
between different tenants. To be able to
explain this topic fully, we’ll start with a
description of the Postgres architecture
and relevant components.
Multitenancy in Databases
There are two main components that
make up a Postgres installation: the
actual database software and one or
more database instances. Figure 1 below
shows a Postgres installation.
Postgres Architecture Overview
Components of a Postgres Installation
Figure 1
EDB WHITE PAPER / PAGE 4 W W W . E N T E R P R I S E D B . C O M
The Postgres database software
installation contains the binaries, or
executables, and related software
libraries for running Postgres database
instances and interacting with them. In
addition to the executable programs and
related libraries, a Postgres installation
includes header files, extensions,
additional server plug-in modules, and
sample scripts and files.
Postgres refers to database instances as
database clusters because each cluster/
instance can contain multiple, logically
separate databases. At the physical
layer, a Postgres database cluster
consists of configuration files, data
files, and log files. A running Postgres
database cluster includes a single top
level “postmaster” process with several
child processes that perform various
functions related to running, maintaining,
and connecting and interacting with the
instance. A running Postgres instance
is configured to listen for and accept
connections on a specific network port
on one or more of the IP addresses
configured on its host server.
At the top level of its functional
hierarchy, a Postgres cluster contains
a set of roles (users and groups), one
or more databases, and one or more
tablespaces. Roles are authorized
(assigned functional rights) to the
tablespaces, databases and the objects
within those databases. Tablespaces
are used as an organizing construct
for specifying where objects within the
databases are stored on the file system.
Clients connect to specific databases
within a Postgres cluster by specifying
the host and port that the cluster is
running on, the name of the database
within the cluster, the user for the
connection (along with any required user
credentials), and optional parameters to
be used for the connection as desired.
Each database within a Postgres cluster
contains one or more schemas, as well
as catalogs and any installed extensions.
Schemas serve as namespaces under
which the various database objects
such as tables, views, sequences,
and functions are contained. At a
minimum, each database will have a
public schema, but other schemas can
be created as needed. Unlike some
other relational database management
systems (RDBMS), in Postgres schemas
are independent of users. Although each
schema is owned by a particular user,
not every user will necessarily have
their own schema. Users and groups are
granted usage and other permissions
for schemas and the objects within the
schemas.
EDB WHITE PAPER / PAGE 5 W W W . E N T E R P R I S E D B . C O M
Users are typically granted access to
specific schemas; however, that may not
always be the case depending on the
privileges that have been given on the
schemas and objects within them to the
connected use. In some cases, any of the
objects within a particular database can be
accessed via a single database connection.
Access to information in other Postgres
databases within the cluster requires a
separate connection for each database.
However, through the use of database links
or foreign tables that establish a connection
to another database “under the covers”,
it is possible to access the information
in multiple databases via a single user
session.
The figure below depicts some example
user access scenarios within a Postgres
database cluster. The intent is not to show
examples of all possible access privileges,
but rather to reinforce the concept that
users (and groups) are global objects within
a Postgres database cluster that can be
assigned access to information in one or
more databases in the cluster. The figure
also shows that users (and groups) can
be assigned access to different levels of
information within the object hierarchy of a
given Postgres database.
Figure 2
Figure 2 shows the object hierarchy within a Postgres database:
EDB WHITE PAPER / PAGE 6 W W W . E N T E R P R I S E D B . C O M
In Figure 3 above, User 1, User 2, the
Schema N Owner User, and the DB1
Owner User have only been given access
to objects in Database 1 whereas User 5,
User 6, and the DB2 Owner User only have
access to objects in Database 2. Superuser
1 and User 3 have access to objects in both
databases. Superuser 1, the DB1 Owner
User, and the DB2 Owner User are users
who have access to all objects within a
database. User 1, User 2, User 3, User 6,
and the Schema N Owner User are users
that have access to all objects only within a
specific schema. Finally, User 3 and User 5
are users who only have access to specific
tables within a schema.
In addition to being able to control access to
a particular schema and a particular set of
tables and views within a schema, Postgres
also provides the ability to restrict a user’s
access to a specific set of information within
the underlying tables using views, row-level
security policies, or a combination of both.
Example User Access Scenarios for Postgres Database Cluster
Figure 3
EDB WHITE PAPER / PAGE 7 W W W . E N T E R P R I S E D B . C O M
This overview of the Postgres architecture
should make it easier to explain and
understand the multitenancy options
available in Postgres. From a database
perspective, multitenancy means that a
single instance of a database running on a
server is used by multiple client applications.
Postgres Multitenancy OptionsWith Postgres there are three main
approaches that support multitenancy using
this definition: (1) multiple databases in a
single Postgres cluster (i.e., instance), (2)
multiple schemas in a single database within
a cluster, and (3) different or shared tables in
a single schema in a single database within
a cluster.
Figure 4
Figure 4 provides a conceptual example of restricting information within a table for
different users and applications using these features.
EDB WHITE PAPER / PAGE 8 W W W . E N T E R P R I S E D B . C O M
Each option has its own benefits and
limitations. Organizations should
choose an option based on the specific
multitenancy requirements for their
application or system. When deciding
which approach is most appropriate for
a given environment, note that it is also
possible to combine features from the
different options as part of a hybrid
solution.
Let’s take a closer look at each of the
three main multitenancy options in the
following sections.
The first option in Postgres is to create a
separate database for each application
within a single cluster. This option
provides a high level of data isolation
between each application. Data stored
in each of the databases cannot
1. Multiple Databases in a Single Clusterbe accessed without establishing
separate connections to each specific
database. Postgres provides additional
connection control features that allow
administrators to further limit who can
access a specific database, which
Figure 5
Figure 5 shows these options.
EDB WHITE PAPER / PAGE 9 W W W . E N T E R P R I S E D B . C O M
network locations can access a specific
database, and which authentication
methods are allowed for a specific
database. Finally, Postgres system
tables are private within a database.
For multitenancy, this means that one
application would not be able to view
information in another application’s
system tables.
Since Postgres extensions and
plug-in modules are installed at the
database level, each application
can use a different set of extensions
to suit its needs without impacting
the other tenant applications. Also,
since each application has its own
database, the database schema
(i.e., metadata) used by each tenant
application can be easily customized
or even be completely different. Using
independent databases for each tenant
also means that application developers
do not need to worry about building
features into the application to only
retrieve data for a specific tenant
and to protect one tenant’s data from
leakage to another tenant, simplifying
development.
Since the databases are contained in
a single Postgres cluster, this option
supports shared administration and
resource usage across the different
tenants, which is often a key motivator
for deciding to use a multitenant
architecture. At the same time, this
option provides some flexibility to
separately configure the different
databases, as many standard
Postgres configuration parameters
can be set on a per-database basis.
In addition, the EnterpriseDB Postgres
Advanced Server (EPAS) edition
includes a resource manager feature
that can be used to define resource
groups that can be assigned at the
database level. Finally, by including
the database name in all log entries,
administrators and auditors can easily
differentiate one tenant application’s
specific database activity and errors
from another. The EnterpriseDB audit
feature available in EPAS provides
the ability to configure different audit
settings per database.
Although using a separate database
for each application can be a good
option for some multitenant use cases,
it does present some challenges.
One of the major drawbacks of this
option is that it is more difficult to
2. Multiple Schemas in a Single Database
Another multitenancy option available
in Postgres is to create separate
schemas for each application within a
single database in a cluster. By using
schema-level privileges to restrict the
access of a tenant application to only
the application’s specific schema, this
option provides some level of data
isolation between each application,
though not to the same degree as using
separate databases.
A major benefit of using separate
schemas in a single database is that,
since the schemas are in the same
database, cross-schema queries are
possible without the use of database
links or foreign tables. This is useful if
application schemas need to share a
common set of non-application-specific
information. To secure application-
specific information, shared information
is often placed in its own separate
schema. One of the advantages
of using a multi-schema approach
for multitenancy with one or more
common schemas is that it reduces
duplicate information. Cross-schema
queries make aggregating information
from multiple applications easier for
business intelligence and decision
support purposes, effectively breaking
down the walls between information
silos.
Like the multi-database multitenancy
approach, a multi-schema multitenancy
approach makes it easier to customize
EDB WHITE PAPER / PAGE 10 W W W . E N T E R P R I S E D B . C O M
aggregate information from multiple
tenants, should that need exist. To
query the contents of each database
a separate connection to each of
those databases is required. While it is
possible to query the contents of one
Postgres database from within another
Postgres database by using database
links or foreign tables, these are
additional components that would need
to be set up and managed. Another
drawback is that from an operations
and maintenance perspective it may
not scale well for a large number of
tenants. However, with the proper
monitoring and management tools,
like EnterpriseDB Postgres Enterprise
Manager (PEM), this concern can be
addressed.
Despite the drawbacks, using separate
databases for each application is a
good multitenant database approach
to take for certain use cases. If security
and data isolation are the most
important factors in deciding which
multitenancy approach to take, this is
an ideal option to choose.
the tables and other database objects
to suit a particular application’s needs.
Also, like the multi-database approach,
using independent schemas for each
tenant also means that application
developers do not need to worry about
building special filtering and data
leakage prevention features into their
applications. In addition, if the database
components of a tenant application
need to be copied or moved, the
Postgres dump and restore facilities
provide the ability to do so on a per-
schema basis. EDB Postgres provides
a clone schema feature to support
this use case as well. A multi-schema
approach also provides the benefit of
shared administration of and shared
resource usage by tenants.
The multi-schema approach to
multitenancy does have some
weaknesses under certain use cases.
For instance, it does not offer as high a
level of data isolation and configuration
control as using separate databases.
Since all application schemas are in
the same database, information in the
system tables is not private to any
one application. Also, in a schema-
per-tenant model, if one tenant runs
some resource-intensive process it could
impact the other tenants. If there is a large
number of tenant schemas, each with
a large number of tables, there may be
additional challenges in managing system
performance, as Postgres vacuuming
operations would need to search across
and be performed against a higher
number of relations (i.e., tables). Finally,
from an application upgrade perspective,
as the number of tenant schemas
increases, additional table definitions and
other metadata may need to be updated,
which could affect the time required to roll
out application changes.
Although there are situations where
a schema-per-tenant approach is not
ideal, it is a good option for many use
cases as it provides a good mix of data
isolation, cross-container query capability,
application schema customizability,
shared administration, and shared
resource usage. This option is well-suited
for a moderate number of tenants with a
moderate set of tables in each application
schema. Serving the middle ground, this
option is often chosen as a starting point
when requirements and long-term needs
do not clearly point to using one of the
other options.
EDB WHITE PAPER / PAGE 11 W W W . E N T E R P R I S E D B . C O M
The third available multitenancy option
is to have all tenants share a single
schema in a single database within
a cluster. Using this approach, each
tenant typically shares a common set
of tables with a column in the tables
for identifying each tenant. Postgres
features such as row-level security and
security barrier views are then used to
restrict an application’s access to only a
specific application tenant at any given
time. Note that it would also be possible
for different tenant applications to use
different sets of tables within a single
schema. However, in most cases there
would be little value in doing so over
having each tenant use separate
schemas, as this would present
unnecessary object naming challenges.
A main benefit of a single-schema,
shared-tables approach is that only
a single set of tables needs to be
maintained, thus enabling a high
degree of shared administration and
shared resource usage. This simplifies
and often reduces the time required
to roll out table and other database
object definition updates that are
part of an application upgrade. Also,
since the data for all tenants is stored
in a single set of tables, querying
and aggregating data from multiple
tenants is much simpler. Finally, this
multitenancy approach also scales
very well for applications that may end
3. Single Schema in a Single Databaseup having many thousands of tenants,
especially if the core application uses
a relatively small number of tables
and each tenant stores a relatively
small amount of data.
Using a shared set of tables presents
some challenges that need to be
understood and potentially overcome
before deciding if it is the right option.
First of all, since the data for all tenants
is stored in a shared set of tables, if
data isolation is required then extra
steps need to be taken to ensure that
one tenant’s data is not accessible
to or leaked to another tenant,
potentially leveraging features such
as row-level security. Similarly, if the
application needs to be able to copy
or move a tenant to another server,
features would need to be built into
the application to support this. Next,
per-tenant customizations may be
difficult to implement and often result
in unconventional or non-standard
constructs, which may be difficult to
maintain in the long run. Also, since
each of the tenants shares a common
set of table definitions, tenant-specific
application updates may be impossible.
Finally, storing a large amount of data
from a large number of tenants in a
single table may impact performance
and require additional monitoring and
tuning. Similarly, if a single tenant
stores and updates a large amount of
EDB WHITE PAPER / PAGE 12 W W W . E N T E R P R I S E D B . C O M
data in a shared table it could impact
the performance of all the tenants: the
“noisy neighbor” effect.
Under the right use cases, the single-
schema, shared-tables multitenancy
option can be a good choice, despite
some challenges. This option is
especially well-suited for many SaaS
applications that may scale to many
thousands of tenants. For these
applications, shared administration
and resource usage are often the most
important factor. This option doesn’t
allow the underlying data model to be
customized without impacting other
tenant applications; however, the built-
in NoSQL capabilities of Postgres can
be used to work around this limitation if
needed.
Other Options
The three options discussed so far
are the Postgres multitenancy options
that most closely match the common
understanding of multitenancy—that
the application serves multiple tenants
using a single instance of the software
running on a single server. However,
there are other options that do not
exactly fit within this definition, but are
nevertheless worth considering as they
may meet other needs. For example,
for some environments it might be
appropriate to run different Postgres
clusters on the same server for different
applications. Deploying and running
Postgres via Docker containers is
another option that may be suitable
for many modern applications. A brief
description of both of these options
follows.
EDB WHITE PAPER / PAGE 13 W W W . E N T E R P R I S E D B . C O M
EDB WHITE PAPER / PAGE 14 W W W . E N T E R P R I S E D B . C O M
Multiple Clusters on the Same Server
As discussed previously, it is possible
to run multiple Postgres clusters on
a single server. Each instance could
then be used to support one or more
applications. Since multiple instances
of Postgres would be running, this
option does not conform to the
conventional definition of multitenancy.
However, it does provide many of the
same benefits as the single-instance,
multiple-database approach as well as
some additional ones. Figure 6 shows
multitenancy using multiple Postgres
clusters.
Figure 6
Using multiple clusters provides a
greater level of data isolation and
security than the other previously
discussed options. One of the major
benefits of having a dedicated
Postgres cluster for an application
is that point-in-time recovery (PITR)
operations can be performed without
impacting other applications. Also,
like the multi-database and multi-
schema approaches, a multi-cluster
approach provides the ability for
each application to easily tailor table
definitions and other metadata to suit
its specific needs.
Although one of the goals of
multitenancy is to promote shared
administration and shared resource
usage, for some use cases this may not
be as important, or even desired. Each
cluster runs its own set of processes
and has its own set of configuration
parameters and database users. As
such, in a multi-cluster environment
most of the administrative activities will
be cluster-specific. This can benefit
organizations who have a need for
isolated administration and isolated
resource usage.
With a proper set of tools, the
challenges of managing multiple
Postgres clusters can be reduced.
For example, EnterpriseDB has tools
like Postgres Enterprise Manager
(PEM), which provides a single
“pane of glass” interface that can
be used to monitor and manage
multiple Postgres clusters across an
organization’s enterprise. In addition,
EnterpriseDB’s tools for managing
backup and recovery, EDB Postgres
Backup and Recovery Tool (BART), and
high-availability configurations, EDB
Postgres Enterprise Failover Manager
(EFM), support multiple clusters.
Although each cluster would mostly be
administered separately, the clusters
would all be using the same Postgres
installation. Therefore, activities such
as applying patches or minor version
updates would apply to every cluster.
Depending on requirements, running
separate clusters on the same server
for different applications might be the
best choice. It is a good candidate for
consideration if data, administration,
and resource isolation is required
or acceptable. Prior to choosing
this option, be sure to understand
the server resources required for
each cluster to ensure that they are
sufficient.
Deployment via Docker Containers
More and more organizations are
beginning to use Docker containers
for deploying their applications and
databases. Containers offer increased
portability, simple and fast deployment,
enhanced productivity, and improved
security. They are a key technology in
EDB WHITE PAPER / PAGE 15 W W W . E N T E R P R I S E D B . C O M
EDB WHITE PAPER / PAGE 16 W W W . E N T E R P R I S E D B . C O M
today’s modern microservices-based
applications. Since containers have
a lightweight footprint and minimal
overhead, they make it possible to
deploy multiple containers running
Postgres on a single machine. It is
worth considering their use in support of
some multitenancy use cases. As part
of a multitenancy solution, a separate
Postgres container is often used for
each application.
Each running Postgres container
contains an installation of the Postgres
software, at least one Postgres cluster,
and the processes corresponding to
the cluster running in the container. In
production deployments, the cluster
data directory is normally mapped
to a storage volume attached to the
host. In addition, an orchestration
framework is usually used to run and
manage containers in production
environments. Kubernetes-based
orchestration frameworks are the most
common. Figure 7 shows a conceptual
example of using containers as part of
a multitenancy solution.
Figure 7
EDB WHITE PAPER / PAGE 17
As part of a multitenancy strategy,
the use of Postgres containers offers
the same benefits as a traditional
multi-cluster deployment, plus some
additional ones. For one, they allow
for an even higher degree of security,
connection control, and data and
process isolation, which is more along
the lines of running Postgres clusters
on separate machines. Running
Postgres in containers also makes
it easier to use different versions of
Postgres for different applications. Due
to the inherent nature of containers,
it is also much easier and faster to
spin up new database instances.
Not only does this make adding
new applications easier and faster,
but it also makes scaling up (and
down) database instances to support
changing usage requirements easier
and faster.
Organizations with a multitenancy
need that have embraced the use
of containers for their application
deployments may want to consider
the use of containers for their
databases as well. Since there are
some additional considerations when
running databases in containers,
most organizations would benefit
from working with vendors such as
EnterpriseDB, who have expertise
in deploying Postgres in containers
and related technologies such as
Kubernetes. EnterpriseDB not only
makes preconfigured containers
with different versions of Postgres
available, but also provides containers
that support Postgres high-availability
deployments, monitoring and
management, backup and recovery,
and load balancing. EnterpriseDB
containers have been designed to run
standalone or in Kubernetes-based
environments such as Google’s GKE
and Red Hat’s OpenShift. In addition,
the EnterpriseDB Professional Services
team can help an organization with
their container-based deployments of
Postgres.
W W W . E N T E R P R I S E D B . C O M
EDB® WHITE PAPER / PAGE 16 W W W . E N T E R P R I S E D B . C O M
A high-level overview of the Postgres
architecture helps provide some contextual
understanding of the multitenancy options
available with Postgres. There are three main
Postgres multitenancy options:
• Using multiple databases in a single
Postgres cluster (i.e., instance)
• Using multiple schemas in a single
Postgres database
• Using shared tables in a single
schema in a single Postgres database
Each option has its own strengths
and weaknesses. Other options worth
considering, are using multiple Postgres
clusters and deploying Postgres via Docker
containers.
SummaryWhen deciding among Postgres
multitenancy options, you should consider
system requirements and expected
usage. Depending on application needs,
you can also create a hybrid solution
combining elements from the different
options. EnterpriseDB, a company
providing Postgres-related support,
products, and services, has expertise in
deploying and maintaining Postgres and
can review an organization’s database
needs and help with deciding the right
database multitenancy strategy for an
environment.
EDB WHITE PAPER / PAGE 18 W W W . E N T E R P R I S E D B . C O M
EnterpriseDB | www.enterprisedb.com EnterpriseDB, EDB and EDB Postgres are trademarks of EnterpriseDB Corporation.
Other names may be trademarks of their respective owners. Copyright© 2019. All rights reserved. 20190903
DESIGN YOUR DATABASEARCHITECTUREWITH A TRUSTED PARTNER
EnterpriseDB (EDB), the Enterprise Postgres company, delivers an open source-based data management
platform based on PostgreSQL, optimized for greater scalability, security, and reliability. EDB Postgres
makes organizations smarter while reducing risk and complexity with enterprise-proven management tools,
security enhancements and Oracle compatibility. Over 4,000 customers worldwide deploy diverse workloads
including transaction processing, data warehousing, customer analytics and web-based applications, both
on-premises and in the cloud.
EDB is an innovator and major contributor to the Postgres community, serving 20% of the Fortune 500
and 15% of Forbes Global 2000 companies worldwide.
EDB is based in the Bedford, Massachusetts with offices around the globe.
About EnterpriseDB