Post on 27-Apr-2020
Datenbanksysteme II Web-scale Data ManagementMulti-Tenancy & SaaSMulti Tenancy & SaaS
28.1.2010Felix Naumann
Web-scale Data Management (WDM)Web-scale Data Management (WDM)
Bi D t2
Big DataPBs of data, 102-105 nodes
OperationalHigh qps, few rows/op Bi T bl D PNUTS
AnalyticLow qps, billions of rows/opM R d H d D d
■ 20 PB processed every day at Google (2008)
e.g., BigTable, Dynamo, PNUTS MapReduce, Hadoop, Dryad
■ Trillions of rows, hundreds of columns/table
■ Structured data, text, images, video
□ 15h of video uploaded to YouTube every minute
■ Data is partitioned, computation is distributed
R di 20PB ld t k 12 t 50MB/□ Reading 20PB would take 12 years at 50MB/s
[http://btw2009.uni-muenster.de/oe20/cms/media/melnik_BTW09_keynote.swf]
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Key enabler: VirtualizationKey enabler: Virtualization
3
Big DataPBs of data, 102-105 nodes
OperationalHigh qps, few rows/op
AnalyticLow qps, billions of rows/opg qp , / p
e.g., BigTable, Dynamo, PNUTSqp , / p
MapReduce, Hadoop, Dryad
Vi t li tiVirtualization(Scalability)
Multi-TenancyMap N logical systems into 1
physical system
Load BalancingMap 1 logical system into N
physical systems
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
physical system physical systems
Platform developmentPlatform development
4/21Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Sources for slides (heavily used)Sources for slides (heavily used)
5
■ Dean Jacbs (SAP): Implementing Software as a ServiceBTW 2009 Tutorial
■ Dean Jacobs (SAP) & Stefan Aulbach (TUM): Ruminations on ■ Dean Jacobs (SAP) & Stefan Aulbach (TUM): Ruminations on Multi-Tenant DatabasesBTW 2007
■ Alfons Kemper (TUM): Database Technology for SaaSMemorial Symposium for Klaus Dittrich 2008
■ Burt Kaliski (EMC): Multi-Tenant Cloud Computing: From Cruise ■ Burt Kaliski (EMC): Multi-Tenant Cloud Computing: From Cruise Liners to Container Ships3rd Asia-Pacific Trusted Infrastructure Technologies Conference (APTC 2008)(APTC 2008)
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
OverviewOverview
6
■ Everything as a Service: The Cloud
■ Software as a Service
S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy
■ Multi-Tenancy Database yEnhancements
□ Schema mappings
□ Chunk Tables
■ Summary & Outlook
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Cloudy weatherCloudy weather
7
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
What is the Cloud?What is the Cloud?
8
■ Cloud computing is Internet-based development and use of computer technology. In concept, it is a paradigm shift whereby details are abstracted from the users who no longer need[says who?]
knowledge of, expertise in, or control over[dubious – discuss] the technology infrastructure “in the cloud” that supports them[clarification needed] . Cloud computing describes a new supplement, consumption, and delivery
d l f IT i b d I t t d it t i ll i l th model for IT services based on Internet, and it typically involves the provision of dynamically scalable and often virtualized resources as a servic over the Internet.
http://en wikipedia org/wiki/Cloud computinghttp://en.wikipedia.org/wiki/Cloud_computing
The interesting thing about cloud computing is that we've redefined cloud computing to include everything that we already do. I can't think of anything computing to include everything that we already do. I can t think of anything that isn't cloud computing with all of these announcements. The computer industry is the only industry that is more fashion-driven than women's fashion. Maybe I'm an idiot, but I have no idea what anyone is talking about. What is it? It's complete gibberish. It's insane. When is this idiocy going to stop?
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
It s complete gibberish. It s insane. When is this idiocy going to stop?Larry Ellison
Jokes with CloudsJokes with Clouds
9
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
W3QS paper (Konopnicki & Shmueli 1995)
EaaS – Everything as a ServiceEaaS – Everything as a Service
10
SaaS: Software as a Service
App-lication
PaaS: Platform as S i
as a Service
Niche
lication
PlatformIaaS:
a Service
PlatformInfrastructure asa Service
Breadth
Infrastructure
Breadth
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
IaaS PaaS SaaS – CloudificationIaaS, PaaS, SaaS – Cloudification
11
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Service Value to UsersService Value to Users
12
Salesforce.com, Google docs, Email
iservices
Microsoft Azure, Google App Engine,
force comforce.com
Amazon ElasticAmazon ElasticCompute Cloud (EC2)
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Why will the cloud always win (over local computing)?local computing)?
13
…fiberglass cablePhoton
overoverelectron
fiberglass cable
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
OverviewOverview
14
■ Everything as a Service: The Cloud
■ Software as a Service
S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy
■ Multi-Tenancy Database yEnhancements
□ Schema mappings
□ Chunk Tables
■ Summary & Outlook
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
What is SaaS? by Dean JacobsWhat is SaaS? by Dean Jacobs
15
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
SaaS: Software as a ServiceSaaS: Software as a Service
16
■ Service provider hosts an application that multiple customers access over the Internet
□ Sales marketing support HR payroll planning □ Sales, marketing, support, HR, payroll, planning, manufacturing, inventory, financials, purchasing
■ Leverage economy of scale to reduce the total cost of ownership (TCO) of the application
□ Capital expenditures – hardware, software
O ti l dit b d idth l□ Operational expenditures – bandwidth, personnel
■ Particularly appealing for small- to medium-sized businesses that do not have a complex data centerp
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Modern Distributed IT ArchitectureModern Distributed IT Architecture
17
Services
Services
Services Services Services
Virtualization – Dynamic acquisition and release of computing resources
Application Infrastructure – Application servers and databases
Services Services Services
■ The system is shared by many services and many customers
Hardware – Massive, geographically-distributed farms of commodity components
y q p g
■ The system is shared by many services and many customers
■ Services may be used to implement other services
■ Solution vendors utilize various upper and lower interfaces
□ Salesforce: Rents data centers, provides CRM software
□ Google: Owns data centers, many basic services, AppEngine
□ Amazon: Owns data centers, storage and compute services
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Two Types of CostsTwo Types of Costs
18
■ Capital Expenditures (CapEx)
□ Cost of acquiring or upgrading physical assets such as CapExphysical assets, such as equipment, property, software, or buildings
CapEx
■ Operational Expenditures (OpEx)
□ Costs for the day-to-day running of a business including
OpEx
running of a business, including salaries, rent, and utilities
Total Cost of Ownership
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Two Cost-Reduction TechniquesTwo Cost-Reduction Techniques
19
1. Operational Automation (see lecture on Load Balancing)
□ Automatically acquire and release computing resources
□ Automatically provision configure and tune systems□ Automatically provision, configure, and tune systems
□ Automatically detect and recover from failures
□ Requires that operational decision-making be simple
□ Requires a small number of subsystems with simple interactions
2 Multi-Tenancy2. Multi Tenancy
□ Consolidate multiple tenants into the same process
□ Worth the effort only if enough tenants fit on the given hardware
□ Reduces CapEx because resource utilization is increased
□ Reduces OpEx because there are fewer processes to manage□ Reduces OpEx because there are fewer processes to manage
□ Question: Where is Multi-Tenancy feasible & viable?
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Features versus CostsFeatures versus Costs
■ Fundamental trade-off in software design: more features result in higher costs20
□ Document search requires additional servers for indexing and queries
□ Reporting increases the load on the database
□ End-user extension of the base application complicates upgrades
□ Disaster recovery requires a remote data center
□ …
DevelopmentDecreaseOpEx
AddFeatures
DevelopmentPriorities
Add
DecreaseCapEx
Decrease
DecreaseCapEx
AddFeatures
DecreaseOpEx
On-Premises Software Software as a Service
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Application Complexity – what works for SaaS?SaaS?
21
■ Applications vary in complexity from Basic to Advanced
Capabilities Costs
A li ti B i C fi ti T ti l R Eff t t S lApplication Type
Business Processes
Configuration and Extension
Transactional Guarantees
Resource Usage
Effort to Operate
Scala-bility
(Too) Basic
Point Solution Simple Minimal,
Self-Service Weaker Lower, Uniform Lower Higher
The sweet spot for SaaS appears to be applications of moderate
(Too) Advanced
Integrated Suite Complex Comprehensive,
Consultants Stronger Higher, Diverse Higher Lower
■ The sweet spot for SaaS appears to be applications of moderate complexity: Economy of scale works best here.
□ Degressionsgewinn
□ Großbetriebsvorteil
□ Rationalisierungseffekt
□ Wirtschaftlichkeit durch Massenproduktion□ Wirtschaftlichkeit durch Massenproduktion
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Customer PreferencesCustomer Preferences
Basic22
2007 Software Revenues
Application Overall Market Size
SaaSMarket Size SaaS Share
BasicServices
SupplierRelationshipM. $ 3,632 $ 455 13 %
CustomerRelationshipM. $ 10,516 $ 1,207 11 %
HumanCapitalM $ 6 569 $ 710 11 %
TheSweetSpot HumanCapitalM. $ 6,569 $ 710 11 %
BusinessIntelligence $ 6,501 $ 455 7 %
GovernanceRiskCompliance $ 1,709 $ 97 6 %
p
SupplyChainM. $ 3,973 $ 215 5 %
ProductLifecycleM. $ 6,082 $ 166 3 %
E t i R Pl i $ 11 757 $ 212 2 %EnterpriseResourcePlanning $ 11,757 $ 212 2 %AdvancedServices
One reason for this distribution is that SaaS has a higher percentage of g p gsmall businesses, which generally automate only more basic processes
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Service Provider PreferencesService Provider Preferences
23
Highr
Use
r On-premise
Cost
Per
TheSweetS tC
SaaS
Economy of Scale
Spot
LowCRM/SRM ERP/FIHCMEmail Collaboration
Application ComplexityLow High
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Multi-Tenancy in PracticeMulti-Tenancy in Practice
Big iron24
chin
eBig iron
10000 1001000# tenants per database
of
Mac
10000 100 101000
Siz
e
10000 100 101000 1Blade
Low Hi h
Email CRM ERPProj Mgmt Banking
Complexity of ApplicationLow High
by Alfons KemperFelix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Disruptive Innovation? Disruptive Innovation?
25
■ Existing solutions target high end of the market
□ Feature-rich, hard to use, expensive
Th th d f t□ They over-serve the needs of many customers
□ Innovation, if any, consists of adding new features
■ Disruptive solutions target non-users and the low end■ Disruptive solutions target non users and the low end
□ Feature-poor, easy to use, inexpensive
□ Technology that enables the disruption may be sophisticated
■ Once the disruptors gain a foothold, they gradually march up market to obtain higher margins
h h h d l d ll d f h■ The high-end solution vendors are eventually squeezed out of the market because they are constitutionally unable to adapt to the new cost structure
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
OverviewOverview
26
■ Everything as a Service: The Cloud
■ Software as a Service
S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy
■ Multi-Tenancy Database yEnhancements
□ Schema mappings
□ Chunk Tables
■ Summary & Outlook
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Scale up scale out scale inScale up, scale out, scale in
27
■ Scale up: Big iron
■ Scale out: Commodity hardware
“S l i ” M lti l / t t / ■ “Scale in”: Multiple apps / tenants / VMs on single machine
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
http://infolab.stanford.edu/pub/voy/museum/pictures/display/0-4-Google.htm
ScalabilityScalability
28
■ Handle large data sets as well as many data sets
■ Two basic techniques
S l U ll b f l □ Scale up – Use a small number of large servers
□ Scale out – Use a large number of small servers
1 75 billion rows 50 TB of data in DB+NFS
CRM Case Studies (October 2007)
Scale Up: Salesforce.com Scale Out: RightNow 1.75 billion rows 139,000 tenants (35,000 customers) 8 Oracle RAC databases (17,000
50 TB of data in DB+NFS 3000 tenants (1800 customers) 200 MySQL servers (1-100s ( ,
tenants/instance) 170 million transactions per day
y Q (tenants/instance)
17 million transactions per day
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Advantages of Scale OutAdvantages of Scale Out
29
■ More bang for the buck
□ Power consumption scales cubically with clock frequency
F ilit t ll t f t ti■ Facilitates all aspects of automation
■ Simplifies failure handling
□ If a server looks suspicious just swap it out□ If a server looks suspicious, just swap it out
□ Ensures individual failures affect fewer users
■ Enables incremental adjustments of capacity
■ Enables incremental rolling upgrades
□ Gain production experience with a small set of users
□ Eliminate down-time by using side-by-side systems
□ Immediately roll back if problems arise
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Disadvantages of Scale OutDisadvantages of Scale Out
30
■ Frequent load balancing and capacity adjustments are required to achieve good utilization
□ Otherwise over-provisioning is required to handle temporary □ Otherwise over provisioning is required to handle temporary load peaks
□ Must not shuffle around large amounts of data
M t b ti ll i l □ Must be operationally simple
■ Large data sets get distributed across multiple servers
□ Read-queries can be efficiently processed only if the data is q y p ydistributed so as to minimize inter-server communication
□ Write-queries require distributed transactions (not necessarily 2PC)2PC)
■ The CAP Theorem
□ Consistency of data, Availability of data, Partition-tolerance (ability to scale out)
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Scale in“: Multi Tenancy„Scale in : Multi Tenancy
32
■ Reminder
□ Capital Expenditures – cost of acquiring or upgrading physical assets such as equipment property or buildingsassets such as equipment, property, or buildings
□ Operational Expenditures – costs for the day-to-day running of a business, including salaries, rent, and utilities
■ Reduce expenditures: Multi-Tenancy
□ Consolidate multiple tenants into the same process
□ Worth the effort only if enough tenants fit on the given hardwarehardware
□ Reduces CapEx because resource utilization is increased
□ Reduces OpEx because there are fewer processes to manage
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Multi-tenancy – literallyMulti-tenancy – literally
33
■ Multiple clients hosted by one service provider
□ = multiple tenants hosted in one building complex
C d (t b t d)■ Code (to be executed)
□ = utilities (gas, electric, water, waste)
■ Data (with services)■ Data (with services)
□ = storage space (furniture, basement, garage)
■ Four models…
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Single homesSingle homes
34
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Different spaces, same time
Private apartmentsPrivate apartments
35
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Different spaces, same time, shared services
Hotel roomsHotel rooms
36
Same space different times
time
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Same space, different times
Youth hostelYouth hostel
37
http://www.graubuenden.ch/sommerurlaub/unterkunft-angebote/unterkuenfte/jugendherbergen/
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Same space, same time
Cost & performanceCost & performance
38
273 USD 54 USD 27 USD 10 USDCost per person per night?
273 USD 54 USD
1,000,000 / 10 years / 365 day
27 USD 10 USD
6/500 = 0.012 4/100 = 0.04 2/25 = 0.08 6/25 = 0.24Beds per m²?
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Isolation: Make other tenants invisibleIsolation: Make other tenants invisible
39
High Fences Strong Walls GoodHigh Fences Strong Walls Housekeeping
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Make other tenants invisible?Make other tenants invisible?
40
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
But here?
Trust & SecurityTrust & Security
41
very high high medium lowvery high high medium low
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Trust in Storage and ServicesTrust in Storage and Services
42
■ Data does not do anything
□ Creep under walls
St l th d t□ Steal other data
□ Infect other data
■ Isolated services■ Isolated services
■ Careful schema design
■ Careful query translation
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Multi-TenancyMulti-Tenancy
43
■ Consolidate multiple businesses (tenants) onto the same operational system
■ Pool resources to improve their utilization■ Pool resources to improve their utilization
□ Avoid provisioning each tenant for their maximum load
□ Breaks down isolation: weakens security, increases resource y,contention, interferes with optimizations
■ Provide a tenant-aware administrative framework to improve t ffi imanagement efficiency
□ Manage farms of individual multi-tenant servers
□ Support bulk operations such as rolling upgrade□ Support bulk operations such as rolling upgrade
□ Support tenant migration within and across farms
■ Here: Focus on schemas and queries
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs & Stefan Aulbach
OverviewOverview
44
■ Everything as a Service: The Cloud
■ Software as a Service
S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy
■ Multi-Tenancy Database yEnhancements
□ Schema mappings
□ Chunk Tables
■ Summary & Outlook
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Multi-Tenant DatabasesMulti-Tenant Databases
45
■ Assume the application has a base schema that may be extended by each tenant
□ New columns for existing tables and new tables□ New columns for existing tables and new tables
□ Common for enterprise applications like CRM and ERP
■ Pool database resources
□ Processes, memory, connections, prepared statements
□ Trade-offs against isolation
■ Provide a tenant-aware administrative framework
□ Manage farms of individual multi-tenant databases
S d□ Support DML and DDL operations across tenants
□ Support tenant migration between databases
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs & Stefan Aulbach
Implementation OptionsImplementation Options
46Database processMachine
■ Shared MachineDatabase processMachine
■ Shared Process Isolation■ Shared Process
■ Shared Table Resource PoolingPooling
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs & Stefan Aulbach
Shared MachineShared Machine
47
■ Memory requirements for a database with one empty CRM schema instance
PostgresSQL MaxDB COTS 1 COTS 2 COTS 3
55 MB 80 MB 171 MB 74 MB 273 MB
■ Cannot scale beyond tens of tenants per server
■ Appropriate for applications with a smaller number of larger tenants, e.g., for banking
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs & Stefan Aulbach
Shared ProcessShared Process
48
■ Memory requirements for a database with 10,000 empty CRM schema instances
PostgresSQL MaxDB COTS 1 COTS 2 COTS 3
79 MB 80 MB 616 MB 2061 MB 359 MB
55 MB 80 MB 171 MB 74 MB 273 MBempty
■ Should scale up to thousands of tenants
55 MB 80 MB 171 MB 74 MB 273 MBempty
■ If each tenant gets their own table space, migration entails simply moving files
■ Connection pooling is possible but then tenant identity must be ■ Connection pooling is possible, but then tenant identity must be managed by the application
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs & Stefan Aulbach
Shared tableShared table
■ Data from many tenants in the same tables49
■ Data from many tenants in the same tables□ Add a tenant_id column
□ Tenant queries must fix the value for this column◊ By connection or by applicationy y pp
■ Extend base schema using generic columns□ May be varchar or a mix of types□ The database must compactly represent sparse tables
■ Advantage - everything is pooled□ Processes, memory, connections, prepared statements□ Easy DML and DDL operations across tenants
Add d t d t t ith DML ( t DDL)□ Add, remove, and extend tenants with DML (not DDL)■ Disadvantage - Isolation is very weak
□ Irrelevant data infects query processing◊ Optimization statistics◊ Optimization statistics◊ Table scans◊ Data locality
□ No indexes or integrity constraints on generic columnsg y g□ Migration requires querying the operational system
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs & Stefan Aulbach
Schema Flexibility Requirementsfor SaaSfor SaaS
50
■ Each tenant has a logical schema consisting of the base schema and a set of extensions
□ Extensions may be private or shared□ Extensions may be private or shared
■ The logical schemas from multiple tenants are mapped into one physical schema (multi-tenancy)
■ The logical schemas evolve while the database is on-line
□ Must not require intervention of a DBA
□ Must have minimal impact on performance
Tenant 1 Tenant 2 Tenant 3 B E1 E2 E3Tenant 1 Tenant 2 Tenant 3B, E1 B, E2, E3 B, E1, E3 B, E1, E2, E3
Logical Schemas Physical Schema
Tenant 1B, E1
Tenant 2B, E2, E3
Tenant 3B, E1, E3 B, E1, E2, E3
Logical Schemas Physical Schema
Tenant 1B, E1
Tenant 2B, E2, E3
Tenant 3B, E1, E3
Logical Schemas
B, E1, E2, E3
Physical Schema
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Logical Schemas
by Dean Jacobs
OverviewOverview
51
■ Everything as a Service: The Cloud
■ Software as a Service
S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy
■ Multi-Tenancy Database yEnhancements
□ Schema mappings
□ Chunk Tables
■ Summary & Outlook
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Schema Mapping TechniquesSchema Mapping Techniques
52
■ Database Owns the Schema
□ Evolution of logical schemas requires on-line DDL
1 P i t T bl1. Private Tables
2. Extension Tables
3 Sparse Columns3. Sparse Columns
■ Application Owns the Schema
□ The application controls evolution of logical schemas
4. XML
5. Universal Tables
6. Pivot Tables
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
#1 Private Tables#1 Private Tables
53
■ Give each tenant their own private tables
□ SQL transformation: Renaming only
Great performance until the number of tables gets too high■ Great performance until the number of tables gets too high
□ Schema overhead: 4 KB/table * 100,000 tables = 400 MB
□ Index pages are only partly full and are hard to keep in memoryp g y p y p y
■ Used if the schema is small and there are few tenants
Account 17 Account_42
Automotive Extension
Account_17Account Name Hospital Beds
1 Acme St Mary 135
Account Name Dealers
1 Big 65
Healthcare Extension
Automotive Extension2 Gump State 1042
Account_35Account NameAccount Name
1 Ball
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Private Tables – Query TransformationPrivate Tables – Query Transformation
54
■ Tenant 17
□ SELECT BedsFROM Account
Account_17Account Name Hospital Beds
1 Acme St Mary 135FROM AccountWHERE Hospital = `State`
□ SELECT Beds
1 Acme St Mary 135
2 Gump State 1042
FROM Account_17WHERE Hospital = `State`
■ Tenant 42
How do we know this?
■ Tenant 42
□ SELECT NameFROM Account Account_42
WHERE Dealers > 50
□ SELECT NameFROM Account 42
Account Name Dealers
1 Big 65
FROM Account_42WHERE Dealers > 50
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
#2 Extension Tables#2 Extension Tables
■ Give each extension its own table55
□ Add a Row_Id column and join to reassemble rows
□ Horizontal partitioning
■ Give all tables a Tenant Id column and share tables■ Give all tables a Tenant_Id column and share tables
■ Additional join at runtime
■ Better consolidation than Private Table layout
□ But: Number of tables still grows in proportion to number of tenants
Account HealthCareAccountTenant Row Account Name
17 0 1 Acme
Account_HealthCareTenant Row Hospital Beds
17 0 St Mary 13517 0 1 Acme
17 1 2 Gump
35 0 1 Ball
17 1 State 1042
Account_Automotive42 0 1 Big Tenant Row Dealers
42 0 65Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Extension Tables – Query TransformationTransformation
56Account
■ Tenant 17
□ SELECT Name, BedsFROM Account
AccountTenant Row Account Name
17 0 1 AcmeFROM AccountWHERE Hospital = `State`
17 1 2 Gump
35 0 1 Ball
42 0 1 Big□ SELECT A.Name, H.Beds
FROM Account A, Account HealthCare H
42 0 1 Big
Account_HealthCare HWHERE A.Tenant = 17AND H.Tenant = 17AND A R H R
Tenant Row Hospital Beds
17 0 St M 135
Account_HealthCare
AND A.Row = H.RowAND H.Hospital = `State`
17 0 St Mary 135
17 1 State 1042
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
#3 Sparse Columns #3 Sparse Columns
57
■ Designed to handle data such as parts catalogs where each item has only a few out of thousands of possible attributes
■ Interpreted storage format to handle null values■ Interpreted storage format to handle null values
□ Fields in a row are stored along with their column identifiers
□ Only available in Microsoft SQL Server (others?)y Q ( )
□ Limited number of sparse columns per table are permitted
■ Extension fields added as sparse columns to each table
■ Database owns the schema: evolution requires on-line DDLAccountTenant Account Name SPARSETenant Account Name SPARSE
17 1 Acme 0:St Mary, 1:135
17 2 Gump 0:State, 1:1042
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
35 1 Ball
42 1 Big 0:65by Dean Jacobs
Sparse Tables – Query TransformationSparse Tables – Query Transformation
58
■ CREATE TABLE Account (Tenant INT, Account INT, Name VARCHAR(100), Hospital VARCHAR(100) SPARSE, Hospital VARCHAR(100) SPARSE, Beds INT SPARSE,Dealer INT SPARSE
)
AccountTenant Account Name Sparse
)
■ Tenant 17
□ SELECT Name, Beds
17 1 Acme 0:St Mary, 1:135
17 2 Gump 0:State, 1:1042
35 1 Ball□ SELECT Name, BedsFROM AccountWHERE Hospital = `State`
SELECT N B d
42 1 Big 2:65
□ SELECT Name, BedsFROM AccountWHERE Tenant = 17
Database owns the schema
AND Hospital = `State`
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
#4 XML#4 XML
59
■ Each base table is given an additional column that stores all extension fields in one (flat) XML document
■ Application owns the schema for extension fields so they can be ■ Application owns the schema for extension fields so they can be evolved without on-line DDL
■ IBM’s pureXMLTenant Account Name XMLDataTenant Account Name XMLData
17 1 Acme <data><hospital>St Mary</hospital><bed>135</bed>
</data>
17 2 Gump <data><hospital>State</hospital><bed>1024</bed><bed>1024</bed>
</data>
35 1 Ball ---
42 1 Big <data>
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
42 1 Big <data><dealers>65</dealers>
</data>
XML Tables – Query TransformationXML Tables – Query Transformation
60
■ Tenant 17
□ SELECT Name, BedsFROM Account
■ SELECT Name, xml…([data/bed])FROM AccountWHERE Tenant = 17
WHERE Hospital = `State` AND xmlexists('$x[data/hospital=`State`]‘PASSING XMLData AS „x“ );
Tenant Account Name XMLDataTenant Account Name XMLData
17 1 Acme <data><hospital>St Mary</hospital><bed>135</bed>
</data></data>
17 2 Gump <data><hospital>State</hospital><bed>1024</bed>
</data>
35 1 Ball ---
42 1 Big <data>
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
<dealers>65</dealers></data>
#5 Universal Tables#5 Universal Tables
61
■ Pack data into wide tables with generic VARCHAR columns
□ Not type-safe Casting necessary
V id M NULL l□ Very wide rows Many NULL values
□ No index support
■ Used if the schema is large or there are many tenants■ Used if the schema is large or there are many tenants
■ salesforce.com does this and makes it work (fast) by rebuilding indexing and query optimization
UniverseTenant Table Col1 Col2 Col3 Col4 … Col500
17 0 1 Acme St Mary 135 ---
17 0 2 Gump State 1042 ---
35 1 1 Ball --- --- ---35 1 1 Ball
42 2 1 Big 65 --- ---
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Universal Tables – Query TransformationTransformation
62
■ Tenant 17
□ SELECT Name, BedsFROM AccountFROM AccountWHERE Hospital = `State`
□ SELECT Col2, Col4 How do we know this?FROM UniverseWHERE Tenant = 17 AND Table = 0AND CAST(Col3 AS VARCHAR) = `State` AND CAST(Col3 AS VARCHAR) State
UniverseTenant Table Col1 Col2 Col3 Col4 … Col500Tenant Table Col1 Col2 Col3 Col4 … Col500
17 0 1 Acme St Mary 135 ---
17 0 2 Gump State 1042 ---
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
35 1 1 Ball --- --- ---
42 2 1 Big 65 --- ---
#6 Pivot Tables#6 Pivot Tables
■ Pack data into 3-ary tables with column_ids and values63
□ Each field of a row in logical table is given its own row.
□ Multiple pivot tables for each type (int, string, e.g.)
□ Eliminates handling many NULL values□ Eliminates handling many NULL values
□ Can solve the typing and indexing problem
■ Google BigTable does something like this
Pivot_IntTenant Table Row Col Int Pivot_String
Tenant Table Row Col String17 0 0 0 1
17 0 0 3 135
17 0 1 0 2
Tenant Table Row Col String
17 0 0 1 Acme
17 0 0 2 St Mary
17 0 1 3 1042
35 1 0 0 1
42 2 0 0 1
17 0 1 1 Gump
17 0 1 2 State
35 1 0 1 Ball42 2 0 0 1
42 2 0 2 65
35 1 0 1 Ball
42 2 0 1 Big
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
#6 Pivot Tables – Example #6 Pivot Tables – Example
64Account 17
Account_35Account_17Account Name Hospital Beds
1 Acme St Mary 135
Account Name
1 Ball
2 Gump State 1042 Account_42Account Name Dealers
1 Big 65g 65Pivot_IntTenant Table Row Col Int
17 0 0 0 1
Pivot_StringTenant Table Row Col String17 0 0 0 1
17 0 0 3 135
17 0 1 0 2
Tenant Table Row Col String
17 0 0 1 Acme
17 0 0 2 St Mary
17 0 1 3 1042
35 1 0 0 1
42 2 0 0 1
17 0 1 1 Gump
17 0 1 2 State
35 1 0 1 Ball
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
42 2 0 0 1
42 2 0 2 65
35 1 0 1 Ball
42 2 0 1 Big
Pivot Tables – Query TransformationPivot Tables – Query Transformation
65 Pivot_Int■ Reminder: Mapper knows
Tenant_Id, Table_Id, and Column_Id
■ Tenant 17
_Tenant Table Row Col Int
17 0 0 0 1
17 0 0 3 135■ Tenant 17
□ SELECT BedsFROM AccountWHERE H it l `St t `
17 0 1 0 2
17 0 1 3 1042
35 1 0 0 1WHERE Hospital = `State`
□ SELECT I.Int
42 2 0 0 1
42 2 0 2 65
Pivot String□ SELECT I.IntFROM Pivot_Int I, Pivot_String SWHERE I.Tenant = 17AND S Tenant = 17
_ gTenant Table Row Col String
17 0 0 1 Acme
17 0 0 2 St MaryAND S.Tenant = 17AND S.Table = 0 AND S.Col = 2AND I.Table = 0 AND I.Col = 3AND S String = `State`
17 0 0 2 St Mary
17 0 1 1 Gump
17 0 1 2 State
35 1 0 1 BallAND S.String = StateAND I.Row = S.Row
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
35 1 0 1 Ball
42 2 0 1 Big
Pivot Tables – Query TransformationPivot Tables – Query Transformation
66 Pivot_Int■ Tenant 17
□ SELECT Name, BedsFROM Account
_Tenant Table Row Col Int
17 0 0 0 1
17 0 0 3 135
WHERE Hospital = `State`
□ SELECT S1.String, I.Int
17 0 1 0 2
17 0 1 3 1042
35 1 0 0 1gFROM Pivot_Int I, Pivot_String S1, Pivot_String S2WHERE I.Tenant = 17AND S1 T t 17
42 2 0 0 1
42 2 0 2 65
Pivot StringAND S1.Tenant = 17AND S2.Tenant = 17AND S1.Table = 0 AND S1.Col = 1AND S2 Table = 0 AND S2 Col = 2
_ gTenant Table Row Col String
17 0 0 1 Acme
17 0 0 2 St MaryAND S2.Table = 0 AND S2.Col = 2AND I.Table = 0 AND I.Col = 3AND I.Row = S1.RowAND I.Row = S2.Row
17 0 0 2 St Mary
17 0 1 1 Gump
17 0 1 2 State
35 1 0 1 BallAND S2.String = `State`
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
35 1 0 1 Ball
42 2 0 1 Big
Google BigTableGoogle BigTable
67
■ Same basic idea as Pivot Tables
■ Columns are grouped into column families
C l f ili ■ Column families
□ Must be explicitly defined (owned by the database)
□ There should not be more than a few hundred in a table and □ There should not be more than a few hundred in a table and they should rarely change during operation
□ Each has an expected type (although all values are stored as Strings)
■ Columns
□ May be created on the fly (owned by the application)□ May be created on-the-fly (owned by the application)
□ May be an unbounded number of them
■ Data in a column family is compressed and stored together■ Data in a column family is compressed and stored together
□ Column family = Pivot Table
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Google BigTable ContinuedGoogle BigTable Continued
BigTable68
■ Data can be clustered together only if it is in the same BigTable instance
BigTableTenant Table Row Account
17 Acct 0 Id, 1same BigTable instance
■ All logical tables for a tenant must be packed into the same
17 Acct 0 Name, Acme
17 Acct 0 Hospital, St Mary
17 Acct 0 Beds, 135
BigTable instance
■ Mapping here: One column family per logical table
17 Acct 1 Id, 2
17 Acct 1 Name, Gump
17 Acct 1 Hospital, Statefamily per logical table
□ Here: Just one logical table
17 Acct 1 Beds, 1042
35 Acct 0 Id, 1
35 Acct 0 Name, Ball35 Acct 0 Name, Ball
42 Acct 0 Id, 1
42 Acct 0 Name, Big
42 Acct 0 Dealers 6542 Acct 0 Dealers, 65
Column FamilyFelix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
Problems in Implementing Multi-TenancyMulti-Tenancy
69
■ Resource contention among tenants
□ Resource governing to ensure fairness is difficult to implement
B t li i d i d t tl b d t t b h t □ But malicious and inadvertently bad requests must be shut down
□ Common practice is simply to forego operations whose p p y g presource usage can’t be bounded in advance
□ SLAs
■ Access control among tenants
□ May have to rely on the application or mapping algorithms rather than the databaserather than the database
■ Moving data for an individual tenant
□ For archiving and load balancing
□ Tuple-at-a-time operations are slow and resource intensive
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs
OverviewOverview
70
■ Everything as a Service: The Cloud
■ Software as a Service
S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy
■ Multi-Tenancy Database yEnhancements
□ Schema mappings
□ Chunk Tables
■ Summary & Outlook
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Reminder: Pivot TableReminder: Pivot Table
71Row: 0
■ Generic type-safe structure
□ Each field of a row in logical table is given its own row
Row: 0
table is given its own row.
□ Multiple pivot tables for each type (int, string, e.g.)
□ Eliminates handling many NULL values
P f■ Performance
□ Depends on the column selectivity of the query y q y(number of reconstructing joins)
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Alfons Kemper
Row FragmentationRow Fragmentation
72
■ Possible solution for addressing table utilization issues
□ Various storage techniques for individual fragments
H t f d l l t d t bl□ Hunt for densely populated tables
■ Idea: Split rows according to their “popularity”
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Alfons Kemper
Chunk TableChunk Table
73
Generic structure
■ Suitable if dataset can be partitioned into dense subsetspartitioned into dense subsets
■ Derived from Pivot table
■ Middle ground between U i l t bl d Pi t
Row: 0Universal table and Pivot table
Performance
Chunk 0 Chunk 1
■ Fewer joins for reconstruction if densely populated subsets can be extractedca be e t acted
■ Indexable
■ Reduced meta-data/data ti d d t ratio dependant on
chunk size
by Alfons KemperFelix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Further Enhancement: Chunk FoldingFurther Enhancement: Chunk Folding
74
■ Combine different schema mappings for best fit
□ Mixes Extension and Chunk □ Mixes Extension and Chunk Tables
□ Each fragment can be stored in an optimal schema layout
■ Optimal chunk folding depends ■ Optimal chunk folding depends on
□ Workload
□ Data distribution
□ Data popularity
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Alfons Kemper
Querying Chunk TablesQuerying Chunk Tables
75
Query Transformation
■ Row reconstruction needs many self- and equi-joins
C b t ti ll t l t d■ Can be automatically translated
Compilation Scheme:
1 Collect all table names and their corresponding columns from 1. Collect all table names and their corresponding columns from the logical source query
2. For each table, obtain the Chunk Tables and the meta-data identifiers representing the used columns
3. For each table, generate a query that filters the correct columns (based on the meta-data identifiers) and aligns the columns (based on the meta data identifiers) and aligns the different chunk relations on their ROW column.
4. Each table reference in the logical source query is extended by its generated table definition query
by Alfons KemperFelix Naumann | VL Datenbanksysteme II | Winter 2009/2010
Example queriesExample queries
76
■ Tenant 17
□ SELECT BedsFROM AccountFROM AccountWHERE Hospital = `State`
■ Replace Account Table with sub-select with Chunk table
■ Luck: Both relevant attributes are in same chunk
□ SELECT BedsFROM (SELECT St 1 H it l I t1 B dFROM (SELECT Str1 as Hospital, Int1 as Beds
FROM Chunkint|str
WHERE Tenant = 17AND Table = 0AND Chunk = 1) AS Account
WHERE Hospital = `State`WHERE Hospital State
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
OverviewOverview
77
■ Everything as a Service: The Cloud
■ Software as a Service
S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy
■ Multi-Tenancy Database yEnhancements
□ Schema mappings
□ Chunk Tables
■ Summary & Outlook
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
SummarySummary
78
■ The Cloud as a new paradigm to provide services
■ Applications are one type of such services: SaaS
Diff t f f li t i■ Different forms of scaling: up, out, in
■ Economy of scale by re-using processes: Mutliple tenants on single database instanceg
□ Problem: Isolation
■ Separate tenants by mapping many logical schemata to single physical schema
■ Many schema mappings techniques
□ Problem: Query transformation□ Problem: Query transformation
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010
OutlookOutlook
79
■ Native multi-tenancy-aware DB-kernel
■ Workload-aware reorganization
S i l l t f diff t t i■ Service level agreements for different user categories
■ Build Trust despite proces-sharing
□ Security & Privacy□ Security & Privacy
Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010