Datenbanksysteme II Web-scale Data Management Multi ... II Web-scale Data Management Multi-Tenancy &...

Datenbanksysteme II Web-scale Data ManagementMulti-Tenancy & SaaSMulti Tenancy & SaaS

28.1.2010Felix Naumann

Web-scale Data Management (WDM)Web-scale Data Management (WDM)

Bi D t2

Big DataPBs of data, 102-105 nodes

OperationalHigh qps, few rows/op Bi T bl D PNUTS

AnalyticLow qps, billions of rows/opM R d H d D d

■ 20 PB processed every day at Google (2008)

e.g., BigTable, Dynamo, PNUTS MapReduce, Hadoop, Dryad

■ Trillions of rows, hundreds of columns/table

■ Structured data, text, images, video

□ 15h of video uploaded to YouTube every minute

■ Data is partitioned, computation is distributed

R di 20PB ld t k 12 t 50MB/□ Reading 20PB would take 12 years at 50MB/s

[http://btw2009.uni-muenster.de/oe20/cms/media/melnik_BTW09_keynote.swf]

Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010

Key enabler: VirtualizationKey enabler: Virtualization

Big DataPBs of data, 102-105 nodes

OperationalHigh qps, few rows/op

AnalyticLow qps, billions of rows/opg qp , / p

e.g., BigTable, Dynamo, PNUTSqp , / p

MapReduce, Hadoop, Dryad

Vi t li tiVirtualization(Scalability)

Multi-TenancyMap N logical systems into 1

physical system

Load BalancingMap 1 logical system into N

physical systems

physical system physical systems

Platform developmentPlatform development

4/21Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010

Sources for slides (heavily used)Sources for slides (heavily used)

■ Dean Jacbs (SAP): Implementing Software as a ServiceBTW 2009 Tutorial

■ Dean Jacobs (SAP) & Stefan Aulbach (TUM): Ruminations on ■ Dean Jacobs (SAP) & Stefan Aulbach (TUM): Ruminations on Multi-Tenant DatabasesBTW 2007

■ Alfons Kemper (TUM): Database Technology for SaaSMemorial Symposium for Klaus Dittrich 2008

■ Burt Kaliski (EMC): Multi-Tenant Cloud Computing: From Cruise ■ Burt Kaliski (EMC): Multi-Tenant Cloud Computing: From Cruise Liners to Container Ships3rd Asia-Pacific Trusted Infrastructure Technologies Conference (APTC 2008)(APTC 2008)

OverviewOverview

■ Everything as a Service: The Cloud

■ Software as a Service

S l l t l i M lti■ Scale up, scale out, scale in: Multi-Tenancy

■ Multi-Tenancy Database yEnhancements

□ Schema mappings

□ Chunk Tables

■ Summary & Outlook

Cloudy weatherCloudy weather

What is the Cloud?What is the Cloud?

■ Cloud computing is Internet-based development and use of computer technology. In concept, it is a paradigm shift whereby details are abstracted from the users who no longer need[says who?]

knowledge of, expertise in, or control over[dubious – discuss] the technology infrastructure “in the cloud” that supports them[clarification needed] . Cloud computing describes a new supplement, consumption, and delivery

d l f IT i b d I t t d it t i ll i l th model for IT services based on Internet, and it typically involves the provision of dynamically scalable and often virtualized resources as a servic over the Internet.

http://en wikipedia org/wiki/Cloud computinghttp://en.wikipedia.org/wiki/Cloud_computing

The interesting thing about cloud computing is that we've redefined cloud computing to include everything that we already do. I can't think of anything computing to include everything that we already do. I can t think of anything that isn't cloud computing with all of these announcements. The computer industry is the only industry that is more fashion-driven than women's fashion. Maybe I'm an idiot, but I have no idea what anyone is talking about. What is it? It's complete gibberish. It's insane. When is this idiocy going to stop?

It s complete gibberish. It s insane. When is this idiocy going to stop?Larry Ellison

Jokes with CloudsJokes with Clouds

W3QS paper (Konopnicki & Shmueli 1995)

EaaS – Everything as a ServiceEaaS – Everything as a Service

SaaS: Software as a Service

App-lication

PaaS: Platform as S i

as a Service

lication

PlatformIaaS:

a Service

PlatformInfrastructure asa Service

Breadth

Infrastructure

Breadth

IaaS PaaS SaaS – CloudificationIaaS, PaaS, SaaS – Cloudification

Service Value to UsersService Value to Users

Salesforce.com, Google docs, Email

iservices

Microsoft Azure, Google App Engine,

force comforce.com

Amazon ElasticAmazon ElasticCompute Cloud (EC2)

Why will the cloud always win (over local computing)?local computing)?

…fiberglass cablePhoton

overoverelectron

fiberglass cable

OverviewOverview

□ Schema mappings

□ Chunk Tables

What is SaaS? by Dean JacobsWhat is SaaS? by Dean Jacobs

Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs

SaaS: Software as a ServiceSaaS: Software as a Service

■ Service provider hosts an application that multiple customers access over the Internet

□ Sales marketing support HR payroll planning □ Sales, marketing, support, HR, payroll, planning, manufacturing, inventory, financials, purchasing

■ Leverage economy of scale to reduce the total cost of ownership (TCO) of the application

□ Capital expenditures – hardware, software

O ti l dit b d idth l□ Operational expenditures – bandwidth, personnel

■ Particularly appealing for small- to medium-sized businesses that do not have a complex data centerp

Modern Distributed IT ArchitectureModern Distributed IT Architecture

Services

Services Services Services

Virtualization – Dynamic acquisition and release of computing resources

Application Infrastructure – Application servers and databases

Services Services Services

■ The system is shared by many services and many customers

Hardware – Massive, geographically-distributed farms of commodity components

y q p g

■ The system is shared by many services and many customers

■ Services may be used to implement other services

■ Solution vendors utilize various upper and lower interfaces

□ Salesforce: Rents data centers, provides CRM software

□ Google: Owns data centers, many basic services, AppEngine

□ Amazon: Owns data centers, storage and compute services

Two Types of CostsTwo Types of Costs

■ Capital Expenditures (CapEx)

□ Cost of acquiring or upgrading physical assets such as CapExphysical assets, such as equipment, property, software, or buildings

■ Operational Expenditures (OpEx)

□ Costs for the day-to-day running of a business including

running of a business, including salaries, rent, and utilities

Total Cost of Ownership

Two Cost-Reduction TechniquesTwo Cost-Reduction Techniques

1. Operational Automation (see lecture on Load Balancing)

□ Automatically acquire and release computing resources

□ Automatically provision configure and tune systems□ Automatically provision, configure, and tune systems

□ Automatically detect and recover from failures

□ Requires that operational decision-making be simple

□ Requires a small number of subsystems with simple interactions

2 Multi-Tenancy2. Multi Tenancy

□ Consolidate multiple tenants into the same process

□ Worth the effort only if enough tenants fit on the given hardware

□ Reduces CapEx because resource utilization is increased

□ Reduces OpEx because there are fewer processes to manage□ Reduces OpEx because there are fewer processes to manage

□ Question: Where is Multi-Tenancy feasible & viable?

Features versus CostsFeatures versus Costs

■ Fundamental trade-off in software design: more features result in higher costs20

□ Document search requires additional servers for indexing and queries

□ Reporting increases the load on the database

□ End-user extension of the base application complicates upgrades

□ Disaster recovery requires a remote data center

□ …

DevelopmentDecreaseOpEx

AddFeatures

DevelopmentPriorities

DecreaseCapEx

Decrease

DecreaseCapEx

AddFeatures

DecreaseOpEx

On-Premises Software Software as a Service

Application Complexity – what works for SaaS?SaaS?

■ Applications vary in complexity from Basic to Advanced

Capabilities Costs

A li ti B i C fi ti T ti l R Eff t t S lApplication Type

Business Processes

Configuration and Extension

Transactional Guarantees

Resource Usage

Effort to Operate

Scala-bility

(Too) Basic

Point Solution Simple Minimal,

Self-Service Weaker Lower, Uniform Lower Higher

The sweet spot for SaaS appears to be applications of moderate

(Too) Advanced

Integrated Suite Complex Comprehensive,

Consultants Stronger Higher, Diverse Higher Lower

■ The sweet spot for SaaS appears to be applications of moderate complexity: Economy of scale works best here.

□ Degressionsgewinn

□ Großbetriebsvorteil

□ Rationalisierungseffekt

□ Wirtschaftlichkeit durch Massenproduktion□ Wirtschaftlichkeit durch Massenproduktion

Customer PreferencesCustomer Preferences

Basic22

2007 Software Revenues

Application Overall Market Size

SaaSMarket Size SaaS Share

BasicServices

SupplierRelationshipM. $ 3,632 $ 455 13 %

CustomerRelationshipM. $ 10,516 $ 1,207 11 %

HumanCapitalM $ 6 569 $ 710 11 %

TheSweetSpot HumanCapitalM. $ 6,569 $ 710 11 %

BusinessIntelligence $ 6,501 $ 455 7 %

GovernanceRiskCompliance $ 1,709 $ 97 6 %

SupplyChainM. $ 3,973 $ 215 5 %

ProductLifecycleM. $ 6,082 $ 166 3 %

E t i R Pl i $ 11 757 $ 212 2 %EnterpriseResourcePlanning $ 11,757 $ 212 2 %AdvancedServices

One reason for this distribution is that SaaS has a higher percentage of g p gsmall businesses, which generally automate only more basic processes

Service Provider PreferencesService Provider Preferences

r On-premise

TheSweetS tC

Economy of Scale

LowCRM/SRM ERP/FIHCMEmail Collaboration

Application ComplexityLow High

Multi-Tenancy in PracticeMulti-Tenancy in Practice

Big iron24

eBig iron

10000 1001000# tenants per database

10000 100 101000

10000 100 101000 1Blade

Low Hi h

Email CRM ERPProj Mgmt Banking

Complexity of ApplicationLow High

by Alfons KemperFelix Naumann | VL Datenbanksysteme II | Winter 2009/2010

Disruptive Innovation? Disruptive Innovation?

■ Existing solutions target high end of the market

□ Feature-rich, hard to use, expensive

Th th d f t□ They over-serve the needs of many customers

□ Innovation, if any, consists of adding new features

■ Disruptive solutions target non-users and the low end■ Disruptive solutions target non users and the low end

□ Feature-poor, easy to use, inexpensive

□ Technology that enables the disruption may be sophisticated

■ Once the disruptors gain a foothold, they gradually march up market to obtain higher margins

h h h d l d ll d f h■ The high-end solution vendors are eventually squeezed out of the market because they are constitutionally unable to adapt to the new cost structure

OverviewOverview

□ Schema mappings

□ Chunk Tables

Scale up scale out scale inScale up, scale out, scale in

■ Scale up: Big iron

■ Scale out: Commodity hardware

“S l i ” M lti l / t t / ■ “Scale in”: Multiple apps / tenants / VMs on single machine

http://infolab.stanford.edu/pub/voy/museum/pictures/display/0-4-Google.htm

ScalabilityScalability

■ Handle large data sets as well as many data sets

■ Two basic techniques

S l U ll b f l □ Scale up – Use a small number of large servers

□ Scale out – Use a large number of small servers

1 75 billion rows 50 TB of data in DB+NFS

CRM Case Studies (October 2007)

Scale Up: Salesforce.com Scale Out: RightNow 1.75 billion rows 139,000 tenants (35,000 customers) 8 Oracle RAC databases (17,000

50 TB of data in DB+NFS 3000 tenants (1800 customers) 200 MySQL servers (1-100s ( ,

tenants/instance) 170 million transactions per day

y Q (tenants/instance)

17 million transactions per day

Advantages of Scale OutAdvantages of Scale Out

■ More bang for the buck

□ Power consumption scales cubically with clock frequency

F ilit t ll t f t ti■ Facilitates all aspects of automation

■ Simplifies failure handling

□ If a server looks suspicious just swap it out□ If a server looks suspicious, just swap it out

□ Ensures individual failures affect fewer users

■ Enables incremental adjustments of capacity

■ Enables incremental rolling upgrades

□ Gain production experience with a small set of users

□ Eliminate down-time by using side-by-side systems

□ Immediately roll back if problems arise

Disadvantages of Scale OutDisadvantages of Scale Out

■ Frequent load balancing and capacity adjustments are required to achieve good utilization

□ Otherwise over-provisioning is required to handle temporary □ Otherwise over provisioning is required to handle temporary load peaks

□ Must not shuffle around large amounts of data

M t b ti ll i l □ Must be operationally simple

■ Large data sets get distributed across multiple servers

□ Read-queries can be efficiently processed only if the data is q y p ydistributed so as to minimize inter-server communication

□ Write-queries require distributed transactions (not necessarily 2PC)2PC)

■ The CAP Theorem

□ Consistency of data, Availability of data, Partition-tolerance (ability to scale out)

Scale in“: Multi Tenancy„Scale in : Multi Tenancy

■ Reminder

□ Capital Expenditures – cost of acquiring or upgrading physical assets such as equipment property or buildingsassets such as equipment, property, or buildings

□ Operational Expenditures – costs for the day-to-day running of a business, including salaries, rent, and utilities

■ Reduce expenditures: Multi-Tenancy

□ Consolidate multiple tenants into the same process

□ Worth the effort only if enough tenants fit on the given hardwarehardware

□ Reduces CapEx because resource utilization is increased

□ Reduces OpEx because there are fewer processes to manage

Multi-tenancy – literallyMulti-tenancy – literally

■ Multiple clients hosted by one service provider

□ = multiple tenants hosted in one building complex

C d (t b t d)■ Code (to be executed)

□ = utilities (gas, electric, water, waste)

■ Data (with services)■ Data (with services)

□ = storage space (furniture, basement, garage)

■ Four models…

Single homesSingle homes

Different spaces, same time

Private apartmentsPrivate apartments

Different spaces, same time, shared services

Hotel roomsHotel rooms

Same space different times

Same space, different times

Youth hostelYouth hostel

http://www.graubuenden.ch/sommerurlaub/unterkunft-angebote/unterkuenfte/jugendherbergen/

Same space, same time

Cost & performanceCost & performance

273 USD 54 USD 27 USD 10 USDCost per person per night?

273 USD 54 USD

1,000,000 / 10 years / 365 day

27 USD 10 USD

6/500 = 0.012 4/100 = 0.04 2/25 = 0.08 6/25 = 0.24Beds per m²?

Isolation: Make other tenants invisibleIsolation: Make other tenants invisible

High Fences Strong Walls GoodHigh Fences Strong Walls Housekeeping

Make other tenants invisible?Make other tenants invisible?

But here?

Trust & SecurityTrust & Security

very high high medium lowvery high high medium low

Trust in Storage and ServicesTrust in Storage and Services

■ Data does not do anything

□ Creep under walls

St l th d t□ Steal other data

□ Infect other data

■ Isolated services■ Isolated services

■ Careful schema design

■ Careful query translation

Multi-TenancyMulti-Tenancy

■ Consolidate multiple businesses (tenants) onto the same operational system

■ Pool resources to improve their utilization■ Pool resources to improve their utilization

□ Avoid provisioning each tenant for their maximum load

□ Breaks down isolation: weakens security, increases resource y,contention, interferes with optimizations

■ Provide a tenant-aware administrative framework to improve t ffi imanagement efficiency

□ Manage farms of individual multi-tenant servers

□ Support bulk operations such as rolling upgrade□ Support bulk operations such as rolling upgrade

□ Support tenant migration within and across farms

■ Here: Focus on schemas and queries

Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs & Stefan Aulbach

OverviewOverview

□ Schema mappings

□ Chunk Tables

Multi-Tenant DatabasesMulti-Tenant Databases

■ Assume the application has a base schema that may be extended by each tenant

□ New columns for existing tables and new tables□ New columns for existing tables and new tables

□ Common for enterprise applications like CRM and ERP

■ Pool database resources

□ Processes, memory, connections, prepared statements

□ Trade-offs against isolation

■ Provide a tenant-aware administrative framework

□ Manage farms of individual multi-tenant databases

S d□ Support DML and DDL operations across tenants

□ Support tenant migration between databases

Implementation OptionsImplementation Options

46Database processMachine

■ Shared MachineDatabase processMachine

■ Shared Process Isolation■ Shared Process

■ Shared Table Resource PoolingPooling

Shared MachineShared Machine

■ Memory requirements for a database with one empty CRM schema instance

PostgresSQL MaxDB COTS 1 COTS 2 COTS 3

55 MB 80 MB 171 MB 74 MB 273 MB

■ Cannot scale beyond tens of tenants per server

■ Appropriate for applications with a smaller number of larger tenants, e.g., for banking

Shared ProcessShared Process

■ Memory requirements for a database with 10,000 empty CRM schema instances

PostgresSQL MaxDB COTS 1 COTS 2 COTS 3

79 MB 80 MB 616 MB 2061 MB 359 MB

55 MB 80 MB 171 MB 74 MB 273 MBempty

■ Should scale up to thousands of tenants

55 MB 80 MB 171 MB 74 MB 273 MBempty

■ If each tenant gets their own table space, migration entails simply moving files

■ Connection pooling is possible but then tenant identity must be ■ Connection pooling is possible, but then tenant identity must be managed by the application

Shared tableShared table

■ Data from many tenants in the same tables49

■ Data from many tenants in the same tables□ Add a tenant_id column

□ Tenant queries must fix the value for this column◊ By connection or by applicationy y pp

■ Extend base schema using generic columns□ May be varchar or a mix of types□ The database must compactly represent sparse tables

■ Advantage - everything is pooled□ Processes, memory, connections, prepared statements□ Easy DML and DDL operations across tenants

Add d t d t t ith DML ( t DDL)□ Add, remove, and extend tenants with DML (not DDL)■ Disadvantage - Isolation is very weak

□ Irrelevant data infects query processing◊ Optimization statistics◊ Optimization statistics◊ Table scans◊ Data locality

□ No indexes or integrity constraints on generic columnsg y g□ Migration requires querying the operational system

Schema Flexibility Requirementsfor SaaSfor SaaS

■ Each tenant has a logical schema consisting of the base schema and a set of extensions

□ Extensions may be private or shared□ Extensions may be private or shared

■ The logical schemas from multiple tenants are mapped into one physical schema (multi-tenancy)

■ The logical schemas evolve while the database is on-line

□ Must not require intervention of a DBA

□ Must have minimal impact on performance

Tenant 1 Tenant 2 Tenant 3 B E1 E2 E3Tenant 1 Tenant 2 Tenant 3B, E1 B, E2, E3 B, E1, E3 B, E1, E2, E3

Logical Schemas Physical Schema

Tenant 1B, E1

Tenant 2B, E2, E3

Tenant 3B, E1, E3 B, E1, E2, E3

Logical Schemas Physical Schema

Tenant 1B, E1

Tenant 2B, E2, E3

Tenant 3B, E1, E3

Logical Schemas

B, E1, E2, E3

Physical Schema

Logical Schemas

by Dean Jacobs

OverviewOverview

□ Schema mappings

□ Chunk Tables

Schema Mapping TechniquesSchema Mapping Techniques

■ Database Owns the Schema

□ Evolution of logical schemas requires on-line DDL

1 P i t T bl1. Private Tables

2. Extension Tables

3 Sparse Columns3. Sparse Columns

■ Application Owns the Schema

□ The application controls evolution of logical schemas

4. XML

5. Universal Tables

6. Pivot Tables

#1 Private Tables#1 Private Tables

■ Give each tenant their own private tables

□ SQL transformation: Renaming only

Great performance until the number of tables gets too high■ Great performance until the number of tables gets too high

□ Schema overhead: 4 KB/table * 100,000 tables = 400 MB

□ Index pages are only partly full and are hard to keep in memoryp g y p y p y

■ Used if the schema is small and there are few tenants

Account 17 Account_42

Automotive Extension

Account_17Account Name Hospital Beds

1 Acme St Mary 135

Account Name Dealers

1 Big 65

Healthcare Extension

Automotive Extension2 Gump State 1042

Account_35Account NameAccount Name

1 Ball

Private Tables – Query TransformationPrivate Tables – Query Transformation

■ Tenant 17

□ SELECT BedsFROM Account

Account_17Account Name Hospital Beds

1 Acme St Mary 135FROM AccountWHERE Hospital = `State`

□ SELECT Beds

1 Acme St Mary 135

2 Gump State 1042

FROM Account_17WHERE Hospital = `State`

■ Tenant 42

How do we know this?

■ Tenant 42

□ SELECT NameFROM Account Account_42

WHERE Dealers > 50

□ SELECT NameFROM Account 42

Account Name Dealers

1 Big 65

FROM Account_42WHERE Dealers > 50

#2 Extension Tables#2 Extension Tables

■ Give each extension its own table55

□ Add a Row_Id column and join to reassemble rows

□ Horizontal partitioning

■ Give all tables a Tenant Id column and share tables■ Give all tables a Tenant_Id column and share tables

■ Additional join at runtime

■ Better consolidation than Private Table layout

□ But: Number of tables still grows in proportion to number of tenants

Account HealthCareAccountTenant Row Account Name

17 0 1 Acme

Account_HealthCareTenant Row Hospital Beds

17 0 St Mary 13517 0 1 Acme

17 1 2 Gump

35 0 1 Ball

17 1 State 1042

Account_Automotive42 0 1 Big Tenant Row Dealers

42 0 65Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs

Extension Tables – Query TransformationTransformation

56Account

■ Tenant 17

□ SELECT Name, BedsFROM Account

AccountTenant Row Account Name

17 0 1 AcmeFROM AccountWHERE Hospital = `State`

17 1 2 Gump

35 0 1 Ball

42 0 1 Big□ SELECT A.Name, H.Beds

FROM Account A, Account HealthCare H

42 0 1 Big

Account_HealthCare HWHERE A.Tenant = 17AND H.Tenant = 17AND A R H R

Tenant Row Hospital Beds

17 0 St M 135

Account_HealthCare

AND A.Row = H.RowAND H.Hospital = `State`

17 0 St Mary 135

17 1 State 1042

#3 Sparse Columns #3 Sparse Columns

■ Designed to handle data such as parts catalogs where each item has only a few out of thousands of possible attributes

■ Interpreted storage format to handle null values■ Interpreted storage format to handle null values

□ Fields in a row are stored along with their column identifiers

□ Only available in Microsoft SQL Server (others?)y Q ( )

□ Limited number of sparse columns per table are permitted

■ Extension fields added as sparse columns to each table

■ Database owns the schema: evolution requires on-line DDLAccountTenant Account Name SPARSETenant Account Name SPARSE

17 1 Acme 0:St Mary, 1:135

17 2 Gump 0:State, 1:1042

35 1 Ball

42 1 Big 0:65by Dean Jacobs

Sparse Tables – Query TransformationSparse Tables – Query Transformation

■ CREATE TABLE Account (Tenant INT, Account INT, Name VARCHAR(100), Hospital VARCHAR(100) SPARSE, Hospital VARCHAR(100) SPARSE, Beds INT SPARSE,Dealer INT SPARSE

AccountTenant Account Name Sparse

■ Tenant 17

□ SELECT Name, Beds

17 1 Acme 0:St Mary, 1:135

17 2 Gump 0:State, 1:1042

35 1 Ball□ SELECT Name, BedsFROM AccountWHERE Hospital = `State`

SELECT N B d

42 1 Big 2:65

□ SELECT Name, BedsFROM AccountWHERE Tenant = 17

Database owns the schema

AND Hospital = `State`

#4 XML#4 XML

■ Each base table is given an additional column that stores all extension fields in one (flat) XML document

■ Application owns the schema for extension fields so they can be ■ Application owns the schema for extension fields so they can be evolved without on-line DDL

■ IBM’s pureXMLTenant Account Name XMLDataTenant Account Name XMLData

17 1 Acme <data><hospital>St Mary</hospital><bed>135</bed>

</data>

17 2 Gump <data><hospital>State</hospital><bed>1024</bed><bed>1024</bed>

</data>

35 1 Ball ---

42 1 Big <data>

42 1 Big <data><dealers>65</dealers>

</data>

XML Tables – Query TransformationXML Tables – Query Transformation

■ Tenant 17

■ SELECT Name, xml…([data/bed])FROM AccountWHERE Tenant = 17

WHERE Hospital = `State` AND xmlexists('$x[data/hospital=`State`]‘PASSING XMLData AS „x“ );

Tenant Account Name XMLDataTenant Account Name XMLData

17 1 Acme <data><hospital>St Mary</hospital><bed>135</bed>

</data></data>

17 2 Gump <data><hospital>State</hospital><bed>1024</bed>

</data>

35 1 Ball ---

42 1 Big <data>

#5 Universal Tables#5 Universal Tables

■ Pack data into wide tables with generic VARCHAR columns

□ Not type-safe Casting necessary

V id M NULL l□ Very wide rows Many NULL values

□ No index support

■ Used if the schema is large or there are many tenants■ Used if the schema is large or there are many tenants

■ salesforce.com does this and makes it work (fast) by rebuilding indexing and query optimization

UniverseTenant Table Col1 Col2 Col3 Col4 … Col500

17 0 1 Acme St Mary 135 ---

17 0 2 Gump State 1042 ---

35 1 1 Ball --- --- ---35 1 1 Ball

42 2 1 Big 65 --- ---

Universal Tables – Query TransformationTransformation

■ Tenant 17

□ SELECT Name, BedsFROM AccountFROM AccountWHERE Hospital = `State`

□ SELECT Col2, Col4 How do we know this?FROM UniverseWHERE Tenant = 17 AND Table = 0AND CAST(Col3 AS VARCHAR) = `State` AND CAST(Col3 AS VARCHAR) State

UniverseTenant Table Col1 Col2 Col3 Col4 … Col500Tenant Table Col1 Col2 Col3 Col4 … Col500

17 0 1 Acme St Mary 135 ---

17 0 2 Gump State 1042 ---

35 1 1 Ball --- --- ---

42 2 1 Big 65 --- ---

#6 Pivot Tables#6 Pivot Tables

■ Pack data into 3-ary tables with column_ids and values63

□ Each field of a row in logical table is given its own row.

□ Multiple pivot tables for each type (int, string, e.g.)

□ Eliminates handling many NULL values□ Eliminates handling many NULL values

□ Can solve the typing and indexing problem

■ Google BigTable does something like this

Pivot_IntTenant Table Row Col Int Pivot_String

Tenant Table Row Col String17 0 0 0 1

17 0 0 3 135

17 0 1 0 2

Tenant Table Row Col String

17 0 0 1 Acme

17 0 0 2 St Mary

17 0 1 3 1042

35 1 0 0 1

42 2 0 0 1

17 0 1 1 Gump

17 0 1 2 State

35 1 0 1 Ball42 2 0 0 1

42 2 0 2 65

35 1 0 1 Ball

42 2 0 1 Big

#6 Pivot Tables – Example #6 Pivot Tables – Example

64Account 17

Account_35Account_17Account Name Hospital Beds

1 Acme St Mary 135

Account Name

1 Ball

2 Gump State 1042 Account_42Account Name Dealers

1 Big 65g 65Pivot_IntTenant Table Row Col Int

17 0 0 0 1

Pivot_StringTenant Table Row Col String17 0 0 0 1

17 0 0 3 135

17 0 1 0 2

Tenant Table Row Col String

17 0 0 1 Acme

17 0 0 2 St Mary

17 0 1 3 1042

35 1 0 0 1

42 2 0 0 1

17 0 1 1 Gump

17 0 1 2 State

35 1 0 1 Ball

42 2 0 0 1

42 2 0 2 65

35 1 0 1 Ball

42 2 0 1 Big

Pivot Tables – Query TransformationPivot Tables – Query Transformation

65 Pivot_Int■ Reminder: Mapper knows

Tenant_Id, Table_Id, and Column_Id

■ Tenant 17

_Tenant Table Row Col Int

17 0 0 0 1

17 0 0 3 135■ Tenant 17

□ SELECT BedsFROM AccountWHERE H it l `St t `

17 0 1 0 2

17 0 1 3 1042

35 1 0 0 1WHERE Hospital = `State`

□ SELECT I.Int

42 2 0 0 1

42 2 0 2 65

Pivot String□ SELECT I.IntFROM Pivot_Int I, Pivot_String SWHERE I.Tenant = 17AND S Tenant = 17

_ gTenant Table Row Col String

17 0 0 1 Acme

17 0 0 2 St MaryAND S.Tenant = 17AND S.Table = 0 AND S.Col = 2AND I.Table = 0 AND I.Col = 3AND S String = `State`

17 0 0 2 St Mary

17 0 1 1 Gump

17 0 1 2 State

35 1 0 1 BallAND S.String = StateAND I.Row = S.Row

35 1 0 1 Ball

42 2 0 1 Big

Pivot Tables – Query TransformationPivot Tables – Query Transformation

66 Pivot_Int■ Tenant 17

_Tenant Table Row Col Int

17 0 0 0 1

17 0 0 3 135

WHERE Hospital = `State`

□ SELECT S1.String, I.Int

17 0 1 0 2

17 0 1 3 1042

35 1 0 0 1gFROM Pivot_Int I, Pivot_String S1, Pivot_String S2WHERE I.Tenant = 17AND S1 T t 17

42 2 0 0 1

42 2 0 2 65

Pivot StringAND S1.Tenant = 17AND S2.Tenant = 17AND S1.Table = 0 AND S1.Col = 1AND S2 Table = 0 AND S2 Col = 2

_ gTenant Table Row Col String

17 0 0 1 Acme

17 0 0 2 St MaryAND S2.Table = 0 AND S2.Col = 2AND I.Table = 0 AND I.Col = 3AND I.Row = S1.RowAND I.Row = S2.Row

17 0 0 2 St Mary

17 0 1 1 Gump

17 0 1 2 State

35 1 0 1 BallAND S2.String = `State`

35 1 0 1 Ball

42 2 0 1 Big

Google BigTableGoogle BigTable

■ Same basic idea as Pivot Tables

■ Columns are grouped into column families

C l f ili ■ Column families

□ Must be explicitly defined (owned by the database)

□ There should not be more than a few hundred in a table and □ There should not be more than a few hundred in a table and they should rarely change during operation

□ Each has an expected type (although all values are stored as Strings)

■ Columns

□ May be created on the fly (owned by the application)□ May be created on-the-fly (owned by the application)

□ May be an unbounded number of them

■ Data in a column family is compressed and stored together■ Data in a column family is compressed and stored together

□ Column family = Pivot Table

Google BigTable ContinuedGoogle BigTable Continued

BigTable68

■ Data can be clustered together only if it is in the same BigTable instance

BigTableTenant Table Row Account

17 Acct 0 Id, 1same BigTable instance

■ All logical tables for a tenant must be packed into the same

17 Acct 0 Name, Acme

17 Acct 0 Hospital, St Mary

17 Acct 0 Beds, 135

BigTable instance

■ Mapping here: One column family per logical table

17 Acct 1 Id, 2

17 Acct 1 Name, Gump

17 Acct 1 Hospital, Statefamily per logical table

□ Here: Just one logical table

17 Acct 1 Beds, 1042

35 Acct 0 Id, 1

35 Acct 0 Name, Ball35 Acct 0 Name, Ball

42 Acct 0 Id, 1

42 Acct 0 Name, Big

42 Acct 0 Dealers 6542 Acct 0 Dealers, 65

Column FamilyFelix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Dean Jacobs

Problems in Implementing Multi-TenancyMulti-Tenancy

■ Resource contention among tenants

□ Resource governing to ensure fairness is difficult to implement

B t li i d i d t tl b d t t b h t □ But malicious and inadvertently bad requests must be shut down

□ Common practice is simply to forego operations whose p p y g presource usage can’t be bounded in advance

□ SLAs

■ Access control among tenants

□ May have to rely on the application or mapping algorithms rather than the databaserather than the database

■ Moving data for an individual tenant

□ For archiving and load balancing

□ Tuple-at-a-time operations are slow and resource intensive

OverviewOverview

□ Schema mappings

□ Chunk Tables

Reminder: Pivot TableReminder: Pivot Table

71Row: 0

■ Generic type-safe structure

□ Each field of a row in logical table is given its own row

Row: 0

table is given its own row.

□ Multiple pivot tables for each type (int, string, e.g.)

□ Eliminates handling many NULL values

P f■ Performance

□ Depends on the column selectivity of the query y q y(number of reconstructing joins)

Felix Naumann | VL Datenbanksysteme II | Winter 2009/2010 by Alfons Kemper

Row FragmentationRow Fragmentation

■ Possible solution for addressing table utilization issues

□ Various storage techniques for individual fragments

H t f d l l t d t bl□ Hunt for densely populated tables

■ Idea: Split rows according to their “popularity”

Chunk TableChunk Table

Generic structure

■ Suitable if dataset can be partitioned into dense subsetspartitioned into dense subsets

■ Derived from Pivot table

■ Middle ground between U i l t bl d Pi t

Row: 0Universal table and Pivot table

Performance

Chunk 0 Chunk 1

■ Fewer joins for reconstruction if densely populated subsets can be extractedca be e t acted

■ Indexable

■ Reduced meta-data/data ti d d t ratio dependant on

chunk size

Further Enhancement: Chunk FoldingFurther Enhancement: Chunk Folding

■ Combine different schema mappings for best fit

□ Mixes Extension and Chunk □ Mixes Extension and Chunk Tables

□ Each fragment can be stored in an optimal schema layout

■ Optimal chunk folding depends ■ Optimal chunk folding depends on

□ Workload

□ Data distribution

□ Data popularity

Querying Chunk TablesQuerying Chunk Tables

Query Transformation

■ Row reconstruction needs many self- and equi-joins

C b t ti ll t l t d■ Can be automatically translated

Compilation Scheme:

1 Collect all table names and their corresponding columns from 1. Collect all table names and their corresponding columns from the logical source query

2. For each table, obtain the Chunk Tables and the meta-data identifiers representing the used columns

3. For each table, generate a query that filters the correct columns (based on the meta-data identifiers) and aligns the columns (based on the meta data identifiers) and aligns the different chunk relations on their ROW column.

4. Each table reference in the logical source query is extended by its generated table definition query

Example queriesExample queries

■ Tenant 17

□ SELECT BedsFROM AccountFROM AccountWHERE Hospital = `State`

■ Replace Account Table with sub-select with Chunk table

■ Luck: Both relevant attributes are in same chunk

□ SELECT BedsFROM (SELECT St 1 H it l I t1 B dFROM (SELECT Str1 as Hospital, Int1 as Beds

FROM Chunkint|str

WHERE Tenant = 17AND Table = 0AND Chunk = 1) AS Account

WHERE Hospital = `State`WHERE Hospital State

OverviewOverview

□ Schema mappings

□ Chunk Tables

SummarySummary

■ The Cloud as a new paradigm to provide services

■ Applications are one type of such services: SaaS

Diff t f f li t i■ Different forms of scaling: up, out, in

■ Economy of scale by re-using processes: Mutliple tenants on single database instanceg

□ Problem: Isolation

■ Separate tenants by mapping many logical schemata to single physical schema

■ Many schema mappings techniques

□ Problem: Query transformation□ Problem: Query transformation

OutlookOutlook

■ Native multi-tenancy-aware DB-kernel

■ Workload-aware reorganization

S i l l t f diff t t i■ Service level agreements for different user categories

■ Build Trust despite proces-sharing

□ Security & Privacy□ Security & Privacy

Datenbanksysteme II Web-scale Data Management Multi ... II Web-scale Data Management Multi-Tenancy &...

Documents

Transcript of Datenbanksysteme II Web-scale Data Management Multi ... II Web-scale Data Management Multi-Tenancy &...

Landscape Scale Data Management

Google Scale Data Management

WS 2013/14 Datenbanksysteme Do 17:00 – 18:30 R 1.007 Vorlesung #8 SQL (Teil 5)

WS 2012/13 Datenbanksysteme Fr 15:15 – 16:45 R 2.007 Vorlesung #7 SQL (Teil 4)

Kapitel 3: Datenbanksysteme - LMU Munich · DATABASE SYSTEMS GROUP Einführung in die Informatik: Systeme und Anwendungen – SoSe 2009 Kapitel 3: Datenbanksysteme 3 • Arbeiten

Preview of “04 1 Anfragetypen” - db.inf.uni-tuebingen.de · Data Warehouses Sommersemester 2011 Melanie Herschel melanie.herschel@uni-tuebingen.de Lehrstuhl für Datenbanksysteme,

Big Data: Scale Down, Scale Up, Scale Out - · PDF fileBig Data: Scale Down, Scale Up, Scale Out Phillip B. Gibbons Intel Science & Technology Center for Cloud Computing Keynote Talk

MAPR-XD CLOUD-SCALE DATA STORE€¦ · EXTEND CONVERGED DATA PLATFORM TO CREATE A CLOUD-SCALE DATA FABRIC MapR-XD Cloud-Scale Data Store is the Exabyte-scale global data store for

Big Data: Scale Down, Scale Up, Scale Outgibbons/Phillip B. Gibbons_files/Big-Data... · Big Data: Scale Down, Scale Up, Scale Out Phillip B. Gibbons Intel Science & Technology Center

Data Center Scale Computing

WS 2013/14 Datenbanksysteme Do 17:00 – 18:30 R 1.207 Vorlesung #12 Mehrbenutzersynchronisation.

Data Modeling for Large Scale Maps Data Modeling for Large ......Data Modeling for Large Scale Maps Data Modeling for Large Scale Maps and Map Production Charlie Frye, ESRI Redlands

Vorlesung Datenbanksysteme WS 2.0

Using Small-Scale History Data to Predict Large-Scale ...

Datenmodellierung und Datenbanksysteme · 2009. 5. 20. · Datenmodellierung und Datenbanksysteme Deﬁnitionen Data modeling in software engineering is the process of creating a

Big Data @ Cloud Scale

Datenbanksysteme II: Big Data - hu-berlin.de · Big Data Landscape Ulf Leser: Implementation of Database Systems, Winter Semester 2016/2017 19 Ulf Leser: Implementation of Database

WS 2007/08 Datenbanksysteme Mi 17:00 – 18:30 R 1.007 Vorlesung #12 Mehrbenutzersynchronisation.

WS 2012/13 Datenbanksysteme Mi 15:15 – 16:45 R 2.207 Vorlesung #12 Mehrbenutzersynchronisation.

Grid-based Data Stream Processing in e-Science€¦ · Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Grid-based Data Stream Processing