Freeing Data Storage from Cages

Alekh Jindal Jorge Quiané Jens Dittrich

Saarland University, GermanyQCRI, Qatar

WWHow! Freeing Data Storage from Cages

Alekh Jindal

FJorge-Arnulfo Quiané-Ruiz

_,⌥ Jens Dittrich

FInformation Systems Group, Saarland University

http://infosys.cs.uni-saarland.de

_QCRI, Qatar Foundation

http://www.qcri.qa

Abstract. E�cient data storage is a key component of data manag-ing systems to achieve good performance. However, currently datastorage is either heavily constrained by static decisions (e.g. fixeddata stores in DBMSs) or left to be tuned and configured by users(e.g. manual data backup in File Systems). In this paper, we takea holistic view of data storage and envision a virtual storage layer.Our virtual storage layer provides a unified storage framework forseveral use-cases including personal, enterprise, and cloud storage.

1. INTRODUCTIONTo better understand the problem, let us first see some of the typ-

ical data storage scenarios and analyze what they have in common.

1.1 Current Approaches to Data StorageWe start from the simplest use case of a personal user using a File

System Storage, right up to an enterprise user using Cloud Storage.Use Case 1: File System. Consider a researcher, Alice, having twolaptops (one personal, one from her university). Alice stores di↵er-ent types of files at di↵erent locations. For example, she storesmovies on her personal laptop and grant proposals on her univer-sity laptop. Alice also maintains regular copies of her data on ex-ternal devices, in case one of her laptop crashes. For instance, shemaintains copies of grant proposals on hard-drives and recent con-ference talks on USB sticks. Finally, Alice also changes the dataformat (e.g. compressed) of her files in order to save storage space.Use Case 2: RAID. Consider a University IT department usingRAID storage servers to store its data. The RAID system automat-ically stores parity information on di↵erent disks and stripes themto allow for one or more disk failures. The server admin does notneed to worry about the reliability of data. She just needs to choosethe RAID level for the first time. However, changing the RAIDlevel is often a complex task in practice.Use Case 3: Relational DBMS. Consider a car manufacturer us-ing an RDBMS for managing its inventory and sales. The manufac-turer provides the schema definition as well as the data to store. TheRDBMS takes care of physical data storage, recovery, and data lay-outs (e.g. row store in IBM DB2). For an expert DBA, the RDBMS

⌥Work done at Saarland University.

This article is published under a Creative Commons Attribution License(http://creativecommons.org/licenses/by/3.0/), which permits distributionand reproduction in any medium as well allowing derivative works, pro-vided that you attribute the original work to the author(s) and CIDR 2013.6th Biennial Conference on Innovative Data Systems Research (CIDR ’13)January 6-9, 2013, Asilomar, California, USA.

provides several additional data tunings knobs such as replication,indexing, partitioning, and storage locations.Use Case 4: Cloud. Consider a web analytics start-up companyusing Cloud storage to store large volumes of web logs. The start-up company needs to pick a cloud provider for its data. However,it does not need to worry about the data placement. The CloudStorage automatically replicates and distributes data over severalstorage locations in order to guarantee the availability of their data.Furthermore, the start-up company does not care about adding newstorage locations: the Cloud Storage does so automatically. Addi-tionally, the company may chose to store the data in multiple datalayouts [10] or indexes [5].

All of these use-cases are very di↵erent, right? No! We arguethat they are all facets of the same single data storage problem.We observe that in each of the four Use Cases, the users are im-plicitly or explicitly answering the following three key questions:(1) What to store? (i.e. which parts to store from a dataset or fileand how many times), (2) Where to store? (i.e. on which devices orlocations), and (3) How to store? (i.e. in which layout or format).Table 1 categorizes these storage decisions in the four Use Casesand labels them according to whether they use fixed ( ), flexible &manual ( ), or flexible & automatic ( ) storage decisions.What to store? The second column of Table 1 shows the what de-cisions in the four Use Cases. In Use Case 1 (UC1 for short), thefile system requires Alice to make manual choices of which mas-ter copies as well as the backup copies of data to store. In UC2,RAID automatically creates data replicas or parity, depending onthe RAID level. In UC3, an RDBMS typically stores the data aswell as the recovery log. A DBA has further control to create in-dexes and materialized views to speed up query execution. Simi-larly, Fractured Mirrors makes a fixed decision of storing exactlytwo copies of the data. Finally, in UC4, Cloud Storage is flexibleto create one or more data replicas for availability.Where to store? The third column of Table 1 shows the wheredecisions in the four Use Cases. In UC1, Alice has to manuallydecide where to store each type of data, e.g. movies on her per-sonal laptop and grant proposals on her university laptop. Goingfurther, RAID (UC2) makes fixed decisions (based on the RAIDlevel) to store data on a set of storage devices, which can be addedor removed manually. Similarly, an RDBMS (UC3) allows usersto define table spaces, typically residing on di↵erent storage loca-tions, for their application. However, Fractured Mirrors stores thedata on two (fixed) di↵erent machines. Cloud Storage (UC4), onthe other hand, is fully flexible on where to store the data. A cloudprovider can change data storage locations and even provision ad-ditional physical machines automatically.How to store? The fourth column of Table 1 shows the how deci-sions in the four Use Cases. Again, users in UC1 must manually

Alekh Jindal

_,⌥ Jens Dittrich

http://www.qcri.qa

Alekh Jindal

_,⌥ Jens Dittrich

http://www.qcri.qa

Alekh Jindal

_,⌥ Jens Dittrich

http://www.qcri.qa

Alekh Jindal

_,⌥ Jens Dittrich

http://www.qcri.qa

Data Storage is Easy!

Raw data

Data Management

System

Raw data

Data Management

System

Queries

Raw data

Data Management

System

Is performance as expected?

Is your Data Storage Efficient?

What to store?

TPC-H Lineitem

copy 1 copy 2 copy 3

TPC-H Lineitem

Where to store?

TPC-H Lineitem

How to store?

What, Where, and Howin our everyday life

pdfppt

However

WhatWhereHow

TPC-HLineitem

Data Management

System

Store Engine

TPC-HLineitem

Data Management

System

Store Engine

DBMS-X

TPC-HLineitem

Store Engine

DBMS-X

TPC-HLineitem

WhatWhere

pdfppt

WWHow!

WWHow!hat

WWHow! VisionFlexible on What, Where, and How1

WWHow! VisionFlexible on What, Where, and How1Declarative Data Storage Language2

WWHow! VisionFlexible on What, Where, and How1Declarative Data Storage Language2Data Storage Optimizer3

DSL DSL

WWHow! Language

Physical Storage Interface

Data Management

System

WWHow! Layer

WWHow!hat

can we do?

I want to store 3 replicas of the weblogs on the cloud with a different index each replica

STORE ‘/webApp/logs/UserVisits.log’

STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3

STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3WHERE ec2-007-23-compute-1-amazonaws.com

STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3WHERE ec2-007-23-compute-1-amazonaws.comHOW idx(url) FOR rep-1, idx(sourceIP) FOR rep-2, idx(visitDate) FOR rep-3;

I want to store 3 replicas of the weblogs with a different index each replica and

with high privacy

STORE ‘/webApp/logs/UserVisits.log’WHAT * AS rep-1, * AS rep-2, * AS rep-3HOW idx(url) FOR rep-1,

idx(sourceIP) FOR rep-2,idx(visitDate) FOR rep-3;

with high privacy

CONSTRAINT Privacy = ‘high’

with high privacy

CONSTRAINT Privacy = ‘high’

with high privacy

job for the WWhow! data storage optimizer

I want to store the weblogs with high privacy and if

possible with high availability

STORE ‘/webApp/logs/UserVisits.log’

STORE ‘/webApp/logs/UserVisits.log’PREFERENCE Availability = ‘high’

STORE ‘/webApp/logs/UserVisits.log’PREFERENCE Availability = ‘high’ CONSTRAINT Privacy = ‘high’

job for the WWhow! data storage optimizer

What about existing

technology?

WWHow!here

do we go?

Data Management System

TPC-HLineitem

Store Enginero

WWHow! Layer

TPC-HLineitem

WWHow! Layer

TPC-HLineitem

WWHow! Layer

Data Management Systems

TPC-HLineitem

DMS-X DMS-Y

WWHow!WWHowto proceed?

WWHow! Language

WWHow!Logi

Data Management

System

WWHow! Language

WWHow!Logi

Data Management

System

API (Logical View)WWHow! Language

WWHow!Logi

Data Management

System

WWHow! Language

WWHow!Logi

Data Management

System

WWHow! Language

WWHow!Logi

Data Management

System

WWHow! LanguageInterpreter

WWHow! Language

WWHow!Logi

Data Management

System

WWHow! ControllerStorage themes

WWHow! Language

WWHow!Logi

Data Management

System

WWHow! Language

WWHow!Logi

Data Management

System

WWHow! OptimizerWhat

Where How

Data Storage Optimizer

WWHow! Language

WWHow!Logi

Data Management

System

Where How

WWHow! Language

WWHow!Logi

Data Management

System

Where How

Data Access Optimizer

WWHow! Language

WWHow!Logi

Data Management

System

Where How

Data Access Optimizer

Freeing Data Storage from Cages -...

Transcript of Freeing Data Storage from Cages -...

Freeing Data Storage from Cages -...

Documents

Transcript of Freeing Data Storage from Cages -...

Runtime Measurements in the Cloud: Observing, Analyzing ...da.qcri.org/jquiane/jorgequianeFiles/papers/pvldb10a.pdf · Runtime [sec] Measurements EC2 Cluster Local Cluster Figure

Oracle Storage Connect Plug-in for Oracle ZFS Storage ...download.oracle.com/otn-pub/otn_software/storage-appliance/Storage... · Oracle Storage Connect Plug-in for Oracle ZFS Storage

Storage Area Network (SAN) 1. Outline Shared Storage Architecture Direct Access Storage (DAS) – SCSI – RAID Network Attached Storage (NAS) Storage Area.

Types Of Storage Device 10 LECTURE. Outline Categorizing Storage Devices Magnetic Storage Devices Optical Storage Devices.

Products_Battery Storage - Storage Battery System

Farm Storage Facility Farm Storage Facility LoansLoansmda.maryland.gov › ... › Farm-Storage-Facility-Loans.pdfdesigned for storage il t t • Walk-in prefabricated storage coolers

Instrument Storage Stringed Instrument Storage …sweets.construction.com/swts_content_files/1458/294164.pdf · Instrument Storage Stringed Instrument Storage Garment Storage Music

88 CHAPTER SECONDARY STORAGE. 2 Storage Primary storage Volatile Temporary Secondary storage Nonvolatile Permanent Secondary storage characteristics Media.

RDFind: Scalable Conditional Inclusion Dependency ...da.qcri.org/jquiane/jorgequianeFiles/papers/sigmod16a.pdf · The Resource Description Framework (rdf) [18] is a exi-ble data model

Cloud Storage Security & Privacy - Troopers IT-Security ...€¦ · Cloud Storage. Cloud Storage. Cloud Storage. Cloud Storage. Why cloud storage? Price. Why cloud storage? Scalable.

Storage(Management · Typical(Storage(Hierarchy registers main(memory((RAM) localsecondary(storage (local(disks,(SSDs) Larger slower storage devices remote(secondary(storage

STORAGE Enterprise Storage Modular Storage Storage Area Networks Tired Storage Architecture Virtual Infrastructure Design.

Object storage APIsObject storage APIs This section provides the APIs to manage your object storage, object storage users, and object storage group. The object storage workflow includes

Tivoli Storage Manager Storage Agent

Energy Storage Technologies Battery Storage for Grid ... · PDF fileEnergy Storage Technologies Battery Storage for Grid Stabilization ... Battery Storage for Grid Stabilization ...

Storage and Memory - St. Mary's Catholic School...Storage and Memory Storage Also called as backing storage/Secondary storage Examples of storage media are Hard disks Optical disks(

Highly Scalable Cognitive Storage Management Platform ... · Storage Management Evolution Storage Storage Storage Manager Storage Manager Storage Manager üEach storage device type

HYDROGEN GAS STORAGE - Hydrogen Storage

On-Premises Storage Hybrid Cloud Storage Public Cloud Storage.

Storage Pro/Self Storage Project