2014.11.25- SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School...

2014.11.25- SLIDE 1IS 257 – Fall 2014

NewSQL and VoltDB

University of California, Berkeley

School of Information

IS 257: Database Management

2014.11.25- SLIDE 2

Project Presentation

• Sign up at • http://doodle.com/vtds6huqx45i95x9

IS 257 – Fall 2014

2014.11.25- SLIDE 3

History of the World, Part 1

• Relational Databases – mainstay of business

• Web-based applications caused spikes– Especially true for public-facing e-Commerce

sites• Developers begin to front RDBMS with

memcache or integrate other caching mechanisms within the application (ie. Ehcache)

2014.11.25- SLIDE 4

Scaling Up

• Issues with scaling up when the dataset is just too big

• RDBMS were not designed to be distributed• Began to look at multi-node database solutions• Known as ‘scaling out’ or ‘horizontal scaling’• Different approaches include:

– Master-slave– Sharding

2014.11.25- SLIDE 5

Scaling RDBMS – Master/Slave• Master-Slave

–All writes are written to the master. All reads performed against the replicated slave databases

–Critical reads may be incorrect as writes may not have been propagated down

–Large data sets can pose problems as master needs to duplicate data to slaves

2014.11.25- SLIDE 6

Scaling RDBMS - Sharding

• Partition or sharding–Scales well for both reads and writes–Not transparent, application needs to be

partition-aware–Can no longer have relationships/joins

across partitions–Loss of referential integrity across

shards

2014.11.25- SLIDE 7

Other ways to scale RDBMS

• Multi-Master replication• INSERT only, not UPDATES/DELETES• No JOINs, thereby reducing query time

– This involves de-normalizing data• In-memory databases

2014.11.25- SLIDE 8

• NoSQL databases adopted these approaches to scaling, but lacked ACID transaction and SQL

• At the same time, many Web-based services needed to deal with Big Data (the Three V’s we looked at last time) and created custom approaches to do this

• In particular, MapReduce…

IS 257 – Fall 2014

2014.11.25- SLIDE 9

MapReduce and Hadoop

• MapReduce developed at Google• MapReduce implemented in Nutch

– Doug Cutting at Yahoo! – Became Hadoop (named for Doug’s child’s

stuffed elephant toy)

IS 257 – Fall 2014

2014.11.25- SLIDE 10

Example

• Page 1: the weather is good• Page 2: today is good• Page 3: good weather is good.

From “MapReduce: Simplified data Processing… ”, Jeffrey Dean and Sanjay Ghemawat

IS 257 – Fall 2014

2014.11.25- SLIDE 11

Map output

• Worker 1: – (the 1), (weather 1), (is 1), (good 1).

• Worker 2: – (today 1), (is 1), (good 1).

• Worker 3: – (good 1), (weather 1), (is 1), (good 1).

IS 257 – Fall 2014

2014.11.25- SLIDE 12

Reduce Input

• Worker 1:– (the 1)

• Worker 2:– (is 1), (is 1), (is 1)

• Worker 3:– (weather 1), (weather 1)

• Worker 4:– (today 1)

• Worker 5:– (good 1), (good 1), (good 1), (good 1)

IS 257 – Fall 2014

2014.11.25- SLIDE 13

Reduce Output

• Worker 1:– (the 1)

• Worker 2:– (is 3)

• Worker 3:– (weather 2)

• Worker 4:– (today 1)

• Worker 5:– (good 4)

IS 257 – Fall 2014

2014.11.25- SLIDE 14

But – Raw Hadoop means code• Most people don’t want to write code if

they don’t have to• Various tools layered on top of Hadoop

give different, and more familiar, interfaces

• Hbase – intended to be a NoSQL database abstraction for Hadoop

• Hive and it’s SQL-like language

IS 257 – Fall 2014

2014.11.25- SLIDE 15

Introduction to PigPIG – A data-flow language for MapReduce

2014.11.25- SLIDE 16

Pig Latin

• Data flow language– User specifies a sequence of operations to

process data– More control on the processing, compared

with declarative language• Various data types are supported• ”Schema”s are supported• User-defined functions are supported

2014.11.25- SLIDE 17

Motivation by Example

• Suppose we have user data in one file, website data in another file.

• We need to find the top 5 most visited pages by users aged 18-25

2014.11.25- SLIDE 18

Hive - SQL on top of Hadoop

IS 257 – Fall 2014

2014.11.25- SLIDE 19

• A database/data warehouse on top of Hadoop– Rich data types (structs, lists and maps)– Efficient implementations of SQL filters, joins and

group-by’s on top of map reduce

• Allow users to access Hive data without using Hive

• Link:– http://svn.apache.org/repos/asf/hadoop/

hive/trunk/

IS 257 – Fall 2014

2014.11.25- SLIDE 20

Hive Architecture

Hive CLIDDLQueriesBrowsing

Map Reduce

Thrift Jute JSONThrift API

MetaStore

Web UIMgmt, etc

Hive QL

Planner ExecutionParser Planner

IS 257 – Fall 2014

2014.11.25- SLIDE 21

Overview

• Review– Big Data Technologies– Hadoop, Pig, Hive…

• NewSQL– The concept– VoltDB

IS 257 – Fall 2014

2014.11.25- SLIDE 22

• One problem with Hadoop/MapReduce is that it is fundamental batch oriented, and everything goes through a read/write on HDFS for every step in a dataflow

• Spark was developed to leverage the main memory of distributed clusters and to, whenever possible, use only memory-to-memory data movement (with other optimizations

• Can give up to 100fold speedup over MRIS 257 – Fall 2014

2014.11.25- SLIDE 23

Meanwhile…

• The database community continues to develop new approaches, some of which try to provide the benefits of SQL and ACID relations to big data

• As we have seen before there is a broad spectrum of database systems and capabilities…

IS 257 – Fall 2014

2014.11.25- SLIDE 24IS 257 – Fall 2014

2014.11.25- SLIDE 25

NewSQL

• NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional RDBMS

IS 257 – Fall 2014

2014.11.25- SLIDE 26

NewSQL

• NewSQL systems focus on workloads that have large numbers of transactions that are:– Short duration– Touch a small fraction of data in the DB using

index lookups (i.e., no full table scans or large distributed joins)

– Repetitive (i.e. executing the same queries with different inputs)

IS 257 – Fall 2014

2014.11.25- SLIDE 27

NewSQL

• There are many different architectures for NewSQL DBs– E.g., Google Spanner, SAP HANA, Clustrix,

VoltDB, etc.• We are going to look at just one example

(VoltDB) and see how it works for very high-velocity workloads..

IS 257 – Fall 2014

2014.11.25- SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School...

Documents

Transcript of 2014.11.25- SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School...

Revenue Assurance in the Age of 5G - VoltDB

VoltDB and the 5G Revolution · VoltDB and the 5G Revolution Michael Stonebraker VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can

Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDB

VoltDB - Stonebraker Live! - Santa Clara, CA - 1.29.13

Administrator's Guidedownloads.voltdb.com/documentation/v7docs/AdminGuide.pdfManaging VoltDB Databases When using the VoltDB Enterprise Edition, you will also need a license file,

Voltdb vs Mysql

Using VoltDBdownloads.voltdb.com/documentation/UsingVoltDB.pdfEncrypting VoltDB Communication Using TLS/SSL ..... 118 12.7.1. Configuring TLS/SSL on the VoltDB Server ..... 119 12.7.2.

VoltDB on SolftLayer Cloud

ST7282 SPEC V1.6 20170398 EXTC -4012 -257 132 DR4 -2006 -257 99 HDIR -3953 -257 133 DR4 -1947 -257 100 HDIR -3894 -257 134 DR3 -1888 -257 ST7282 Version 1.6 Page 12 of PAD No. PIN

VoltDB talk at QCON-Brasil

Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB

Model 257 and 257-L (Watermark 200) Soil Moisture Sensors.campbellsci.com/documents/au/manuals/257.pdf · Soil Moisture Sensor 1. General Description The 257 (Watermark 200) soil

A memcached-compatible Key/Value store built on the VoltDB ...€¦ · A memcached-compatible Key/Value store built on the VoltDB RDBMS VoltDB, Inc. Twitter: @voltdb the NewSQL database

VoltDB: an SQL Developer’s Perspective Tim Callaghan, VoltDB Field Engineer tcallaghan@voltdb.com.

...6 1037: (VAS) 16X’ 15X Burundi Select Infos 153 257 154 257 Media SMS 155 257 &!=!1! 156 257 160 257 ONKODTELECOM 161 257 161 XX XXX 257: M. Hakizimana Constaque Agence de Régulation

Basic Machine Learning: Linear Models Alexander Fraser CIS, LMU München 2014.11.25 WSD and MT.

The NewSQL database for high velocity applications Introduction to VoltDB Big Data & Analytics – Unites States AFPOA Fred Holahan, CMO, VoltDB, Inc. e:

Using VoltDBdownloads.voltdb.com/documentation/v4docs/UsingVoltDB.pdfUsing VoltDB Abstract This book explains how to use VoltDB to design, build, and run high performance applica-tions.

How to Build Cloud-based Microservice Environments with Docker and VoltDB

Planning Guide - VoltDB