History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling...

Post on 06-Jan-2018

221 views 2 download

Transcript of History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling...

2015.11

Jae Hyung KimPh.D. Candidate , Department of Computer Science, Yonsei University

Big Data Platforms- History and Motiva-tions

Index

1. Introduction

2. RDBMS vs Big Data Platforms

3. Growing Big Data Platforms

1. Introduction

Introduction

• History & Motivations– RDBMS

• History & Motivations (cont’d)

Introduction

Concurrent AccessHandling Failures

Shared DataUser

• Transaction– Powerful abstraction concept which forms the “interface

contract” between an application program and a transac-tional server

Introduction

Program Start

Begin Transac-

tion...

Commit Transac-

tion

Program End

ApplicationLifecycle

TransactionBoundary

• Transaction (cont’d)

Introduction

The core requirement on a DBMS isACID guarantees for set of opera-tions in the same transaction

concurrency control component to guarantee the isolation properties of transactions, for both committed and aborted trans-actions

recovery component to guarantee the atomicity and durability of transactions

• RDBMS Architecture – Heavy!!!

Introduction

Language and Interface Layer Query Decomposition and

Optimization Layer Query Execution Layer

Access Layer

Storage Layer

Requestexecutionthreads

Requests

Clients

DatabaseServer

Data Access

Database

To facilitate disk I/O parallelism be-tween different requests

• RDBMS Architecture – How data is storedIntroduction

Page1) The minimum unit of data

transfer between disk and main memory

2) The unit of caching in memorySlot= A page number + A slot number

Database usually hasa cretain amount of preallocated disk space consists of one or more extentsEach extent is a range of pages that are contiguous on disk

A page number A disk number + A physical address on diskby looking up an entry in an extent ta-bleand adding a relative offset

• RDBMS Computational Model – Page model

Introduction

Parallelized transaction execution

Requests Processing of pages (read or write)

ACID Properties of Transaction Page based

Concurrency Control and Recovery should be based on page model

t = r(x)r(y)r(z)w(u)w(x)r(x) r(y) r(z)

w(u) w(x)Partial Or-

der

※ The details of how data is manipulated within the lo-cal variables of the executing programs are mostly irrelevant

• Needs for huge data from Google– More than 15,000 commodity-class PC's– Multiple clusters distributed worldwide– Thousands of queries served per second– One query reads 100's of MB of data– One query consumes 10's of billions of CPU cycles– Google stores dozens of copies of the entire Web!

Introduction

Conclusion: Need large, distributed, highly fault tolerant file system Traditional DBMS cannot tolerate

RDBMS vs Big Data Platforms

• Problems of RDBMS– RDBMS’s clustering

RDBMS vs Big Data Platforms

Data Copy Cost

Transac-tion

Maintain cost

Performance does not increase as we expected

• Problems of RDBMS– Scale-up vs Scale-out (Cost perspective)

RDBMS vs Big Data Platforms

인텔 제온 E5-2697V3 ( 하스웰 -EP)인텔 ( 소켓 2011-V3) / 테트라데카(14) 코어 / 쓰레드 28 개 / 64(32)비트 / 2.6GHz / DDR4 / PCI-Ex-press 40 개 레인

인텔 코어 i5-6 세대 6600 (스카이레이크 )인텔 ( 소켓 1151) / DDR4 / DDR3L / 64 비트 / 쿼드 코어 / 쓰레드 4 개 / 3.3GHz / 인텔 HD 530 / PCI-Express 16 개 레인

\250,000

\3,400,000

• Google File System– Beginning of the big data platforms– Affects to Hadoop– Chunk : Analogous to block, except larger (typically 64MB)

RDBMS vs Big Data Platforms

• Google File System– Read Algorithm (1/2)

RDBMS vs Big Data Platforms

• Google File System– Read Algorithm (2/2)

RDBMS vs Big Data Platforms

• Google File System– Write Algorithm (1/4)

RDBMS vs Big Data Platforms

• Google File System– Write Algorithm (2/4)

RDBMS vs Big Data Platforms

• Google File System– Write Algorithm (3/4)

RDBMS vs Big Data Platforms

• Google File System– Write Algorithm (4/4)

RDBMS vs Big Data Platforms

• Hadoop– HDFS + MapReduce

RDBMS vs Big Data Platforms

128MB file (e.g. /data/hdfs/block1)on Local Filesystem

• Hadoop– HDFS + MapReduce (Computational Model)

RDBMS vs Big Data Platforms

On Local Filesys-

tem

Growing Bigdata Platforms

Growing Big Data Platforms

• Gartner’s hype cycle 2012

Growing Big Data Platforms

• Gartner’s hype cycle 2013

Growing Big Data Platforms

• Gartner’s hype cycle 2014

Growing Big Data Platforms

• Gartner’s hype cycle 2015– Big data dropped from cycle, Big data is now into practice

Growing Big Data Platforms

Q&A

• Thank you