History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling...

30
2015.11 Jae Hyung Kim Ph.D. Candidate , Department of Computer Science, Yonsei University Big Data Platforms - History and Motivations

Transcript of History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling...

Page 1: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

2015.11

Jae Hyung KimPh.D. Candidate , Department of Computer Science, Yonsei University

Big Data Platforms- History and Motiva-tions

Page 2: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

Index

1. Introduction

2. RDBMS vs Big Data Platforms

3. Growing Big Data Platforms

Page 3: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

1. Introduction

Page 4: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

Introduction

• History & Motivations– RDBMS

Page 5: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• History & Motivations (cont’d)

Introduction

Concurrent AccessHandling Failures

Shared DataUser

Page 6: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Transaction– Powerful abstraction concept which forms the “interface

contract” between an application program and a transac-tional server

Introduction

Program Start

Begin Transac-

tion...

Commit Transac-

tion

Program End

ApplicationLifecycle

TransactionBoundary

Page 7: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Transaction (cont’d)

Introduction

The core requirement on a DBMS isACID guarantees for set of opera-tions in the same transaction

concurrency control component to guarantee the isolation properties of transactions, for both committed and aborted trans-actions

recovery component to guarantee the atomicity and durability of transactions

Page 8: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• RDBMS Architecture – Heavy!!!

Introduction

Language and Interface Layer Query Decomposition and

Optimization Layer Query Execution Layer

Access Layer

Storage Layer

Requestexecutionthreads

Requests

Clients

DatabaseServer

Data Access

Database

To facilitate disk I/O parallelism be-tween different requests

Page 9: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• RDBMS Architecture – How data is storedIntroduction

Page1) The minimum unit of data

transfer between disk and main memory

2) The unit of caching in memorySlot= A page number + A slot number

Database usually hasa cretain amount of preallocated disk space consists of one or more extentsEach extent is a range of pages that are contiguous on disk

A page number A disk number + A physical address on diskby looking up an entry in an extent ta-bleand adding a relative offset

Page 10: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• RDBMS Computational Model – Page model

Introduction

Parallelized transaction execution

Requests Processing of pages (read or write)

ACID Properties of Transaction Page based

Concurrency Control and Recovery should be based on page model

t = r(x)r(y)r(z)w(u)w(x)r(x) r(y) r(z)

w(u) w(x)Partial Or-

der

※ The details of how data is manipulated within the lo-cal variables of the executing programs are mostly irrelevant

Page 11: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Needs for huge data from Google– More than 15,000 commodity-class PC's– Multiple clusters distributed worldwide– Thousands of queries served per second– One query reads 100's of MB of data– One query consumes 10's of billions of CPU cycles– Google stores dozens of copies of the entire Web!

Introduction

Conclusion: Need large, distributed, highly fault tolerant file system Traditional DBMS cannot tolerate

Page 12: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

RDBMS vs Big Data Platforms

Page 13: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Problems of RDBMS– RDBMS’s clustering

RDBMS vs Big Data Platforms

Data Copy Cost

Transac-tion

Maintain cost

Performance does not increase as we expected

Page 14: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Problems of RDBMS– Scale-up vs Scale-out (Cost perspective)

RDBMS vs Big Data Platforms

인텔 제온 E5-2697V3 ( 하스웰 -EP)인텔 ( 소켓 2011-V3) / 테트라데카(14) 코어 / 쓰레드 28 개 / 64(32)비트 / 2.6GHz / DDR4 / PCI-Ex-press 40 개 레인

인텔 코어 i5-6 세대 6600 (스카이레이크 )인텔 ( 소켓 1151) / DDR4 / DDR3L / 64 비트 / 쿼드 코어 / 쓰레드 4 개 / 3.3GHz / 인텔 HD 530 / PCI-Express 16 개 레인

\250,000

\3,400,000

Page 15: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Google File System– Beginning of the big data platforms– Affects to Hadoop– Chunk : Analogous to block, except larger (typically 64MB)

RDBMS vs Big Data Platforms

Page 16: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Google File System– Read Algorithm (1/2)

RDBMS vs Big Data Platforms

Page 17: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Google File System– Read Algorithm (2/2)

RDBMS vs Big Data Platforms

Page 18: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Google File System– Write Algorithm (1/4)

RDBMS vs Big Data Platforms

Page 19: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Google File System– Write Algorithm (2/4)

RDBMS vs Big Data Platforms

Page 20: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Google File System– Write Algorithm (3/4)

RDBMS vs Big Data Platforms

Page 21: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Google File System– Write Algorithm (4/4)

RDBMS vs Big Data Platforms

Page 22: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Hadoop– HDFS + MapReduce

RDBMS vs Big Data Platforms

128MB file (e.g. /data/hdfs/block1)on Local Filesystem

Page 23: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Hadoop– HDFS + MapReduce (Computational Model)

RDBMS vs Big Data Platforms

On Local Filesys-

tem

Page 24: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

Growing Bigdata Platforms

Page 25: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

Growing Big Data Platforms

Page 26: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Gartner’s hype cycle 2012

Growing Big Data Platforms

Page 27: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Gartner’s hype cycle 2013

Growing Big Data Platforms

Page 28: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Gartner’s hype cycle 2014

Growing Big Data Platforms

Page 29: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

• Gartner’s hype cycle 2015– Big data dropped from cycle, Big data is now into practice

Growing Big Data Platforms

Page 30: History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

Q&A

• Thank you