CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1...

35
CSC 2017 Data Technologies Exercises Lecturer: Alberto Pace Tutorial/Exercises: Andreas-Joachim Peters mail-to [email protected]

Transcript of CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1...

Page 1: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

CSC 2017 Data Technologies Exercises

Lecturer: Alberto Pace

Tutorial/Exercises: Andreas-Joachim Peters

mail-to [email protected]

Page 2: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

TopicsTutorial 1: basic tools for IO measurements and caching effects interactive tutorial

Exercise 1: implementation of Cloud Storage

Exercise 2: Basics of Encryption & RAID technology

Wednesday 1 hour

Wednesday 2 hours Thursday 0-1 hour

Thursday 1-2 hours

Schedule

Page 3: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

What do you know already?

http://etc.ch/84Yd

Please open this anonymous online poll with your phone or laptop …

Progress

Page 4: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

TutorialInteractive

Page 5: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

IO Measurements

Skip caching using

direct IO

Measurement Tools

in non-virtual machines

Page 6: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node
Page 7: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

IO Performance Throughput & Latency & IOPS

Throughput = IO volume / time IOPS = IO operations / time

Page 8: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

IO Performance Latency with multiple clients

Single IO client

Multiple IO clients

Page 9: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

IO Performance IOPS improvements using queuing

IO Queuing allows to trade an average increase in latency for more IOPS! Requests can be reordered to match their physical location on disk and reduce the seek time.

Page 10: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Latency Sequential Operations

Page 11: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Latency Optimization Compound operations - vector read

Page 12: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Latency Optimization Asynchronous Requests - read-ahead

Page 13: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Caching

Page 14: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Tiering

Page 15: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Useful Commands Linux

Realtime, CPU, System Time Measurement time <program>

Copy/Block IO tool dd

Page 16: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Useful Commands Linux

Buffer Cache/Block IO Activityvmstat -n 1

Device Activityiostat -x 1 dstat

Task Overview (proc interface)top, htop, iftop ...

Page 17: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Useful Commands Linux

Trace system commands of a program strace <program> [program args] -f follow forks-c count sys calls -ttt show high time resolution → man strace !

Page 18: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Useful Commands Linux

Flush dirty pages as user or rootsync

Clean page cache → as user:write a new file bigger than the memory size → as root echo 3 > /proc/sys/vm/drop_caches

Page 19: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Exercise 1http://cern.ch/go/9JGC

Page 20: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Cloud Storage Setup BashCloud

REDIS Server

redis_cli

REDIS protocol

Storage Logic: → implemented in client

by you !scalable &

fault tolerant

Page 21: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Cloud Storage Object Storage

basic principles for the exercise

sharding: files are placed and located using a distributed hash table (DHT)

the DHT can be changed to change the storage configuration

files are located computing the SHA1 checksum of their filename in hex representation

files get replicated to each neighbouring node e.g. every file has 3 copies

files can be listed using a 'bucket’

Page 22: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

File Location using Consistent Hashing

Page 23: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Bucket Implementationbuckets are represented by a set of file names e.g. ls / 1.jpg 2.jpg 3.jpg

a set is more suitable than a list because it does not allow duplicated file names

one can also shard buckets for scalability purposes

to list a directory one combines the listing of all participating servers

Page 24: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Basic Ingredients of Cloud Storage

Objects: K-V Store API GET SET DELETE KEYS

Collections:SET API ADD DELETE LISTMEMBERS

Scalability: Sharding of Objects and Collections

Redundancy: Replication & Erasure Encoding

Page 25: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

KV Stores Simple and Complex Data Types available in

REDIS

Files

Bucket

Page 26: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

File

Page 27: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

File

Page 28: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

File

DHT

Page 29: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

DHT

Node 1 Node 2 Node 3 Node 4

File.0 File.1 File.2 File.3 File.4 File.5

hash by name

Page 30: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

File

DHT

Node 1 Node 2 Node 3 Node 4

hash by content

count=2count=1

count=1

count=1 count=1

count=1

Page 31: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Node 1 Node 2 Node 3

Replication Schemes

client side

write pathclient replicates to three nodes

According to the CAP theorem, is that an CA, CP or AP system?

C = Consistency A = Availability P = Partition Tolerance

read pathchoose random node

Page 32: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Node 1 Node 2 Node 3

Replication Schemes

server side

write pathclient replicates trough one node

According to the CAP theorem, is that an CA, CP or AP system?

C = Consistency A = Availability P = Partition Tolerance

read pathfrom write entry node

Page 33: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Exercise 2

Page 34: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

ParityParity defined by XOR Operation

P = X ^ Y

Parity algebra for reconstruction:

P = X^Y Y = P^X X = P^Y

Page 35: CSC-2017-Introduction · 2018. 11. 21. · REDIS Files Bucket. File. File. File DHT. DHT Node 1 Node 2 Node 3 Node 4 File.0 File.1File.2File.3File.4File.5 hash by name. File DHT Node

Data Striping

Example RAID-0 doubles avg. throughput for rw

double avg. IOPS

Disk 1 Disk 2