Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

17
1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca Fabio Picconi Pierre Sens LIP6, Université Paris 6 – CNRS, Paris, France INRIA, Rocquencourt, France

description

Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca Fabio Picconi Pierre Sens LIP6, Université Paris 6 – CNRS, Paris, France INRIA, Rocquencourt, France. Outline. DHT-based File Systems Pastis Performance evaluation. Distributed file systems. - PowerPoint PPT Presentation

Transcript of Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

Page 1: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

1JTE HPC/FS

Pastis: a peer-to-peer file system for persistant large-scale storage

Jean-Michel BuscaFabio Picconi

Pierre Sens

LIP6, Université Paris 6 – CNRS, Paris, FranceINRIA, Rocquencourt, France

Page 2: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

2JTE HPC/FS

1. DHT-based File Systems

2. Pastis

3. Performance evaluation

Outline

Page 3: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

3JTE HPC/FS

Distributed file systems

Client-server P2P

LAN

(100)

NFS -

Organization(10.000)

AFS FARSITE

Pangaea

Internet(1.000.000)

- Ivy *Oceanstore *

Pastis *

scalability(number of nodes)

architecture

* uses a Distributed Hash Table (DHT) to store data

Page 4: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

4JTE HPC/FS

Distributed Hash Tables

52

24

7540

18

91

32

66

83

Page 5: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

5JTE HPC/FS

DHTs

logical address space

52

24

7540

18

91

32

66

83

South America

North America

North America

Australia

Asia

Asia

Europe

Europe

Asia

high latency,low bandwidthbetween logical neighbors

Overlay network

Page 6: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

6JTE HPC/FS

Insertion of blocks in DHT

04F2

5230

834BC52A

8909

8BB2

3A79

8954

8957AC78

895D

E25A

04F2

3A79

5230

834B8909

8954

8957

8BB2

AC78

C52A

E25A

k = 8958

k = 8959

put(8959,block)

root ofkey 8959 block

Address space

replica

895D

Page 7: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

7JTE HPC/FS

PAST: Storage System

PAST: Cooperative, archival file storage and distribution Layered on top of Pastry

Goals: Strong persistence of the data High availability Scalability of the System Reduced cost (no backup) Efficient use of pooled resources

Page 8: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

8JTE HPC/FS

Insertion of blocks in DHT

04F2

5230

834BC52A

8909

8BB2

3A79

8954

8957AC78

895D

E25A

04F2

3A79

5230

834B8909

8954

8957

8BB2

AC78

C52A

E25A

k = 8958

k = 8959

put(8959,block)

root ofkey 8959 block

Address space

replica

895D

replica

Page 9: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

9JTE HPC/FS

Insertion of blocks in DHT

04F2

5230

834BC52A

8909

8BB2

3A79

8954

8957AC78

895D

E25A

04F2

3A79

5230

834B8909

8954

8957

8BB2

AC78

C52A

E25A

block

Address space

replica

895D

replica

k = 8958

k = 8959

get(8959,block)

Page 10: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

10JTE HPC/FS

P2P File systems architecture

put(key, block)

block = get(key)

files and directoriesread-write access

semanticssecurity and access control

DHash / Past

Ivy / Pastis

DHT

FS

- scalability

- fault-tolerance

- self-organization

block store (DHT)

message routing

open(), read(), write(), close(), etc.

Page 11: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

11JTE HPC/FS

DHT-based file systems

Ivy [OSDI’02]log-based, one log per user

fast writes, slow reads

limited to small number of users

Oceanstore [FAST’03]updates serialized by primary replicas

partially centralized system

BFT agreement protocol requires well-connected primary replicas

primary replicas

secondary replicas

User A’s log

User B’s log

User C’s log

DHTobject

DHTobject

DHTobject

Page 12: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

12JTE HPC/FS

Pastis

Page 13: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

13JTE HPC/FS

Pastis design

Design goals simple completely decentralized scalable (network size and number of users)

put(key, block) block = get(key)

Pastry

Past

Pastis

DHT

FS

storage

routing

Page 14: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

14JTE HPC/FS

Pastis data structures

Data structures similar to the Unix file system inodes are stored in modifiable DHT blocks (UCBs) file contents are stored in immutable DHT blocks (CHBs)

metadata

block addresses

UCB

file inode

CHB1

CHB2

file contents

file contentsUCB

CHB1

CHB2replica sets

DHT address space

Inode key

Page 15: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

15JTE HPC/FS

Pastis data structures (cont.) directories contain <file name, inode key> entries use indirect blocks for large files

metadata

block addresses

UCB

directory inode

CHB

file1, key1file2, key2

…metadata

block addresses

UCB

file1 inode

CHB

oldcontents

CHB

indirectblock

CHB

filecontents

CHB

oldcontents

CHB

filecontents

Page 16: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

16JTE HPC/FS

Content Hash Block (CHB)

Content Hash Block block has to be immutable

Solution to check and prevent modification

block contents determine block key can detect if block is modified

data block

block key = Hash( block contents )

block contents

Page 17: Pastis:  a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca

17JTE HPC/FS

User Certificate Blocks (UCBs)

UCBs are modifiable by the block owner.

Question: How to check that the file is modified only by the owner?

Protocol

(KBpub

, KBpriv

) associated to each block

The owner builds a signature of the block using KB

priv.

Authentication Verify signature of UCB

using the KBpub

sign(KBpriv)

timestamp

UCB

block key = Hash( KBpub

)

inode contents