1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca...
-
Upload
amanda-burns -
Category
Documents
-
view
213 -
download
0
Transcript of 1 JTE HPC/FS Pastis: a peer-to-peer file system for persistant large-scale storage Jean-Michel Busca...
1JTE HPC/FS
Pastis: a peer-to-peer file system for persistant large-scale storage
Jean-Michel BuscaFabio Picconi
Pierre Sens
LIP6, Université Paris 6 – CNRS, Paris, FranceINRIA, Rocquencourt, France
3JTE HPC/FS
Distributed file systems
Client-server P2P
LAN
(100)
NFS -
Organization(10.000)
AFS FARSITE
Pangaea
Internet(1.000.000)
- Ivy *Oceanstore *
Pastis *
scalability(number of nodes)
architecture
* uses a Distributed Hash Table (DHT) to store data
5JTE HPC/FS
DHTs
logical address space
52
24
7540
18
91
32
66
83
South America
North America
North America
Australia
Asia
Asia
Europe
Europe
Asia
high latency,low bandwidthbetween logical neighbors
Overlay network
6JTE HPC/FS
Insertion of blocks in DHT
04F2
5230
834BC52A
8909
8BB2
3A79
8954
8957AC78
895D
E25A
04F2
3A79
5230
834B8909
8954
8957
8BB2
AC78
C52A
E25A
k = 8958
k = 8959
put(8959,block)
root ofkey 8959 block
Address space
replica
895D
7JTE HPC/FS
PAST: Storage System
PAST: Cooperative, archival file storage and distribution Layered on top of Pastry
Goals: Strong persistence of the data High availability Scalability of the System Reduced cost (no backup) Efficient use of pooled resources
8JTE HPC/FS
Insertion of blocks in DHT
04F2
5230
834BC52A
8909
8BB2
3A79
8954
8957AC78
895D
E25A
04F2
3A79
5230
834B8909
8954
8957
8BB2
AC78
C52A
E25A
k = 8958
k = 8959
put(8959,block)
root ofkey 8959 block
Address space
replica
895D
replica
9JTE HPC/FS
Insertion of blocks in DHT
04F2
5230
834BC52A
8909
8BB2
3A79
8954
8957AC78
895D
E25A
04F2
3A79
5230
834B8909
8954
8957
8BB2
AC78
C52A
E25A
block
Address space
replica
895D
replica
k = 8958
k = 8959
get(8959,block)
10JTE HPC/FS
P2P File systems architecture
put(key, block)
block = get(key)
files and directoriesread-write access
semanticssecurity and access control
DHash / Past
Ivy / Pastis
DHT
FS
- scalability
- fault-tolerance
- self-organization
block store (DHT)
message routing
open(), read(), write(), close(), etc.
11JTE HPC/FS
DHT-based file systems
Ivy [OSDI’02]log-based, one log per user
fast writes, slow reads
limited to small number of users
Oceanstore [FAST’03]updates serialized by primary replicas
partially centralized system
BFT agreement protocol requires well-connected primary replicas
primary replicas
secondary replicas
User A’s log
User B’s log
User C’s log
DHTobject
DHTobject
DHTobject
13JTE HPC/FS
Pastis design
Design goals simple completely decentralized scalable (network size and number of users)
put(key, block) block = get(key)
Pastry
Past
Pastis
DHT
FS
storage
routing
14JTE HPC/FS
Pastis data structures
Data structures similar to the Unix file system inodes are stored in modifiable DHT blocks (UCBs) file contents are stored in immutable DHT blocks (CHBs)
metadata
block addresses
UCB
file inode
CHB1
CHB2
file contents
file contentsUCB
CHB1
CHB2replica sets
DHT address space
Inode key
15JTE HPC/FS
Pastis data structures (cont.) directories contain <file name, inode key> entries use indirect blocks for large files
metadata
block addresses
UCB
directory inode
CHB
file1, key1file2, key2
…metadata
block addresses
UCB
file1 inode
CHB
oldcontents
CHB
indirectblock
CHB
filecontents
CHB
oldcontents
CHB
filecontents
16JTE HPC/FS
Content Hash Block (CHB)
Content Hash Block block has to be immutable
Solution to check and prevent modification
block contents determine block key can detect if block is modified
data block
block key = Hash( block contents )
block contents
17JTE HPC/FS
User Certificate Blocks (UCBs)
UCBs are modifiable by the block owner.
Question: How to check that the file is modified only by the owner?
Protocol
(KBpub
, KBpriv
) associated to each block
The owner builds a signature of the block using KB
priv.
Authentication Verify signature of UCB
using the KBpub
sign(KBpriv)
timestamp
UCB
block key = Hash( KBpub
)
inode contents