Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells,...
-
date post
15-Jan-2016 -
Category
Documents
-
view
215 -
download
0
Transcript of Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems Hakim Weatherspoon, Chris Wells,...
Naming and Integrity: Self-Verifying Data in Peer-to-Peer Systems
Hakim Weatherspoon, Chris Wells, John KubiatowiczUniversity of California, Berkeley
Future Directions in Distributed Computing. Thursday, June 6, 2002
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:2
OceanStore Context: Ubiquitous Computing
• Computing everywhere:– Desktop, Laptop, Palmtop.– Cars, Cellphones.– Shoes? Clothing? Walls?
• Connectivity everywhere:– Rapid growth of bandwidth in the interior of the
net.– Broadband to the home and office.– Wireless technologies such as CDMA, Satellite,
laser.
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:3
Archival Storage
• Where is persistent information stored?– Want: Geographic independence for availability,
durability, and freedom to adapt to circumstances
• How is it protected?– Want: Encryption for privacy, secure naming and
signatures for authenticity, and Byzantine commitment for integrity
• Is it Available/Durable? – Want: Redundancy with continuous repair and
redistribution for long-term durability
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:4
Path of an Update
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:5
Questions about Data?
• How to use redundancy to protect against data being lost?
• How to verify data?
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:6
Archival Dissemination Built into Update
• Erasure codes– redundancy without overhead of strict replication– produce n fragments, where any m is sufficient to reconstruct data. m < n.
rate r = m/n. Storage overhead is 1/r.
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:7
Durability
1.E-70
1.E-60
1.E-50
1.E-40
1.E-30
1.E-20
1.E-10
1.E+00
0 6 12 18 24
Repair Time (Months)
Pro
ba
bilit
y o
f B
loc
k F
ailu
re p
er
Ye
ar
n = 4 fragmentsn = 8 fragmentsn = 16 fragmentsn = 32 fragmentsn = 64 fragments
• Fraction of Blocks Lost Per Year (FBLPY)*– r = ¼, erasure-encoded block. (e.g. m = 16, n = 64)– Increasing number of fragments, increases durability of block
• Same storage cost and repair time.
– n = 4 fragment case is equivalent to replication on four servers.* Erasure Coding vs. Replication, H. Weatherspoon and J. Kubiatowicz, In Proc. of IPTPS
2002.
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:8
Naming and Verification Algorithm
• Use cryptographically secure hash algorithm to detect corrupted fragments.
• Verification Tree: – n is the number of
fragments.– store log(n) + 1 hashes
with each fragment.
– Total of n.(log(n) + 1) hashes.
• Top hash is a block GUID (B-GUID).
– Fragments and blocks are self-verifying
Fragment 3:
Fragment 4:
Data:
Fragment 1:
Fragment 2:
H2 H34 Hd F1 - fragment data
H14 data
H1 H34 Hd F2 - fragment data
H4 H12 Hd F3 - fragment data
H3 H12 Hd F4 - fragment data
F1 F2 F3 F4
H1 H2 H3 H4
H12 H34
H14
B-GUID
HdData
Encoded Fragments
F1
H2
H34
Hd
Fragment 1: H2 H34 Hd F1 - fragment data
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:9
Enabling Technology
GUID
Fragments
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:10
Complex Objects I
Unit of Coding
data
Verification Tree
GUID of d
Encoded Fragments:Unit of Archival Storage
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:11
Complex Objects II
GUID of d1
d1
Unit of CodingEncoded Fragments:Unit of Archival Storage
Verification Tree
BlocksData
VGUID
d2 d4d3 d8d7d6d5 d9
DataB -Tree
Indirect Blocks
M
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:12
Complex Objects III
DataBlocks
VGUIDi VGUIDi + 1
d2 d4d3 d8d7d6d5 d9d1
Data B -Tree
IndirectBlocks
M
d'8 d'9
Mbackpointer
copy on write
copy on write
AGUID = hash{name+keys}
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:13
Mutable Data
• Need mutable data for real system.– Entity in network.– A-GUID to V-GUID mapping.– Byzantine Commitment for Integrity– Verifies client privileges.– Creates a serial order. – Atomically applies update.
• Versioning system– Each version is inherently read-only.
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:14
Archiver I
• Archiver Server Architecture– Requests to archive
objects recv’d thru network layers.
– Consistency mechanisms decides to archive obj.
Asynchronous DiskAsynchronous Network
Network
Operating System
Java Virtual Machine
ThreadScheduler
X
Y
Consistency
Location & Routing
Archiver
IntrospectionModules
Dispatch
4
2
31
4
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:15
Archiver Control Flow
GenerateChkptStage
GenerateFragsStage
DisseminatorStage
ConsistencyStage(s)
GenerateFragsChkptReqDiss
eminate
FragsR
eq
GenerateFragsReq
GenerateFragsResp
GenerateFragsChkptResp
Send Frags toStorage Servers
DisseminateFragsResp
Req
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:16
Performance
• Performance of the Archival Layer– Performance of an OceanStore server in archiving a objects.– analyze operations of archiving data (this includes signing
updates in a BFT protocol).• No archiving• Inlined archiving (synchronous) (m = 16, n = 32)• Delayed archiving (asynchronous) (m = 16, n = 32)
• Experiment Environment– OceanStore servers were analyzed on a 42-node cluster. – Each machine in the cluster is a
• IBM xSeries 330 1U rackmount PC with• two 1.0 GHz Pentium III CPUs• 1.5 GB ECC PC133 SDRAM• two 36 GB IBM UltraStar 36LZX hard drives.• The machines use a single Intel PRO/1000 XF gigabit Ethernet
adaptor to connect to a Packet Engines• Linux 2.4.17 SMP kernel.
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:17
Performance: Throughput
• Data Throughput– No archive 5MB/s.– Delayed 3MB/s.– Inlined 2.5MB/s.
Update Throughput vs. UpdateSize
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
1 10 100 1000 10000UpdateSize (kB)
Th
rou
gh
pu
t (M
B/s
)
Delayed Archive
Inlined Archive
No Archive
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:18
Performance: Latency
• Latency– Archive only
• Y-intercept 3ms, slope 0.3s/MB.
– Inlined Archive = No archive + only archive.
Update Latency vs. Update Size
y = 0.6x + 29.6
y = 0.3x + 3.0
y = 1.2x + 36.4
0
10
20
30
40
50
60
70
80
2 7 12 17 22 27 32
UpdateSize (kB)
Late
ncy (
in m
s)
Inlined Archive
No Archive
Only Archive
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:19
Future Directions
• Caching for performance– Automatic Replica Placement
• Automatic Repair
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:20
Caching
• Automatic Replica Placement– Replicas are soft-state.– Can be constructed and destroyed as necessary.
• Prefetching– Reconstruct replicas from fragments in advance of
use
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:21
Efficient Repair
• Global.– Global Sweep and repair not
efficient.– Want detection/notification of
node removal in system.– Not as affective as distributed
mechanisms.
• Distributed.– Exploit DOLR’s distributed
information and locality properties.
– Efficient detection and then reconstruction of fragments.
3274
4577
5544
AE87
3213
9098
1167
6003
0128
L2L2
L1
L1
L2
L2
L3
L3
L2
L1L1
L2L3
L2
Ring of L1Heartbeats
FuDiCo 2002 ©2002 Hakim Weatherspoon/UC Berkeley Naming and Integrity:22
Conclusion
• Storage efficient, self-verifying mechanism.– Erasure codes are good.
• Self-verifying data assist in– Secure read-only data – Secure caching infrastructures– Continuous adaptation and repair
For more information: http://oceanstore.cs.berkeley.edu/