Efficient Data Dissemination and Survivable Data Storage

63
Efficient Data Dissemination and Survivable Data Storage Lihao Xu http://www.cs.wayne.edu/ ~lihao/

description

Efficient Data Dissemination and Survivable Data Storage. Lihao Xu http://www.cs.wayne.edu/~lihao/. Ubiquitous Information Access. Key Building Blocks. Storage Retrieval Dissemination Consumption. Key Building Blocks. Storage Retrieval Dissemination Consumption. - PowerPoint PPT Presentation

Transcript of Efficient Data Dissemination and Survivable Data Storage

Page 1: Efficient Data Dissemination and Survivable Data Storage

Efficient Data Dissemination and Survivable Data Storage

Lihao Xuhttp://www.cs.wayne.edu/~lihao/

Page 2: Efficient Data Dissemination and Survivable Data Storage

Ubiquitous Information Access

Page 3: Efficient Data Dissemination and Survivable Data Storage

Key Building Blocks

• Storage

• Retrieval

• Dissemination

• Consumption

Page 4: Efficient Data Dissemination and Survivable Data Storage

Key Building Blocks

• Storage

• RetrievalRetrieval

• Dissemination

• ConsumptionConsumption

Page 5: Efficient Data Dissemination and Survivable Data Storage

Error Correcting Codes

Page 6: Efficient Data Dissemination and Survivable Data Storage

Error Correcting Codes

21 k…3Message

Page 7: Efficient Data Dissemination and Survivable Data Storage

Error Correcting Codes

21 k…3Message

Codeword 21 n - 1…3 n

Page 8: Efficient Data Dissemination and Survivable Data Storage

Error Correcting Codes

21 k…3Message

Codeword 21 n - 1…3 n

m

21 k…3Message

Page 9: Efficient Data Dissemination and Survivable Data Storage

MDS (Maximum Distance Separable ) Codes

m = k

Page 10: Efficient Data Dissemination and Survivable Data Storage

(n,k) MDS Codes

Reed-Solomon (RS) Code

Page 11: Efficient Data Dissemination and Survivable Data Storage

(n,k) MDS Codes

(4,2) B-Code

a

d+c

b

d+a

c

a+b

d

b+c

Page 12: Efficient Data Dissemination and Survivable Data Storage

Data Dissemination:Broadcast Scheduling

Page 13: Efficient Data Dissemination and Survivable Data Storage

WirelessServer

Data Dissemination

want 1want 2

want 1

want 3

WirelessClients

Page 14: Efficient Data Dissemination and Survivable Data Storage

WirelessServer

Broadcast in a Cell

want 1want 2

want 1

want 3

WirelessClients

Page 15: Efficient Data Dissemination and Survivable Data Storage

want 1want 2

want 1

want 3

WirelessServer

Broadcast Model

Model clients as random processesModel clients as random processes Desired item is random with probability Desired item is random with probability ppii

for item for item ii of length of length llii..

WirelessClients

Page 16: Efficient Data Dissemination and Survivable Data Storage

Scheduling Problem

S =

• 2 items, l1=l2

• Each item consists of k packets, k large

• Challenge: choose packet broadcast schedule to minimize wait for clients

1 2 1 2

Page 17: Efficient Data Dissemination and Survivable Data Storage

Prior Work

Complexity of optimal schedules Complexity of optimal schedules Bar-Noy, Bhatia, Naor, Schieber, FoltzBar-Noy, Bhatia, Naor, Schieber, Foltz

Complexity of computing optimal Complexity of computing optimal schedulesschedules Kenyon, SchabanelKenyon, Schabanel

Error correction/detectionError correction/detection BestavrosBestavros

Page 18: Efficient Data Dissemination and Survivable Data Storage

Metric: Delivery Time

Delivery Time for item 1

1,SdelivT

S =

initt

1 2 1 2

Page 19: Efficient Data Dissemination and Survivable Data Storage

Delivery Time

initiS

deliv tT , Total amount of time spent waiting for item i whenstarting at time in schedule S.

initt Instant in time when client starts waiting for item.

S =

initt

1 2 1 2

initt

initiS

deliv tT ,

Page 20: Efficient Data Dissemination and Survivable Data Storage

Expected Delivery Time (EDT)

iS

n

iin EDTppppSEDT ,

121 ),...,,,(

][ ,, init

iSdelivtiS tTEEDT

init

initt uniformly distributed over schedule S.

Page 21: Efficient Data Dissemination and Survivable Data Storage

EDT Calculation

1 2 1 2

P = P = 1/21 2

Page 22: Efficient Data Dissemination and Survivable Data Storage

EDT Calculation

1 2 1 2

DT 2

P = P = 1/21 2

Page 23: Efficient Data Dissemination and Survivable Data Storage

EDT Calculation

1 2 1 2

DT 2 3/2

P = P = 1/21 2

Page 24: Efficient Data Dissemination and Survivable Data Storage

EDT Calculation

1 2 1 2

DT 2 3/2

P = P = 1/21 2

DT1 7/4

Page 25: Efficient Data Dissemination and Survivable Data Storage

EDT Calculation

1 2 1 2

DT 2 3/2

P = P = 1/21 2

DT1 7/4

EDT 7/4

Page 26: Efficient Data Dissemination and Survivable Data Storage

Performance with Errors

Data items consist of Data items consist of kk packets packets What happens if a packet is lost?What happens if a packet is lost?

Original:

Transmitted:

12345 . . . k

12345 . . . k

Received: 1234 . . . k

1

k 1

k 1

Page 27: Efficient Data Dissemination and Survivable Data Storage

Performance with Errors

What happens if a packet is lost?What happens if a packet is lost?

Original:

Transmitted:

12345 . . . k

12345 . . . k

Received: 1234 . . . k

1

k 1

k 1 12345

Page 28: Efficient Data Dissemination and Survivable Data Storage

Performance with Errors

What happens if a packet is lost?What happens if a packet is lost?

Original:

Transmitted:

12345 . . . k

12345 . . . k

Received: 1234 . . . k

1

k 1

k 1 12345

EDT = 3 !

Page 29: Efficient Data Dissemination and Survivable Data Storage

Use Use kk of of nn MDS code, MDS code, nn = 2 = 2kk Now only need to wait for 1 additional packetNow only need to wait for 1 additional packet

Solution – Coding

Original:

Transmitted:

12345 . . . k

12345 . . . k

Received: 1234 . . . k

1

k 1

k 1 1

12345 . . . k

12345 . . . k

k +

k +

k +

Page 30: Efficient Data Dissemination and Survivable Data Storage

EDT = 9/4EDT = 9/4

Solution – Coding

Original:

Transmitted:

12345 . . . k

12345 . . . k

Received: 1234 . . . k

1

k 1

k 1 1

12345 . . . k

12345 . . . k

k +

k +

k +

Page 31: Efficient Data Dissemination and Survivable Data Storage

Solution – Coding

Use Use kk of of nn MDS code, MDS code, mm = 2( = 2(k+1)k+1) Now only need to wait for 1 additional packetNow only need to wait for 1 additional packet

Original:

Transmitted:

12345 . . . k

Received:

1k +

k +

n 12345 . . . kn

12345 . . . k 1n 12345 . . . kn

12345 . . . kn

Page 32: Efficient Data Dissemination and Survivable Data Storage

Solution – Coding

Original:

Transmitted:

12345 . . . k

Received:

1k +

k +

n 12345 . . . kn

12345 . . . k 1n 12345 . . . kn

12345 . . . kn

EDT = 7/4 + e

Page 33: Efficient Data Dissemination and Survivable Data Storage

General Solution

Original:

Transmitted:

12345 . . . k

Received:

1k +

k +

n 12345 . . . kn

12345 . . . k 1n 12345 . . . kn

12345 . . . kn

Given loss probability p, what is the optimal n?

Page 34: Efficient Data Dissemination and Survivable Data Storage

General Solution

Page 35: Efficient Data Dissemination and Survivable Data Storage

General Solution

Page 36: Efficient Data Dissemination and Survivable Data Storage

General Solution

Page 37: Efficient Data Dissemination and Survivable Data Storage

General Solution

k = 100 and p = 0.1

Page 38: Efficient Data Dissemination and Survivable Data Storage

General Solution

k = 100

Page 39: Efficient Data Dissemination and Survivable Data Storage

Two-Channel Broadcasting

WirelessServer

want 1want 2

want 1

want 3

WirelessClients

WirelessServer

Page 40: Efficient Data Dissemination and Survivable Data Storage

Coordinating Schedule Data

Use (2Use (2kk, , kk) MDS code to eliminate data overlap) MDS code to eliminate data overlap Channel 1 sends packets 1 through Channel 1 sends packets 1 through kk (raw data) (raw data) Channel 2 sends packets Channel 2 sends packets kk+1 through 2+1 through 2kk

FeaturesFeatures Each channel is self-sufficientEach channel is self-sufficient No overlap between channelsNo overlap between channels

S1 = 12 1 2

S2 = 12 1 2(same schedule, different data)

Page 41: Efficient Data Dissemination and Survivable Data Storage

Scheduling for two channelsScheduling for two channels Two items with equal length and demandTwo items with equal length and demand Two synchronized channels of equal Two synchronized channels of equal

bandwidthbandwidth First channel’s schedule fixed at 12First channel’s schedule fixed at 12

What is the optimal schedule for channel 2?What is the optimal schedule for channel 2?

Two Broadcast Channels

S1 =

S2 =

1 2

?

Page 42: Efficient Data Dissemination and Survivable Data Storage

Some Schedules

1 2

1 2

1 2

12

1 2

1 2

1 2

1 2

Repeat

Swap

Shift

2

Reshuffle

Unequal Portions

121 112 2 2

1 2

1 12 2

Arbitrary

2

1 11 2 2

Page 43: Efficient Data Dissemination and Survivable Data Storage

Some Schedules

1 2

1 2

1 2

12

1 2

1 2

1 2

1 2

Repeat

Swap

Shift

2

Reshuffle

1 1

Unequal Portions

121 112 2 2

1

1 2

1 12 2

Arbitrary

2

EDT = 1

EDT = 1

EDT = 1

EDT = 1

2 2

Page 44: Efficient Data Dissemination and Survivable Data Storage

Some Schedules

1 2

1 2

1 2

12

1 2

1 2

1 2

1 2

Repeat

Swap

Shift

2

Reshuffle

1 1

Unequal Portions

1 21 112 2 2

1

1 2

1 12 2

Arbitrary

2

EDT = 1

EDT = 1

EDT = 1

EDT = 1

EDT = 63/64

EDT < 63/64?

2 2

Page 45: Efficient Data Dissemination and Survivable Data Storage

Schedule Performance

Symmetric ProblemSymmetric Problem Equal lengthsEqual lengths Equal demandsEqual demands Equal bandwidth channelsEqual bandwidth channels Symmetric “fixed” schedule for 1Symmetric “fixed” schedule for 1stst channel channel

Asymmetric SolutionAsymmetric Solution Asymmetric schedules can beat any symmetric Asymmetric schedules can beat any symmetric

schedule for the 2schedule for the 2ndnd channel channel How is this possible?How is this possible?

Page 46: Efficient Data Dissemination and Survivable Data Storage

More to Explore …

More servers/ChannelsMore servers/Channels Differing levels of synchronizationDiffering levels of synchronization Transmission ErrorsTransmission Errors Streaming DataStreaming Data BoundsBounds Wireless

Server

want 1want 2

want 1

want 3

WirelessClients

WirelessServer

WirelessServer

WirelessServer

Page 47: Efficient Data Dissemination and Survivable Data Storage

Hydra: A Platform for SSS

Page 48: Efficient Data Dissemination and Survivable Data Storage

Secure and Survivable Storage

• Availability

• Recoverability

• Persistence

• Confidentiality

• Integrity

• Scalability

• Efficiency

Page 49: Efficient Data Dissemination and Survivable Data Storage

Secure and Survivable Storage

• Yahoo

• Ebay

• Amazon

• Google

• Banks

• Your Labs

• More …

Page 50: Efficient Data Dissemination and Survivable Data Storage

Hydra

Page 51: Efficient Data Dissemination and Survivable Data Storage

Hydra Design Goals

• Portable to various OS/FS

• Hardware independent

• Unix FS semantics maintained

• Low overhead in performance and storage

• Transport independent

• Easy to install, configure, scale, maintain and

automate

Page 52: Efficient Data Dissemination and Survivable Data Storage

Hydra and System

App.

Hydra

FS

I/O

Page 53: Efficient Data Dissemination and Survivable Data Storage

Hydra and System

App.

Hydra

FS

I/O

App.

Hydra

FS

I/O

Page 54: Efficient Data Dissemination and Survivable Data Storage

Hydra and System

App.

Hydra

FS

I/O

App.

Hydra

FS

I/O

App.

FS/Hydra

I/O

Page 55: Efficient Data Dissemination and Survivable Data Storage

Basics of Hydra

(4,2) B-Code

a

d+c

b

d+a

c

a+b

d

b+c

Page 56: Efficient Data Dissemination and Survivable Data Storage

Performance Test2.4G P4, 512 MB, 80GB ATA/100 7200rpm, Redhat 9.0 (kernel 2.4.2.0)

Operations Throughput (Mbps)

File Read 384 File Write 200 Memory Copy 17572(4,2) B-Code Encoding 5522(4,2) B-Code Decoding 22866 (4,2) RS Encoding 286 (4,2) RS Decoding 216

Page 57: Efficient Data Dissemination and Survivable Data Storage

Hydra Components

• Meta Data ( hnode)

• Operations

• Monitor

Page 58: Efficient Data Dissemination and Survivable Data Storage

Hydra Meta Data

• Code

• Symbol Location

• Data Layout

• Security Flag

• Access Rights

• Extensions

Page 59: Efficient Data Dissemination and Survivable Data Storage

Hydra Operations

• Distribute (Write)

• Recover (Read)

• Detect

• Repair

• Restore

• Others

Page 60: Efficient Data Dissemination and Survivable Data Storage

Hydra Monitor

• Connectivity

• Security

Page 61: Efficient Data Dissemination and Survivable Data Storage

Hydra Applications

• Web Server

• CDN/P2P/Data Server

• Archiving

• Data Security

system activity logger, forensic, file integrity checker …

• Others

Page 62: Efficient Data Dissemination and Survivable Data Storage

Acknowledgement