Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf ·...

22
Correcting Localized Deletions Using Guess & Check Codes Salim El Rouayheb Joint work with Serge Kas Hanna and Hieu Nguyen Rutgers University 55th Annual Allerton Conference on Communication, Control, and Computing

Transcript of Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf ·...

Page 1: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

CorrectingLocalized DeletionsUsingGuess&CheckCodes

SalimElRouayheb

JointworkwithSergeKas HannaandHieu Nguyen

RutgersUniversity

55thAnnualAllertonConferenceonCommunication, Control,andComputing

Page 2: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Motivation

2

• Ourmotivation:filesynchronization,E.g.Dropbox

Alice Bob

10101010…

• Recentapplication:DNA-basedstorage

• DeletionswerefirststudiedbyVarshamov-Tenengolts (‘65)andLevenshtein (‘66)

110010…

• Deletions:10101010 110010Transmitted Received

Page 3: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

LocalizedDeletions

3

• Motivation:filesynchronization,E.g.Dropbox

Localizededits

Page 4: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

PreviousWorkonDeletions

4

Ø Unrestricteddeletions

Ø Bursty deletions

• Filesynchronization:[Maetal.‘11]

• Codeconstructions: [Levenshtein ‘67],[Chengetal.14],[Schoeny etal‘17]

Existenceofcodesforlocalizedmodelw=3,4

• Informationtheoreticapproach:[Gallager ’61],[Dobrushin ‘67];lowerandupperboundsonthecapacity:[Mitzenmacher andDrinea ‘06],[Diggavi etal.‘07],[Kanoria andMontanari ‘13],[Venkataramanan etal.’13]…

• Codeconstructionsandfundamentallimits:[Varshamov andTenenglots‘65],[Levenshtein ‘66],[SchulmanandZuckerman ‘99],[Helberg andFerreira ‘02],[Cullina andKiyavash ‘14],[Gabrys etal.‘16],[Brankensieketal.’16],[Thomas etal.’17]…

• Recentfilesynchronizationalgorithms:[Yazdi andDolecek ‘14],[Venkataramanan etal.‘15],[Salaetal.’17] …

Page 5: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

ModelandContribution

5

! = 101010111000100100010110101110000 01010001100111111000010

Ø # ≤ % deletions localizedinawindowofsize%windowofsize%

# deletions(inred)

Ø Ourassumptions:1)positionsofthedeletions areindependent ofthecodeword;2)informationmessage isuniform iid

Ø Contribution:Explicitcodeswithdeterministicpolynomialtimeencodinganddecodingthatcancorrectlocalizeddeletionswhp

• Logarithmicredundancy:& − ( = ) log ( +% + 1

• Polynomialtimeencodinganddecoding• Asymptoticallyvanishingprobabilityofdecoding failure• Canbegeneralized tomultiplewindows

Guess&Check(GC)Codes

Ø Hardproblemfor% = &

Ø [Schoeny etal.’17]:existenceofcodesfor% = 3,4

Page 6: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

GCCodesExample

6

• Encodingthemessageoflengthk=16: 1110000000110001

1110000000110001GF(17)

14 0 3 1 (6,4)MDS 14 0 3 1 1 0

Ø Guess1:

• Decoding

111000000000112 0 1?

5 12 0 1

2MDSparities

Ø Guess2: 14 3 0 1

Ø Guess3: 14 0 3 1

Ø Guess4: 14 0 0 4

1110 000000001

1110000000001

1110000000001

0

Decodedusing1st parity

Checkwith2nd parity

1110000000001

• Assumethatthedeletions (inred)affectonlyonesystematicblock

16bits 13bits

111000000011 0001

GF(17)

[Kas HannaandElRouayhebISIT17’]

Deletionsoccurinoneofthesewindows

Page 7: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

GeneralizingtoAnyWindowPosition

7

Ø Guess1:

• Decoding,windowofsizelogkcanaffectatmost2consecutive blocks

11000001100013 1?

14 4 3 1

Ø Guess2: 12 7 2 1

Ø Guess3: 12 1 11 15

1100000110001

1100000110001

12

Decodedusing1st &2nd parity

Checkwith3rd parity

• Sameencodingwithoneextraparity

1110010000110001GF(17)

(7,4)MDS 14 4 3 1 5 83MDSparities

12logkbits

1110010000110001 1100000110001

• Assumethat3deletions (inred)affect systematicbits,w=logk=4bits

16bits 13bits

window

Page 8: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

RecoveringtheMDSparities

8

• Buffer:wzeros +asingleone

Ø Howtorecover theMDSparitysymbolsatthedecoder?

Ø Trivialsolution:repeat theparitybits

systematicbits

(# + 1) repetitioncode

Ø Better solution:insertabuffer betweensystematicandMDSparitybits

• Ifparitybitsgetaffected ->simplyoutputthefirst( bits.ElseapplyGuess&Checkdecoding

• Deletionscannotaffect bothsystematicandparitybitssimultaneously

111000000011000100010000MDSparities

systematicbits

11100000001100010000000000001111 0… 0

systematicbits

111000000011000100001 00010000MDSparitiesbuffer

w+1bits

Page 9: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Whendoesdecodingfail?

9

• Encodingthemessageoflengthk=16: 1100010001110010

1100010001110010GF(17)

14 4 7 2 (6,4)MDS 14 4 7 2 8 13

Ø Guess1:

• Decoding

00100011100104 7 2?

14 4 7 2

2MDSparities

Ø Guess2: 2 14 7 2

Ø Guess3: 2 3 1 2

Ø Guess4: 2 3 9 11

0010001110010

001000111 0010

0010001110010

13

Decodedusing1st parity

Checkwith2nd parity

logkbits

0010001110010

• Assumethatthedeletions (inred)affectonlyonesystematicblock

16bits 13bits

1100 010001110010

GF(17)

DecodingFailure

Page 10: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Simulations– DecodingFailure

10

( (messagelengthinbits)

Prob

abilityofFailure

4 = 106 iterations

• Simulationresults for:% = log ( , # = log ( − 1 , c = 3,4MDSparities

Page 11: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

MainResults

11

Sketch of Theorem 2 (8 > : windows):

Ø Encoding complexity is ;(( log (), Decoding complexity is ;((<=>)

Ø Probability of decoding failure: Pr A ≤ (<(B=C)DE

Ø Redundancy: )(F% + 1) log (

Theorem 1 (One window): Guess & Check (GC) codes can correct inpolynomial time up to # ≤ % = G(log () localized deletions, whereH log ( < % < H + 1 log ( for some integer H ≥ 0. Let ) > H+ 2 be aconstant integer.

Ø Encoding complexity is ;(( log (), Decoding complexity is ;((L/ log ()Ø Probability of decoding failure: Pr A ≤ (B=CDE/ log (

Ø Redundancy: ) log ( + %+ 1

Sketch of Theorem 3 [ISIT ‘17] (Unrestricted deletions):Ø Redundancy: )(# + 1) log (

Ø Encoding complexity is ;(( log (), Decoding complexity is ;((N=>/ logN ()Ø Probability of decoding failure: Pr A = ;((>NDE/ logN ()

Page 12: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

TestGCCodesOnline

12

Ø C++&PythoncodesareavailableonGitHub

GitHub repository:https://github.com/serge-k-hanna/GC

Ø TestthecodesonlineusingtheJupyter notebook

Goto:https://try.jupyter.org/

Upload thenotebook filesfromhttp://eceweb1.rutgers.edu/csi/software.html

Ø Formoredetails:http://eceweb1.rutgers.edu/csi/software.html

Page 13: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Simulations– RunningTime

13

C++:Earlyterminationwithprecomputing

Python:Allcaseswithoutprecomputing

Python:Allcaseswithprecomputing

Python:Earlyterminationwithprecomputing

Page 14: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

DecodingFailures:WhatHappened.

13

• Exampleforonedeletion:16-bitmessage0000100011110110

Ø (6,4)MDSencodingoverGF(17): 0 8 15 6 12 52parities

Ø Suppose14th bitgetsdeleted,decoding:v Guess1: 8 4 7 10

v Guess4: 0 8 15 6

Guesses1&4satisfythe2parities

• Probabilityofdecoding failureforagivenstring:combinatorialproblemthatdependsonthestringanddeletionposition

v Guess2:

v Guess3:8 4 7 10

8 4 7 10

• Decodingfailure:morethanonepossibleguess,different decodedstrings

• Proofapproach:assumemessage isuniformiid,averageoverallpossiblemessages

Page 15: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

DecodingFailure– 1Deletion15

Setofalltransmittedk-bitstrings

B B

A

SetofallGCdecoderoutputs

Setsatisfyingall) parities

AssumeWLOGthatGuess 1iscorrect,observetheoutputofdecoderatwrong GuessO ≠ 1

DecodingfailsifdecodedstringisinA

Lemma:atmost2differenttransmitted sequences canleadtothesamedecodedstring inanyGuessO ≠ 1

Setsatisfyingfirstparity

QR decodingfailureinguessO ≠ 1 = QR(decodedstringisin])

···

·

··

14

Fixed:- Guess- Deletion- Firstparity

Page 16: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

ProofofPr(F)forOneDeletion

15

B B

A

k : length of message

Yi : string decoded in Guess i

c : number of parities

A : set satisfying all c parities

B : set satisfying first parity

Unionbound

Lemma

Subspacecardinality

q : field size

Page 17: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Claim

17

Ø Claim1(onedeletion):atmost2different transmittedstringscanleadtothesamedecodedstringinanywrongguess

Setofalltransmittedk-bitstrings

B B

SetofallGCdecoderoutputs

Setsatisfyingfirstparity

·

···

Fixed:- Guess- Deletionposition- Firstparity

Ø Claim2(# deletions): aconstant numberofdifferent transmittedstringscanleadtothesamedecodedstringinanywrongguess

Page 18: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Claim- Example

17

^: = 0000000000000000

^_ = 0010000000000010

000000000000000

000000000000010

Ø Claim1(onedeletion):atmost2different transmittedstringscanleadtothesamedecodedstringinanywrongguess

0000000000000000

Ø Example

Ø 3rd bitdeleted;Guess:deletionoccurred in4th block

0 0 0 0 0parity

` 0 0 ` 0

EncodinginGF(16)

Ø ^L = 1111111111111111 and^a = 0010000000000000

Ø Twoconditions:(1)Symmetryconstraint;(2)Algebraic linearconstraint

Decodeb:

b_Received

Page 19: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Claim

18

Ø Suppose3rd bitisdeleted, guess:deletionoccurred in4th block

8b1 + 4b2 + 2b4 + b5

b1b2b4b5 b6b7b8b9 b10b11b12b13 b14b15b16

8b6 + 4b7 + 2b8 + b9 8b10 + 4b11 + 2b12 + b13 ?

Symbol1 Symbol2 Symbol3 Erasure

Ø Howmanydifferent messagescanleadtosamedecodedstring?

GF(17)

Ø Symmetryconstraint: samedecodedstring⇒ samebitvaluesatpositionsofsymbols1,2and3

Ø Bitswhichcanbedifferent: and(deleted bit)b14, b15, b16 b3

Ø Algebraicconstraint:erasure isdecodedusingfirstparity

4b14 + 2(b3 + b15) + b16 = p1

b14, b15, b16, b3 2 GF (2)

p1 2 GF (17)

Equationhasatmost2solutions

Page 20: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

ApplicationtoFileSynchronization

20

• Interactive synchronizationalgorithmby[Venkataramanan etal.’15]Ø Isolatesingledeletions,useVTcodesØ Modification:isolate# orfewer deletions,useGCcodes

• Gain:(1)lesscommunicationrounds,(2)lowercommunicationcost

Upto75%improvement

Upto15%improvement

19

N=1000iterations

Page 21: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

Summary

16

• Guess&CheckCodes forlocalizeddeletions

Ø Explicitcodeconstruction withlogarithmicredundancy

Ø Deterministicpolynomialtimeencodinganddecoding

Ø Asymptoticallyvanishingprobabilityofdecoding failure

Ø Forsingleormultiplewindows

• Openproblems• Capacityofdeletionchannelwithlocalizeddeletions?• Codes foradversarial localizeddeletions• Andofcourse for“unrestricted” deletioncapacityandcodesarestill

openproblems

Page 22: Correcting Localized Deletions Using Guess & Check Codeseceweb1.rutgers.edu/~csi/Allerton_GC.pdf · v Guess 2: v Guess 3: 8 4 7 10 8 4 7 10 • Decoding failure: more than one possible

20