Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect...

17
Dan Ernst Improving Reliability Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC. Detected with parity: 1 bit errors Corrected with ECC (Error correcting codes)

Transcript of Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect...

Page 1: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Improving ReliabilityImproving ReliabilityWe used parity to determine when a memory bit failed. We can protect

buses from transmission failures using parity/ECC.

Detected with parity: 1 bit errors

Corrected with ECC (Error correcting codes)

Page 2: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

ParityParity

If any two different valid datum, in memory or on a bus, differ by at least 2 bits:

It is easy to detect if one bit fails sincea one bit failure will result in an invaliddata value.

How can we make sure any good data differsby at least two bits?

Page 3: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Why parity works to detect one bit errors…Why parity works to detect one bit errors…

• All valid (and unique) data encoding must differ by at least one bit.– Argument: if they don’t they aren’t unique.

• Pick any two values:– If they differ by more than one bit, a single bit error will not turn one into the other– If they differ by one bit then if we count the number of 0 bits in their encoding, one of the

encodings will have an odd number of 0s, the other will have an even number and therefore their parity bits MUST also differ, so they will also differ by two bits. So we can’t change one valid encoding into another by changing only 1 bit!

Page 4: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

ExampleExample• Two number: 100110 and 101110• Add odd parity bits: 1001100 and 1011101• Now these numbers differ by 2 bits• What if they already differ by more than one bit?

– No problem, a 1-bit error can’t turn one into the other

Page 5: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Error Correcting CodesError Correcting CodesIf any two different valid datum, in memory or on a bus, differ by at least 3 bits:

It is easy to detect and correct if one bit fails since a one bit failure will result in an invaliddata value and we know which valid datavalue is only one bit away.

How can we make sure any good data differsby at least three bits?

Page 6: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Error Correcting CodesError Correcting Codes• Use multiple parity bits, each computing parity over a different set of data bits.• Each data bit is used to calculate parity by a different combination (or

permutation) of 2 or more parity bits.– data bit 0 may be used in the calculation of parity bits 1 and 2, – while data bit 1 is used by parity bits 1 and 3.

• When a parity bit is flipped, only its parity calculation will be wrong.

Page 7: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

ECC on 4 bits of dataECC on 4 bits of data• Data bit 0 is used by parity 0 and 1• Data bit 1 is used by parity 0 and 1 and 2• Data bit 2 is used by parity 0 and 2• Data bit 3 is used by parity 1 and 2

– P0 = odd_parity (D0, D1, D2)– P1 = odd_parity (D0, D1, D3)– P2 = odd_parity (D1, D2, D3)

Page 8: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Calculating ECC (4 bits)Calculating ECC (4 bits)

0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 1

1 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

D3 D2 D1 D0 D3 D2 D1 D0 P2 P1 P0 P2 P1 P0

Page 9: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Calculating ECC (4 bits)Calculating ECC (4 bits)

0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 1

1 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

D3 D2 D1 D0 D3 D2 D1 D0 P2 P1 P0 P2 P1 P0

10010110

10010110

Page 10: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Calculating ECC (4 bits)Calculating ECC (4 bits)

0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 1

1 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

D3 D2 D1 D0 D3 D2 D1 D0 P2 P1 P0 P2 P1 P0

10010110

10010110

10011001

01100110

Page 11: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Calculating ECC (4 bits)Calculating ECC (4 bits)

0 0 0 00 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 00 1 1 1

1 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

D3 D2 D1 D0 D3 D2 D1 D0 P2 P1 P0 P2 P1 P0

10010110

10010110

10011001

01100110

11000011

00111100

Page 12: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Test questionTest question• The following 4 bit data value is encoded with ECC (as shown in class).

Unfortunately it has a 1 bit error, fix it!

0 0 0 1 0 1 0

D3 is used by P2 and P1

P2 P1 P0 D3 D2 D1 D0

Page 13: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

Test question SolutionTest question Solution• The flipped bit is D3, which should be a 1.

1 0 0 1 0 1 0

D3 is used by P2 and P1

P2 P1 P0 D3 D2 D1 D0

Page 14: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

How many ECC parity bits (P) do you need for N How many ECC parity bits (P) do you need for N bits of data?bits of data?

• You use 1 ECC parity bit pattern for each data bit error.– N bit patterns for fixing 1 bit errors

• Plus 1 more pattern for each parity bit– P bit patterns

• Plus 1 pattern for “correct value”

N + P + 1 = 2P

Page 15: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

How many ECC bit do you need for a 78 bit bus?How many ECC bit do you need for a 78 bit bus?

78 + P + 1 2P

Patterns I need to decidewhich bit is wrong if 1 bitis flipped

Unique pattern Ican represent withP ECC parity bits

26 = 64 (too small)27 = 128 (which is > 78+7+1)So, 7 ECC parity bits needed

Page 16: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

QuestionQuestion• Prove that ECC has at least 3 bits different in any two representations.• Answer:

– if they are different, then they must differ by at least 1 data bit.– Each data bit is covered by at least 2 parity bits

• Those parity bits must now differ since one must have an odd number of data 1’s and the other even.

Page 17: Dan Ernst Improving Reliability We used parity to determine when a memory bit failed. We can protect buses from transmission failures using parity/ECC.

Dan Ernst

QuestionQuestion• How do you decide which bit is wrong?• If only one parity bit is the wrong parity, then that parity bit has been corrupted

– Because and data bit is checked by at least 2 parity bits a data bit failure will cause 2 or more parity bit errors.

• If two or more parity bits are wrong, then the pattern of the parity bits that fail uniquely identify the corrupted data bit.