Copyright © Curt Hill 2005-2012 Error detection and correction Techniques to Increase the...

Copyright © Curt Hill 2005-2012

Error detection and correction

Techniques to Increase the Reliability


The Story Starts with Parity• The original error detection scheme• Originally used on Teletypes in 1930s• Transmitting data over

telephone/telegraph lines• Add one bit to check if rest of data was

correct• Next used for tape drives in 1950s• Nine track tapes had eight data and

one parity– Before that were seven and eight track

tapes


How Does Parity Work?• Parity was set to even or odd• The data bits were summed• Using even parity the parity bit was

set to make the sum even• If data is: 0 1 1 0 1 0 1 0 set the

parity bit to 0 – It is already even

• If data is: 1 1 1 0 0 1 0 1 set the parity bit to 1– To make it even


Example• Consider the following data:• 01101100

– Even parity: 011011000 – Odd parity: 011011001

• 01110110– Even parity: 011101101 – Odd parity: 011101100


Options• How many bits were being transmitted

in a character?• How fast were they being transmitted?• What parity was being used?• How many start and stop bits?• The answers to all these questions

constitute a Protocol• The protocol did not matter as both

the sender and receiver agreed on the same one


Parity checking • It is easy to check and generate• If there is a one bit error it will detect it but

give you no clue where it is• An even number of bits error is

undetectable, but odd number of bits error is detectable

• The number of data bits is irrelevant– A single parity bit can be put on 8 bits or 50– However, as the number of bits gets larger the

protection gets smaller


Semiconductor Memory• Many machines also used parity to

check their memory• The original IBM PC and many

successors used 9 bits to store an 8 bit byte

• If a parity error was detected the machine halted with a parity error

• Even that generation was not as reliable as hoped


Is this the best we can do?• If we employ multiple error detection

bits we can detect multiple bit errors and correct single bit errors

• How do we correct an error?• Since each bit may only have two

states 0 or 1 all we have to know is which bit is bad– Correct it by reversing it

• How do we find location of the error?


Basic Scheme• Four data bits (0-3)• Three syndrome bits (a-c)• Each parity bit protects just three

of the four– A protects 0-2 with parity– B protects 1-3 with parity– C protects 0,1,3 with parity

• Each bit is protected by two or three of the parity bits

• The number of parity bits indicate which bit was in error


Four data and three parity

a b

c

1

243Error Parity

1 a, b

2 a, b, c

3 a, c

4 b, c

a a

b b

c c


More• Single error bit correction, by flipping

the bit that is indicated• It takes one more bit to detect two bit

errors– Otherwise it will be mistaken for a one bit

error and corrected

• The three bits are collectively called the syndrome

• This scheme uses three error bits for four data bits and is somewhat wasteful

Who is responsible?• Richard Hamming developed the first

error correction codes– Known as Hamming codes

• Worked at Bell Labs• Was concerned about transmitting

digital data at high speeds– Telephone calls were often digitized by this

time

• Parity is sufficient for telegraph which is relative low speed, but not telephone



Error Correction• In order to get error correction we

need:2K-1>M+Kwhere M is the number of data bits and K is the number of check (error correction) bits

• For four data bits we need four syndrome bits to get error correction24-1 = 1515 > 4 + 4


Observations• Notice the formula

2K-1>M+K• Exponentials grow rather faster

than sums• Thus adding one to the number

of syndrome bits doubles the number of protected bits

• Large word sizes are proportionally easier to protect than small


Summary

Data bits

SEC % diff SEC/DED % diff

8 4 50% 5 62.5%

16 5 31.3%

6 37.5%

32 6 18.8%

7 18.75%

64 7 10.9%

8 12.5%

• The number of bits needed is summarized by this table:


Layout• Lets try eight data bits and four bits of

error correction• The four bits can generate 16 values 0-15• We want value zero to represent no error• If the four bit value contains a single one

bit that will indicate that the error is in the check bits and thus no correction is needed

• If the four bit value contains more than one bit then we want this number to tell us the bit that is off


Layout Picture12 1 1 0 0 M8

13 1 0 1 1 M7

14 1 0 1 0 M6

09 1 0 0 1 M5

08 1 0 0 0 C8

07 0 1 1 1 M4

06 0 1 1 0 M3

05 0 1 0 1 M2

04 0 1 0 0 C4

03 0 0 1 1 M1

02 0 0 1 0 C2

01 0 0 0 1 C1

• Each check bit guards the bits that have that bit as a position

• The C8 checks every bit which has a 1 in the 8s bit


Computing Check Bits• Each check bit is the even parity of

four or five bits• They are calculated as follows• The computed syndrome bits are

compared with the received syndrome and should give the bit number


An Example• Consider the data bits: 0 1 0 0 1 1 1 0• Locate 0 1 0 0 c 1 1 1 c 0 c c• Even parity for C8 bit – 1

– 0 1 0 0 1 1 1 1 c 0 c c

• Even parity for C4 bit – 1– 0 1 0 0 1 1 1 1 1 0 c c

• Even parity for C2 bit – 1– 0 1 0 0 1 1 1 1 1 0 1 c

• Even parity for C1 bit – 1– 0 1 0 0 1 1 1 1 1 0 1 1


Suppose An Error• The original word now come back as:

– 0 1 0 1 1 1 1 1 1 0 1 1

• Compute the new check bits– C8 = 0– C4 = 1– C2 = 1– C1 = 0

• C8 and C1 disagree• Sum them, the error is in position 9, which

is data bit M5


Some examples of use• IBM 30xx use an 8bit SEC-DED for

each 64 bits, hence they have 12% overhead

• DEC VAX uses a 7 bit SEC-DED for each 32 bits, hence they have a 22% overhead

• Some versions of RAID also use this to guarantee recoverability in the case of disk errors

Addendum• Recently these concepts have been

applied to internet/wireless data packets

• When a mobile phone drops a packet it must ask for it again

• This greatly reduces perceived bandwidth

• A study in Boston showed that 3% of packets for mobile phones were lost

• Packet loss on a fast moving train is typically 5%


Packets• The solution is to put error correction

codes for nearby packets in adjacent packets– Known as coded TCP

• In a study doing this with 2% packet loss boosted apparent bandwidth from 1 to 16 Mbit

• If loss rates are low this does not help but losses in wireless networks are always present


Copyright © Curt Hill 2005-2012 Error detection and correction Techniques to Increase the...

Documents

Transcript of Copyright © Curt Hill 2005-2012 Error detection and correction Techniques to Increase the...