Compliance collisions-misconceptions
Click here to load reader
-
Upload
richard-bocchinfuso -
Category
Documents
-
view
297 -
download
0
Transcript of Compliance collisions-misconceptions
Page 1
Compliance, Collisions and Common Misconceptions by Rich Bocchinfuso
Over the past couple of years compliance has become a major buzz word in the storage
industry. Regulatory bodies such as the SEC and Federal Government have mandated
that organizations begin adhering to the over 16,000 worldwide regulations.
There are numerous technologies that the storage industry has responded with, many of
the them legacy technologies and many of them new more advanced technologies which
have changed the face compliance and long term archiving. While traditional
technologies such as WORM optical and WORM tape continue to play an active role in
compliance and long term archiving, CAS (Content Addressable Storage) has emerged as
the technology of choice. Organizations can now cost effectively store petabytes of data
on online rotational disk media with reliability, availability, manageability, and
serviceability that surpasses that of traditional WORM devices.
While Content Addressable Storage has revolutionized the compliance and long term
archiving market place there have been some concerns raised in the past 6 to 9 months.
This article will examine at a high level the workings of Content Addressable Storage and
some of the associated concerns.
The basic premise for Content Addressable Storage is that data send to a compliant
device is hashed and stored on the device. The hash is the digital fingerprint for the data,
in theory the only way to generate a duplicate hash is to hash the exact same data. The
concept of digitally fingerprinting data has provided benefits beyond compliance and
guaranteed authenticity. Hashing facilitated single instance storage so data with an
identical fingerprint can be de-duplicated, this type of functionality has a positive
cascading effect throughout and organization. CAS vendors provide hashing algorithm
options with their products, the most common hashing algorithms are MD5, SHA-1,
SHA-256.
Criteria MD5 SHA-1 SHA-256
Key Length 128 bit 160 bit 256 bit
Maximum Size of data infinite 2^64 bits 2^64 bits
Main advantage Speed Security Potentially more secure
Recently collisions have been discovered for both MD5 and SHA-1. Essentially these
collisions were produced in the lab and were found by generating data that produces the
same MD5 or SHA-1 hash.
Page 2
MD5 Collision Example:
$M1 = h2b(“
d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 87 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 71 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd f2 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 b4 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 a8 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 2b 6f f7 2a 70”);
$M2 = h2b(“
d1 31 dd 02 c5 e6 ee c4 69 3d 9a 06 98 af f9 5c
2f ca b5 07 12 46 7e ab 40 04 58 3e b8 fb 7f 89
55 ad 34 06 09 f4 b3 02 83 e4 88 83 25 f1 41 5a
08 51 25 e8 f7 cd c9 9f d9 1d bd 72 80 37 3c 5b
d8 82 3e 31 56 34 8f 5b ae 6d ac d4 36 c9 19 c6
dd 53 e2 34 87 da 03 fd 02 39 63 06 d2 48 cd a0
e9 9f 33 42 0f 57 7e e8 ce 54 b6 70 80 28 0d 1e
c6 98 21 bc b6 a8 83 93 96 f9 65 ab 6f f7 2a 70”);
$ md5sum.exe M1 M2; sha1sum.exe M1 M2 79054025255fb1a26e4bc422aef54eb4 *vec1
79054025255fb1a26e4bc422aef54eb4 *vec2
a34473cf767c6108a5751a20971f1fdfba97690a *vec1
4283dd2d70af1ad3c2d5fdc917330bf502035658 *vec2
The above example represents a hash collision, H(M) = H(M1). This hash collision was
lab generated by looking for random theoretical data that would cause a hash collision.
A brute force and birthday attack both solve for collisions by generating M and M
1 until
there is a hash collision.
The MD5 hash function produces 128-bit values, whereas SHA–1 produces 160-bit
values and SHA-256 produces a 256-bit value. The question becomes how many bits do
we need for security? Practically 2128
, 2160
and 2256
are all more than large enough to
thwart a brute force attack that simply searches randomly for colliding pairs (M,M1).
However, a Birthday Attack reduces the size of the search space to roughly the square
root of the original size. Thus, MD5 has roughly the same resistance to the birthday
attack as a cryptosystem with 64-bit keys would have to a brute force attack. Similarly,
SHA–1’s effective size in terms of birthday attack resistance is only 80-bits, etc….
The birthday attack is named for the birthday paradox. This is the fact that there is
approximately a 50–50 chance that two people in a room of 23 strangers have the same
birthday. For a complete description of the birthday paradox click the following link
http://en.wikipedia.org/wiki/Birthday_paradox.
A birthday attack essentially creates random messages, takes their hash value, and checks
to see if that hash value has been encountered before. For MD5, as an example, an
attacker could expect to find collisions after trying 264
messages. Given today's
computing power, this is a difficult, but not impossible task.
While collisions to both MD5 and SHA-1 have been found using both brute force and
birthday attacks these are not real world examples. The concept of artificially generating
random data until a collision is found in now way threatens the authenticity or integrity of
existing data.
Page 3
The two hash attacks that can cause authenticity and data integrity problems are a 1st
preimage and 2nd
preimage attack.
A 1st preimage attack is best described as given X (X represents an existing hash) solve
for M, or by the equation H(M) = X. A 1st preimage attack has never been successful
against any of the mentioned hashing algorithms.
A 2nd
preimage attack can be described as a given M solve for M1 where the hashes are
equal. This can be represented by the equation H(M) = H(M1). A 2
nd preimage attack
has also never been successful against any of the mentioned hashing algorithms.
While brute force and birthday attacks do provide reason for concern for both MD5 and
SHA-1 the key is to consider is what damage can generating two bogus messages with
the same hash do? Why is this important?
Image for a moment that an adversary constructs two messages with the same hash where
one message appears legitimate or innocuous. For example, suppose the attacker
discovers that the message "I, Bob, agree to pay Charlie $ 5000.00 on 4/12/2005." has the
same hash as "I, Bob, agree to pay Charlie $18542841.54 on 9/27/2012." Charlie could
then try to get the victim to digitally sign the first message (e.g., by purchasing $5000 of
goods). Charlie would then claim that Bob actually signed the second message, and
"prove" this assertion by showing that Bob's signature matches the second message.
While in theory this possible this would involve a 2nd
preimage attack which has never
successful against any of the aforementioned algorithms.
In conclusion there is no reason to believe that we are even close to a successful 1st
preimage or 2nd
preimage attack. There is no reason to panic over the use of MD-5 and
SHA-1 algorithms. I believe that realistically there is no problem with using either MD5
or SHA-1 for the foreseeable future. Customers who are concerned with MD5 and/or
SHA-1 algorithms should ask their CAS vendor about alternative hashing algorithms,
most vendors will support multiple.