A lecture on hashes

A lecture on hashes

By Charles Morris

What is a hash table?

• A hash table is an array-like data structure that associates its input (the key) with the associated output (the record, or value).

• They use a ‘Hashing Function’ to create the association; more on this later.

• Hashes were first known as “Associative Arrays” (and you may think of them as so); but using seven syllables is tiring.

What is a hash table?

Why a hash?

• Hashes have many distinct benefits.– Hashes are designed so that you may find a

variable without knowing its location.– Computational complexity for lookup varies;

Almost always O(1) or O(2), but in the case of `c` collisions, (where c <= n) it is O(c) (like searching an array).

What is a ‘hash function’?

• A hash function is simply a function that generates a fingerprint based on it’s input.

• Hashing functions are used in many fields, in cryptography one-way hash functions are used to create a small pseudo-random checksum out of (normally) much larger data. This checksum is used for authentication and secure network transmissions, amongst other applications.

Hash functions and Collisions

• In a one-way cryptographic hash function, the amount of collisions is not as important as the randomness of the collisions. These functions try to minimize the computational feasibility of finding a key ‘j’ such that j = f(k); where k is the original key.

Hash functions and Collisions

• In a hash function used to store data in a hash table, collisions should be minimized as much as possible; as every time a collision happens for key ‘j’ where j = f(k); it increases the O(lookup of f(k)).

• This assumes that your hash table watches for collisions, if it doesn’t, the old value will be trampled with the new value.

Collision mitigation

• As was stated, if two keys ‘k’ and ‘j’ both hash to the same index ( f(j) == f(k) ), this causes a collision.

• Collisions are avoided by using a good hashing algorithm, however they always happen to some degree when a hash function is given enough data to run on.

Collision resolution

• Chaining• Instead of one value being at the location f(k), there is a

chain (maybe a linked-list) of values. • Linear or Quadratic Probing is a similar solution, where

space is reserved at certain locations.

• Double hashing– Double hashing requires another hash computation,

O(2n), however the records become so sparse that collisions become very rare.

Hash function example

• The hashing function used in Perl 5.005:» (close relative of the popular ‘djb2’ algorithm)

// (Defined by the PERL_HASH macro in hv.h)

// ported by Charles Morris (me) for C++ programmers

unsigned long hashingfunction( string key )

{

unsigned long fingerprint = 0;

for( int j = 0; j < key.length(); j++) //for each letter in the string

{

fingerprint = fingerprint * 33 + (int)key.at(j); //sum

}

return fingerprint;

}

Hash function example

• hf(‘abc’) using the previous functionfingerprint = 0

//fingerprint = fingerprint * 33 + (int)’a’;

fingerprint = 0 * 33 + 97; //fingerprint = 97

//fingerprint = fingerprint * 33 + (int)’b’;

fingerprint = (97 * 33) + 98; //fingerprint = 3299

//fingerprint = fingerprint * 33 + (int)’c’;

fingerprint = (3299 * 33) + 99; //fingerprint = 108966

A lecture on hashes

Documents

Transcript of A lecture on hashes