[IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer...

An FPGA Implementation and Comparison of the SHA-256 and Blake-256

Fatma Kahri, Belgacem Bouallegue, Mohsen Machhout and Rached Tourki Electronics and Micro-Electronics Laboratory (E. µ. E. L)

Faculty of Sciences of Monastir, Tunisia [email protected]

Abstract— Since the beginning of study of the Secure Hash function (SHA), it has been thoroughly studied by designers with the goal of reducing the area, frequency, throughput and power consumption of the hardware implementation of this cryptosystem. The Secure Hash function algorithm has become the default choice for security services in numerous applications. Following the attacks considerable standard SHA-2, a new version of hash was developed known as SHA3. In this paper, we discussed the study of the SHA-3 hash exposing the protocol chosen for our Blake-256 application. We study the hash function family SHA-256 and Blake-256. Moreover, we conduct a comparative study between the two hash family. The SHA-256 and Blake-256 have been implemented on Xilinx Virtex-5 Virtex-6 Virtex-7 FPGA. Their area, frequency, throughput, efficiency have been compared and it is shown that the blake-256 achieves good performance in terms of area, throughput and efficiency.

Keywords— Cryptography, Hash functions, SHA-2 (256), FPGA. SHA-3, BLAKE, FPGA.

I. INTRODUCTION Today’s modern world of e-mail, internet banking, on-line

shopping, and other sensitive digital communications, cryptography has become a vital tool for ensuring the privacy of data transfers.

All A hash function is a type of cryptographic primitives. Hash algorithms take as input a message of arbitrary length, and produce a hash or message digest as output. This process can be denoted as:

( )h H M= (1) Where M is the input message and h is the hash generated

by the hash algorithm H. Normally, the size of the hash h is fixed by the algorithm. A cryptographically strong hash function has the following properties: One-way property: ( )H x h= (2) Weak collision resistance: ( ) ( )H x H y= (3) Strong collision resistance: ( ) ( )H x H y= . (4)

Hash functions operate at the root of many popular cryptographic methods in current use, such as the Digital Signature Standard (DSS), Transport Layer Security (TLS) and Internet Protocol Security (IPSec) protocols, numerous random number generation algorithms, encryption algorithms,

all-or-nothing transforms, and pass-word storage mechanisms [1,2,3].

II. BAKGROUND Some descriptions of SHA-1, SHA-2 and SHA-3

algorithms can be found in the official NIST standard [4]. Table 1 shows a comparative study of three hash functions characteristics. The security of these hash functions is controlled by the size of their outputs, referred to as hash values. All functions have a similar internal structure and process each message block using multiple rounds. . These hash functions enable the determination of a message’s integrity: any change to the message will result in a different produced message digest, with a very high probability.

TABLE 1: Functional characteristics of four hash functions

Hash function SHA 1 SHA 2 SHA 3

Constants Kt number 4 64 16

Size of hash value (n) 160 256 256

Complexity of the best attack 2 80 2

Message size <2 64 <2

Message block size (m) <2 64 <2

Message block size (m) 512 512 512

Word size 32 32 32

Numbers of words 5 8

Digest rounds number 80 64 10

III. SHA-2 DESCRIPTION

A. General SHA-256 accepts messages with arbitrary lengths up to

264-bits. The SHA256 Hash function produces a final digest message of 256 bits that is dependent of the input message, composed by multiple blocks of 512 bits each. This input block is expanded and fed to the 64 cycles of the SHA256 function in words of 32 bits each (denoted by Wt). Intermediate hash values are rerouted back into the compression loop.[5,6]

14th international conference on Sciences and Techniques of Automatic control & computer engineering - STA'2013, Sousse, Tunisia, December 20-22, 2013

STA'2013-PID3195-CEM

978-1-4799-2954-2/13/$31.00 ©2013 IEEE 152

B. Message Padding The binary message to be processed is appended with a ‘1’

and padded with zeros until its length ≡ 448 mod 512. The original message length is then appended as a 64-bit binary number. The resultant padded message is parsed into N512-bit blocks, denoted M(1), M(2), ..., M(N). These M(i) message blocks are passed individually to the message expander [7].

C. Preprocessing As with other popular hashing functions, with SHA256 the

message to be hashed is first padded so that its final length is a multiple of 512 bits. The n-bit message is padded so that a single 1-bit is added into the end of the message. Then, 0-bits are added until the length of the message is congruent to 448 modulo 512. A 64-bit representation of n is appended to the result of the padding. Thus, the result message is a multiplicity of 512 bits. This message is denoted here as M(i). M(i) message blocks are passed individually to the message expander. Padding can be represented as:

L+1+k = 448 mod 512 Fig. 1. Message preprocessing

D. Algorithm The message, M is expended by a message Scheduler

according to the following function:

For j = 0 to 15: W = Mj(i) and

For J = 16 to 63{

Wj←σ1(Wj-2) + Wj-7 + σ0(Wj-15) + Wj-16

}

For i=1 to N

{

Initialize registers a, b, c, d, e, f, g, h with the (i-1)st

intermediate hash value.

Apply the following compression function to registers a-h:

For j= 0 to 63

{

T1← h+∑1(e)+Ch(a, b, c)+Kj+ Wj

T2←∑0(a) + Maj(a, b, c)

h←g , g←f, f←e, e←d+T1

d←c, c←a, b←a, a←T1+T2

}

ith intermediate hash:

H1(i) ← a+H1

(i-1)

….

H8(i) ← h+H8

(i-1)

}

The hash of M:

H(N) =(H1(N), H2

(N),…, H8(N))

IV. SHA-3 DESCRIPTION

A. Algorithm Description In the following part, we briefly describe the main

concepts used in hash function Blake 256 and the general design choices we have taken for our hardware implementations. BLAKE is our candidate for SHA-3. We did not reinvent the wheel; BLAKE is built on previously studied components, chosen for their complementarily [8].

This section defines BLAKE-256, going from its constant parameters to its compression function. The Blake-32 compression function works on an internal state of 512 bits, represented as a (4 × 4) matrix of 32-bits words Figure 2 shows the matrix:

0 1 2 3 0 1 2 3

4 5 6 7 4 5 6 7

8 9 10 11 0 0 1 1 2 2 3 3

12 13 14 15 0 4 0 5 1 6 1 7

v v v v h h h hv v v v h h h hv v v v s c s c s c s cv v v v t c t c t c t c

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥←⎢ ⎥ ⎢ ⎥⊗ ⊗ ⊗ ⊗⎢ ⎥ ⎢ ⎥⊗ ⊗ ⊗ ⊗⎣ ⎦ ⎣ ⎦

Fig. 2. Matrix of 32 bits

With • v = v0…v7: initial value as SHA-256. • c = c0…c15: constants. • t = t0, t1: counter. • s = s0…s3: salt.

One time the state v is initialized; the compression function iterates a series of 14 rounds. A round is a conversion of the state v that computes. G0 (v0, v4, v8, v12), G1 (v1, v5, v9, v13), G2 (v2, v6, v10, v14) G3 (v3, v7, v11, v15), G4 (v0, v5, v10, v15), G5 (v1, v6, v11, v12), G6 (v2, v7, v8, v13), G7 (v3, v4, v9, v14). Where, at round r, Gi (a, b, c, d) sets:

(2 ) (2 1)

(2 1) (2 )

( )( ) 16

( ) 12( )

( ) 8

( ) 7

r i r i

r i r i

a a b m cd d ac c db b ca a b m cd d cc c db b c

σ σ

σ σ

+

+

← + ⊕← ⊕ >>>← +← + >>>← + + ⊕← + >>>← +← ⊕ >>>

Message M Value of l000000…0 1

k-bits l-bits 64-bits

N*512-bits


153

The first four calls G0 . . . G3 can be used in computers, for

the reason that each of them updates a different column of the matrix. We call the method of computing G0, . . . , G3 a column step. Also, the last four calls G4, . . . ,G7 update diverse diagonals thus can be parallelized as well, which we call a diagonal step. A single G makes six additions modulo 232, six XORs and four individual word rotations by a fixed distance. Figure 3shows the G functions for index i. A single round consists of eight invocations of the G function: Four on the columns of the state and four on the diagonals of the state. A totality of ten rounds is executed.

Fig. 3. The Gi function.

Later than the rounds sequence, the new chain value h’0, . . , h’7 With input of the initial chain value h0, . . . , h7 and the salt s0, . . . , s3: The finalization takes the output of the ten rounds and combines it with the input chaining value and the salt.

B. BLAKE-256 hash function The whole hash function operation is divided in two

stages: (1) padding and (2) hash computation. Pre-processing involves padding the input message, parsing the padded data into a number of m-bit blocks (m = 512) and setting the appropriate initial values, which are used in the hash computation. The calculation of hash functions requires the use functions applied to the padded data, constants and word logical and algebraic operations, to generate iteratively a series of hash values. After a specified number of transformation rounds the produced hash value turns becomes equal to the message digest. BLAKE-256 compression function is used iteratively as follows: h0 v for i =0,…,N-1 hi+1 compress (hi, mi, s, li) return hn

V. BLAKE-256/SHA-256 PROCESSOR DESIGN IMPLEMENTATION

A. Design processor This section presents the architectural design of our

programmable BLAKE-256/SHA-256 processor, our implementations.

A high- level block based of our proposed processor is

shown in figure 4. The given architecture supports four

operation modes for reconfigurable BLAKE-256 /SHA-256 processor. The given architecture supports four operation modes for reconfigurable BLAKE-256 /SHA-256 processor.

A Bus Interface Unit has been integrated in order for the proposed processor to communicate efficiently with the external environment.

The Control Unit is designed to control the flow of data in the design, as well as data exchange between the Padded procedure Unit and Hash Computation Unit. A Finite State Machine (FSM) is used for this function.

Padded Process Unit pads the input data messages and converts them to 512 bit.

The Hash Computation Unit is the principal data path

component of the system architecture. BLAKE -256 requires 14 cycles to produce the 256-bit message digests. Each cycle requires the previous round’s, as well as the constant value Ci, the core utilize eight 32-bit words: a-d, wish are initialized to predefined values IV0(0)… IV7(0),[9] at the start of each call to the hash function [8].

The Hash Computation Unit is the principal data path component of the system architecture. SHA -256 requires 64 cycles to produce the 256-bit message digests. Each cycle requires the previous round’s, as well as the constant value Ki, the core utilize eight 32-bit words: a-d, wish are initialized to predefined values H1(0)… H8(0), at the start of each call to the hash function

B. FPGA Implementation • In this section, we present the implementation of

Blake-256 and SHA-256 VHDL it is used as the hardware description language thanks to the flexibility to exchange among environments. The code is pure VHDL that could easily be implemented on other devices, without changing the design. The software used for this work is Xilinx - Project Navigator, ISE 14.1 suite. This is used for writing, debugging and optimizing efforts, and also for fitting, simulating and checking the performance results using the simulation tools available on ModelSim 6.1 software.

In the design of the hash function architectures described in this paper, the goal was to give a baseline comparison between the hash functions using area and throughput. We calculate the throughput as follows:

.( ( 1) ( )block sizeThroughput

T HTime N HTime N−=

+ −

Where block_size is a message block size, characteristic for each hash function, HTime (N) is a total number of clock cycle necessary to hash an N-block message, T is a clock period. Table 2 shows that the number of occupied slices decreases depending on the platform used, according to the circuit using FPGA there is a change in frequency. The increase in the


154

frequency leads to an increase of the dynamic power with a high Throughput.

Table 3 and figure 5 show that the number of occupied slices decreases depending on the platform used, according to the circuit using FPGA there is a change in frequency. The increase in the frequency leads to an increase of the dynamic power with a high Throughput.

TABLE 2 Results for BLAKE -256

REFERENCES

VIRTEX 5 (65nm)

VITREX 6 (40nm)

VIRTEX 7(28 nm)

Area 691 (6%) 594 (1%) 522 (<1%) Frequency (Mhz) 79 81 82 Throughput(Gbps) 2.53 2.59 2.62 Efficiency (Gbps/slices) 3.66 4.42 4.97

TABLE 3 Results for SHA -256

REFERENCES Throughput (Gbps)

Efficiency (Gbps/slices)

VIRTEX2 0.73 0.5 VIRTEX4 1.06 0.72 VIRTEX5 2.30 0.89 VITREX6 3.81 1.39 VIRTEX7 4.2 1.47

300

350

400

450

500

550

600

650

700

750

9

4648

367

348

222

180

126350

365

391

677

Are

a (s

lices

)

Area

Virtex 2 Virtex 4 Virtex 5 Virtex 6 Virtex 7

694

100

150

200

250

300

350

400

Freq

uenc

y (M

Hz)

Frequency

0

10

20

30

40

50

Pow

er a

vera

ge(m

W)

Power average

Fig. 5. Results for SHA-256

VI. COMPARISON AND DISCUSSION In this section we have presented a comparison between SHA-256 and Blake-256. Fig 6 shows the results of synthesis of standard SHA-256 and Blake-256 in all Virtex.

Data-in

Hash Computation Unit

8×32 bit

ROM bloks

Unit Wt

Bus interface unit

Constant unit

Control unit

Hashed message 32-bit

32-bit

32-bit

Control

Control

32-bit

Clo

ck

Star

t

Res

et

Data-out

Padded data

32-bit

Padded unit

FIG. 4. Proposed Architecture of The Hash Function


155

Fig. 6.a

Fig. 6.b

Fig. 6.c

Fig. 6.d

Fig. 6 comparison between SHA-256 and Blake 256

The synthesis results shows the new proposed hash function Blake-256 less area resources compared with previous implementations SHA-256, and achieves a compared frequency with standard SHA-256. The standard hash family requires 64 cycles but Blake-256 requires 14 cycles. The lower number of cycles increases the speed of the SHA-3 standard against SHA-2.

The new proposed hash function performs much better compared with the implementations of the hash family standard SHA-256.

Finally SHA-3 provides a high security compared to their predecessors of the SHA-2 family.

VII. CONCLUSION In this paper we have presented an architecture and

efficient hardware implementation of SHA -256. Blake-256 secure hash algorithm. We reported the implementation results of SHA-256 and new hash function on Xilinx Virtex 2, Virtex 5, Virtex 6 and Virtex 7 FPGAs. We reported the performance of our implementation in terms of area, throughput, frequency and efficiency and compared the standard with the newest hash function.

REFERENCES [1] J.Aumasson, L.Willi Meier, and Raphael C.-W. Phan. SHA-3 proposal

BLAKE,version 1.3. Available online at http://131002. net/blake/blake.pdf, 2008.

[2] Christophe De Canniére and Christian Rechberger. Findin SHA1 Characteristics: General Results and Applications. In Xuejia Lai and Kefei Chen, editors, Advances in Cryptology -ASIACRYPT 2006, 12th International Conference on the Theory and Application of Cryptology and Information Security, Shanghai, China, December 3-7, 2006, Proceedings, volume 4284 of Lecture Notes in Computer Science, pages 1–20. Springer, 2006.

[3] National Institute of Standards and Technology, “Secure Hash Standard”, Federal Information Processing Standards 180-2, August 2002.

[4] National Institute of Standards and Technology, “Secure Hash Standard”, Federal Information Processing Standards 180-2, August 2002.

[5] Dadda, L., Macchetti, M., Owen, J.: The Design of a High Speed ASIC Unit for the Hash Function SHA-256 (384, 512). In: DATE, IEEE Computer Society (2004) 70–75

0.72 0.89

1.39 1.47

2.212.53 2.59 2.62

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Virtex 4 Virtex 5 Virtex 6 Virtex 7Thr

ough

put(

Gbp

s)

Platforms

Throughput

SHA-2 (256)

SHA-3 (256)

677

391 365 350

1265

691522 594

0

200

400

600

800

1,000

1,200

1,400

Virtex 4 Virtex 5 Virtex 6 Virtex 7

Are

a(s

lices

)

Platforms

Area

SHA-2 (256)

SHA-3 (256)

1.06

2.30

3.814.20

1.75

3.66

4.974.42

0

1

2

3

4

5

6


effiv

ienc

y (G

bps/

slic

es)

Platforms

Efficiency

SHA-2 (256)

SHA-3 (256)

180222

348 367

69 79 81 82

0

50

100

150

200

250

300

350

400


Freq

uenc

y (M

Hz)

Platforms

Frequency

SHA-2 (256)

SHA-3 (256)


156

[6] National Institute of Standards and Technology, "Secure Hash Standard", Federal Information Processing Standards 180-1, April 1995.

[7] Robert P. McEvoy, Francis M. Crowe, Colin C. Murphy and William P. Marnane. Optimisation of the SHA-2 Family of Hash Functions on FPGAs

[8] Fatma Kahri, Belgacem Bouallegue, Mohsen Machhout and Rached TourkiAn FPGA implementation of the SHA-3: The Blake Hash Function 10th International Multi-Conference on systems, Signals & Devices March 18 - 21, 2013 - Hammamet, Tunisia

[9] J. Philippe Aumasson, L Henzen W. Meier, SHA-3 proposal BLAKE varsion 1.3, decembre 16, 2010


157

[IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer...

Documents

Transcript of [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer...