[IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer...
Transcript of [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer...
![Page 1: [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA2013) - Sousse (2013.12.20-2013.12.22)] 14th International Conference](https://reader031.fdocuments.in/reader031/viewer/2022020618/575097081a28abbf6bcfd5fb/html5/thumbnails/1.jpg)
An FPGA Implementation and Comparison of the SHA-256 and Blake-256
Fatma Kahri, Belgacem Bouallegue, Mohsen Machhout and Rached Tourki Electronics and Micro-Electronics Laboratory (E. µ. E. L)
Faculty of Sciences of Monastir, Tunisia [email protected]
Abstract— Since the beginning of study of the Secure Hash function (SHA), it has been thoroughly studied by designers with the goal of reducing the area, frequency, throughput and power consumption of the hardware implementation of this cryptosystem. The Secure Hash function algorithm has become the default choice for security services in numerous applications. Following the attacks considerable standard SHA-2, a new version of hash was developed known as SHA3. In this paper, we discussed the study of the SHA-3 hash exposing the protocol chosen for our Blake-256 application. We study the hash function family SHA-256 and Blake-256. Moreover, we conduct a comparative study between the two hash family. The SHA-256 and Blake-256 have been implemented on Xilinx Virtex-5 Virtex-6 Virtex-7 FPGA. Their area, frequency, throughput, efficiency have been compared and it is shown that the blake-256 achieves good performance in terms of area, throughput and efficiency.
Keywords— Cryptography, Hash functions, SHA-2 (256), FPGA. SHA-3, BLAKE, FPGA.
I. INTRODUCTION Today’s modern world of e-mail, internet banking, on-line
shopping, and other sensitive digital communications, cryptography has become a vital tool for ensuring the privacy of data transfers.
All A hash function is a type of cryptographic primitives. Hash algorithms take as input a message of arbitrary length, and produce a hash or message digest as output. This process can be denoted as:
( )h H M= (1) Where M is the input message and h is the hash generated
by the hash algorithm H. Normally, the size of the hash h is fixed by the algorithm. A cryptographically strong hash function has the following properties: One-way property: ( )H x h= (2) Weak collision resistance: ( ) ( )H x H y= (3) Strong collision resistance: ( ) ( )H x H y= . (4)
Hash functions operate at the root of many popular cryptographic methods in current use, such as the Digital Signature Standard (DSS), Transport Layer Security (TLS) and Internet Protocol Security (IPSec) protocols, numerous random number generation algorithms, encryption algorithms,
all-or-nothing transforms, and pass-word storage mechanisms [1,2,3].
II. BAKGROUND Some descriptions of SHA-1, SHA-2 and SHA-3
algorithms can be found in the official NIST standard [4]. Table 1 shows a comparative study of three hash functions characteristics. The security of these hash functions is controlled by the size of their outputs, referred to as hash values. All functions have a similar internal structure and process each message block using multiple rounds. . These hash functions enable the determination of a message’s integrity: any change to the message will result in a different produced message digest, with a very high probability.
TABLE 1: Functional characteristics of four hash functions
Hash function SHA 1 SHA 2 SHA 3
Constants Kt number 4 64 16
Size of hash value (n) 160 256 256
Complexity of the best attack 2 80 2
Message size <2 64 <2
Message block size (m) <2 64 <2
Message block size (m) 512 512 512
Word size 32 32 32
Numbers of words 5 8
Digest rounds number 80 64 10
III. SHA-2 DESCRIPTION
A. General SHA-256 accepts messages with arbitrary lengths up to
264-bits. The SHA256 Hash function produces a final digest message of 256 bits that is dependent of the input message, composed by multiple blocks of 512 bits each. This input block is expanded and fed to the 64 cycles of the SHA256 function in words of 32 bits each (denoted by Wt). Intermediate hash values are rerouted back into the compression loop.[5,6]
14th international conference on Sciences and Techniques of Automatic control & computer engineering - STA'2013, Sousse, Tunisia, December 20-22, 2013
STA'2013-PID3195-CEM
978-1-4799-2954-2/13/$31.00 ©2013 IEEE 152
![Page 2: [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA2013) - Sousse (2013.12.20-2013.12.22)] 14th International Conference](https://reader031.fdocuments.in/reader031/viewer/2022020618/575097081a28abbf6bcfd5fb/html5/thumbnails/2.jpg)
B. Message Padding The binary message to be processed is appended with a ‘1’
and padded with zeros until its length ≡ 448 mod 512. The original message length is then appended as a 64-bit binary number. The resultant padded message is parsed into N512-bit blocks, denoted M(1), M(2), ..., M(N). These M(i) message blocks are passed individually to the message expander [7].
C. Preprocessing As with other popular hashing functions, with SHA256 the
message to be hashed is first padded so that its final length is a multiple of 512 bits. The n-bit message is padded so that a single 1-bit is added into the end of the message. Then, 0-bits are added until the length of the message is congruent to 448 modulo 512. A 64-bit representation of n is appended to the result of the padding. Thus, the result message is a multiplicity of 512 bits. This message is denoted here as M(i). M(i) message blocks are passed individually to the message expander. Padding can be represented as:
L+1+k = 448 mod 512 Fig. 1. Message preprocessing
D. Algorithm The message, M is expended by a message Scheduler
according to the following function:
For j = 0 to 15: W = Mj(i) and
For J = 16 to 63{
Wj←σ1(Wj-2) + Wj-7 + σ0(Wj-15) + Wj-16
}
For i=1 to N
{
Initialize registers a, b, c, d, e, f, g, h with the (i-1)st
intermediate hash value.
Apply the following compression function to registers a-h:
For j= 0 to 63
{
T1← h+∑1(e)+Ch(a, b, c)+Kj+ Wj
T2←∑0(a) + Maj(a, b, c)
h←g , g←f, f←e, e←d+T1
d←c, c←a, b←a, a←T1+T2
}
ith intermediate hash:
H1(i) ← a+H1
(i-1)
….
H8(i) ← h+H8
(i-1)
}
The hash of M:
H(N) =(H1(N), H2
(N),…, H8(N))
IV. SHA-3 DESCRIPTION
A. Algorithm Description In the following part, we briefly describe the main
concepts used in hash function Blake 256 and the general design choices we have taken for our hardware implementations. BLAKE is our candidate for SHA-3. We did not reinvent the wheel; BLAKE is built on previously studied components, chosen for their complementarily [8].
This section defines BLAKE-256, going from its constant parameters to its compression function. The Blake-32 compression function works on an internal state of 512 bits, represented as a (4 × 4) matrix of 32-bits words Figure 2 shows the matrix:
0 1 2 3 0 1 2 3
4 5 6 7 4 5 6 7
8 9 10 11 0 0 1 1 2 2 3 3
12 13 14 15 0 4 0 5 1 6 1 7
v v v v h h h hv v v v h h h hv v v v s c s c s c s cv v v v t c t c t c t c
⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥←⎢ ⎥ ⎢ ⎥⊗ ⊗ ⊗ ⊗⎢ ⎥ ⎢ ⎥⊗ ⊗ ⊗ ⊗⎣ ⎦ ⎣ ⎦
Fig. 2. Matrix of 32 bits
With • v = v0…v7: initial value as SHA-256. • c = c0…c15: constants. • t = t0, t1: counter. • s = s0…s3: salt.
One time the state v is initialized; the compression function iterates a series of 14 rounds. A round is a conversion of the state v that computes. G0 (v0, v4, v8, v12), G1 (v1, v5, v9, v13), G2 (v2, v6, v10, v14) G3 (v3, v7, v11, v15), G4 (v0, v5, v10, v15), G5 (v1, v6, v11, v12), G6 (v2, v7, v8, v13), G7 (v3, v4, v9, v14). Where, at round r, Gi (a, b, c, d) sets:
(2 ) (2 1)
(2 1) (2 )
( )( ) 16
( ) 12( )
( ) 8
( ) 7
r i r i
r i r i
a a b m cd d ac c db b ca a b m cd d cc c db b c
σ σ
σ σ
+
+
← + ⊕← ⊕ >>>← +← + >>>← + + ⊕← + >>>← +← ⊕ >>>
Message M Value of l000000…0 1
k-bits l-bits 64-bits
N*512-bits
14th international conference on Sciences and Techniques of Automatic control & computer engineering - STA'2013, Sousse, Tunisia, December 20-22, 2013
153
![Page 3: [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA2013) - Sousse (2013.12.20-2013.12.22)] 14th International Conference](https://reader031.fdocuments.in/reader031/viewer/2022020618/575097081a28abbf6bcfd5fb/html5/thumbnails/3.jpg)
The first four calls G0 . . . G3 can be used in computers, for
the reason that each of them updates a different column of the matrix. We call the method of computing G0, . . . , G3 a column step. Also, the last four calls G4, . . . ,G7 update diverse diagonals thus can be parallelized as well, which we call a diagonal step. A single G makes six additions modulo 232, six XORs and four individual word rotations by a fixed distance. Figure 3shows the G functions for index i. A single round consists of eight invocations of the G function: Four on the columns of the state and four on the diagonals of the state. A totality of ten rounds is executed.
Fig. 3. The Gi function.
Later than the rounds sequence, the new chain value h’0, . . , h’7 With input of the initial chain value h0, . . . , h7 and the salt s0, . . . , s3: The finalization takes the output of the ten rounds and combines it with the input chaining value and the salt.
B. BLAKE-256 hash function The whole hash function operation is divided in two
stages: (1) padding and (2) hash computation. Pre-processing involves padding the input message, parsing the padded data into a number of m-bit blocks (m = 512) and setting the appropriate initial values, which are used in the hash computation. The calculation of hash functions requires the use functions applied to the padded data, constants and word logical and algebraic operations, to generate iteratively a series of hash values. After a specified number of transformation rounds the produced hash value turns becomes equal to the message digest. BLAKE-256 compression function is used iteratively as follows: h0 v for i =0,…,N-1 hi+1 compress (hi, mi, s, li) return hn
V. BLAKE-256/SHA-256 PROCESSOR DESIGN IMPLEMENTATION
A. Design processor This section presents the architectural design of our
programmable BLAKE-256/SHA-256 processor, our implementations.
A high- level block based of our proposed processor is
shown in figure 4. The given architecture supports four
operation modes for reconfigurable BLAKE-256 /SHA-256 processor. The given architecture supports four operation modes for reconfigurable BLAKE-256 /SHA-256 processor.
A Bus Interface Unit has been integrated in order for the proposed processor to communicate efficiently with the external environment.
The Control Unit is designed to control the flow of data in the design, as well as data exchange between the Padded procedure Unit and Hash Computation Unit. A Finite State Machine (FSM) is used for this function.
Padded Process Unit pads the input data messages and converts them to 512 bit.
The Hash Computation Unit is the principal data path
component of the system architecture. BLAKE -256 requires 14 cycles to produce the 256-bit message digests. Each cycle requires the previous round’s, as well as the constant value Ci, the core utilize eight 32-bit words: a-d, wish are initialized to predefined values IV0(0)… IV7(0),[9] at the start of each call to the hash function [8].
The Hash Computation Unit is the principal data path component of the system architecture. SHA -256 requires 64 cycles to produce the 256-bit message digests. Each cycle requires the previous round’s, as well as the constant value Ki, the core utilize eight 32-bit words: a-d, wish are initialized to predefined values H1(0)… H8(0), at the start of each call to the hash function
B. FPGA Implementation • In this section, we present the implementation of
Blake-256 and SHA-256 VHDL it is used as the hardware description language thanks to the flexibility to exchange among environments. The code is pure VHDL that could easily be implemented on other devices, without changing the design. The software used for this work is Xilinx - Project Navigator, ISE 14.1 suite. This is used for writing, debugging and optimizing efforts, and also for fitting, simulating and checking the performance results using the simulation tools available on ModelSim 6.1 software.
In the design of the hash function architectures described in this paper, the goal was to give a baseline comparison between the hash functions using area and throughput. We calculate the throughput as follows:
.( ( 1) ( )block sizeThroughput
T HTime N HTime N−=
+ −
Where block_size is a message block size, characteristic for each hash function, HTime (N) is a total number of clock cycle necessary to hash an N-block message, T is a clock period. Table 2 shows that the number of occupied slices decreases depending on the platform used, according to the circuit using FPGA there is a change in frequency. The increase in the
14th international conference on Sciences and Techniques of Automatic control & computer engineering - STA'2013, Sousse, Tunisia, December 20-22, 2013
154
![Page 4: [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA2013) - Sousse (2013.12.20-2013.12.22)] 14th International Conference](https://reader031.fdocuments.in/reader031/viewer/2022020618/575097081a28abbf6bcfd5fb/html5/thumbnails/4.jpg)
frequency leads to an increase of the dynamic power with a high Throughput.
Table 3 and figure 5 show that the number of occupied slices decreases depending on the platform used, according to the circuit using FPGA there is a change in frequency. The increase in the frequency leads to an increase of the dynamic power with a high Throughput.
TABLE 2 Results for BLAKE -256
REFERENCES
VIRTEX 5 (65nm)
VITREX 6 (40nm)
VIRTEX 7(28 nm)
Area 691 (6%) 594 (1%) 522 (<1%) Frequency (Mhz) 79 81 82 Throughput(Gbps) 2.53 2.59 2.62 Efficiency (Gbps/slices) 3.66 4.42 4.97
TABLE 3 Results for SHA -256
REFERENCES Throughput (Gbps)
Efficiency (Gbps/slices)
VIRTEX2 0.73 0.5 VIRTEX4 1.06 0.72 VIRTEX5 2.30 0.89 VITREX6 3.81 1.39 VIRTEX7 4.2 1.47
300
350
400
450
500
550
600
650
700
750
9
4648
367
348
222
180
126350
365
391
677
Are
a (s
lices
)
Area
Virtex 2 Virtex 4 Virtex 5 Virtex 6 Virtex 7
694
100
150
200
250
300
350
400
Freq
uenc
y (M
Hz)
Frequency
0
10
20
30
40
50
Pow
er a
vera
ge(m
W)
Power average
Fig. 5. Results for SHA-256
VI. COMPARISON AND DISCUSSION In this section we have presented a comparison between SHA-256 and Blake-256. Fig 6 shows the results of synthesis of standard SHA-256 and Blake-256 in all Virtex.
Data-in
Hash Computation Unit
8×32 bit
ROM bloks
Unit Wt
Bus interface unit
Constant unit
Control unit
Hashed message 32-bit
32-bit
32-bit
Control
Control
32-bit
Clo
ck
Star
t
Res
et
Data-out
Padded data
32-bit
Padded unit
FIG. 4. Proposed Architecture of The Hash Function
14th international conference on Sciences and Techniques of Automatic control & computer engineering - STA'2013, Sousse, Tunisia, December 20-22, 2013
155
![Page 5: [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA2013) - Sousse (2013.12.20-2013.12.22)] 14th International Conference](https://reader031.fdocuments.in/reader031/viewer/2022020618/575097081a28abbf6bcfd5fb/html5/thumbnails/5.jpg)
Fig. 6.a
Fig. 6.b
Fig. 6.c
Fig. 6.d
Fig. 6 comparison between SHA-256 and Blake 256
The synthesis results shows the new proposed hash function Blake-256 less area resources compared with previous implementations SHA-256, and achieves a compared frequency with standard SHA-256. The standard hash family requires 64 cycles but Blake-256 requires 14 cycles. The lower number of cycles increases the speed of the SHA-3 standard against SHA-2.
The new proposed hash function performs much better compared with the implementations of the hash family standard SHA-256.
Finally SHA-3 provides a high security compared to their predecessors of the SHA-2 family.
VII. CONCLUSION In this paper we have presented an architecture and
efficient hardware implementation of SHA -256. Blake-256 secure hash algorithm. We reported the implementation results of SHA-256 and new hash function on Xilinx Virtex 2, Virtex 5, Virtex 6 and Virtex 7 FPGAs. We reported the performance of our implementation in terms of area, throughput, frequency and efficiency and compared the standard with the newest hash function.
REFERENCES [1] J.Aumasson, L.Willi Meier, and Raphael C.-W. Phan. SHA-3 proposal
BLAKE,version 1.3. Available online at http://131002. net/blake/blake.pdf, 2008.
[2] Christophe De Canniére and Christian Rechberger. Findin SHA1 Characteristics: General Results and Applications. In Xuejia Lai and Kefei Chen, editors, Advances in Cryptology -ASIACRYPT 2006, 12th International Conference on the Theory and Application of Cryptology and Information Security, Shanghai, China, December 3-7, 2006, Proceedings, volume 4284 of Lecture Notes in Computer Science, pages 1–20. Springer, 2006.
[3] National Institute of Standards and Technology, “Secure Hash Standard”, Federal Information Processing Standards 180-2, August 2002.
[4] National Institute of Standards and Technology, “Secure Hash Standard”, Federal Information Processing Standards 180-2, August 2002.
[5] Dadda, L., Macchetti, M., Owen, J.: The Design of a High Speed ASIC Unit for the Hash Function SHA-256 (384, 512). In: DATE, IEEE Computer Society (2004) 70–75
0.72 0.89
1.39 1.47
2.212.53 2.59 2.62
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Virtex 4 Virtex 5 Virtex 6 Virtex 7Thr
ough
put(
Gbp
s)
Platforms
Throughput
SHA-2 (256)
SHA-3 (256)
677
391 365 350
1265
691522 594
0
200
400
600
800
1,000
1,200
1,400
Virtex 4 Virtex 5 Virtex 6 Virtex 7
Are
a(s
lices
)
Platforms
Area
SHA-2 (256)
SHA-3 (256)
1.06
2.30
3.814.20
1.75
3.66
4.974.42
0
1
2
3
4
5
6
Virtex 4 Virtex 5 Virtex 6 Virtex 7
effiv
ienc
y (G
bps/
slic
es)
Platforms
Efficiency
SHA-2 (256)
SHA-3 (256)
180222
348 367
69 79 81 82
0
50
100
150
200
250
300
350
400
Virtex 4 Virtex 5 Virtex 6 Virtex 7
Freq
uenc
y (M
Hz)
Platforms
Frequency
SHA-2 (256)
SHA-3 (256)
14th international conference on Sciences and Techniques of Automatic control & computer engineering - STA'2013, Sousse, Tunisia, December 20-22, 2013
156
![Page 6: [IEEE 14th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA2013) - Sousse (2013.12.20-2013.12.22)] 14th International Conference](https://reader031.fdocuments.in/reader031/viewer/2022020618/575097081a28abbf6bcfd5fb/html5/thumbnails/6.jpg)
[6] National Institute of Standards and Technology, "Secure Hash Standard", Federal Information Processing Standards 180-1, April 1995.
[7] Robert P. McEvoy, Francis M. Crowe, Colin C. Murphy and William P. Marnane. Optimisation of the SHA-2 Family of Hash Functions on FPGAs
[8] Fatma Kahri, Belgacem Bouallegue, Mohsen Machhout and Rached TourkiAn FPGA implementation of the SHA-3: The Blake Hash Function 10th International Multi-Conference on systems, Signals & Devices March 18 - 21, 2013 - Hammamet, Tunisia
[9] J. Philippe Aumasson, L Henzen W. Meier, SHA-3 proposal BLAKE varsion 1.3, decembre 16, 2010
14th international conference on Sciences and Techniques of Automatic control & computer engineering - STA'2013, Sousse, Tunisia, December 20-22, 2013
157