Chapter 8 Simulation of Protocols -...

30
173 Chapter 8 Simulation of Protocols 8.1. Introduction In this chapter, we analyze the security and performance of proposed protocols: HDVP, RSA-DPAP, ECC-DPAP, PVDSSP and EDVP by using simulation results. The simulation results were realized by using Network Simulator (NS-2), MATLAB 9.0, Statistical Tool box and proposed verification protocols applied to outsourced data storage applications in cloud to show the security and performance of these verification protocols. For the sake of completeness, we implemented proposed protocols in windows. Our experiments are conducted on a system with an Intel Core 2 processor running at 2.4 GHz, 4GB RAM, and a 7200 RPM Western Digital 320 GB Serial ATA drive with an 8 MB buffer with. All programs are written with help of Pairing-Based Cryptography (PBC) library version 0.4.18, the crypto library of OpenSSL version 0.9.8h and Sobol_Data Set library. Our implementation utilizes storage services/application: Amazon Simple Storage Service (S3). Storage service: Amazon Simple Storage Service (S3) is a scalable, pay-per use online storage service. The Clients can store an unlimited amount of data, paying for only the storage space and bandwidth that they are using, without initial startup fee. The basic data unit in S3 is an object, and the basic container for objects in S3 is called a bucket. For example, objects contain both data and metadata. A single object has a size limit of 5GB, but there is no limit on the number of objects per bucket. Moreover, a small script on Amazon Elastic Compute Cloud (EC2) is used to provide the support for verification protocol and dynamic data operations. 8.2. Experimental Results In this section, we present and discuss the experimental results for security and performance of all our proposed protocols and compare the results. 8.2.1. Security Here, we conduct experimental results for testing the Integrity, Confidentiality and Availability of the data for data storage applications.

Transcript of Chapter 8 Simulation of Protocols -...

173

Chapter 8 Simulation of Protocols

8.1. Introduction

In this chapter, we analyze the security and performance of proposed protocols: HDVP,

RSA-DPAP, ECC-DPAP, PVDSSP and EDVP by using simulation results.

The simulation results were realized by using Network Simulator (NS-2), MATLAB 9.0,

Statistical Tool box and proposed verification protocols applied to outsourced data storage

applications in cloud to show the security and performance of these verification protocols.

For the sake of completeness, we implemented proposed protocols in windows. Our

experiments are conducted on a system with an Intel Core 2 processor running at 2.4 GHz, 4GB

RAM, and a 7200 RPM Western Digital 320 GB Serial ATA drive with an 8 MB buffer with. All

programs are written with help of Pairing-Based Cryptography (PBC) library version 0.4.18, the

crypto library of OpenSSL version 0.9.8h and Sobol_Data Set library. Our implementation

utilizes storage services/application: Amazon Simple Storage Service (S3).

Storage service: Amazon Simple Storage Service (S3) is a scalable, pay-per use online

storage service. The Clients can store an unlimited amount of data, paying for only the storage

space and bandwidth that they are using, without initial startup fee. The basic data unit in S3 is

an object, and the basic container for objects in S3 is called a bucket. For example, objects

contain both data and metadata. A single object has a size limit of 5GB, but there is no limit on

the number of objects per bucket. Moreover, a small script on Amazon Elastic Compute Cloud

(EC2) is used to provide the support for verification protocol and dynamic data operations.

8.2. Experimental Results

In this section, we present and discuss the experimental results for security and performance

of all our proposed protocols and compare the results.

8.2.1. Security

Here, we conduct experimental results for testing the Integrity, Confidentiality and

Availability of the data for data storage applications.

174

a)Integrity

To test the Integrity of data, we consider two parameters: Probability Detection and

Verification Time.

1) Probability Detection is the corruption of data should be detected with high probability as

soon as possible. The Probabilistic detection of data corruption assurance on data Integrity

increased with the iteration of the verification protocols. The main problem is the detection of

such corruption in less time.

2) Verification time is the time taken for probability detection of it being corrupted blocks,

less time is always preferable.

We simulated the proposed verification protocols and existing protocol with 100, 500 and

1000 node cloud network using NS2 to test the Integrity in terms of verification time for the

detecting data corruptions with high probability (99%) and compare the verification time of all

Integrity verification protocols. We assume that nodes that data is deleted or modified by the

store. Each simulation step is identifying the corrupted data on the verification time. We consider

the different randomly corrupt a percentage ranging from 1% to 10% of the data with 1GB of the

data file.

Fig. 8.1-8.3 presents the verification time (in seconds) for the detection of different data

corruptions range from 1% to 20% of 1GB file with 99% probability using proposed protocols

and Wang et al.[165] protocol in 100, 500 and 1000 node cloud network.

175

1% 5% 10% 15% 20%0

50

100

150

200

Data Corruption

Tim

e(S

)

Wang et al.[165](99%)

HDVP(99%)

RSA-DPAP(99%)

ECC-DPAP(99%)

PVDSSP(99%)

EDVP(99%)

Fig. 8.1 Comparisons of verification time between proposed protocols and Wang’s protocol

for the detection of different data corruptions with detection probability maintained at 99

percent in 100 nodes.

176

1 5 10 15 200

50

100

150

200

250

300

350

Data Corruption(%)

Tim

e(S

)Wang et al.[165](99%)

HDVP(99%)

RSA-DPAP(99%)

ECC-DPAP(99%)

PVDSSP(99%)

EDVP(99%)

Fig. 8.2 Comparisons of verification time between proposed protocols and Wang’s protocol

for the detection of different data corruptions with detection probability maintained at 99

percent in 500 nodes.

177

1 5 10 15 200

50

100

150

200

250

300

350

400

Data Corruption(%)

Tim

e(S

)

Wang et al.[165](99%)

HDVP(99%)

RSA-DPAP(99%)

ECC-DPAP(99%)

PVDSSP(99%)

EDVP(99%)

Fig. 8.3 Comparisons of verification time between proposed protocols and Wang’s protocol

for the detection of different data corruptions with detection probability maintained at 99

percent in 1000 nodes.

As observed from Fig. 8.1-8.3, the proposed protocols are very fast at detecting data

corruptions in cloud than Wang et al. [165]. In proposed protocols, the HDVP is useful for small

size applications and it is not suitable for large size data storage application when Clients having

less constrained resources. The RSA-DPAP suitable for large size data storage application but it

creates heavy overhead on processer due to large key size. Similarly, ECC-DPAP is more

suitable for small, medium and large size applications even when Clients having less constrained

resources(PDA, smart phones) and detects the corruptions faster than RSA-DPAP due to the less

key size. The PVDSSP is also useful for all types of applications and it takes very less

verification time to detect data corruptions. Finaly, the EDVP protocol detects the corruptions

more efficiently when compare to all above protocols.

178

Statistical Inference on Integrity of proposed protocols using one-way ANOVA

Consider a one-way ANOVA having experimental results of proposed verification protocols

for verification time to detect the data corruptions.

The hypothesis is assumed as follows:

Null hypothesis H0: There is no significant difference in the verification time of the

proposed algorithms tested.

Alternate hypothesis H1: there is significant difference between the verification times of

the proposed algorithms tested.

Table 8.1: ANOVA Table for Comparison of the Verification Time of Proposed Protocols No. of Nodes source SS df MS F Prob>F

100

Columns 12948.4 4 3237.1 11.59 0.0004932

Error 5585 24 279.25

Total 18533.4 28

500

Columns 35713.6 4 8928.39 9.86 0.0001

Error 18118.7 24 905.93

Total 53832.3 28

1000

Columns 47000 4 11750.01 7.26 0.0009

Error 32355.9 28 1671.79

Total 79355.9 28

SS: Sum of Squares, df: degrees of freedom, MS: mean square F:F-distribution, Prob: Probabability

The test statistic is the F value of 11.59, 9.86, and 7.26 from Table 8.1 for 100, 500 and

1000 nodes respectively. Using an α of .05, we have that F.05; 4, 24 = 2.87 from the F distribution

table. Since the test statistic is much larger than the critical value, we reject the null hypothesis of

equal verification time means and conclude that there is a (statistically) significant difference

among the verification times of proposed protocols. The p-value for 11.59, 9.86, and 7.26 are

0.000493, 0.0001, and 0.0009 respectively from Table 8.1, so the test statistic is significant at

that level.

The p-value returned by anova1 depends on assumptions about the random disturbances εij

in the model equation. For the p-value to be correct, these disturbances need to be independent,

179

normally distributed, and have constant variance. The ANOVA1 test is conducting by using

MATLAB 9.0 Statistical Tool Box.

Duncan Multiple Range Test

When null hypothesis is rejected, then a post hoc test can be conducted to identify which

groups having different mean. In this study, Duncan multiple range test was chosen. Duncan

multiple range test can maintain a low overall type I error and also can to be applied in groups

application that exhibit not significantly different. Duncan test uses a studentized range statistic

within a multiple stage test, referred to as a multiple range test.

To find which proposed protocol poorly performed, we compute the means of proposed

protocols using Duncan test at different nodes and presented in Table 8.2.

Table 8.2: Means of Proposed Protocols

No. of Nodes Mean

HDVP RSA-DPAP ECC-DPAP PVDSSP EDVP

100 74.24 53.27 39.12 24.09 8.58

500 126.28 91.51 68.56 48.38 14.83

1000 150.52 121.78 93.12 65.32 25.83

Form Table 8.2, the HDVP scheme performs poorly compared to other proposed methods.

The HDVP method (corresponding to the mean values: 74.24, 126.28, 150.52 for 100,500 and

1000 nodes respectively) yield more differences when compared with other methods. It is shown

that the mean difference of HDVP with other groups was greater from the least significant range.

This indicates that HDVP scheme is not an appropriate technique to be applied for the large

datasets.

Now, we compare the verification time of HDVP scheme existing Wang‘s [165] scheme

using Statistical Inference.

Statistical Inference on Integrity of HDVP and Existing Protocol using one-way ANOVA

Consider a one-way ANOVA having experimental results of different types of verification

protocols for verification time to detect the data corruptions.

180

The hypothesis is assumed as follows:

Null hypothesis H0: There is no significant difference in the verification time of the

HDVP and Wang et al.[165] tested.

Alternate hypothesis H1: there is significant difference between the verification time of

HDVP and Wang et al.[165] tested.

Table 8.3: ANOVA Table for Comparison of the Verification Time of HDVP and Existing Scheme

No. of Nodes source SS df MS F Prob>F

100

Columns 9261.1 1 9261.07 8.8 0.018

Error 8419.8 8 1052.48

Total 17680.9 9

500

Columns 20691.6 1 20691.6 5.42 0.048

Error 30549 8 3818.6

Total 51240.6 9

1000

Columns 30759.2 1 30759.2 5.71 0.043

Error 43093.6 8 5386.7

Total 73852.8 9

The test statistic is the F value of 8.8, 5.42and 5.71 from Table 8.3 for 100, 500 and 1000

nodes respectively. Using an α of .05, we have that F.05; 1, 8 = 5.32 from the F distribution table.

Since the test statistic is much larger than the critical value, we reject the null hypothesis of equal

verification time means and conclude that there is a (statistically) significant difference among

the verification times of HDVP and Wang‘s[165] scheme. The p-value for 8.8, 5.42and 5.71 are

0.018, 0.048, and 0.043 respectively from Table 8.3, so the test statistic is significant at that

level.

b)Availability

To test the Availability of data, we consider the following two parameters: Availability

percentage, and the encoding time. The Availability percentage of data means how much

percentage of data should be available with different number of redundancy blocks. Encoding

time is the time taken to generate redundancy blocks. It is by no means clear that proposed

approach to encode the data will work well in practice. Too many redundancy data blocks could

be created due to partial or incorrect information and due to multiple servers acting

181

simultaneously. The overhead of the state monitoring is required guiding the generation of

redundancy data blocks.

We simulated the 100, 500 and 1000-node cloud network with unlimited storage space using

NS2 to test the Availability of data. The nodes join or leave with a specific probability; we

assume that nodes that fail or leave the network lose the data blocks they store. In simulation, at

each step:

A certain number of nodes go down(depending on the probability of nodes being up)

The percentage of nodes that are up check for available blocks the file can contain. If

they need, they create more redundancy blocks.

We built simulations for proposed protocols with two parameters: 1) evaluate the

Availability percentage data using proposed protocols 2) compare encoding time of different

erasure codes are used in existing and proposed schemes. In Fig, 8.4 - 8.6 each plot corresponds

to different node Availability percentage with 100, 500 and 1000 node network using erasure

codes.

(20,0) (20,4) (20,8) (20,12) (20,16) (20, 20)0

20

40

60

80

100

Data Blocks(m,n)

Availab

ilit

y(%

)

Fig. 8.4 the Availability of data with 100 node network using different

redundancy blocks

182

(20,0) (20,4) (20,8) (20,12) (20,16) (20,20)0

20

40

60

80

100

Data Blocks(m,n)

Availab

ilit

y(%

)

Fig. 8.5 the Availability of data with 500 node network using different

redundancy blocks

(20,0) (20,4) (20,8) (20,12) (20,16) (20,20)0

20

40

60

80

100

Data Blocks(m,n)

Ava

ilab

ility

(%)

Fig. 8.6 the Availability of data with 1000 node network using different

redundancy blocks

183

By observing Fig. 8.4 -8.6, as soon as increasing more redundancy blocks, the Availability

of data guarantee are increasing when increasing the number of nodes in the network also.

In Fig 8.7, we compare encoding performance of different erasure codes to get the

Availability of the data in less time. From the Fig. 8.7, we can see that Tornado codes giving

better performance than all other erasure codes and Cauchy Reed-Solomon codes are giving best

performance than remaining erasure codes. Hence, we are using Tornado codes and Cauchy

Reed-Solomon codes for large size and small size storage applications respectively.

1 2 3 4 50

1000

2000

3000

4000

5000

6000

File Size(GB)

Tim

e(S

)

Tornado Code

Cauchy Reed-Solomon Code

Vandermonde Reed-Solomon Code

Server Code

Dispersal ECC

Fig. 8.7 Encoding Performance of different Erasure Codes

Statistical Inference on encoding cost of different erasure codes using one-way ANOVA

Consider a one-way ANOVA having experimental results of different types of different

erasure codes for encoding the data on EC2 database.

The hypothesis is assumed as follows:

Null hypothesis H0: There is no significant difference in the encoding cost of the erasure

codes tested.

184

Alternate hypothesis H1: there is significant difference between the encoding costs of the

erasure codes tested.

Table 8.4: Comparison of the Encoding Time of Different Erasure Codes

SS: Sum of Squares, df: degrees of freedom, MS: mean square F:F-distribution, Prob: Probability

The test statistic is the F value of 3.43, from Table 8.4. Using α of .05, we have that

F.05; 4, 20 = 2.866, from the F distribution table. Since the test statistic is larger than the critical

value, we reject the null hypothesis of equal encoding time means and conclude that there is a

(statistically) significant difference among the encoding time of erasure codes. The p-value for

3.43 is 0.0273 from Table 8.4, so the test statistic is significant at that level.

The HDVP, RSA-DPAP, ECC-DPAP and PVDSSP schemes have the lowest available

bandwidth, where initially all files is stored, so at the beginning of the simulation the Availability

of data is the worst. With encoding of data, the Availability percentage of data improves

drastically, since the most popular files become available on the local storage. Similar behavior

can be seen on the proposed protocols, which have a much better available bandwidth to cloud

applications.

c) Confidentiality

To test the Confidentiality of data, we consider the two parameters: data is not disclosed to

attackers and compares the time for encrypting the data. We simulate a cloud network with

100,500 and 1000 nodes randomly for the testing the Confidentiality of the data using NS2. We

verify the claim that encryption techniques in ECC-DPAP and PVDSSPs guarantees the ∆T -

Confidentiality property with probability almost one when 0ktT is sufficiently long. To this

end, we randomly pick two nodes on the cloud and consider one as the source and the other as

185

the data collector. By applying ECC-DPAP and PVDSSPs between the two nodes, we can

evaluate the quantity 0Fkt , 1/N for different values of k. As shown in Fig. 8.8-8.10 with different

node network, this probability converges very fast to zero with an increasing k, according to both

simulation and analytical results.

1 2 3 4 5 6 7 8 9-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

k

Pro

bab

ilit

ies

Simulation

Analysis

Fig. 8.8 The probability of breaking the Confidentiality of k measurements from a

given 0Fkt , 1/N with 100 nodes

186

1 2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

k

Pro

bab

ilit

ies

Simulation

Analysis

Fig. 8.9 The probability of breaking the Confidentiality of k measurements

from a given node 0Fkt , 1/N with 500 nodes.

1 2 3 4 5 6 7 8 90

0.1

0.2

0.3

0.4

0.5

k

Pro

bab

ilit

ies

Simulation

Analysis

Fig. 8.10 The probability of breaking the Confidentiality of k measurements from a

given 0Fkt , 1/N with 1000 nodes

187

Fig. 8.8-8.10 shows the simulation results, as well as the analytical results of

Confidentiality of data with 100, 500 and 1000 nodes respectively. It is observed that the

approximated results obtained from theoretical analysis match the simulation results perfectly.

The second observation is that the false guess probability P in most cases is zero. Does this mean

that the ECC-DPAP and PVDSSP Schemes almost-zero false guess probability leaks more useful

information to the attackers? The answer is definitely NO. Although the almost-zero false guess

probability reflects a fact that the number of false guesses 0Fkt , 1/N is zero, it ignores the fact

that the total number of matched guesses 0Fkt is almost zero as well.

In the this experiment, we have also shown that, for any application-specific objective ∆ ≥

1/N , the Confidentiality of the stored data can be safeguarded with probability almost equal to

one. Of course, this probability is the one that an attacker might overhear the data, it does not

mean that the data can be compromised because the data blocks are encrypted as well. Again,

this verifies that proposed idea makes it harder to an attacker for collect enough data to break the

secret. Although this seems to require that a sufficiently high number of measurements (or

equivalently long period T ) are of interest, the experimental values show that even very short

sequences (e.g., T = 5t0 ) of measurements originating from a single source node can be

protected with probability fast approaching one. This is achieved thanks to the encryption,

resulting in particularly robust operation even when approximately 60% of the nodes are

compromised by the attacker as shown in Fig. 8.11.

188

10 20 30 40 50 70 80 1000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

no. of nodes

Pro

bab

ilit

y

empirical pdf 1

Analytical pdf

empirical pdf 2

empirical pdf 3

Fig. 8.11 Stationary distributions of the number of correct nodes

Next, we measure the time to encrypt the data with different file sizes form 1GB-5GB using

proposed protocols: ECC-DPAP , PVDSSP and existing protocols: Hao et al.[65] and Barsoum

et al.[19] as shown in Fig 8.12.

In Fig 8.12, the focus of this simulation is to employ and tune encrypt the data with low-cost

to balance the Confidentiality and performance in cloud systems. An arising question is why not

broadcast encrypted data directly which will eliminate most of if not all, Confidentiality

concerns. The main reason is that the encryption performed at the thin Clients (e.g., PDA or

mobile phone) is very expensive. While encryption/decryption can address some security

concerns, it deteriorates performance significantly. In order to demonstrate the inefficiency of

encryption/decryption operation and the significant improvement brought by encryption

schemes, we implement the algorithms in a real mobile device, PDA.

189

1 2 3 4 50

500

1000

1500

2000

2500

3000

3500

4000

4500

File Size(GB)

Tim

e(S

)

PVDSSP

ECC-DPAP

Hao et al.[65]

Barsoum et al.[19]

Fig. 8.12 Time for the Encrypting the Data using Different Protocols

In all four schemes, the Client is supposed to encrypt the data, which are much larger size.

Fig. 8.12 shows that the proposed protocols ECC-DPAP and PVDSSP are more suitable for the

thin Clients( having less constrained resources) when working with large size applications

whereas existing protocols Hao et al.[65] and Barsoum et al.[19] are not suitable for the thin

Clients when dealing with large size applications.

Statistical Inference on Encryption cost of different protocols using one-way ANOVA

Consider a one-way ANOVA having experimental results of different types of protocols for

encrypting the data on EC2 database.

The hypothesis is assumed as follows:

190

Null hypothesis H0: There is no significant difference in the encryption cost of the

different protocols tested.

Alternate hypothesis H1: there is significant difference between the encryption costs of

the different protocols tested.

Table 8.5: Comparison of Encryption Time of Different Protocols

SS: Sum of Squares, df: degrees of freedom, MS: mean square F:F-distribution, Prob: Probabability

The test statistic is the F value of 5.34, from Table 8.5. Using α of .05, we have that

F.05; 3, 16 = 3.23, from the F distribution table. Since the test statistic is larger than the critical

value, we reject the null hypothesis of equal encryption time means and conclude that there is a

(statistically) significant difference among the encryption times. The p-value for 5.34 is 0.0097

from Table 8.5, so the test statistic is significant at that level.

We can get some graphical assurance that the means are different by looking at the box plots

in the Fig. 8.13 displayed by anova1.

191

500

1000

1500

2000

2500

3000

3500

4000

4500

1 2 3 4

Protocols

Tim

e(S

)

Fig. 8.13 Encryption Cost of Different protocols by ANOVA

1: PVDSSP, 2: ECC-DPAP, 3: Hao et al.[65] 4: Barsoum et al.[19].

The summary of security of proposed protocols are, the HDVP is provide better security

compared to other protocols when the application size is small and demands private verifiability

without requiring Confidentiality of data. The RSA-DPAP is suitable for large size applications

when application demands public verifiability without requiring Confidentiality of data.

Similarly, The ECC-DPAP suitable for all types of applications when an application demands all

three basic security requirements: Confidentiality, Integrity and Availability. This protocol is

more suitable for the thin Clients (e.g., a PDA or a cell phone). The ECC-DPAP is also suitable

for the all types of applications even if Clients having less computing resources where

applications demand all three security requirements with public verifiability and without any

storage overhead for the Clients.

192

8.2.2. Performance

In this section, we present and discuss the experimental results for the performance of our

research and compare the results of proposed schemes (HDVP, RSA-DPAP, ECC-DPAP, and

PVDSSP) from different perspectives: Computation Cost, Communication cost and Storage Cost.

a) Computation Cost

To test the computation cost of proposed verification protocols, we considered the three

parameters: Client, Verifier and CSP computation costs.

Client Computation Cost

Here, we are measuring computation cost of Client for generating the metadata. Table 8.6

presents the computation cost for generating metadata for proposed schemes using different file

sizes. The metadata generation time of the PVDSSP scheme is the lowest one, and this is because

it uses linear code. Moreover, the metadata generation time is unlikely to have significant impact

on the overall system performance because the metadata generation task is done only once

during the files life time which may be for tens of years.

Table 8.6: Metadata Generation Cost(S) of Client in Proposed protocols

File Size PVDSSP ECC-DPAP RSA-DPAP HDVP

1GB 165.11 191.91 335.15 491.05

2GB 186.06 230.66 372.22 538.35

3GB 208.71 276.02 405.02 581.36

4GB 230.52 310.86 442.45 635.85

5GB 251.06 353.19 485.1 697.92

We can get some graphical assurance that the means are different by looking at the box plots

in the Fig. 8.14.

193

200

300

400

500

600

700

1 2 3 4

Protocols

Co

mp

uta

tio

n T

ime(M

S)

Fig. 8.14 Computation cost of the Client in Proposed Protocols

1: PVDSSP, 2: ECC-DPAP, 3: RSA-DPAP, 4: HDVP

Verifier Computation cost

Table 8.7 presents the verifier computation times to check the responses received from the

CSP. The Secret sharing schemes has the shortest verification time among the four proposed

schemes. As illustrated in Table 8.7 proposed schemes have a very tiny increase in the verifier

computation time with increasing file sizes.

Table 8.7: Computation Cost(S) of the Verifier in Proposed Protocols

Data Size PVDSSP ECC-DPAP RSA-DPAP HDVP

20KB 110.21 165.11 294.04 396.21

40KB 132.64 182.05 321.22 445.11

60KB 153.46 203.09 352.91 491.22

80KB 177.32 222.45 385.12 540.34

100KB 201.42 251.71 420.25 595.52

194

We can get some graphical assurance of the verification time that the means are different by

looking at the box plots in the Fig 8.15.

100

200

300

400

500

600

1 2 3 4

Proposed Protocols

Co

mp

uta

tio

n T

ime(M

S)

Fig. 8.15 Computation cost of the Verifier in Proposed Protocols

1: PVDSSP, 2: ECC-DPAP, 3: RSA-DPAP, 4: HDVP

CSP Computation Cost

Table 8.8 presents the CSP computation times (s) to compute the Integrity proof for the

challenged blocks. Of course, the computation cost of HDVP is the largest one; it computes the

proof using universal hash functions, and thus provides the strongest guarantee.

Table 8.8 Computation Cost(S) of the CSP in Proposed Protocols

Data Size PVDSSP ECC-DPAP RSA-DPAP HDVP

20KB 80.76 120.9 204.4 294.4

40KB 96.8 144.31 235.05 341.76

60KB 113.04 162.07 263.23 373.43

80KB 128.07 186.33 298.44 405.9

100KB 146.02 209.74 323.19 441.08

195

We can get some graphical assurance of CSP computation time that the means are different

by looking at the box plots in the Fig 8.16.

100

150

200

250

300

350

400

450

1 2 3 4

Proposed Protocols

Co

mp

uta

tio

n T

ime(M

S)

Fig. 8.16 Computation cost of the CSP in Proposed Protocols

1: PVDSSP, 2: ECC-DPAP, 3: RSA-DPAP, 4: HDVP

Statistical Inference on computation Cost of proposed protocols using one-way ANOVA

Consider a one-way ANOVA having experimental results of different types of proposed

verification protocols for computation time of Client, TPA and CSP tested on EC2 database.

The hypothesis is assumed as follows:

Null hypothesis H0: There is no significant difference in the computation cost of the

proposed algorithms tested.

Alternate hypothesis H1: there is significant difference between the computations costs of

the proposed algorithms tested.

196

Table 8.9: ANOVA Table for Comparison of the Computation cost of Proposed Protocols

Computation Source SS df MS F Prob>F

Client

Columns 424102.3 5 141367.4 37.28 0.009299

Error 60833.8 13 3802.1

Total 484936.1 18

TPA

Columns 352786.5 5 117595.5 42.56 0.007494

Error 44209.3 16 2763.1

Total 396995.8 21

CSP

Columns 195734.3 3 65224.8 35.52 0.0026487

Error 29390.7 16 1836.9

Total 225125 19

SS: Sum of Squares, df: degrees of freedom, MS: mean square F:F-distribution, Prob: Probabability

The test statistic is the F value of 37.28, 42.56, and 35.52 from Table 8.9 for 100, 500 and

1000 nodes respectively. Using an α of .05, we have that F.05; 5, 13 = 3.025, F.05; 5, 16 = 2.85 and

F.05; 3, 16 = 3.23 from the F distribution table. Since the test statistics are much larger than the

critical values, we reject the null hypothesis of equal computation cost means and conclude that

there is a (statistically) significant difference among the computation costs of Client, TPA and

CSP in proposed protocols. The p-values for 37.28, 42.56, and 35.52 are 0.009299, 0.007494,

and 0.002648 respectively from Table 8.9, so the test statistic is significant at that level.

b) Storage Cost

Table 8.8 shows the storage overhead of the verifier for proposed schemes with different file

sizes. The way we aggregate the metadata makes PVDSSP scheme to have the lowest storage

overhead on the verifier side.

Table 8.10: Storage Cost (B) of the Verifier in Proposed Protocols

Data Size PVDSSP ECC-DPAP RSA-DPAP HDVP

1GB 145.76 180.90 235.40 295.40

2GB 162.80 221.31 285.050 361.76

3GB 185.04 262.07 341.23 433.43

4GB 214.07 306.33 408.44 515.90

5GB 245.02 359.74 473.19 609.08

197

We can get some graphical assurance that the means are different by looking at the bar

charts in the Fig. 8.17.

2 4 6 8 100

100

200

300

400

500

600

700

File Size(MB)

Sto

rag

e C

ost(

B)

PVDSSP

ECC-DPAP

RSA-DPAP

HDVP

Fig. 8.17 Storage cost of the Verifier in Proposed Protocols

Statistical Inference on Verifier Storage Cost of the proposed protocols using one-way

ANOVA

Consider a one-way ANOVA having experimental results of different types of proposed

verification protocols for storage overhead of the verifier tested on EC2 database.

The hypothesis is assumed as follows:

Null hypothesis H0: There is no significant difference in the verifier storage costs of the

proposed algorithms tested.

198

Alternate hypothesis H1: there is significant difference between the verifier storage costs

of the proposed algorithms tested.

Table 8.11: Comparison of the Storage cost of verifier in Proposed Protocols

SS: Sum of Squares, df: degrees of freedom, MS: mean square F:F-distribution, Prob: Probabability

The test statistic is the F value of 15.47, from Table 8.11. Using α of .05, we have that

F.05; 3, 16 = 7.65, from the F distribution table. Since the test statistic is larger than the critical

value, we reject the null hypothesis of equal storage cost means and conclude that there is a

(statistically) significant difference among the storage cost of verifier in proposed protocols. The

p-value for 15.47 is 0.0022 from the Table 8.11, so the test statistic is significant at that level.

c) Communication Cost

The communication costs of proposed protocols are illustrated in Fig. 8.18. From the Fig.

8.18, we can see that the HDVP scheme has the highest communication overhead. On the other

hand, PVDSSP scheme has lowest communication cost among the four schemes. Hence,

proposed schemes are much more practical especially when the available bandwidth is limited

and there are millions of verifiers who need to audit their data files over the CSPs.

199

1 2 3 4 5 6 7 8 9 100

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

File Size(GB)

Co

mm

un

icati

on

Co

st(

Byte

s)

HDVP

RSA-DPAP

ECC-DPAP

PVDSSP

Fig. 8.18 Communication Cost of Proposed Protocols

Statistical Inference on Communication Cost of the proposed protocols using one-way

ANOVA

Consider a one-way ANOVA having experimental results of communication cost of

different types of proposed verification protocols tested on EC2 database.

The hypothesis is assumed as follows:

Null hypothesis H0: There is no significant difference in the communication costs of the

proposed algorithms tested.

Alternate hypothesis H1: there is significant difference between the communications

costs of the proposed algorithms tested.

200

Table 8.12: Comparison of the Computation cost of Proposed Protocols

SS: Sum of Squares, df: degrees of freedom, MS: mean square F:F-distribution, Prob: Probabability

The test statistic is the F value of 18.52, from Table 8.12. Using α of .05, we have that

F.05; 3, 36 = 2.866, from the F distribution table. Since the test statistic is larger than the critical

value, we reject the null hypothesis of equal storage cost means and conclude that there is a

(statistically) significant difference among the communication cost of proposed protocols. The

p-value for 18.52 is 1.95669e-007

from the Table 8.12, so the test statistic is significant at that

level.

The summary of performance of the proposed protocols is, the PVDSSP protocol gives best

performance among the proposed protocols considered here; it surpasses the number of

perspectives: verifier storage overhead cost, metadata generation cost, communication cost, and

computation cost. To conclude that a HDVP is suitable for small applications, RSA -DPAP is

useful for large applications even Clients having less capable of resourcing power where the

application demands public verifiability and an efficient data dynamic support. Besides, if the

computation cost over the server side is less important (the CSP has unlimited computational

resources), the Client needs the strongest guarantee that data is intact, and the verification

process is done in a constrained environment with limited bandwidth and limited verifier‘s

computational power (e.g., a PDA or a cell phone), then ECC-DPAP and PVDSSP are the best

choice to be applied in such circumstances.

201

Table 8.13: Salient features of the Proposed Protocols

Parameters/Protocols HDVP RSA-DPAP ECC-DPAP PVDSSP

Integrity yes Yes Yes Yes

Availability Yes Yes Yes Yes

Confidentiality no no yes Yes

Public Verifiability no Yes yes Yes

Data Dynamics Partialy Yes Yes Yes

Probability Detection O(N-1) O(N-1) O(N-1) O(N-1)

Storage Overhead for the Clients

yes yes yes no

Security of Dynamic Data Operations

No Possibility of

replay attacks Possibility of

replay attacks Yes

Overall Security N/A IF ECDL CDH, DDH

Server Computation O(1) O(1) O(1) O(1)

Verifier computation O(1) O(1) O(1) O(1)

Verifier Storage Overhead

O(1) O(1) O(1) O(1)

Size of MetaData O(n) O(n) O(n) O(1)

N/A: No Assumption, IF: Integer Factorization, DH: Diffie–Hellman, CDH: Computational

DH, ECDL: Elliptic Curve Discrete Logarithm (ECDL). DDH: Decisional DH.

The EDVP is RSA-DPAP, ECC-DPAP and PVDSSP in a distributed manner, so all the

characteristics of these protocols also applicable to EDVP.

202

8.3. Summary

In this chapter, we have presented and discuss the simulation results of proposed protocols,

which consist of security and performance results and compared the results with existing

schemes. In security results, we have tested the Integrity, Availability and Confidentiality of the

proposed protocols. In performance results, we have tested the computation cost, communication

cost and storage cost. We have also evaluated the both results with statistical tests using one-way

ANOVA and proved that there is a security and efficiency difference between the proposed

protocols and existing protocols. By considering experimental results, we conclude that

Homomorpic Distribution Verification Protocol (HDVP) is useful when an application needs

Availability and Integrity of data through private verifiability. RSA-based Dynamic Audit

Protocol (RSA-DPAP) is useful where an application demands Integrity and Availability of data

with efficient dynamic data operations through public verifiability. ECC-based Dynamic Public

Audit Protocol (ECC-DPAP) is useful where application needs Confidentiality, Availability

and Integrity of data efficiently and mainly it is suitable for resource constrained mobile devices

in cloud computing like PDA, Smart-Cards and note books. A Publicly Verifiable Dynamic

Secret Sharing Protocol (PVDSSP): this protocol is useful where an application needs all three

security properties refer to Availability, Integrity and Confidentiality of data with lightweight

communication and without any storage cost for the Clients to maintain the encryption key at

locally. An Efficient Distributed Verification Protocol (EDVP) is useful when application

needs faster execution time to validate the Integrity of data through multiple verifiers.

The main salient features of proposed protocols are given in Table 8.13.