MOVING TARGET WITH LOAD BALANCING IN A HIERARCHICAL...

International Journal of Cloud Computing (ISSN 2326-7550) Vol. 2, No. 3, July - September, 2014

MOVING TARGET WITH LOAD BALANCING IN A HIERARCHICAL CLOUD

Hong Liu, Johnson Thomas and Praveen Khethavath Department of Computer Science

Oklahoma State University Stillwater, USA

[email protected]

Abstract In this paper we propose a ‘moving target’ security mechanism for a P2P cloud where files are partitioned and sensitive sections are moved at different times without modifying the routing or finger tables, to reduce the risk of the file being compromised. Two drawbacks with this approach are the problem of determining the locality of the data and load unbalancing. We present a hierarchical P2P cloud system that leads to scalability and efficiency. A 3-step load balancing scheme for hierarchical P2P cloud system to globally balance the network is proposed. Our simulation results show that our algorithm is effective in achieving load balancing in hierarchical peer-to-peer cloud systems. Keywords: moving target; load balancing; cloud

__________________________________________________________________________________________________________________

1. INTRODUCTION The cloud serves as a large storage repository for user

files and data. One of the big problems is security of the

files. If data or files can be accessed by attackers, the service

provider will lose trust from its users, and the leakage of

sensitive data or files could cause great damage. Attacks can

be directed at the routing, searching and storing

mechanisms. Techniques such as encryption are typically

used for securing the storage [10]. In this paper, we propose

a ‘moving target’ approach as a compliment to existing

approaches. The idea is to move critical files to a different

location so that even if an attacker breaks into the system,

the target will be stored in a different location. This gives

the attacker no option, but to guess and attack at a different

location. There is an overhead in moving files, but only files

that require high security will be moved. A question not

addressed in this work is the timing of the transfer. Should

files or data could be moved at regular intervals or randomly

or only when some suspicious event triggers. This is not

covered in this paper and will be the topic of future work.

P2P cloud systems are increasing in popularity, making

it possible to harness the computing power and resources of

large populations of network in a cost-effective manner.

Currently, most P2P cloud systems are flat with all nodes

having the same functionalities. These flat P2P cloud

systems are limited when it comes to scalability [8].

Searching for nodes or files will take time. Moreover, since

they lack a centralized administrative entity that controls the

node resources, ensuring high levels of availability,

performance and security becomes difficult. We propose a

hierarchical peer-to-peer cloud (HP2PC) network model

which is scalable, efficient and secure. Data security which

is achieved by the moving target approach exposed in this

paper requires fair load distribution among all nodes for

efficiency and performance reasons. Not only is there an

internal transfer of files for security reasons, but cloud users

will be adding and deleting files.

Our contribution is two-fold:

A moving target defense approach for storage in a hierarchical P2P cloud. This is achieved without modifying the routing tables.

A load balancing mechanism caused by file transfers and user file updates in a hierarchical P2P cloud.

Figure 1. Hierarchical P2P Cloud Network

A file that needs to be securely stored is divided into

multiple portions. The goal of the division is to

compartmentalize parts of the file that need to be securely

stored so that the sensitive sections are moved more often.

Parts which contain little or no sensitive data or code can be

left at their original locations with little or no transfers to

other locations. Our proposed approach is divided into the

following steps:

id: 0111 prefix: 2 bits suffix: 2 bits predecessor: 1111 successor: 1011 supernode:1111 load: x capacity: y interval: (3,1]

1110

1001

0010 1111

0000

0101

0110

1010

0111

1011

1000

Level 0

Level 1

Level 1 Level 1

Level 1

1101

id: 0010 prefix: 2 bits suffix: 2 bits predecessor:1001 successor: 1111 load: x capacity: y interval: (1,2]

Super-node Leaf node

Information table

id load capacity

0010

0110

1010

1110

International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year

The partitioned files are randomly distributed across the cloud. A node will store only one part of a file (section 3).

Load Balance the hierarchical P2P cloud (section 4)

Move the security sensitive files at regular intervals, or randomly or only when some suspicious event triggers. This is not discussed in this paper.

Our hierarchical model is shown in Fig. 1. Red nodes represent leaf nodes and yellow nodes represent super-nodes. There are four groups in the level 1 network, and the super-nodes are 1, 2, 3, and 4. These four super-nodes constitute the level 0 network.

A literature review of previous work is presented in the next section. Our proposed approach is outlined in section 3 and section 4. Section 5 is about routing schema which is followed by the moving target defense mechanism in section 6. The load balancing scheme is described in section 7 before paper concludes.

2. LITERATURE REVIEW

2.1 SECURE FILE STORAGE IN P2P CLOUD SYSTEMS Much of the previous works for cloud security focused

on cryptographic schemes and data integrity. Many of the

cryptographic schemes have been proposed for hiding the

data from the storage provider and hence preserving data

privacy [9] [10]. Wang et al. in [9], presented a scheme in

which, the user’s identity is also detached from the data and

provide public auditing of data. In [10], Dijk et al. proved

that in cloud computing individual cryptographic

measurement is insufficient for guaranteeing data privacy.

The problem of ensuring the integrity of data storage in

cloud computing is studied in [11] and [12]. In [11],

Lamport et al. presented provable data integrity (PDI)

solution to support public data integrity verification. Wang

et al. in [12] proposed a scheme to prove the integrity of the

data dynamically stored in cloud systems.

Concerns arise in schemes of cloud storage services that

with given a sufficient amount of time, data can be

decrypted, meaningful information can be located and

retrieved and user privacy can easily be breached. To solve

this problem, Condie et. al. [14] periodically reset the

routing tables by using induced churn where different nodes

enter and leave the address space. This reduces the chances

of hitting on a specific target. However, if an attacker is able

to access the router, he will notice the change in the routing

table and be able to deduce that files have changed

locations. In contrast, in our method, critical files or data are

moved, but the routing or finger table does not change. An

attacker is thus not able to detect, even if he breaks into the

network router, that the target has been moved. In [14], the

routing is constrained and an inefficient path may be

chosen. In our approach we aim for efficient routing, but

without modifying the routing table. Hence an attacker can

attack the routing table, but not be able to detect that the

target has moved.

2.2 LOAD BALANCING STRATEGIES Several load balancing approaches have been proposed

for P2P cloud systems. In [1], Rieche et al. presented an

algorithm to balance load in distributed hash table (DHT)

based on a thermal dispersion scheme. All intervals in the

identifier space are managed by a minimum number f and a

maximum number 2f. Each node belonging to the interval

stores all documents assigned to the interval. Load

balancing can be done by splitting, merging, and shifting the

interval. However, this approach has a limitation since it

requires each file has copies on all nodes belonging to the

interval. Furthermore, in this scheme there are still some

nodes having a load up to twice above the optimum. The

load is defined as the number of documents it stores, and

they focus on the distibution of documents among the nodes.

However, we define the load as the ratio of the current

workload to the capability of the node, since each node in

the system cannot have the same capacity. In addition, the

framework in this paper is hierarchical. Results show that in

our approach for each node under a certain amount of load,

load fluctuations are relatively small. Consequently, load

balancing improves significantly using our approach.

Stoica et al. in [4], proposed the concept of virtual

servers to address the load balancing issue by having each

node simulate a logarithmic number of virtual servers. As a

result, the overloaded node needs to transfer some of its

virtual nodes to the under loaded node to achieve load

balancing. The limitation of this approach is that as more

nodes join in the system, these virtual servers consume more

resources. Aberer et al. in [5] tried to balance the load in a

DHT by checking its load with its neighbor nodes. In the

system, each node repeatedly checks the load information of

its neighbor nodes to achieve load balancing. Although this

method is able to achieve load balancing when the system is

in a steady state, there is no guarantee of load balance when

the system is in a dynamic state because load balancing is

only done locally between neighbor nodes. Zoels et al. in [6]

proposed an algorithm to balance a hierarchical system.

First, peers contact a predefined superpeer when they join

the system. Second, an algorithm is used to determine a

super-node for the new peer. As a result, all super-peers

have an equal load. This method has a limitation since it

considers super-nodes only.

A load balancing scheme for a flat decentralized

architecture is proposed in [15]. In our work, we use a

hierarchical architecture. When compared to the flat

network, the hierarchical architecture offers exploiting

heterogeneous peers, transparency, faster lookup time, and

less messages in the wide-area [3]. Moreover, the work in

[15] does not consider defending against malicious

participants, but we use a moving target security mechanism

to reduce the risk.

3. PARTITIONED FILE DISTRIBUTION


In this paper we assume that partitioned files are

randomly distributed across the P2P cloud system. The

file/document is broken into multiple pieces or fragments.

Some sections may be more critical. We are particularly

interested in the critical pieces of code. These are the

sections that will be moved.

If there are few fragments to a file, randomly distribute

one file per ring. If there are many fragments, then there will

be at least one file per node, hence many files per P2P ring.

Each file is broken into k parts, where k may be different for

each file. Depending on the number of fragments of a file,

the files will be distributed across the nodes in an individual

ring, a number of rings that form a sub-part of the cloud and

are physically located next to each other, or distributed

across the whole P2P cloud system.

Files fragments are distributed randomly across the

cloud. In the moving target approach, the files are moved to

different storage locations or nodes in the cloud. In this

paper, we assume that as a file is fragmented, and the

critical parts are moved, the critical fragments of the file

have to be accessed for the attack to be completely

successful. However, breaking into some of the files may

provide some information, so our security condition is not

strict. Security in our moving target model is therefore

measured as follows: the lower the probability of

successfully accessing all the critical partitions of a file that

has been moved, the more secure the entire file is. The goal

of the moving target approach is to ensure that the target

will have changed from the attacker’s view.

The probability that all the critical fragmented files can

be accessed depends on the number of possible

combinations possible for moving the files. We assume that

the cloud contains many resources and there will be only

one fragment per node.

Assume there are p nodes across which the fragments

are to be distributed. Of these p nodes, n nodes do not have

any fragments of the file and are available to accommodate

one fragment at the most. Let k be the total number of

fragments of the file of which r are to be moved. In this

paper, fragments are moved to a node which has no other

fragment of the file. Given n available nodes, and r

fragments, the number of possible combinations for storing

r fragments on n nodes is:

( )

The number of possible combinations given that there

are k fragments, of which r are to be moved to n available

nodes is:

( ) ( )

(1)

If an attacker is able to access all k fragments with a

probability of 1, then the probability after the files have

been moved is shown below (see Fig. 2).

Figure 2 shows that moving all the files or moving very

few files reduce the effectiveness of the moving target

approach. If few fragments are moved, then the attacker

does not have to modify his strategy much. On the other

hand moving most of the fragments not only introduce a lot

of overhead, but it also suggests that the fragments can be

moved to limited places only. The most secure (or lowest

probability) is therefore to move an intermediate number of

fragments. In this paper files refer to file fragments.

Figure 2. Moving Target Probability

4. LOAD BALANCING The moving target model moves fragments to improve

security. This means that load balancing becomes critical,

not only because the user keeps changing the load in the

cloud, but also because of the moving target model. In this

section, we propose a new approach to load balancing in

HP2PC systems. This approach focuses on a 2-level P2P

cloud network; however, it can be easily applied to a n-level

P2P cloud network. Our proposed approach to load

balancing consists of five steps:

accumulate load information in the whole system;

node classification. According to their utilization,

nodes are classified into overloaded nodes,

underloaded nodes, or neutral nodes;

network balance;

load balancing within the level 1 network;

load balancing within the level 0 network.

Our main contribution is a novel load balancing strategy

for HP2PC networks, which can effectively control the

amount of load imbalance across the network to globally

balance the load. First, we define the load for a node, a

supernode, and a group. Next, three strategies are presented

to balance the load among nodes, supernodes and groups

along with the algorithms for each strategy. Simulation

results show that our algorithm is effective in achieving load

balancing in HP2PC systems.

4.1 HASHING SCHEME We used hashing to locate data or files as in traditional

P2P systems. A hashing function takes a search key as an

-0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

0 2 4 6 8 10

Pro

bab

ility

No of fragments moved

4 fragments

8 fragments


argument and computes from it an integer in the range 0 to

B – 1, where B is the identifier space. If a node has search

key K, then we put the node in the identifier space for the

position h(K), where h is the hash function. A common

choice of hash function when keys are integers is to

compute the remainder of K/B, where K is the key value.

For character-string search keys, we treat each character as

an integer, sum these integers, and take the remainder when

the sum is divided by B. For example, key is a n byte

character string (key = ‘x1x2x3……xn’). We sum these

characters as integers (sum = x1 + x2 + x3……+ xn), and

compute sum modulo B. Common hash functions include

[7].

4.2 MEASUREMENT OF LOAD BALANCING In hierarchical P2P cloud networks (see Fig. 1):

nodes are organized in groups; each group has a supernode and consists of leaf nodes belonging to the same supernode;

all requests are first sent to a supernode, and then the supernode assigns a destination supernode or a leafnode to respond to the request;

each supernode maintains an information table, and each entry keeps information about the nodes in the group;

in addition to its own information, each supernode has statistical information of the group;

every node knows its supernode, which is a node in both levels of the network; and

each node has the information of its successor node and predecessor node.

The utilization, Ri, of a node, i, is calculated as the ratio

of the current workload, Li, to the capability, Ci, of the node.

Load balancing strives to minimize the load imbalance,

which means every node has the same utilization. A node or

a supernode is overloaded if its utilization is greater than the

target utilization, whereas it is underloaded if its utilization

is smaller than the target utilization. A group is overloaded

if it maintains more than 2n nodes, whereas it is

underloaded if it maintains less than n nodes. The system is

load balanced if none of the nodes, supernodes, and groups

at each level is overloaded.

In our proposed load balancing strategy, we use the

following approach to balance the whole network. The first

step is to balance the load among nodes with the same

supernode. The second step is to balance the load among

supernodes. Finally, the group size is balanced.

Table 1.Notations used in this paper Definitions

M number of groups

N number of nodes in group M

LMn load of the nth

node in group M

CMn capacity of the nth

node in group M

RMn utilization of the nth

node in group M

SM sum of load in group M

CM sum of capacity in group M

RM utilization of group M

S load of the whole system

C capacity of the whole system

T target utilization

LBFMn Load imbalance factor

p-bits The length of the prefix

s-bits The length of the suffix

Utilization RMn of nth

node in mth

group:

Utilization RM of mth

group: =

∑

∑

Target utilization (utilization of the whole system):

∑ (∑

)

∑ (∑

)

Load imbalance factor: LBFMn = |LMn T * CMn|

5. ROUTING SCHEME

5.1 HASHING SCHEME FOR MULTI-LEVEL NETWORK We outline the hashing scheme used in our proposed

hierarchical network. We use hash function h to assign each

node an m-bit key in binary form. The m-bit key consists of

two parts, a prefix and a suffix. The suffix determines the

level 1 network, and the prefix determines the level 0

network. For example, in a P2P cloud network of up to 16

nodes, 4 binary bits are sufficient to address all the nodes in

the network. There are 12 nodes in the [0, 24) identifier

space in Fig. 1. The first two bits are the prefix and the last

two bits are the suffix. The 2 bits suffix determines the level

1. Hence, there can be a maximum of 22 nodes at the level 0

network and 22 nodes at level 0.

Finger tables at level 0. These points to other supernodes. node 0 (0000) node 9 (1001) node 2 (0010) node 15(1111)

1 1001

2 0010

4 0000

1 0010

2 1111

4 1001

1 1111

2 0000

4 0010

1 0000

2 1001

4 1111

Finger tables at level 00. Each node will be at least a

distance of 2n away. These points to nodes at the same level.

node 0 (0000) node 8 (1000)

1 1000

2 1000

4 0000

1 0000

2 0000

4 1000

Similarly there are finger tables at levels 01, 10 and 11.

5.2 ROUTING SCHEME


Check the suffix bits (the last two bits). If the suffix of

the source and destination match, then the routing is within

the same level 1 network. If they do not match, the routing

is to another network in the P2P cloud system. Every node

knows its supernode – a node that is in both level networks.

At level 00, the supernode is 0000; at level 01, the

supernode is 1001and so on.

A. Routing within a level 1 network

If node 1010 wants to send a message to node 0110 in

Fig. 1, 1. Check if the suffixes match; 2. The table shows that the next node is node 0010; 3. The table at node 0010 shows that the next node is

destination node 0110.

B. Routing within Level 0 Network or between a level 0 and 1 network

If node 1000 wants to send a message to node 0111 in

Fig. 1,

1. Check the suffixes match;

2. They do not match. Send message to supernode

0000;

3. Supernode 0000 sends a message to node 0010 at the

level 0 network. The suffixes do not match;

therefore, node 0010 sends a message to node

supernode 1111.

4. The suffixes match, and a message is sent to 0111

6. ROUTING ATTACKS IN P2P NETWORKS Different types of attacks are possible with P2P systems.

The main focus of our security model is to make sure the

file is available only to the legitimate user. There are

different ways an attacker can get hold of the information

regarding where files are stored. The attacker can use

sniffing techniques to learn about the file’s location by

inserting himself between the user and a legitimate node.

This kind of Man-in-the-middle attack is a form of active

eavesdropping. The attacker can sniff network traffic and

gain information about the critical file such as location and

get access to them. The other way is to obtain location

information is for the attacker to join the P2P network and

become a node in the network or an insider attack. The node

will receive from and inform other nodes routing or location

information about files. This kind of attacker who is part of

the P2P network is very difficult to detect.

Our goal is therefore to hide location information about

files from routing tables. Only the node requesting the file

and the owner of the file will be aware of the location of the

files. Using our moving target defense mechanism we can

mitigate both attacks.

6.1 SECURE FINGER OR ROUTING TABLES It is important that an attacker is not able to read the

finger tables and thereby locate files or fragments thereof. If

an attacker is able to locate files through the finger tables or

by being a man-in-the middle, he will be able to locate the

files, this rendering the moving target security scheme

ineffective. This is particularly important for sensitive files.

To achieve this we use one-way hash chains [13]. Every

node when it enters the system is given its id or address, and

the hashed values generate the prefix p and the suffix s. h is

the hash function. The user or owner of a file to be stored in

the cloud, who is assumed to be trustworthy is also given

the same information.

A. Hashing Scheme

The finger table will contain the hashed values for

routing. The hashed value will contain the suffix and prefix

as described above. This points to the node that is the owner

of the file. Although the finger table remains constant, that

is, it always points to the same location for a file, in reality

the file is moved around in the moving target scheme, that

is, the location or address keeps changing.

Hash function h generates a one way hash chain with

security parameter k such that h:{0,1}* {0,1}k. A string

of 1s and 0s is hashed to a string of length k. Let c be the

seed which is picked randomly. By applying h recursively N

times to seed c it generates a hash chain of length N and can

be represented as hN(c). Let N length hash chain be

represented with ῳ.

ῳ = hN(c) = h(h

N-1(c)) = h(h(h(…h(c)))

Let us consider a Boolean predicate of a function

B:{0,1}* {0,1}. B takes an n-bit binary number input and

generates a random bit 0 or 1 as result.

A private key Ski,j and public key PKi,j are generated. i,j

represents the ith

key in the jth

round. These keys are used to

verify the whether the user requesting the file is an attacker

or a valid node.

Every node in the P2P architecture has its own ID’s.

These hashed values of the ID’s are divided into two parts

that is suffix and prefix. Suffix represents the level 1

network and prefix represents level 0 network. Let l1 and l2

be the lengths of the ids or addresses for the suffix and

prefix since we are implementing a 2-level scheme. This

approach can be applied to a n-level P2P network. The entry

in the finger table for a file f will be ps where p is the prefix

and s is the suffix. Let us consider for suffix the hash chain

generated is of length n where l2 = n and similarly for the

prefix let the hash chain generated be of length m where l1=

m. We assume that a user has no encryption keys whereas a

file owner has public and private keys. The steps are

outlined below:


B. Initial Seed Generation

Notation used:

A: op – operation at entity A

A B:data – A sends B data

Ki,j : ith

key K in the jth

round

FL – file has changed location

File owner – FO

User requesting file - USER

The user requests the file. He sends his credentials

encrypted with the public key of the file owner. If the

credentials are accepted, the following take place:

Hence, after the first round, a n-bit binary seed for the

hash chain has been generated by the user. Although the

seed has not been transmitted by the file owner to the user,

the user generates the same seed as the file owner.

is the

new seed to generate a new hash chain of length n for the

suffix address. Similarly a seed is generated in a similar

manner to generate a new hash chain of length n for the

prefix address.

C. File Access

The first file access is by concatenating the hashes of

the seeds. That is, location of file = h( )h(

).

D. Change in location of file

Each time a file is moved, the user is informed that the file

has moved and a bit for the next seed is generated as earlier. repeat {

i = 1; j > 1

FO: randomly generate a secret key SK1,j

FO: generate public key PK1,j where PK1,j= h(SK1,j)

FO USER: PK1,

For each transfer of file to a different location {

FO : randomly generate a secret key SKi+1,j

FO : generate public key PKi+1,j = h(SKi+1,j)

FO USER : FL,(SKi,j,PKi+1,j)

USER : verify whether the message is correct or a

malicious attack by verifying PKi+1,,j = h(SKi,j)

FO : Compute Boolean predicate B:{0,1}* {0,1} by

B(SKi,j) {0,1} which generates a single bit binary

value 1 or 0. This will be a bit for next hash chain with

seed . is the seed for generating the hash of the

suffix and for the prefix.

For each file access request by user {

location of file = ( ) (

) }

i=i+1 }

end

j=j+1 }

until user finishes accessing the file

When a file is moved for the ith

time in the jth

round, the

file owner informs the user or requestor of the file that the

file has been moved and also sends (SKi,j,PKi+1,j). The user

is therefore able to generate the next bit of the seed. Each

time the file is moved, the new address of the file is hi( )

hi( ). At the end of the chain the address of the file will be

hn( ) h

n( ) .

After n moves of the file, seeds and

are exhausted,

but new seeds

and

are have been generated by

the user. The process of seed generation is repeated for each

round.

The user himself generates the new address of the

transferred file without the file owner sending him the

address (or hashed values) of the new location where the file

owner has moved the file. The seeds for hashing are also not

transmitted by the file owner to the user. This makes a man-

in-middle attack very difficult. The attacker has to intercept

and read each and every message transmitted between the

file owner and the user, as well as know the hash function

and the Boolean predicate function to generate the hash

chain. The proposed approach protects from insider attacks

as values in the hash chain (which are addresses to the file)

are not known to anyone or transmitted over the network.

The routing tables do not change and an insider is not aware

that the file has been transferred to another location. The

proposed approach could be made more secure by using

different communication paths and encrypted

communications. This analysis is left for future work.

7. LOAD BALANCING SCHEME The moving targets model moves fragments to improve

security. This and user addition/deletion of files can cause

i = j = 1

FO: randomly generate a secret key SK1,1

FO: generate_public key PK1,1 where PK1,1= h(SK1,1)

FO USER: PK1,1

repeat {

FO : randomly generate a secret key SKi+1,1

FO : generate_public key PKi+1,1 = h(SKi+1,1)

FO USER : (SKi,1,PKi+1,1)

USER : verify whether the message is correct or a

malicious attack by verifying PKi+1,1 = h(SKi,1)

FO : Compute Boolean predicate B:{0,1}* {0,1}

by B(SKi,j) {0,1} which generates a single

bit binary value 1 or 0. This will be a bit for

next hash chain with seed . is the seed for

generating the hash of the suffix and for the

prefix.

i=i+1 } until i = n


load imbalance. In this section, we propose a new approach

to balance the whole system.

The hierarchical P2P cloud network is represented as a

bipartite graph for a 2-level network. Fig. 3 shows the

bipartite graph for the hierarchical P2P cloud network in

Fig. 1. Each node at level 0 is a supernode and each node at

level 1 is a leaf node. There is a solid blue arc from the

supernode at level 0 to the leaf nodes which are the nodes in

the same group. The red dotted lines represent the

connections for the finger table of nodes. Each supernode

has an information table, and each entry keeps load

information about the nodes in the group. Therefore, each

supernode gets the load utilization of the group (RM ∑

∑

).

Figure 3. Bipartite Graph for P2P Cloud Network of Fig. 1

The measurement of load balancing is described in

section 4.2. To balance the whole system, we also consider

latency in this work, since latency is an important

component that contributes to system speed. The term

latency refers to a measure of the time delay experienced by

a system. By considering the latency between any two nodes

in the HP2PC or between the user node and another node

that stores a file, we balance the level 1 network first, since

moving files between nodes within the same supernode

takes less time than moving to other rings with other super

nodes. Moreover balancing the level 1 network first makes

sure that files remain closer to the user thereby decreasing

the latency for retrieving the file. Only uncritical files are

moved in our load balancing approach, and critical files are

only moved by the hash and routing scheme outlined in

section 6.1. In our proposed load balancing strategy, we use

the following approach to balance the whole network. The

first step is to balance the load among nodes with the same

supernode. The second step is to balance the load among

supernodes. Finally, the group size is balanced. This is in

contrast to our previous approach [16] where the load

balancing approach was different. The new approach

presented in this paper yields better results as shown in

section 8.

7.1 LOAD BALANCING - LEVEL 1 NETWORK When a node in the bottom level network becomes

overloaded, the load has to be sent to the other nodes within

the same supernode to balance locally. For example (see

Fig. 4), if node 21 is overloaded, some load is transferred

from node 21 to node 31, which has the lower load.

Figure 4. P2P Cloud Network - Imbalanced Case 1.

Tag (x/y), x means load and y means capacity. In group

1, node 21 is overloaded, and node 31 is underloaded. Load

is transferred from node 21 to node 31. Thus, all nodes in

group1 are balanced.

Algorithm 1: Local Balancing Algorithm

Ln: load of node n; Cn: capacity of node n

Rn: utilization of node n; T: target utilization

Sort all nodes into a list L in decreasing order based on

utilization Rn;

Calculate T;

For each group {

Partition L into two sub lists: L1 (overloaded list)and L2.

(underloaded list). ∀li ∈ L1, ri ≥ T; ∀li ∈ L2, ri < T;

For each node in L {

Transfer some load to the nodes which belong to L2;

Set redirection point for the transferred data;

Delete the current node from L1 and update L2;

}

}

7.2 LOAD BALANCING - LEVEL 0 NETWORK When a group becomes overloaded, it checks with its

related supernodes, including predecessor, successor and

nodes related through the finger table. Some load will then

be transferred to the group with the lowest load utilization.

For example (Fig. 5), if load balancing cannot be

achieved within group 3, supernode 1 (related through the

finger table), supernode 2 (predecessor), and supernode 4

(successor) respectively are searched in parallel for their

load information. Since group 2 is underloaded, some of the

load is transferred from group 3 to group 2 depending on

which node in group 2 has the lower load utilization.

Redirection is used to find data moved. For example, the

hash of the data di, that is h(di) gives 23. But because data

has been moved from 23 to 13, 23 will have a pointer for di

to 13.

node 3 node 2 node 1 Level 0

Level 1 1

1

2

1

3

1

1

2 2

2

3

2

4

2

5

2 1

3

2

3

1

4

2

4

node 4

42 (20/40)

1

2

3

11 (30/60)

31 (20/80)

21 (60/80)

32(20/120)

22 (20/40) 12 (10/40)

52 (30/60)

13 (10/20)

23 (130/160)

4

24 (20/40)

44 (20/40)

40/80 110/220

100/300

140/180

390/780

Level 0

Level 1 Level 1

Level 1

Level 1

Group 3

Group 4 Group 1

Group 2


Figue 5. P2P Cloud Network - Imbalanced Case 2.

Group 3 is overloaded, and group 2 is under-loaded.

Node 23 (highest load utilization in group 3) transfers some

load to node 32 (lowest load utilization in group 2). Thus,

both group 2 and group 3 are balanced.

Algorithm 2: Local Balancing Algorithm

Sm: sum of load in group m; Cm: sum of capacity in group m;

Rm: utilization of group m; T: target utilization

Sort all groups gm into a list L in decreasing order based on

utilization Rm;

Calculate T;

Partition L into two sub lists: L1 and L2. L1 is overloaded list

and L2 is underloaded list. ∀gi ∈ L1, ri ≥ T; ∀gj ∈ L2, rj < T;

For each gm in L1 {

Get the information of nodes nk which belong to the

supernode of nk such that nk is a supernode in another

network;

Transfer some load to node nk which belong to group gk, gk

∈ L2;

Set redirection point for the transferred data;

Update finger table;

Delete gm from L1 and update L2;

}

7.3 NETWORK BALANCING The hash function h computes for each node an m-bit

key that consists of two parts, a prefix (p-bit) and a suffix (s-

bit). The prefix determines the level 0 network, and the

suffix determines the level 1 network. Hence, the maximum

number of groups is 2S when s is the number of bits used for

the suffix, and the maximum number of nodes which belong

to a supernode is 2P

when p is the number of bits used for

the prefix.

The number of nodes managed by a supernode needs to

be controlled, since the supernodes are used by the level 0

network to route messages among groups. Thus, it is

necessary to keep the number of nodes neither too large nor

too small. The number of bits, i, to represent the

membership, takes the minimal number of bits from the tail

end of the suffix that is needed to include the members. For

example, Fig. 6(a) shows a small network, such that m = 4

(the hash function produces a sequence of four bits), the

prefix p is 2 bits and the suffix s is 2 bits. Even though the

suffix is 2 bits, in this case only one of these bits is used, as

indicated by i = 1 in the middle of the two groups. The first

group holds all the nodes ending with 1, and the second

group holds all the nodes ending with 0.

Figue 6. (a) One Bit Used to Determine the Membership;

(b) Two Bits Used to Determine the Membership

However, more bits are considered for nodes as the

network grows. That is, the group size is determined by the

maximum number of bits used, but some groups may use

fewer bits.

Algorithm 3: Supernodes Balancing Algorithm

Sm: current supernodes; Sm+1: successors; Sm-1: predecessors;

|Sm|: number of nodes which belong to supernode Sm

When a node joins the network or a node leaves the network;

if |Sm| > 2n

if (|Sm+1| < n or |Sm-1| < n)

some of the nodes can be transferred to its neighbor’s

supernode base on their last i bits;

else if (|Sm+1| > n and |Sm-1| > n)

the group is divided into two groups;

else if |Sm| < n

if (|Sm+1| < n or |Sm-1| < n)

combine the groups to one group, based on their last i

bits;

else if (|Sm+1| > n and |Sm-1| > n)

some of the nodes can be transferred and thus, the

borders between the two groups will be shifted;

update the routing table for Sm and its neighbors

To insert a new node, take the last i bits and find its

supernode as represented by these i bits and check the

number of nodes that belong to the supernode. If there are

fewer than 2i nodes in one group, put the new node in the

group. If there are more than 2i nodes in one group, split the

group into two groups, based on the value of their last (i-1)th

bit. Put nodes whose key has 0 in that bit in one group and

nodes whose key has 1 in that bit in another group. For

example, suppose we insert a new node whose key hashes to

the sequence 1100 into the network in Fig. 6. Since the last

bit is 0, this node belongs to the second group. However, the

group is already full, so it needs to be split. As shown in Fig.

32 (20/120)

52 (30/60)

1

2

3

11 (30/60)

31 (20/80)

21 (60/80)

22 (20/40) 12 (10/40)

42 (20/40) 13 (10/20)

23 (130/160)

4

24 (20/40)

44 (20/40)

40/80 110/220

100/300

140/180 390/780

Level 0

Level 1 Level 1

Level 1

Level 1

Group 3

Group 4

Group 1

Group 2

1001

0010

0101

0110

1010

1011

1000

i = 1

i = 1

1st group

2nd group

(a)

1101

1001

0010

1101 0101

0110

1010

1011

1000

i = 1

i = 2

1st group

2nd group

New node: 1100

i = 2 1100

3rd group

(b)


6(b), we first set i = 2 in the second group. The second

group, whose nodes end with 0, needs to be split, so we

partition its nodes into a group those ending with ‘00’ and a

group of those ending with ‘10’.

To delete a node, check the number of nodes belonging

to the supernode. If there are fewer than 2p-1

nodes in the

group, merge the group with another group. Combine the

groups to one group, based on the value of their last (i-1)th

bit; these are groups whose (i-1)th

bit has the same value.

For example, suppose we delete node 1100 from the

network shown in Fig. 6(b). Since there is only one node

left in the third group, it needs to be combined with one of

the other groups. First, check with the other group whose (i-

1)th

bit has the same value as the third group. The second

group whose nodes end with 0 will combine with the first

group. Next, set i = i – 1 in the new group. After combining

the two groups, we get the network as shown in Fig. 6(a).

8. SIMULATIONS To verify the validity of our load balancing algorithm,

we built a simulation framework on which we implemented

a HP2PC system and our load balancing algorithm. We used

the load balancing measurements outlined in section 4.2 in

our simulations. Our simulated system has 103 nodes within

a [0, 212

) identifier space, which form a two-level

hierarchical network. Each node is assigned a capability and

load information. Based on the load and capability

information, we can get the utilization of each node, which

is the primary metric of our load balancing algorithm. Table

2 lists the parameters of our simulated environment and our

load balancing algorithm.

Table 2: Simulated Parameters ID space [0, 215)

Number of nodes 213

Number of layers 2

Max number of nodes in a group 256

Min number of nodes in a group 128

Target utilization 0.5

Offset ±0.1

To analyze our algorithm, we applied three different

strategies to this experimental system:

1. a HP2PC system without load balancing;

2. a HP2PC system with load balancing among leaf

nodes;

3. a HP2PC system with load balancing among

supernodes and leaf nodes.

In the simulation, we used overloaded nodes to assign

and transfer load to other underloaded nodes, since the goal

is load balancing. Therefore, even if a fraction of the nodes

are still underloaded after load balancing, we assume that

the load distributed on the system is fair, that is load

balancing has been achieved. In the simulation, we set the

utilization within a reasonable range, that is, the target

utilization value with the offset of ±p.

In Fig. 7 (a-c), the x- axis represents the utilization of

nodes and the y-axis represents the capacity of nodes. Fig.

7(a) represents the utilization distribution among

heterogeneous nodes before load balancing and the

distribution of dots in the figure is random. Fig. 7(b)

represents the utilization distribution among heterogeneous

nodes after load balancing among leaf nodes (as described

in section 6.2.A). Fig. 7(c) represents the utilization and

capacity distribution among heterogeneous nodes after load

balancing among supernodes and leaf nodes (as described in

section 7.2). As can be seen, Fig. 7(c) shows that the loads

on the nodes are very similar. Since the target utilization is

0.5 and the offset is ±0.1, some nodes are concentrated at

the 40, 50 and 60 percentage areas.

0

5000

10000

15000

20000

25000

30000

35000

0% 20% 40% 60% 80% 100%

No

de

Utilization 7(a)

0

5000

10000

15000

20000

25000

30000

35000

0% 20% 40% 60% 80% 100%

No

de

Utilization 7(b)

0

5000

10000

15000

20000

25000

30000

35000

0% 20% 40% 60% 80% 100%

No

de

Utilization 7(c)


Figure 7. Distribution of utilization for every node (a)

before load balancing; (b) after level 1 load balancing; (c)

after both level 0 and level 1 load balancing.

Figue 8. Distribution of utilization for every group (a)

before load balancing; (b) after level 1 load balancing; (c)

after both level 0 and level 1 load balancing

In Fig. 8 (a-c), the x-axis represents the group and the y-

axis represents the utilization of the group in each range.

Fig. 8(a) shows the minimum, maximum and average load

utilization of each group without load balancing. Fig. 8(b)

shows the minimum, maximum and average load utilization

of each group after load balancing within groups (as

described in section 7.1). Fig. 8(c) shows the minimum,

maximum and average load utilization of each group after

load balancing among groups (as described in section 7.2).

Figue 9. Numbers of Nodes in each Utilization Range

In Fig. 9, the x- axis represents the utilization and the y-

axis represents the number of nodes in each utilization

range. The lines shows the distribution before load

balancing; the distribution after load balancing at the level 1

network; and the distribution after load balancing at both

level 1 and level 0 network. These results show that load

balancing improves significantly using our approach.

In our previous study [16] we proposed a load balancing

algorithm for HP2PC. The new approach we developed in

this paper was based on the previous approach but also took

into consideration latency. For the previous approach, we

observed that the mean of load balanced data was 49.75 and

the standard deviation was 8.30. For the new approach, using

the same original data (but without latency), we found that

the mean of load balanced data was same and the standard

deviation was 8.19. Comparing the two approaches, the

standard deviation of the new approach is smaller than the

previous one. This shows that considering latency does

improve the utilization of the network as a whole.

The overhead in load balancing was also measured. We

ran experiments with 1000 nodes and calculated the time

spent on moving files to achieve load balancing by using the

two approaches. Based on the results we observed it took

760 time units for the new approach compared to the

previous approach which took 851 time units. This shows

that the new approach shows a significant improvement of

around 11% on the overall time taken in load balancing

process. Hence our improved approach achieves better load

balancing with less overhead.

9. CONCLUSIONS In this paper, we propose an effective secure load

balancing algorithm to enable global load balance for

HP2PC systems. Security is achieved by a simple moving

0%

20%

40%

60%

80%

100%

1 5 9 13 17 21 25 29 33 37 41 45 49

Uti

lizat

ion

Group 8(a)

MAX AVG MIN

0%

20%

40%

60%

80%

100%

1 5 9 13 17 21 25 29 33 37 41 45 49

Uti

lizat

ion

Group 8(b)

MAX AVG MIN

0%

20%

40%

60%

80%

100%

1 5 9 13 17 21 25 29 33 37 41 45 49

Uti

lizat

ion

Group 8(c)

MAX AVG MIN

0

500

1000

1500

2000

2500

3000

3500

0% 20% 40% 60% 80% 100%

Nu

mb

er o

f N

od

es

Utiliazation

before LB after level 1 LB after both level 0 and level 1 LB


target approach with hash of information to hide location of

files, yet without modifying the routing tables. To ensure

load balancing in the moving target scheme, the first step is

to ensure fair load distribution among nodes within the same

supernodes, followed by fair load distribution among

supernodes. We also propose two schemes to balance the

network. Our simulation results show that our algorithm is

effective in achieving load balancing in HP2PC systems.

Compared to previous work, we achieve better load

balancing with less overhead. We focus on a 2-level P2P

cloud network in this paper; however, our approach can be

easily applied to a multi-level P2P cloud network.

A number of potential improvements to our algorithm

deserve further study. First, we use storage as a load factor

in this paper. However, a distributed computing system may

be constrained with other parameters besides storage, such

as CPU and bandwidth. Another problem is how to

determine the optimum level of hierarchy under a given set

of assumptions for the HP2PC network. It would be

interesting to determine the number levels of hierarchical

levels needed to balance the load. A simulation on a larger,

more realistic HP2PC is needed. Finally the moving target

scheme is primitive as it stands and is worthy of further

analysis and study.

10. REFERENCES [1] S Rieche, L Petrtak and K Wehrle, “A Thermal –Dissipation-based

Approach for Balancing Data Load in Distributed Hash Tables”, Proceedings 29th IEEE International Conference on Local Computer Networks, 2004

[2] J Byers, J Considine and M Mitzenmacher, “Simple Load Balancing for Distributed Hash Tables”, PeertoPeer Systems II, Volume 2735, Pages 80-88, 2002

[3] [L. Garc´es-Erice1, E.W. Biersack1, P.A. Felber1, K.W. Ross2, and G. Urvoy-Keller1, “Hierarchical Peer-to-Peer Systems”, Parallel Processing Letters, Volume 13, Issue 4, Pages 643-657, 2003

[4] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” IEEE/ACM Transactions on Networking, Volume 11, Issue 1, Pages 17-32, 2003.

[5] K. Aberer, A. Datta, and M. Hauswirth, “Multifaceted Simultaneous Load Balancing in DHT-Based P2P Systems: A New Game with Old Balls and Bins,” Science, Issue 5005, Pages 373-391, 2005.

[6] Stefan Zoels, Zoran Despotovic, and Wolfgang Kellerer, “Load balancing in a hierarchical DHT-based P2P system”, Proceedings of the 2007 International Conference, 2007

[7] David Karger, Eric Lehman, Tom Leighton, Matthew Levine, Daniel Lewin, and Rina Panigrahy, “Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web”, Proc. 29th ACM Symposium on Theory of Computing, 1997

[8] Marc Sanchez Artigas, Pedro Garcia Lopez, Jordi Pujol Ahullo, Antonio Gomez Skameta, “Cyclone: a Novel Design Schema for Hierarchical DHTs”, Fifth IEEE International Conference on Peer-to-Peer Computing, 2005

[9] C. Wang, Sherman S.-M. Chow, Q. Wang, K. Ren, W. Lou, “Privacy preserving public auditing for secure cloud storage Proceedings of the 29th conference on Information communications, March 2010.

[10] M. Dijk and A. Juels, “On the Impossibility of Cryptography Alone for Privacy-Preserving Cloud Computing”, Proceedings of the 5th USENIX conference on Hot topics in security, 2010.

[11] K Zeng, "Publicly verifiable remote data integrity”, Proceedings of the 10th International Conference on Information and Communications Security, 2008.

[12] Cong Wang, Qian Wang, Kui Ren, and Wenjing Lou, "Ensuring Data Storage Security in Cloud Computing," Proceedings of the 17th International Workshop on Quality of Service.2009.

[13] L. Lamport, “Password Authentication with Insecure Communication”, Communications of the ACM, Vol 24, No. 11, pp 770-772, November 1981

[14] T Condie, V Kacholia, S Sankararaman, J M Hellerstein ad P Manitatis, “Induced Churn as Shelter from Routing-Table Poisoning”, Proc. 13th Annual Network and Distributed System Security Symposium (NDSS), 2006

[15] F Dabek, M. F Kaashoek, D Karger, R Morris, and I Stoica, “Wide-area cooperative storage with CFS”, Proc. 18th ACM Symposium on Operating Systems Principles (SOSP), 2001.

[16] H Liu, J Thomas, and P Khethavath, “Load balancing with moving target in P2P Cloud”, IEEE 6th International Conference on Cloud Computing, 2013.

Authors

Hong Liu obtained her B.S in Computer

Science and Technology from the

Northeastern University in China, M.S in

Computer Science from Oklahoma State

University. She is currently a PhD student

in Computer Science at Oklahoma State

University. Her research interests include Cloud Computing,

Big data and Peer-to-Peer networks.

Johnson P Thomas obtained his B.Sc in

Electrical Engineering from the University

of Wales, M.Sc in Electrical Engineering

and Computer Science from the University

of Edinburgh, Scotland and PhD in

Computer Science from the University of

Reading. England. He is currently an Associate Professor of

Computer Science at Oklahoma State University. His

research interests include Cloud Computing, Computer

Security and Sensor Networks. He serves as an Associate

Editor for the Wiley Security and Communications

Networks Journal.

PraveenKumar Khethavath obtained his

B.E. from Chaitanya Bharathi Institute of

Technology, Osmania University in India.

He is currently a PhD student at Oklahoma

State University. His research interests

include Cloud Computing, Security and

privacy in mobile networks and health care, Big data,

wireless sensor networks and Peer-to-peer networks.

mailto:[email protected]

http://www.pdos.lcs.mit.edu/~kaashoek/

http://theory.lcs.mit.edu/~karger/

http://www.pdos.lcs.mit.edu/~rtm/

http://www.cs.berkeley.edu/~istoica/

http://www.cs.ucsd.edu/sosp01/

http://www.cs.ucsd.edu/sosp01/

MOVING TARGET WITH LOAD BALANCING IN A HIERARCHICAL...

Documents

Transcript of MOVING TARGET WITH LOAD BALANCING IN A HIERARCHICAL...