MOVING TARGET WITH LOAD BALANCING IN A HIERARCHICAL...
Transcript of MOVING TARGET WITH LOAD BALANCING IN A HIERARCHICAL...
International Journal of Cloud Computing (ISSN 2326-7550) Vol. 2, No. 3, July - September, 2014
MOVING TARGET WITH LOAD BALANCING IN A HIERARCHICAL CLOUD
Hong Liu, Johnson Thomas and Praveen Khethavath Department of Computer Science
Oklahoma State University Stillwater, USA
Abstract In this paper we propose a ‘moving target’ security mechanism for a P2P cloud where files are partitioned and sensitive sections are moved at different times without modifying the routing or finger tables, to reduce the risk of the file being compromised. Two drawbacks with this approach are the problem of determining the locality of the data and load unbalancing. We present a hierarchical P2P cloud system that leads to scalability and efficiency. A 3-step load balancing scheme for hierarchical P2P cloud system to globally balance the network is proposed. Our simulation results show that our algorithm is effective in achieving load balancing in hierarchical peer-to-peer cloud systems. Keywords: moving target; load balancing; cloud
__________________________________________________________________________________________________________________
1. INTRODUCTION The cloud serves as a large storage repository for user
files and data. One of the big problems is security of the
files. If data or files can be accessed by attackers, the service
provider will lose trust from its users, and the leakage of
sensitive data or files could cause great damage. Attacks can
be directed at the routing, searching and storing
mechanisms. Techniques such as encryption are typically
used for securing the storage [10]. In this paper, we propose
a ‘moving target’ approach as a compliment to existing
approaches. The idea is to move critical files to a different
location so that even if an attacker breaks into the system,
the target will be stored in a different location. This gives
the attacker no option, but to guess and attack at a different
location. There is an overhead in moving files, but only files
that require high security will be moved. A question not
addressed in this work is the timing of the transfer. Should
files or data could be moved at regular intervals or randomly
or only when some suspicious event triggers. This is not
covered in this paper and will be the topic of future work.
P2P cloud systems are increasing in popularity, making
it possible to harness the computing power and resources of
large populations of network in a cost-effective manner.
Currently, most P2P cloud systems are flat with all nodes
having the same functionalities. These flat P2P cloud
systems are limited when it comes to scalability [8].
Searching for nodes or files will take time. Moreover, since
they lack a centralized administrative entity that controls the
node resources, ensuring high levels of availability,
performance and security becomes difficult. We propose a
hierarchical peer-to-peer cloud (HP2PC) network model
which is scalable, efficient and secure. Data security which
is achieved by the moving target approach exposed in this
paper requires fair load distribution among all nodes for
efficiency and performance reasons. Not only is there an
internal transfer of files for security reasons, but cloud users
will be adding and deleting files.
Our contribution is two-fold:
A moving target defense approach for storage in a hierarchical P2P cloud. This is achieved without modifying the routing tables.
A load balancing mechanism caused by file transfers and user file updates in a hierarchical P2P cloud.
Figure 1. Hierarchical P2P Cloud Network
A file that needs to be securely stored is divided into
multiple portions. The goal of the division is to
compartmentalize parts of the file that need to be securely
stored so that the sensitive sections are moved more often.
Parts which contain little or no sensitive data or code can be
left at their original locations with little or no transfers to
other locations. Our proposed approach is divided into the
following steps:
id: 0111 prefix: 2 bits suffix: 2 bits predecessor: 1111 successor: 1011 supernode:1111 load: x capacity: y interval: (3,1]
1110
1001
0010 1111
0000
0101
0110
1010
0111
1011
1000
Level 0
Level 1
Level 1 Level 1
Level 1
1101
id: 0010 prefix: 2 bits suffix: 2 bits predecessor:1001 successor: 1111 load: x capacity: y interval: (1,2]
Super-node Leaf node
Information table
id load capacity
0010
0110
1010
1110
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
The partitioned files are randomly distributed across the cloud. A node will store only one part of a file (section 3).
Load Balance the hierarchical P2P cloud (section 4)
Move the security sensitive files at regular intervals, or randomly or only when some suspicious event triggers. This is not discussed in this paper.
Our hierarchical model is shown in Fig. 1. Red nodes represent leaf nodes and yellow nodes represent super-nodes. There are four groups in the level 1 network, and the super-nodes are 1, 2, 3, and 4. These four super-nodes constitute the level 0 network.
A literature review of previous work is presented in the next section. Our proposed approach is outlined in section 3 and section 4. Section 5 is about routing schema which is followed by the moving target defense mechanism in section 6. The load balancing scheme is described in section 7 before paper concludes.
2. LITERATURE REVIEW
2.1 SECURE FILE STORAGE IN P2P CLOUD SYSTEMS Much of the previous works for cloud security focused
on cryptographic schemes and data integrity. Many of the
cryptographic schemes have been proposed for hiding the
data from the storage provider and hence preserving data
privacy [9] [10]. Wang et al. in [9], presented a scheme in
which, the user’s identity is also detached from the data and
provide public auditing of data. In [10], Dijk et al. proved
that in cloud computing individual cryptographic
measurement is insufficient for guaranteeing data privacy.
The problem of ensuring the integrity of data storage in
cloud computing is studied in [11] and [12]. In [11],
Lamport et al. presented provable data integrity (PDI)
solution to support public data integrity verification. Wang
et al. in [12] proposed a scheme to prove the integrity of the
data dynamically stored in cloud systems.
Concerns arise in schemes of cloud storage services that
with given a sufficient amount of time, data can be
decrypted, meaningful information can be located and
retrieved and user privacy can easily be breached. To solve
this problem, Condie et. al. [14] periodically reset the
routing tables by using induced churn where different nodes
enter and leave the address space. This reduces the chances
of hitting on a specific target. However, if an attacker is able
to access the router, he will notice the change in the routing
table and be able to deduce that files have changed
locations. In contrast, in our method, critical files or data are
moved, but the routing or finger table does not change. An
attacker is thus not able to detect, even if he breaks into the
network router, that the target has been moved. In [14], the
routing is constrained and an inefficient path may be
chosen. In our approach we aim for efficient routing, but
without modifying the routing table. Hence an attacker can
attack the routing table, but not be able to detect that the
target has moved.
2.2 LOAD BALANCING STRATEGIES Several load balancing approaches have been proposed
for P2P cloud systems. In [1], Rieche et al. presented an
algorithm to balance load in distributed hash table (DHT)
based on a thermal dispersion scheme. All intervals in the
identifier space are managed by a minimum number f and a
maximum number 2f. Each node belonging to the interval
stores all documents assigned to the interval. Load
balancing can be done by splitting, merging, and shifting the
interval. However, this approach has a limitation since it
requires each file has copies on all nodes belonging to the
interval. Furthermore, in this scheme there are still some
nodes having a load up to twice above the optimum. The
load is defined as the number of documents it stores, and
they focus on the distibution of documents among the nodes.
However, we define the load as the ratio of the current
workload to the capability of the node, since each node in
the system cannot have the same capacity. In addition, the
framework in this paper is hierarchical. Results show that in
our approach for each node under a certain amount of load,
load fluctuations are relatively small. Consequently, load
balancing improves significantly using our approach.
Stoica et al. in [4], proposed the concept of virtual
servers to address the load balancing issue by having each
node simulate a logarithmic number of virtual servers. As a
result, the overloaded node needs to transfer some of its
virtual nodes to the under loaded node to achieve load
balancing. The limitation of this approach is that as more
nodes join in the system, these virtual servers consume more
resources. Aberer et al. in [5] tried to balance the load in a
DHT by checking its load with its neighbor nodes. In the
system, each node repeatedly checks the load information of
its neighbor nodes to achieve load balancing. Although this
method is able to achieve load balancing when the system is
in a steady state, there is no guarantee of load balance when
the system is in a dynamic state because load balancing is
only done locally between neighbor nodes. Zoels et al. in [6]
proposed an algorithm to balance a hierarchical system.
First, peers contact a predefined superpeer when they join
the system. Second, an algorithm is used to determine a
super-node for the new peer. As a result, all super-peers
have an equal load. This method has a limitation since it
considers super-nodes only.
A load balancing scheme for a flat decentralized
architecture is proposed in [15]. In our work, we use a
hierarchical architecture. When compared to the flat
network, the hierarchical architecture offers exploiting
heterogeneous peers, transparency, faster lookup time, and
less messages in the wide-area [3]. Moreover, the work in
[15] does not consider defending against malicious
participants, but we use a moving target security mechanism
to reduce the risk.
3. PARTITIONED FILE DISTRIBUTION
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
In this paper we assume that partitioned files are
randomly distributed across the P2P cloud system. The
file/document is broken into multiple pieces or fragments.
Some sections may be more critical. We are particularly
interested in the critical pieces of code. These are the
sections that will be moved.
If there are few fragments to a file, randomly distribute
one file per ring. If there are many fragments, then there will
be at least one file per node, hence many files per P2P ring.
Each file is broken into k parts, where k may be different for
each file. Depending on the number of fragments of a file,
the files will be distributed across the nodes in an individual
ring, a number of rings that form a sub-part of the cloud and
are physically located next to each other, or distributed
across the whole P2P cloud system.
Files fragments are distributed randomly across the
cloud. In the moving target approach, the files are moved to
different storage locations or nodes in the cloud. In this
paper, we assume that as a file is fragmented, and the
critical parts are moved, the critical fragments of the file
have to be accessed for the attack to be completely
successful. However, breaking into some of the files may
provide some information, so our security condition is not
strict. Security in our moving target model is therefore
measured as follows: the lower the probability of
successfully accessing all the critical partitions of a file that
has been moved, the more secure the entire file is. The goal
of the moving target approach is to ensure that the target
will have changed from the attacker’s view.
The probability that all the critical fragmented files can
be accessed depends on the number of possible
combinations possible for moving the files. We assume that
the cloud contains many resources and there will be only
one fragment per node.
Assume there are p nodes across which the fragments
are to be distributed. Of these p nodes, n nodes do not have
any fragments of the file and are available to accommodate
one fragment at the most. Let k be the total number of
fragments of the file of which r are to be moved. In this
paper, fragments are moved to a node which has no other
fragment of the file. Given n available nodes, and r
fragments, the number of possible combinations for storing
r fragments on n nodes is:
( )
The number of possible combinations given that there
are k fragments, of which r are to be moved to n available
nodes is:
( ) ( )
(1)
If an attacker is able to access all k fragments with a
probability of 1, then the probability after the files have
been moved is shown below (see Fig. 2).
Figure 2 shows that moving all the files or moving very
few files reduce the effectiveness of the moving target
approach. If few fragments are moved, then the attacker
does not have to modify his strategy much. On the other
hand moving most of the fragments not only introduce a lot
of overhead, but it also suggests that the fragments can be
moved to limited places only. The most secure (or lowest
probability) is therefore to move an intermediate number of
fragments. In this paper files refer to file fragments.
Figure 2. Moving Target Probability
4. LOAD BALANCING The moving target model moves fragments to improve
security. This means that load balancing becomes critical,
not only because the user keeps changing the load in the
cloud, but also because of the moving target model. In this
section, we propose a new approach to load balancing in
HP2PC systems. This approach focuses on a 2-level P2P
cloud network; however, it can be easily applied to a n-level
P2P cloud network. Our proposed approach to load
balancing consists of five steps:
accumulate load information in the whole system;
node classification. According to their utilization,
nodes are classified into overloaded nodes,
underloaded nodes, or neutral nodes;
network balance;
load balancing within the level 1 network;
load balancing within the level 0 network.
Our main contribution is a novel load balancing strategy
for HP2PC networks, which can effectively control the
amount of load imbalance across the network to globally
balance the load. First, we define the load for a node, a
supernode, and a group. Next, three strategies are presented
to balance the load among nodes, supernodes and groups
along with the algorithms for each strategy. Simulation
results show that our algorithm is effective in achieving load
balancing in HP2PC systems.
4.1 HASHING SCHEME We used hashing to locate data or files as in traditional
P2P systems. A hashing function takes a search key as an
-0.005
0
0.005
0.01
0.015
0.02
0.025
0.03
0 2 4 6 8 10
Pro
bab
ility
No of fragments moved
4 fragments
8 fragments
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
argument and computes from it an integer in the range 0 to
B – 1, where B is the identifier space. If a node has search
key K, then we put the node in the identifier space for the
position h(K), where h is the hash function. A common
choice of hash function when keys are integers is to
compute the remainder of K/B, where K is the key value.
For character-string search keys, we treat each character as
an integer, sum these integers, and take the remainder when
the sum is divided by B. For example, key is a n byte
character string (key = ‘x1x2x3……xn’). We sum these
characters as integers (sum = x1 + x2 + x3……+ xn), and
compute sum modulo B. Common hash functions include
[7].
4.2 MEASUREMENT OF LOAD BALANCING In hierarchical P2P cloud networks (see Fig. 1):
nodes are organized in groups; each group has a supernode and consists of leaf nodes belonging to the same supernode;
all requests are first sent to a supernode, and then the supernode assigns a destination supernode or a leafnode to respond to the request;
each supernode maintains an information table, and each entry keeps information about the nodes in the group;
in addition to its own information, each supernode has statistical information of the group;
every node knows its supernode, which is a node in both levels of the network; and
each node has the information of its successor node and predecessor node.
The utilization, Ri, of a node, i, is calculated as the ratio
of the current workload, Li, to the capability, Ci, of the node.
Load balancing strives to minimize the load imbalance,
which means every node has the same utilization. A node or
a supernode is overloaded if its utilization is greater than the
target utilization, whereas it is underloaded if its utilization
is smaller than the target utilization. A group is overloaded
if it maintains more than 2n nodes, whereas it is
underloaded if it maintains less than n nodes. The system is
load balanced if none of the nodes, supernodes, and groups
at each level is overloaded.
In our proposed load balancing strategy, we use the
following approach to balance the whole network. The first
step is to balance the load among nodes with the same
supernode. The second step is to balance the load among
supernodes. Finally, the group size is balanced.
Table 1.Notations used in this paper Definitions
M number of groups
N number of nodes in group M
LMn load of the nth
node in group M
CMn capacity of the nth
node in group M
RMn utilization of the nth
node in group M
SM sum of load in group M
CM sum of capacity in group M
RM utilization of group M
S load of the whole system
C capacity of the whole system
T target utilization
LBFMn Load imbalance factor
p-bits The length of the prefix
s-bits The length of the suffix
Utilization RMn of nth
node in mth
group:
Utilization RM of mth
group: =
∑
∑
Target utilization (utilization of the whole system):
∑ (∑
)
∑ (∑
)
Load imbalance factor: LBFMn = |LMn T * CMn|
5. ROUTING SCHEME
5.1 HASHING SCHEME FOR MULTI-LEVEL NETWORK We outline the hashing scheme used in our proposed
hierarchical network. We use hash function h to assign each
node an m-bit key in binary form. The m-bit key consists of
two parts, a prefix and a suffix. The suffix determines the
level 1 network, and the prefix determines the level 0
network. For example, in a P2P cloud network of up to 16
nodes, 4 binary bits are sufficient to address all the nodes in
the network. There are 12 nodes in the [0, 24) identifier
space in Fig. 1. The first two bits are the prefix and the last
two bits are the suffix. The 2 bits suffix determines the level
1. Hence, there can be a maximum of 22 nodes at the level 0
network and 22 nodes at level 0.
Finger tables at level 0. These points to other supernodes. node 0 (0000) node 9 (1001) node 2 (0010) node 15(1111)
1 1001
2 0010
4 0000
1 0010
2 1111
4 1001
1 1111
2 0000
4 0010
1 0000
2 1001
4 1111
Finger tables at level 00. Each node will be at least a
distance of 2n away. These points to nodes at the same level.
node 0 (0000) node 8 (1000)
1 1000
2 1000
4 0000
1 0000
2 0000
4 1000
Similarly there are finger tables at levels 01, 10 and 11.
5.2 ROUTING SCHEME
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
Check the suffix bits (the last two bits). If the suffix of
the source and destination match, then the routing is within
the same level 1 network. If they do not match, the routing
is to another network in the P2P cloud system. Every node
knows its supernode – a node that is in both level networks.
At level 00, the supernode is 0000; at level 01, the
supernode is 1001and so on.
A. Routing within a level 1 network
If node 1010 wants to send a message to node 0110 in
Fig. 1, 1. Check if the suffixes match; 2. The table shows that the next node is node 0010; 3. The table at node 0010 shows that the next node is
destination node 0110.
B. Routing within Level 0 Network or between a level 0 and 1 network
If node 1000 wants to send a message to node 0111 in
Fig. 1,
1. Check the suffixes match;
2. They do not match. Send message to supernode
0000;
3. Supernode 0000 sends a message to node 0010 at the
level 0 network. The suffixes do not match;
therefore, node 0010 sends a message to node
supernode 1111.
4. The suffixes match, and a message is sent to 0111
6. ROUTING ATTACKS IN P2P NETWORKS Different types of attacks are possible with P2P systems.
The main focus of our security model is to make sure the
file is available only to the legitimate user. There are
different ways an attacker can get hold of the information
regarding where files are stored. The attacker can use
sniffing techniques to learn about the file’s location by
inserting himself between the user and a legitimate node.
This kind of Man-in-the-middle attack is a form of active
eavesdropping. The attacker can sniff network traffic and
gain information about the critical file such as location and
get access to them. The other way is to obtain location
information is for the attacker to join the P2P network and
become a node in the network or an insider attack. The node
will receive from and inform other nodes routing or location
information about files. This kind of attacker who is part of
the P2P network is very difficult to detect.
Our goal is therefore to hide location information about
files from routing tables. Only the node requesting the file
and the owner of the file will be aware of the location of the
files. Using our moving target defense mechanism we can
mitigate both attacks.
6.1 SECURE FINGER OR ROUTING TABLES It is important that an attacker is not able to read the
finger tables and thereby locate files or fragments thereof. If
an attacker is able to locate files through the finger tables or
by being a man-in-the middle, he will be able to locate the
files, this rendering the moving target security scheme
ineffective. This is particularly important for sensitive files.
To achieve this we use one-way hash chains [13]. Every
node when it enters the system is given its id or address, and
the hashed values generate the prefix p and the suffix s. h is
the hash function. The user or owner of a file to be stored in
the cloud, who is assumed to be trustworthy is also given
the same information.
A. Hashing Scheme
The finger table will contain the hashed values for
routing. The hashed value will contain the suffix and prefix
as described above. This points to the node that is the owner
of the file. Although the finger table remains constant, that
is, it always points to the same location for a file, in reality
the file is moved around in the moving target scheme, that
is, the location or address keeps changing.
Hash function h generates a one way hash chain with
security parameter k such that h:{0,1}* {0,1}k. A string
of 1s and 0s is hashed to a string of length k. Let c be the
seed which is picked randomly. By applying h recursively N
times to seed c it generates a hash chain of length N and can
be represented as hN(c). Let N length hash chain be
represented with ῳ.
ῳ = hN(c) = h(h
N-1(c)) = h(h(h(…h(c)))
Let us consider a Boolean predicate of a function
B:{0,1}* {0,1}. B takes an n-bit binary number input and
generates a random bit 0 or 1 as result.
A private key Ski,j and public key PKi,j are generated. i,j
represents the ith
key in the jth
round. These keys are used to
verify the whether the user requesting the file is an attacker
or a valid node.
Every node in the P2P architecture has its own ID’s.
These hashed values of the ID’s are divided into two parts
that is suffix and prefix. Suffix represents the level 1
network and prefix represents level 0 network. Let l1 and l2
be the lengths of the ids or addresses for the suffix and
prefix since we are implementing a 2-level scheme. This
approach can be applied to a n-level P2P network. The entry
in the finger table for a file f will be ps where p is the prefix
and s is the suffix. Let us consider for suffix the hash chain
generated is of length n where l2 = n and similarly for the
prefix let the hash chain generated be of length m where l1=
m. We assume that a user has no encryption keys whereas a
file owner has public and private keys. The steps are
outlined below:
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
B. Initial Seed Generation
Notation used:
A: op – operation at entity A
A B:data – A sends B data
Ki,j : ith
key K in the jth
round
FL – file has changed location
File owner – FO
User requesting file - USER
The user requests the file. He sends his credentials
encrypted with the public key of the file owner. If the
credentials are accepted, the following take place:
Hence, after the first round, a n-bit binary seed for the
hash chain has been generated by the user. Although the
seed has not been transmitted by the file owner to the user,
the user generates the same seed as the file owner.
is the
new seed to generate a new hash chain of length n for the
suffix address. Similarly a seed is generated in a similar
manner to generate a new hash chain of length n for the
prefix address.
C. File Access
The first file access is by concatenating the hashes of
the seeds. That is, location of file = h( )h(
).
D. Change in location of file
Each time a file is moved, the user is informed that the file
has moved and a bit for the next seed is generated as earlier. repeat {
i = 1; j > 1
FO: randomly generate a secret key SK1,j
FO: generate public key PK1,j where PK1,j= h(SK1,j)
FO USER: PK1,
For each transfer of file to a different location {
FO : randomly generate a secret key SKi+1,j
FO : generate public key PKi+1,j = h(SKi+1,j)
FO USER : FL,(SKi,j,PKi+1,j)
USER : verify whether the message is correct or a
malicious attack by verifying PKi+1,,j = h(SKi,j)
FO : Compute Boolean predicate B:{0,1}* {0,1} by
B(SKi,j) {0,1} which generates a single bit binary
value 1 or 0. This will be a bit for next hash chain with
seed . is the seed for generating the hash of the
suffix and for the prefix.
For each file access request by user {
location of file = ( ) (
) }
i=i+1 }
end
j=j+1 }
until user finishes accessing the file
When a file is moved for the ith
time in the jth
round, the
file owner informs the user or requestor of the file that the
file has been moved and also sends (SKi,j,PKi+1,j). The user
is therefore able to generate the next bit of the seed. Each
time the file is moved, the new address of the file is hi( )
hi( ). At the end of the chain the address of the file will be
hn( ) h
n( ) .
After n moves of the file, seeds and
are exhausted,
but new seeds
and
are have been generated by
the user. The process of seed generation is repeated for each
round.
The user himself generates the new address of the
transferred file without the file owner sending him the
address (or hashed values) of the new location where the file
owner has moved the file. The seeds for hashing are also not
transmitted by the file owner to the user. This makes a man-
in-middle attack very difficult. The attacker has to intercept
and read each and every message transmitted between the
file owner and the user, as well as know the hash function
and the Boolean predicate function to generate the hash
chain. The proposed approach protects from insider attacks
as values in the hash chain (which are addresses to the file)
are not known to anyone or transmitted over the network.
The routing tables do not change and an insider is not aware
that the file has been transferred to another location. The
proposed approach could be made more secure by using
different communication paths and encrypted
communications. This analysis is left for future work.
7. LOAD BALANCING SCHEME The moving targets model moves fragments to improve
security. This and user addition/deletion of files can cause
i = j = 1
FO: randomly generate a secret key SK1,1
FO: generate_public key PK1,1 where PK1,1= h(SK1,1)
FO USER: PK1,1
repeat {
FO : randomly generate a secret key SKi+1,1
FO : generate_public key PKi+1,1 = h(SKi+1,1)
FO USER : (SKi,1,PKi+1,1)
USER : verify whether the message is correct or a
malicious attack by verifying PKi+1,1 = h(SKi,1)
FO : Compute Boolean predicate B:{0,1}* {0,1}
by B(SKi,j) {0,1} which generates a single
bit binary value 1 or 0. This will be a bit for
next hash chain with seed . is the seed for
generating the hash of the suffix and for the
prefix.
i=i+1 } until i = n
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
load imbalance. In this section, we propose a new approach
to balance the whole system.
The hierarchical P2P cloud network is represented as a
bipartite graph for a 2-level network. Fig. 3 shows the
bipartite graph for the hierarchical P2P cloud network in
Fig. 1. Each node at level 0 is a supernode and each node at
level 1 is a leaf node. There is a solid blue arc from the
supernode at level 0 to the leaf nodes which are the nodes in
the same group. The red dotted lines represent the
connections for the finger table of nodes. Each supernode
has an information table, and each entry keeps load
information about the nodes in the group. Therefore, each
supernode gets the load utilization of the group (RM ∑
∑
).
Figure 3. Bipartite Graph for P2P Cloud Network of Fig. 1
The measurement of load balancing is described in
section 4.2. To balance the whole system, we also consider
latency in this work, since latency is an important
component that contributes to system speed. The term
latency refers to a measure of the time delay experienced by
a system. By considering the latency between any two nodes
in the HP2PC or between the user node and another node
that stores a file, we balance the level 1 network first, since
moving files between nodes within the same supernode
takes less time than moving to other rings with other super
nodes. Moreover balancing the level 1 network first makes
sure that files remain closer to the user thereby decreasing
the latency for retrieving the file. Only uncritical files are
moved in our load balancing approach, and critical files are
only moved by the hash and routing scheme outlined in
section 6.1. In our proposed load balancing strategy, we use
the following approach to balance the whole network. The
first step is to balance the load among nodes with the same
supernode. The second step is to balance the load among
supernodes. Finally, the group size is balanced. This is in
contrast to our previous approach [16] where the load
balancing approach was different. The new approach
presented in this paper yields better results as shown in
section 8.
7.1 LOAD BALANCING - LEVEL 1 NETWORK When a node in the bottom level network becomes
overloaded, the load has to be sent to the other nodes within
the same supernode to balance locally. For example (see
Fig. 4), if node 21 is overloaded, some load is transferred
from node 21 to node 31, which has the lower load.
Figure 4. P2P Cloud Network - Imbalanced Case 1.
Tag (x/y), x means load and y means capacity. In group
1, node 21 is overloaded, and node 31 is underloaded. Load
is transferred from node 21 to node 31. Thus, all nodes in
group1 are balanced.
Algorithm 1: Local Balancing Algorithm
Ln: load of node n; Cn: capacity of node n
Rn: utilization of node n; T: target utilization
Sort all nodes into a list L in decreasing order based on
utilization Rn;
Calculate T;
For each group {
Partition L into two sub lists: L1 (overloaded list)and L2.
(underloaded list). ∀li ∈ L1, ri ≥ T; ∀li ∈ L2, ri < T;
For each node in L {
Transfer some load to the nodes which belong to L2;
Set redirection point for the transferred data;
Delete the current node from L1 and update L2;
}
}
7.2 LOAD BALANCING - LEVEL 0 NETWORK When a group becomes overloaded, it checks with its
related supernodes, including predecessor, successor and
nodes related through the finger table. Some load will then
be transferred to the group with the lowest load utilization.
For example (Fig. 5), if load balancing cannot be
achieved within group 3, supernode 1 (related through the
finger table), supernode 2 (predecessor), and supernode 4
(successor) respectively are searched in parallel for their
load information. Since group 2 is underloaded, some of the
load is transferred from group 3 to group 2 depending on
which node in group 2 has the lower load utilization.
Redirection is used to find data moved. For example, the
hash of the data di, that is h(di) gives 23. But because data
has been moved from 23 to 13, 23 will have a pointer for di
to 13.
node 3 node 2 node 1 Level 0
Level 1 1
1
2
1
3
1
1
2 2
2
3
2
4
2
5
2 1
3
2
3
1
4
2
4
node 4
42 (20/40)
1
2
3
11 (30/60)
31 (20/80)
21 (60/80)
32(20/120)
22 (20/40) 12 (10/40)
52 (30/60)
13 (10/20)
23 (130/160)
4
24 (20/40)
44 (20/40)
40/80 110/220
100/300
140/180
390/780
Level 0
Level 1 Level 1
Level 1
Level 1
Group 3
Group 4 Group 1
Group 2
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
Figue 5. P2P Cloud Network - Imbalanced Case 2.
Group 3 is overloaded, and group 2 is under-loaded.
Node 23 (highest load utilization in group 3) transfers some
load to node 32 (lowest load utilization in group 2). Thus,
both group 2 and group 3 are balanced.
Algorithm 2: Local Balancing Algorithm
Sm: sum of load in group m; Cm: sum of capacity in group m;
Rm: utilization of group m; T: target utilization
Sort all groups gm into a list L in decreasing order based on
utilization Rm;
Calculate T;
Partition L into two sub lists: L1 and L2. L1 is overloaded list
and L2 is underloaded list. ∀gi ∈ L1, ri ≥ T; ∀gj ∈ L2, rj < T;
For each gm in L1 {
Get the information of nodes nk which belong to the
supernode of nk such that nk is a supernode in another
network;
Transfer some load to node nk which belong to group gk, gk
∈ L2;
Set redirection point for the transferred data;
Update finger table;
Delete gm from L1 and update L2;
}
7.3 NETWORK BALANCING The hash function h computes for each node an m-bit
key that consists of two parts, a prefix (p-bit) and a suffix (s-
bit). The prefix determines the level 0 network, and the
suffix determines the level 1 network. Hence, the maximum
number of groups is 2S when s is the number of bits used for
the suffix, and the maximum number of nodes which belong
to a supernode is 2P
when p is the number of bits used for
the prefix.
The number of nodes managed by a supernode needs to
be controlled, since the supernodes are used by the level 0
network to route messages among groups. Thus, it is
necessary to keep the number of nodes neither too large nor
too small. The number of bits, i, to represent the
membership, takes the minimal number of bits from the tail
end of the suffix that is needed to include the members. For
example, Fig. 6(a) shows a small network, such that m = 4
(the hash function produces a sequence of four bits), the
prefix p is 2 bits and the suffix s is 2 bits. Even though the
suffix is 2 bits, in this case only one of these bits is used, as
indicated by i = 1 in the middle of the two groups. The first
group holds all the nodes ending with 1, and the second
group holds all the nodes ending with 0.
Figue 6. (a) One Bit Used to Determine the Membership;
(b) Two Bits Used to Determine the Membership
However, more bits are considered for nodes as the
network grows. That is, the group size is determined by the
maximum number of bits used, but some groups may use
fewer bits.
Algorithm 3: Supernodes Balancing Algorithm
Sm: current supernodes; Sm+1: successors; Sm-1: predecessors;
|Sm|: number of nodes which belong to supernode Sm
When a node joins the network or a node leaves the network;
if |Sm| > 2n
if (|Sm+1| < n or |Sm-1| < n)
some of the nodes can be transferred to its neighbor’s
supernode base on their last i bits;
else if (|Sm+1| > n and |Sm-1| > n)
the group is divided into two groups;
else if |Sm| < n
if (|Sm+1| < n or |Sm-1| < n)
combine the groups to one group, based on their last i
bits;
else if (|Sm+1| > n and |Sm-1| > n)
some of the nodes can be transferred and thus, the
borders between the two groups will be shifted;
update the routing table for Sm and its neighbors
To insert a new node, take the last i bits and find its
supernode as represented by these i bits and check the
number of nodes that belong to the supernode. If there are
fewer than 2i nodes in one group, put the new node in the
group. If there are more than 2i nodes in one group, split the
group into two groups, based on the value of their last (i-1)th
bit. Put nodes whose key has 0 in that bit in one group and
nodes whose key has 1 in that bit in another group. For
example, suppose we insert a new node whose key hashes to
the sequence 1100 into the network in Fig. 6. Since the last
bit is 0, this node belongs to the second group. However, the
group is already full, so it needs to be split. As shown in Fig.
32 (20/120)
52 (30/60)
1
2
3
11 (30/60)
31 (20/80)
21 (60/80)
22 (20/40) 12 (10/40)
42 (20/40) 13 (10/20)
23 (130/160)
4
24 (20/40)
44 (20/40)
40/80 110/220
100/300
140/180 390/780
Level 0
Level 1 Level 1
Level 1
Level 1
Group 3
Group 4
Group 1
Group 2
1001
0010
0101
0110
1010
1011
1000
i = 1
i = 1
1st group
2nd group
(a)
1101
1001
0010
1101 0101
0110
1010
1011
1000
i = 1
i = 2
1st group
2nd group
New node: 1100
i = 2 1100
3rd group
(b)
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
6(b), we first set i = 2 in the second group. The second
group, whose nodes end with 0, needs to be split, so we
partition its nodes into a group those ending with ‘00’ and a
group of those ending with ‘10’.
To delete a node, check the number of nodes belonging
to the supernode. If there are fewer than 2p-1
nodes in the
group, merge the group with another group. Combine the
groups to one group, based on the value of their last (i-1)th
bit; these are groups whose (i-1)th
bit has the same value.
For example, suppose we delete node 1100 from the
network shown in Fig. 6(b). Since there is only one node
left in the third group, it needs to be combined with one of
the other groups. First, check with the other group whose (i-
1)th
bit has the same value as the third group. The second
group whose nodes end with 0 will combine with the first
group. Next, set i = i – 1 in the new group. After combining
the two groups, we get the network as shown in Fig. 6(a).
8. SIMULATIONS To verify the validity of our load balancing algorithm,
we built a simulation framework on which we implemented
a HP2PC system and our load balancing algorithm. We used
the load balancing measurements outlined in section 4.2 in
our simulations. Our simulated system has 103 nodes within
a [0, 212
) identifier space, which form a two-level
hierarchical network. Each node is assigned a capability and
load information. Based on the load and capability
information, we can get the utilization of each node, which
is the primary metric of our load balancing algorithm. Table
2 lists the parameters of our simulated environment and our
load balancing algorithm.
Table 2: Simulated Parameters ID space [0, 215)
Number of nodes 213
Number of layers 2
Max number of nodes in a group 256
Min number of nodes in a group 128
Target utilization 0.5
Offset ±0.1
To analyze our algorithm, we applied three different
strategies to this experimental system:
1. a HP2PC system without load balancing;
2. a HP2PC system with load balancing among leaf
nodes;
3. a HP2PC system with load balancing among
supernodes and leaf nodes.
In the simulation, we used overloaded nodes to assign
and transfer load to other underloaded nodes, since the goal
is load balancing. Therefore, even if a fraction of the nodes
are still underloaded after load balancing, we assume that
the load distributed on the system is fair, that is load
balancing has been achieved. In the simulation, we set the
utilization within a reasonable range, that is, the target
utilization value with the offset of ±p.
In Fig. 7 (a-c), the x- axis represents the utilization of
nodes and the y-axis represents the capacity of nodes. Fig.
7(a) represents the utilization distribution among
heterogeneous nodes before load balancing and the
distribution of dots in the figure is random. Fig. 7(b)
represents the utilization distribution among heterogeneous
nodes after load balancing among leaf nodes (as described
in section 6.2.A). Fig. 7(c) represents the utilization and
capacity distribution among heterogeneous nodes after load
balancing among supernodes and leaf nodes (as described in
section 7.2). As can be seen, Fig. 7(c) shows that the loads
on the nodes are very similar. Since the target utilization is
0.5 and the offset is ±0.1, some nodes are concentrated at
the 40, 50 and 60 percentage areas.
0
5000
10000
15000
20000
25000
30000
35000
0% 20% 40% 60% 80% 100%
No
de
Utilization 7(a)
0
5000
10000
15000
20000
25000
30000
35000
0% 20% 40% 60% 80% 100%
No
de
Utilization 7(b)
0
5000
10000
15000
20000
25000
30000
35000
0% 20% 40% 60% 80% 100%
No
de
Utilization 7(c)
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
Figure 7. Distribution of utilization for every node (a)
before load balancing; (b) after level 1 load balancing; (c)
after both level 0 and level 1 load balancing.
Figue 8. Distribution of utilization for every group (a)
before load balancing; (b) after level 1 load balancing; (c)
after both level 0 and level 1 load balancing
In Fig. 8 (a-c), the x-axis represents the group and the y-
axis represents the utilization of the group in each range.
Fig. 8(a) shows the minimum, maximum and average load
utilization of each group without load balancing. Fig. 8(b)
shows the minimum, maximum and average load utilization
of each group after load balancing within groups (as
described in section 7.1). Fig. 8(c) shows the minimum,
maximum and average load utilization of each group after
load balancing among groups (as described in section 7.2).
Figue 9. Numbers of Nodes in each Utilization Range
In Fig. 9, the x- axis represents the utilization and the y-
axis represents the number of nodes in each utilization
range. The lines shows the distribution before load
balancing; the distribution after load balancing at the level 1
network; and the distribution after load balancing at both
level 1 and level 0 network. These results show that load
balancing improves significantly using our approach.
In our previous study [16] we proposed a load balancing
algorithm for HP2PC. The new approach we developed in
this paper was based on the previous approach but also took
into consideration latency. For the previous approach, we
observed that the mean of load balanced data was 49.75 and
the standard deviation was 8.30. For the new approach, using
the same original data (but without latency), we found that
the mean of load balanced data was same and the standard
deviation was 8.19. Comparing the two approaches, the
standard deviation of the new approach is smaller than the
previous one. This shows that considering latency does
improve the utilization of the network as a whole.
The overhead in load balancing was also measured. We
ran experiments with 1000 nodes and calculated the time
spent on moving files to achieve load balancing by using the
two approaches. Based on the results we observed it took
760 time units for the new approach compared to the
previous approach which took 851 time units. This shows
that the new approach shows a significant improvement of
around 11% on the overall time taken in load balancing
process. Hence our improved approach achieves better load
balancing with less overhead.
9. CONCLUSIONS In this paper, we propose an effective secure load
balancing algorithm to enable global load balance for
HP2PC systems. Security is achieved by a simple moving
0%
20%
40%
60%
80%
100%
1 5 9 13 17 21 25 29 33 37 41 45 49
Uti
lizat
ion
Group 8(a)
MAX AVG MIN
0%
20%
40%
60%
80%
100%
1 5 9 13 17 21 25 29 33 37 41 45 49
Uti
lizat
ion
Group 8(b)
MAX AVG MIN
0%
20%
40%
60%
80%
100%
1 5 9 13 17 21 25 29 33 37 41 45 49
Uti
lizat
ion
Group 8(c)
MAX AVG MIN
0
500
1000
1500
2000
2500
3000
3500
0% 20% 40% 60% 80% 100%
Nu
mb
er o
f N
od
es
Utiliazation
before LB after level 1 LB after both level 0 and level 1 LB
International Journal of Cloud Computing (ISSN 2326-7550) Vol. X, No. Y, Month Year
target approach with hash of information to hide location of
files, yet without modifying the routing tables. To ensure
load balancing in the moving target scheme, the first step is
to ensure fair load distribution among nodes within the same
supernodes, followed by fair load distribution among
supernodes. We also propose two schemes to balance the
network. Our simulation results show that our algorithm is
effective in achieving load balancing in HP2PC systems.
Compared to previous work, we achieve better load
balancing with less overhead. We focus on a 2-level P2P
cloud network in this paper; however, our approach can be
easily applied to a multi-level P2P cloud network.
A number of potential improvements to our algorithm
deserve further study. First, we use storage as a load factor
in this paper. However, a distributed computing system may
be constrained with other parameters besides storage, such
as CPU and bandwidth. Another problem is how to
determine the optimum level of hierarchy under a given set
of assumptions for the HP2PC network. It would be
interesting to determine the number levels of hierarchical
levels needed to balance the load. A simulation on a larger,
more realistic HP2PC is needed. Finally the moving target
scheme is primitive as it stands and is worthy of further
analysis and study.
10. REFERENCES [1] S Rieche, L Petrtak and K Wehrle, “A Thermal –Dissipation-based
Approach for Balancing Data Load in Distributed Hash Tables”, Proceedings 29th IEEE International Conference on Local Computer Networks, 2004
[2] J Byers, J Considine and M Mitzenmacher, “Simple Load Balancing for Distributed Hash Tables”, PeertoPeer Systems II, Volume 2735, Pages 80-88, 2002
[3] [L. Garc´es-Erice1, E.W. Biersack1, P.A. Felber1, K.W. Ross2, and G. Urvoy-Keller1, “Hierarchical Peer-to-Peer Systems”, Parallel Processing Letters, Volume 13, Issue 4, Pages 643-657, 2003
[4] I. Stoica, R. Morris, D. Karger, M. Kaashoek, and H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” IEEE/ACM Transactions on Networking, Volume 11, Issue 1, Pages 17-32, 2003.
[5] K. Aberer, A. Datta, and M. Hauswirth, “Multifaceted Simultaneous Load Balancing in DHT-Based P2P Systems: A New Game with Old Balls and Bins,” Science, Issue 5005, Pages 373-391, 2005.
[6] Stefan Zoels, Zoran Despotovic, and Wolfgang Kellerer, “Load balancing in a hierarchical DHT-based P2P system”, Proceedings of the 2007 International Conference, 2007
[7] David Karger, Eric Lehman, Tom Leighton, Matthew Levine, Daniel Lewin, and Rina Panigrahy, “Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web”, Proc. 29th ACM Symposium on Theory of Computing, 1997
[8] Marc Sanchez Artigas, Pedro Garcia Lopez, Jordi Pujol Ahullo, Antonio Gomez Skameta, “Cyclone: a Novel Design Schema for Hierarchical DHTs”, Fifth IEEE International Conference on Peer-to-Peer Computing, 2005
[9] C. Wang, Sherman S.-M. Chow, Q. Wang, K. Ren, W. Lou, “Privacy preserving public auditing for secure cloud storage Proceedings of the 29th conference on Information communications, March 2010.
[10] M. Dijk and A. Juels, “On the Impossibility of Cryptography Alone for Privacy-Preserving Cloud Computing”, Proceedings of the 5th USENIX conference on Hot topics in security, 2010.
[11] K Zeng, "Publicly verifiable remote data integrity”, Proceedings of the 10th International Conference on Information and Communications Security, 2008.
[12] Cong Wang, Qian Wang, Kui Ren, and Wenjing Lou, "Ensuring Data Storage Security in Cloud Computing," Proceedings of the 17th International Workshop on Quality of Service.2009.
[13] L. Lamport, “Password Authentication with Insecure Communication”, Communications of the ACM, Vol 24, No. 11, pp 770-772, November 1981
[14] T Condie, V Kacholia, S Sankararaman, J M Hellerstein ad P Manitatis, “Induced Churn as Shelter from Routing-Table Poisoning”, Proc. 13th Annual Network and Distributed System Security Symposium (NDSS), 2006
[15] F Dabek, M. F Kaashoek, D Karger, R Morris, and I Stoica, “Wide-area cooperative storage with CFS”, Proc. 18th ACM Symposium on Operating Systems Principles (SOSP), 2001.
[16] H Liu, J Thomas, and P Khethavath, “Load balancing with moving target in P2P Cloud”, IEEE 6th International Conference on Cloud Computing, 2013.
Authors
Hong Liu obtained her B.S in Computer
Science and Technology from the
Northeastern University in China, M.S in
Computer Science from Oklahoma State
University. She is currently a PhD student
in Computer Science at Oklahoma State
University. Her research interests include Cloud Computing,
Big data and Peer-to-Peer networks.
Johnson P Thomas obtained his B.Sc in
Electrical Engineering from the University
of Wales, M.Sc in Electrical Engineering
and Computer Science from the University
of Edinburgh, Scotland and PhD in
Computer Science from the University of
Reading. England. He is currently an Associate Professor of
Computer Science at Oklahoma State University. His
research interests include Cloud Computing, Computer
Security and Sensor Networks. He serves as an Associate
Editor for the Wiley Security and Communications
Networks Journal.
PraveenKumar Khethavath obtained his
B.E. from Chaitanya Bharathi Institute of
Technology, Osmania University in India.
He is currently a PhD student at Oklahoma
State University. His research interests
include Cloud Computing, Security and
privacy in mobile networks and health care, Big data,
wireless sensor networks and Peer-to-peer networks.