Efficient Optimization to a Distributed Cloud Storage...

Efficient Optimization to a Distributed Cloud Storage System

PROJECT PROPOSAL

COEN 241 – Cloud Computing

Summer 2015

Ganesh Kamath Rahul Kilambi Ronald Bayross

Table of Contents Page#

1. Introduction ………………………………………………………… 5

1.1 Objective ………………………………………………………… 5 1.2 Problem statement ……………………………………….... 5 1.3 Why this is a project related to this class ……….... 5 1.4 Why other approach is no good ……………………. 5 1.5 Why our approach is better ………………………………. 5 1.6 Statement of the problem ………………………………. 6 1.7 Area or scope of investigation ………………………………. 6

2. Theoretical bases and literature review ………………….. 7

2.1 Definition of the problem ………………………………. 7 2.2 Theoretical background of the problem …………. 7 2.3 Related research to solve the problem …………. 7 2.4 Advantage/Disadvantage of previous research ………. 8 2.5 Our solution to solve this problem ……………………. 9 2.6 Why our solution is better ………………………………. 9

3. Hypotheses ………………………………………………………… 10

4. Methodology ………………………………………………………… 11

4.1 How to generate/collect input data ………………….... 11 4.2 How to solve the problem …………………………….... 11

4.2.1 Algorithm design ……………………………… 11 4.2.1.1 Shadow Master ………………….... 11 4.2.1.1 Load Balancing ………………….... 11 4.2.1.1 Daisy Chaining ………………….... 11

4.2.1.1 Merkle trees ………………….... 11 4.2.2 Language used ……………………………… 12 4.2.3 Tools used ……………………………… 12

5. Implementation ………………………………………………………… 13 5.1 Simulation …………………....……………………………… 13 5.2 Code Snippets ………....………………………………………… 13

6. Data analysis and discussion …………………………………… 36 6.1 Output generation ……....……………………………… 36 6.2 Output analysis and comparison ………………… 40

7. Conclusion and recommendations ……………………………. 41 7.1 Summary ……....…………………………………………… 41 7.2 Conclusion ……....…………………………………………… 41

8. Bibliography …………………………………………………………. 42

List of Tables:

1. Comparison of results ……………………………………………………………… 40

1. Introduction

1.1 Objective

To implement a highly fault tolerant distributed file system with efficient utilization of resources.

1.2 What is the problem

A distributed file system supports sharing of files and resources in the form of persistent storage over a network. Large data intensive applications generate files that are usually huge (>64MB). Maintaining and managing such huge files and data processing demands was a challenge with the existing file systems.

1.3 Why this is a project related the this class

This project deals with the following:

Distributed storage across multiple nodes Maintaining redundant copies of the file for higher availability and scalability Load balancing to avoid hot spots and balanced disk space utilization. Fault tolerant system able to withstand node crashes

1.4 Why other approach is no good

The other approach does not deal with data transfer in an efficient way. It simply forwards the data from the client to the proxy server and from there to the data nodes. It does not consider the need to minimize the latency time for the client. Also, handling corrupt files or missing partitions was done only when the file was requested by the client and not periodically.

1.5 Why you think your approach is better

In our approach we implement daisy chaining that is making one node the master of other replicas. With this approach, we make sure that all nodes are synchronized and up to date. When dealing with the corruption or missing files, a daemon process running at the node uses Merkle trees to maintain consistency. The load balancer makes sure that all disks are utilized equally.

1.6 Statement of the problem

A distributed file system supports sharing of files and resources in the form of persistent storage over a network. Large data intensive applications generate files that are usually huge (>64MB). Maintaining and managing such huge files and data processing demands was a challenge with the existing file systems.

1.7 Area or scope of investigation

In this project we are implementing a Cloud based Storage System which is Scalable, Available and implements the fault tolerant:

Cloud Based Storage Merkle Trees for maintaining Consistency Load Balancing Recovery Strategy

2. Theoretical bases and literature review

2.1 Definition of the problem

Maintaining and managing such huge files and data processing demands was a challenge with the existing file systems.

2.2 Theoretical background of the problem

Cloud storage is a model of data storage where the digital data is stored in logical pools, the physical storage spans multiple servers (and often locations), and the physical environment is typically owned and managed by a Master. The storage system is comprised of commodity hardware which are often expected to fail. So the storage system must be capable of dealing with the faults, i.e., dealing with files gone wrong on deletion. When a disk fails a new disk has to be added and the system should be capable of replacing the old data in the new disk. In order to make the system available and fault tolerant we replicate the data across multiple servers.

In the Storage, efficient utilization of the disk and network is important to achieve a low latency system. To fully utilize each machine’s network bandwidth, the data is pushed linearly along a chain of store machines rather than distributed in some other topology (e.g., tree). Thus, each machine’s full outbound bandwidth is used to transfer the data as fast as possible rather than divided among multiple recipients. We minimize latency by pipelining the data transfer over TCP connections. Once a store machines receives some data, it starts forwarding immediately.

A daisy chain is an interconnection of computer devices, peripherals, or network nodes in series, one after another. The data is sent linearly along the chain of store machines. This is done to utilize the full bandwidth of each machine. The data is sent from the proxy server to the primary server. The primary takes care of sending the data to the remaining replicas. The main advantage of the daisy chain is its simplicity.

In computing, load balancing distributes workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy.

2.3 Related research to solve the problem

Google File System implemented the storage system with commodity hardware and made it Scalable, Available and Fault-Tolerant. GFS implements the system based on a master slave model. There is one Master and multiple chunk servers which act as the storage

https://en.wikipedia.org/wiki/Computing

https://en.wikipedia.org/wiki/Workload

https://en.wikipedia.org/wiki/Computer_cluster

https://en.wikipedia.org/wiki/Computer_network

https://en.wikipedia.org/wiki/Central_processing_unit

https://en.wikipedia.org/wiki/Disk_drives

https://en.wikipedia.org/wiki/Disk_drives

https://en.wikipedia.org/wiki/Throughput

https://en.wikipedia.org/wiki/Redundancy_(engineering)

machines. A client can request for any file operations like create, delete, open, write, or append a file. Apart from these common operations GFS also implements the atomic-append operation and snapshot. GFS modified the basic distributed file system by making it compatible for large files rather than small files and gives support for both random and large streaming reads. GFS implements relaxed consistency model.

Amazon dynamo, a highly available, is a key-value store that is built to be an always writable system. To achieve the highly availability dynamo sacrifices immediate consistency and practices eventual consistent model. Unlike other distributed file systems, dynamo implements the consistent at the time of writing. It uses consistent hashing for incremental Scalability, in normal hashing models when a new disk is added and a hashing space is allotted to it. Based on the newly allocated hash boundaries all the partitions are sent now adjusted to the new disk. This has unnecessary moving of data partitions from one disk to others. In the consistency hashing model, a ring based model is maintained and only redundant movements are made from old disks to the new disk. Dynamo handles permanent failures by using Merkle trees or hash tree. A checksum was calculated based on a hash algorithm and a hash tree was built between replicas containing common files. The tree was evaluated level by level until the source of the inconsistency was found and replaced.

2.4 Advantage/disadvantage of those research

With the GFS research, they were able to utilize the disk and Network to the full extent. They minimized latency while storing or retrieving the file. GFS also changes the conventional block size of 4KB to 64MB which made the system very much efficient for large reads and writes. But, with the increase in the Block size they gave up the compatibility for small files, adding the padding when needed which wasted a lots of size.

Using Consistent Hashing for file mark and exchange lowers the number of unnecessary data exchanges resulting efficient usage of response time or network bandwidth. Merkle trees are highly efficient as they require no actual transfer between the nodes when the integrity between the nodes are being verified, no actual data exchange takes place and only Merkle root hash value exchange takes place and verified. This results in only those partitions that are corrupted or missing being sent to the faulty machine.

2.5 Our solution to solve this problem Our solution consists of the following:

Shadow Master:

The Master will log each and every operation in the system to persistent storage. A daemon will persistently poll and track the master to check whether it is operational. In case the master fails the shadow master will regenerate the server state before the crash and resume operation as the new master. In this way the operation of the system is guaranteed even under failure conditions.

Storage Load Balancing: Maintain the current storage levels of each disk. Select pertinent disks to effectively balance the load uniformly when an upload is done.

Consistency using Merkle trees: Merkle trees help us check the integrity of data in a manner which involves minimum bandwidth utilization and is highly efficient. Hash values are calculated at the base of a binary tree whose leaves represent the file partitions. These hash values are then repeatedly hashed 2 at a time until only one value at the apex of the tree remains. This tree is generated for both copies and then compared. If the value is the same then there is no need to exchange any more data. If they mismatch then the children in the tree are recursively checked to ultimately find the defective leaves (partitions)

Daisy chaining, pipelining, distance metrics: Find the closest disk to the proxy server among the 2 disks chosen and transfer replica 1 to that. The data is transferred linearly to the other replica. We calculate distance metrics based on turnaround time using average ping duration.

2.6 Why our solution is better

We have utilized the most viable solutions from multiple papers that comprise the best solutions for our current problem at hand. For example although GFS is robust, it does not use concepts such as Merkle trees.

3. Hypotheses (or goals)

We are using a dynamic load balancing scheme to decide where the replicas will be stored to balance the load uniformly across all the data nodes. Use Merkle trees to detect missing and or corrupted file partitions efficiently. Use of daisy chaining and pipelining to transfer the file contents to or from the client to data nodes directly.

4. Methodology

4.1 How to generate/collect input data

The input data comprises of multiple binary files in various formats and sizes. The files vary from a size of a few KB to a couple of GB.

4.2 How to solve the problem

4.2.1 Algorithm design

4.2.1.1 Shadow Master

The Master will log each and every operation in the system to persistent storage. A daemon will persistently poll and track the master to check whether it is operational. In case the master fails the shadow master will regenerate the server state before the crash and resume operation as the new master. In this way the operation of the system is guaranteed even under failure conditions.

4.2.1.2 Load Balancing

For this, we plan to do two things:

Maintain the current storage levels of each disk Select pertinent disks to effectively balance the load uniformly

4.2.1.3 Daisy Chaining

Find the closest disk to the proxy server among the two disks chosen and transfer replica 1 to that.

The data is transferred linearly to the other replica

We calculate distance metrics based on turnaround time using average ping duration

4.2.1.4 Merkle trees

Merkle trees help us check the integrity of data in a manner which involves minimum bandwidth utilization and is highly efficient. Hash values are calculated at the base of a binary tree whose leaves represent the file partitions. These hash values are then repeatedly hashed 2 at a time until only one value at the apex of the tree remains. This tree is generated for both copies and then compared. If the value is the same then there is no need to exchange any more data. If they mismatch then the children in the tree are

recursively checked to ultimately find the defective leaves (partitions). These partitions are then replaced with fault free copies from the replicas.

4.2.2 Language used

Java

4.2.3 Tools used

Java SDK7, NetBeans and Eclipse IDE.

5. Implementation

5.1 Simulation

The following tests were performed:

Upload a file to the disk stores Corrupt a file partition Fail the master server

5.2 Code Snippets

Our program is a multithreaded Java application.

5.2.1 Code to find least used store machine

//Get current disk situation from memory and owner maps //Create list for size lookup and store the values in it... List<Long> machine_mem_consumed = new ArrayList<Long>(); for(int i=0;i<machine_addresses.size();i++) { long memcount = 0; for(int j=0;j<memory_tracker.memory_map.length;j++) { if((memory_tracker.owner_map[j] == machine_addresses.get(i).machine_id) && (memory_tracker.memory_map[j] == 1)) { memcount++; } } machine_mem_consumed.add(memcount); } //Again loop through every machine and check our memory compared to others for(int i=0;i<machine_addresses.size();i++) { long selected_mem = machine_mem_consumed.get((int)storage_machine)==0 ? 1 : machine_mem_consumed.get((int)storage_machine); long candidate_mem = machine_mem_consumed.get(i)==0 ? 1 : machine_mem_consumed.get(i);

//If a disk has more free space than 20 % of current selected disk if(((selected_mem - candidate_mem) / selected_mem) > 0.2) { //Find first free partition in this disk for(int j=0;j<memory_tracker.memory_map.length;j++) { if((memory_tracker.owner_map[j] == machine_addresses.get(i).machine_id) && (memory_tracker.memory_map[j] == 0)) { storage_partition = j; break; } } storage_machine = i; break; } } 5.2.2 Code to implement secondary proxy server

import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.FileReader; import java.io.IOException; import java.math.BigInteger; import java.net.UnknownHostException; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Scanner; public class ShadowServer{ private String remote_serveripAddress = ""; private int remote_serverport = 0; int init_start_flg = 0; ShadowServerSocketOperations socketoperations = new ShadowServerSocketOperations(); //Function to check connect command using regular expressions public int checkConnectInputPattern(String inputstr) { //Store all legal String[] pattern_array = new String[10]; pattern_array[0] = "shadowserver (((25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9]).){3,3}(25[0-5]|2[0-4][0-9]|[01]?[0-9]?[0-9])) [0-9]+";

pattern_array[1] = "shadowserver .+ [0-9]+"; //If input pattern format is correct if((inputstr.matches(pattern_array[0])) || (inputstr.matches(pattern_array[1]))) { try { //Extract parameters String connection_parameters[]; connection_parameters = inputstr.split(" "); //Establish the connection socketoperations.connect(connection_parameters[1],Integer.parseInt(connection_parameters[2])); //If no exceptions, Store server details... remote_serveripAddress = connection_parameters[1]; remote_serverport = Integer.parseInt(connection_parameters[2]); //Write test message socketoperations.writeMessage("TEST"); //Disconnect socketoperations.disconnect(); //Approved System.out.println("Server Operational. Connection successful."); return 1; }catch (UnknownHostException e) { //Rejected System.out.println("Unknown Host. Please enter valid host."); return 0; } catch (IOException e) { System.out.println("Cannot establish connection. Server connection failed or does not exist."); return 0; } } else { //Rejected System.out.println("Please enter valid command to initiate connection"); return 0; } } //Function to check input using regular expressions public int CheckMainServerAlive(int wait_time,int attempt_count) throws InterruptedException, UnknownHostException, IOException { int conn_try_counter=0;

//Connect to remote server while(true){ try{ //Establish the connection socketoperations.connect(remote_serveripAddress,remote_serverport); break; } catch (UnknownHostException e) { //Connection failed conn_try_counter++; Thread.sleep(wait_time); System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF " + attempt_count + " TIMES"); if(conn_try_counter >= attempt_count){ return 0; } } catch (IOException e) { //Connection failed conn_try_counter++; Thread.sleep(wait_time); System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF " + attempt_count + " TIMES"); if(conn_try_counter >= attempt_count){ return 0; } } } //Write test message socketoperations.writeMessage("TEST"); //Disconnect socketoperations.disconnect(); return 1; } public static void main(String arg[]) { try { //Set file path where files reside String filepath = "/tmp/"; String command_parts[]; //Create client object ShadowServer shadowserver = new ShadowServer();

String in_conn_str = ""; while(true){ //Check if connection command is already issued. if(in_conn_str == "") { //Get connection command. Loop till correct. Scanner in_conn = new Scanner(System.in); System.out.println("Please Enter Connection Command"); in_conn_str = in_conn.nextLine(); } if(shadowserver.checkConnectInputPattern(in_conn_str)==0){ in_conn_str = ""; continue; } //Clear the connection command in_conn_str = ""; //Continuously loop and check server while(true) { //CONNECT TO MAIN SERVER EVERY 10 SEC AND CHECK IF ALIVE int server_response=shadowserver.CheckMainServerAlive(2000,3); //int server_response=0; if(server_response==1) { Thread.sleep(3000); continue; } //IF SERVER FAILED... if(server_response==0) { //Create server object Server server = new Server(); //RECONSTRUCT ALL THE MEMORY STRUCTURES. Read and parse the physical file... String connectionString=""; //GENERAL DETAILS File logFile_General = new File(filepath + "log_GeneralDetails.txt"); try (BufferedReader br = new BufferedReader(new FileReader(logFile_General))) { String line; while ((line = br.readLine()) != null) { //Process the line.

command_parts = line.split(":"); server.server_partition_power = Integer.parseInt(command_parts[1]); } } connectionString = "server"; connectionString = connectionString + " " + server.server_partition_power; //MEMORY ADDRESSES File logFile_MAddresses = new File(filepath + "log_MAddresses.txt"); try (BufferedReader br = new BufferedReader(new FileReader(logFile_MAddresses))) { String line; while ((line = br.readLine()) != null) { //Process the line. command_parts = line.split(";"); //ID String[] command_parts_ID = command_parts[0].split(":"); //IP String[] command_parts_IP = command_parts[1].split(":"); connectionString = connectionString + " " + command_parts_IP[1]; //Port String[] command_parts_Port = command_parts[2].split(":"); connectionString = connectionString + " " + command_parts_Port[1]; Data_Stores ds = new Data_Stores(command_parts_IP[1],Integer.parseInt(command_parts_Port[1]),Integer.parseInt(command_parts_ID[1])); server.machine_addresses.add(ds); } } //MEMORY TRACKER File logFile_MTracker_1 = new File(filepath + "log_MTracker_1.txt"); try (BufferedReader br = new BufferedReader(new FileReader(logFile_MTracker_1))) { String line; while ((line = br.readLine()) != null) { //Process the line. command_parts = line.split(":"); server.memory_tracker.memory_counter = Integer.parseInt(command_parts[1]); } } server.memory_tracker.memory_map = new int[(int) Math.pow(2,server.server_partition_power) + 1]; FileInputStream fileInput1 = new FileInputStream(filepath + "log_MTracker_2.txt");

int r1;int mcounter1=0; while ((r1 = fileInput1.read()) != -1) { char c = (char) r1; server.memory_tracker.memory_map[mcounter1] = Character.getNumericValue(c); mcounter1++; } fileInput1.close(); server.memory_tracker.owner_map = new int[(int) Math.pow(2,server.server_partition_power) + 1]; FileInputStream fileInput2 = new FileInputStream(filepath + "log_MTracker_3.txt"); int r2;int mcounter2=0; while ((r2 = fileInput2.read()) != -1) { char c = (char) r2; server.memory_tracker.owner_map[mcounter2] = Character.getNumericValue(c); mcounter2++; } fileInput2.close(); //MEMORY TABLE String username = ""; String filename = ""; List<List<temporary_shard_object>> temporary_replica_list = new ArrayList<List<temporary_shard_object>>(); List<temporary_shard_object> temporary_shard_list = new ArrayList<temporary_shard_object>(); List<List<BigInteger>> ordered_replica_Checksum_list = new ArrayList<List<BigInteger>>(); List<BigInteger> ordered_shard_Checksum_list = new ArrayList<BigInteger>(); File logFile_MTable = new File(filepath + "log_MTable.txt"); try (BufferedReader br = new BufferedReader(new FileReader(logFile_MTable))) { String line; while ((line = br.readLine()) != null) { if(!line.equals("::::") && !line.equals("::;::")){ //Process the line. command_parts = line.split(";"); //USERNAME String[] command_parts_USERNAME = command_parts[0].split(":"); username = command_parts_USERNAME[1]; //FILENAME String[] command_parts_FILENAME = command_parts[1].split(":"); filename = command_parts_FILENAME[1];

//REPLICANO String[] command_parts_REPLICANO = command_parts[2].split(":"); int replicaindex = Integer.parseInt(command_parts_REPLICANO[1]); //SHARDNO String[] command_parts_SHARDNO = command_parts[3].split(":"); int shardindex = Integer.parseInt(command_parts_SHARDNO[1]); //IP String[] command_parts_IP = command_parts[4].split(":"); String IPAddress = command_parts_IP[1]; //PARTNO String[] command_parts_PARTNO = command_parts[5].split(":"); long PartitionNo = Long.parseLong(command_parts_PARTNO[1]); //PARTCOUNTREQ String[] command_parts_PARTCOUNTREQ = command_parts[6].split(":"); long PartitionCountReq = Long.parseLong(command_parts_PARTCOUNTREQ[1]); //SHARDSERIALNO String[] command_parts_SHARDSERIALNO = command_parts[7].split(":"); long ShardSerialNo = Long.parseLong(command_parts_SHARDSERIALNO[1]); //CHECKSUM String[] command_parts_CHECKSUM = command_parts[8].split(":"); BigInteger Checksum = new BigInteger(command_parts_CHECKSUM[1]); temporary_shard_object trdo; trdo = new temporary_shard_object(GetMachineIDByIPAddress(server,IPAddress),PartitionNo,PartitionCountReq,ShardSerialNo); temporary_shard_list.add(trdo); //Add Checksum shard list to Checksum replica list ordered_shard_Checksum_list.add(Checksum); } //If replica marker reached, then replica over if(line.equals("::::")){ //Take the shard list you had created till now and put it in the replica list. Same for checksum temporary_replica_list.add(temporary_shard_list); ordered_replica_Checksum_list.add(ordered_shard_Checksum_list); //Create New Lists temporary_shard_list = new ArrayList<temporary_shard_object>(); ordered_shard_Checksum_list = new ArrayList<BigInteger>();

continue; } //If file marker reached, then file over. Dump file into Memory Table if(line.equals("::;::")){ BigInteger storage_partition_shard_checksum; if(true) { //NOW THAT BOTH PHYSICAL AND THEORETICAL IN PLACE, COMMIT ALL MAPPINGS TO MEMORY TABLES........ /*//First convert VIRTUAL EXECUTION results to actual. memory_tracker.memory_counter = temp_memory_tracker.memory_counter; memory_tracker.memory_map = temp_memory_tracker.memory_map; memory_tracker.owner_map = temp_memory_tracker.owner_map;*/ //For each replica for(int replica_index=0;replica_index<temporary_replica_list.size();replica_index++) { //Store each replicas details in all our main tables and controls----------------- //Add file to list of shards User_File_shards ufshards = new User_File_shards(); ufshards.file_shards = new ArrayList<User_File>(); User_File_replicas ufreplicas = new User_File_replicas(); ufreplicas.file_replicas = new ArrayList<User_File_shards>(); //First check if user exists in our map. If not create the user in map. if(server.Mapping_table.containsKey(username) && ((User_AllFiles) server.Mapping_table.get(username)).Files_Mapping_table.containsKey(filename)) { //User Exists. File exists. Add replicas to user file //For each replica shard for(int shard_index=0;shard_index<temporary_replica_list.get(replica_index).size();shard_index++) { long storage_machine = temporary_replica_list.get(replica_index).get(shard_index).replica_machine_indexno; long storage_partition = temporary_replica_list.get(replica_index).get(shard_index).replica_partition_no; long storage_partition_count_required = temporary_replica_list.get(replica_index).get(shard_index).replica_partition_count_required; long storage_partition_shard_serial_no = temporary_replica_list.get(replica_index).get(shard_index).replica_shard_serial_no;

storage_partition_shard_checksum = ordered_replica_Checksum_list.get(replica_index).get(shard_index); //Construct File User_File uf = new User_File(GetMachineIPAddressStrByMachineID(server,(int)storage_machine),storage_partition,storage_partition_count_required,storage_partition_shard_serial_no,storage_partition_shard_checksum); ufshards.file_shards.add(uf); } server.Mapping_table.get(username).Files_Mapping_table.get(filename).file_replicas.add(ufshards); } else if(server.Mapping_table.containsKey(username) && !((User_AllFiles) server.Mapping_table.get(username)).Files_Mapping_table.containsKey(filename)) { //User Exists. File does not exist. Add file to user files //For each replica shard for(int shard_index=0;shard_index<temporary_replica_list.get(replica_index).size();shard_index++) { long storage_machine = temporary_replica_list.get(replica_index).get(shard_index).replica_machine_indexno; long storage_partition = temporary_replica_list.get(replica_index).get(shard_index).replica_partition_no; long storage_partition_count_required = temporary_replica_list.get(replica_index).get(shard_index).replica_partition_count_required; long storage_partition_shard_serial_no = temporary_replica_list.get(replica_index).get(shard_index).replica_shard_serial_no; storage_partition_shard_checksum = ordered_replica_Checksum_list.get(replica_index).get(shard_index); //Construct File User_File uf = new User_File(GetMachineIPAddressStrByMachineID(server,(int)storage_machine),storage_partition,storage_partition_count_required,storage_partition_shard_serial_no,storage_partition_shard_checksum); ufshards.file_shards.add(uf); } //Add shards list to list of replicas ufreplicas.file_replicas.add(ufshards); //Now add the user file to the Fileholder

server.Mapping_table.get(username).Files_Mapping_table.put(filename, ufreplicas); } else { //User does not Exist. File does not Exist. Add user, files, replicas //For each replica shard for(int shard_index=0;shard_index<temporary_replica_list.get(replica_index).size();shard_index++) { long storage_machine = temporary_replica_list.get(replica_index).get(shard_index).replica_machine_indexno; long storage_partition = temporary_replica_list.get(replica_index).get(shard_index).replica_partition_no; long storage_partition_count_required = temporary_replica_list.get(replica_index).get(shard_index).replica_partition_count_required; long storage_partition_shard_serial_no = temporary_replica_list.get(replica_index).get(shard_index).replica_shard_serial_no; storage_partition_shard_checksum = ordered_replica_Checksum_list.get(replica_index).get(shard_index); //Construct File User_File uf = new User_File(GetMachineIPAddressStrByMachineID(server,(int)storage_machine),storage_partition,storage_partition_count_required,storage_partition_shard_serial_no,storage_partition_shard_checksum); ufshards.file_shards.add(uf); } //Add shards list to list of replicas ufreplicas.file_replicas.add(ufshards); //Construct Files Holder. Add file list to user User_AllFiles uaf = new User_AllFiles(); uaf.Files_Mapping_table = new HashMap<String, User_File_replicas>(); uaf.Files_Mapping_table.put(filename, ufreplicas); //Add user and his files to main table server.Mapping_table.put(username,uaf); int y = 5; } } //--------------------------------------------PROJECT 2-------------------------------------------- //First prepare metadata for the single file and store on central server

///////(run this on top in advance and store with ever shard. Dont want to add single elegant variable for the file. Will have to convert one of the lists to objects. Too painful to change everywhere. } temporary_replica_list = new ArrayList<List<temporary_shard_object>>(); ordered_replica_Checksum_list = new ArrayList<List<BigInteger>>(); continue; } } } server.main_function(server,connectionString); } } } } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } public static String GetMachineIPAddressStrByMachineID(Server server, int Machine_Index) { //Get Machine IP Address String Machine_IP_Address=""; for (Data_Stores element : server.machine_addresses) { if(element.machine_id == Machine_Index){ Machine_IP_Address = element.machine_ipaddress; } } return Machine_IP_Address; } public static int GetMachineIDByIPAddress(Server server, String IP_Address) { //Get Machine ID int Machine_ID=0; for (Data_Stores element : server.machine_addresses) { if(element.machine_ipaddress.equals(IP_Address)){ Machine_ID = element.machine_id; } } return Machine_ID; } } 5.2.3 Code to use Daisy Chaining and pipelining to transfer the data

//TRANSFER TO ALL REMOTE MACHINES BEFORE COMMITTING TO ANY OF OUR MEMORY TABLES. //Cycle through all our shards and transfer using VIRTUAL INFOMATION //Hold File checksums for each shard

List<List<BigInteger>> ordered_replica_Checksum_list = new ArrayList<List<BigInteger>>(); //For each replica for(int replica_index=0;replica_index < temporary_replica_list.size();replica_index++) { //Create new lists for transfer utility for each replica's shard list List<Integer> ordered_size_list = new ArrayList<Integer>(); List<String> ordered_IPAddress_list = new ArrayList<String>(); List<BigInteger> ordered_shard_Checksum_list = new ArrayList<BigInteger>(); //First order the shards before breaking the actual physical file because stitch order MATTERS. So break order also MATTERS!!! //Since we are breaking and naming shards as per their shard_serial_no, we can directly use this stored shard no in downloads to get the file. //For each shard for(int shard_index=0;shard_index < temporary_replica_list.get(replica_index).size();shard_index++) { //Search for shard in ascending serial no for(int shard_searchindex=0;shard_searchindex < temporary_replica_list.get(replica_index).size();shard_searchindex++) { if(temporary_replica_list.get(replica_index).get(shard_searchindex).replica_shard_serial_no == shard_index) { //Get ONLY REQUIRED particulars storage_machine = temporary_replica_list.get(replica_index).get(shard_index).replica_machine_indexno; storage_partition_count_required = temporary_replica_list.get(replica_index).get(shard_index).replica_partition_count_required; //storage_partition_shard_serial_no = temporary_replica_list.get(replica_index).get(shard_index).replica_shard_serial_no; //Store ordered sizes in list ordered_size_list.add((int) (storage_partition_count_required * no_of_hashvals_per_partition)); ordered_IPAddress_list.add(GetMachineIPAddressStrByMachineID((int)storage_machine)); } } } //DELETE SPLINTERED FILES LEFT BEHIND FROM LAST OPERATION File directory_old = new File(filepath + username + "/"); if(directory_old.exists() && directory_old.isDirectory()){ for(File f_upload: directory_old.listFiles()) if(f_upload.getName().startsWith(filename + "_replica")) f_upload.delete(); } //BREAK FILE. Split the files to prepare for transfer...

File_Splitter_Fn(filepath + username + "/" + filename, replica_index, ordered_size_list, filesize); //TRANSFER FILES //For each shard for(int shard_index=0;shard_index < temporary_replica_list.get(replica_index).size();shard_index++) { String temp_filename = filename + "_replica" + replica_index + ".part" + shard_index; //Transfer Only first replica. Rest will chain... if(replica_index==0) { ServerSocketOperations remotedisksocketoperations=null; int conn_try_counter=0; //Connect to remote server while(true){ try{ //Create new connection object(Keep old one to client Alive) remotedisksocketoperations = new ServerSocketOperations(); remotedisksocketoperations.connect(ordered_IPAddress_list.get(shard_index),GetServerPortNoByIPAddress(ordered_IPAddress_list.get(shard_index))); //,GetServerPortNoByIPAddress(ordered_IPAddress_list.get(shard_index)) break; } catch (UnknownHostException e) { //Connection failed conn_try_counter++; Thread.sleep(5000); System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF 5 TIMES"); if(conn_try_counter >= 5){ socketoperations.writeMessage("Could not connect to store server: " + ordered_IPAddress_list.get(shard_index) + ". Cannot store the file.(Unknown Host)"); total_break_2 = 1; break; } } catch (IOException e) { //Connection failed conn_try_counter++; Thread.sleep(5000); System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF 5 TIMES"); if(conn_try_counter >= 5){ socketoperations.writeMessage("Could not connect to store server: " + ordered_IPAddress_list.get(shard_index) + ". Cannot store the file.(IO Exception)");

total_break_2 = 1; break; } } } //Construct Transmission string String temp_chainstring = ""; for(int rep_index=1;rep_index < temporary_replica_list.size();rep_index++) { String temp_file = filename + "_replica" + rep_index + ".part" + shard_index; String temp_IPAddress = GetMachineIPAddressStrByMachineID((int)temporary_replica_list.get(rep_index).get(shard_index).replica_machine_indexno); int temp_PortNo = GetServerPortNoByIPAddress(GetMachineIPAddressStrByMachineID((int)temporary_replica_list.get(rep_index).get(shard_index).replica_machine_indexno)); temp_chainstring = temp_chainstring + temp_file + ":::" + temp_IPAddress + ":::" + temp_PortNo + ":::"; } //Send command File temp_f = new File(filepath + username + "/" + temp_filename); remotedisksocketoperations.writeMessage("upload:::" + temp_filename + ":::" + temp_f.length() + ":::" + temp_chainstring); //Thread.sleep(500); //UPLOAD file remotedisksocketoperations.writeFile(filepath + username + "/" + temp_filename); //Read result String response = remotedisksocketoperations.readMessage(); //Disconnect remotedisksocketoperations.disconnect(); //Kill entire operation on failure if(response.equals("0")){ total_break_2=1; } } //Calculate checksum of file and store orderly in list BigInteger calculated_checksum = GenerateChecksum(filepath + username + "/" + temp_filename); //BigInteger calculated_checksum = null; ordered_shard_Checksum_list.add(calculated_checksum); } if(total_break_2==1){break;};

//Add Checksum shard list to Checksum replica list ordered_replica_Checksum_list.add(ordered_shard_Checksum_list); } //DELETE SPLINTERED FILES and ACTUAL FILE CREATED IN THIS OPERATION File directory_new = new File(filepath + username + "/"); if(directory_new.exists() && directory_new.isDirectory()){ for(File f_upload: directory_new.listFiles()) if(f_upload.getName().startsWith(filename)) f_upload.delete(); } 5.2.4 Code to check for consistency using modified Merkle trees

import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.IOException; import java.net.UnknownHostException; import java.util.HashMap; import java.util.List; import java.util.Map; public class ServerConsistencySlave extends Thread { int consistency_time_interval; String filepath; String consistency_filepath; String merkel_filename = ""; HashMap<String, String> Store_Machine_Entry_table = new HashMap<String, String>(); ///////////////////////////////////// List<Data_Stores> machine_addresses; HashMap<String, User_AllFiles> Mapping_table; ///////////////////////////////////// ServerConsistencySlave(int p_consistency_time_interval, String p_filepath, HashMap<String, User_AllFiles> p_Mapping_table, List<Data_Stores> p_machine_addresses) { //Set time interval consistency_time_interval = p_consistency_time_interval; //Set filepath filepath = p_filepath; consistency_filepath = p_filepath + "consistency/"; //Set server machine addresses

machine_addresses = p_machine_addresses; //Export mapping table Mapping_table = p_Mapping_table; } public void run() { System.out.println("CONSISTENCY THREAD STARTED. RUNNING EVERY SECONDS"); int busy_flg=0; try { //Keep looping till program runs... while(true) { //WAIT Thread.sleep(consistency_time_interval); //Check if cleanup operation still going on... if(busy_flg==0) { //Set busy flag busy_flg = 1; //Call all store servers for data... for (Data_Stores element : machine_addresses) { merkel_filename = "log_MerkelDetails_" + element.machine_ipaddress + ".txt"; ServerSocketOperations remotedisksocketoperations=null; int total_break_2=0; int conn_try_counter=0; //Connect to remote server while(true) { try{ remotedisksocketoperations = new ServerSocketOperations(); //Create new connection object(Keep old one to client Alive) remotedisksocketoperations = new ServerSocketOperations(); remotedisksocketoperations.connect(element.machine_ipaddress,element.machine_portno); //,GetServerPortNoByIPAddress(ordered_IPAddress_list.get(shard_index)) break; } catch (UnknownHostException e) { //Connection failed conn_try_counter++; Thread.sleep(5000);

System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF 5 TIMES"); if(conn_try_counter >= 5){ total_break_2 = 1; break; } } catch (IOException e) { //Connection failed conn_try_counter++; Thread.sleep(5000); System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF 5 TIMES"); if(conn_try_counter >= 5){ total_break_2 = 1; break; } } } //If connection succeeded if(total_break_2 == 0) { //Write message to send MERKEL data file remotedisksocketoperations.writeMessage("merkel_data" + ":::" + merkel_filename); //Download File remotedisksocketoperations.readFile(filepath + merkel_filename, 0); } } //Feed all the data from ALL machines to hashmap for quick lookup... for (Data_Stores element : machine_addresses) { merkel_filename = "log_MerkelDetails_" + element.machine_ipaddress + ".txt"; int tempPortNo = GetServerPortNoByIPAddress(element.machine_ipaddress); String command_parts[]; File logFile_MerkelFile = new File(filepath + merkel_filename); try (BufferedReader br = new BufferedReader(new FileReader(logFile_MerkelFile))) { String line; while ((line = br.readLine()) != null) { //Process the line. command_parts = line.split(";"); //Filename

String[] command_parts_Filename = command_parts[0].split(":"); //Checksum String[] command_parts_Checksum = command_parts[1].split(":"); Store_Machine_Entry_table.put(command_parts_Filename[1], command_parts_Checksum[1] + ";" + element.machine_ipaddress + ";" + tempPortNo); } } } //Now that ALL data files from ALL machines received. Do MERKEL checks for every file //Loop through users for (Map.Entry<String, User_AllFiles> entryMappingTable : Mapping_table.entrySet()) { String entry_username = entryMappingTable.getKey(); //Loop through files of user for (Map.Entry<String, User_File_replicas> entryFile : entryMappingTable.getValue().Files_Mapping_table.entrySet()) { String entry_filename = entryFile.getKey(); //Loop through all replicas of a file for(int replica_index=0;replica_index<entryFile.getValue().file_replicas.size();replica_index++) { //Loop through all shards of a replica for(int shard_index=0;shard_index<entryFile.getValue().file_replicas.get(replica_index).file_shards.size();shard_index++) { String original_tempfilename = entry_filename + "_replica" + replica_index + ".part" + entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).shard_serial_no; //Lookup Hashtable for checksum String command_parts[] = Store_Machine_Entry_table.get(original_tempfilename).split(";"); String original_tempchecksum = command_parts[0]; String original_IP = command_parts[1]; String original_Port = command_parts[2]; //trans_str = "username:" + entry_username + ";" + "filename:" + entry_filename + ";" + "replicano:" + replica_index + ";" + "shardno:" + shard_index + ";" + "MachIPAddress:" + entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).machine_ipaddress + ";" + "PartitionNo:" + entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).partition_no + ";" + "PartitionCountRequired:" + entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).partition_count_requir

ed + ";" + "ShardSerialNo:" + entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).shard_serial_no + ";" + "CalculatedChecksum:" + entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).calculated_checksum + ";"; if(original_tempchecksum.equals(entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).calculated_checksum.toString())) { //Do nothing. Chunk is consistent int x = 1; } else { int found_flg = 0; String alternate_tempfilename=""; String alternate_tempchecksum = ""; String alternate_IP = ""; String alternate_Port = ""; //Find coherent partition for(int temp_replica_index=0;temp_replica_index<entryFile.getValue().file_replicas.size();temp_replica_index++) { alternate_tempfilename = entry_filename + "_replica" + temp_replica_index + ".part" + entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).shard_serial_no; //Lookup Hashtable for checksum command_parts = Store_Machine_Entry_table.get(alternate_tempfilename).split(";"); alternate_tempchecksum = command_parts[0]; alternate_IP = command_parts[1]; alternate_Port = command_parts[2]; if(alternate_tempchecksum.equals(entryFile.getValue().file_replicas.get(replica_index).file_shards.get(shard_index).calculated_checksum.toString())) { //Found first good replica. Transfer file from here found_flg = 1; break; } } //Create the directory if it does not exist File newDir = new File(filepath + "consistency"); //If the directory does not exist, create it

if (!newDir.exists()) { newDir.mkdir(); } if(found_flg==1){ //Rectify the file TransferShardToStoreMachine("delete",original_IP,Integer.parseInt(original_Port),filepath + original_tempfilename,original_tempfilename,0,1); TransferShardToStoreMachine("download",alternate_IP,Integer.parseInt(alternate_Port),consistency_filepath + alternate_tempfilename,alternate_tempfilename,0,1); //Rename file for original machine and then upload it File file1 = new File(consistency_filepath + alternate_tempfilename); File file2 = new File(consistency_filepath + original_tempfilename); boolean success = file1.renameTo(file2); TransferShardToStoreMachine("upload",original_IP,Integer.parseInt(original_Port),consistency_filepath + original_tempfilename,original_tempfilename,0,1); file1 = new File(consistency_filepath + original_tempfilename); file1.delete(); } else{ System.out.println("MESSAGE FROM CONSISTENCY THREAD: ALL COPIES HAVE FAILED."); } } } } } } //Now that process complete busy_flg = 0; } } } catch (Exception e) { e.printStackTrace(); } } public void TransferShardToStoreMachine(String operation,String IPAddress,int PortNo, String total_filepath, String total_filename,int filesize, int called_by) throws IOException, InterruptedException { //TRANSFER FILES

System.out.println("CALLED BY :" + called_by + ":::" + operation + ":::" + IPAddress + "/" + PortNo + ":::" + total_filepath + ":::" + total_filename + ":::" + filesize); ServerSocketOperations remotedisksocketoperations=null; int conn_try_counter=0; //Connect to remote server while(true){ try{ //Create new connection object(Keep old one to client Alive) remotedisksocketoperations = new ServerSocketOperations(); remotedisksocketoperations.connect(IPAddress,PortNo); break; } catch (UnknownHostException e) { //Connection failed conn_try_counter++; Thread.sleep(5000); System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF 5 TIMES"); if(conn_try_counter >= 5){ //socketoperations.writeMessage("Could not connect to store server: " + IPAddress + ".(Unknown Host)"); break; } } catch (IOException e) { //Connection failed conn_try_counter++; Thread.sleep(5000); System.out.println("RETRYING CONNECTION " + conn_try_counter + " OF 5 TIMES"); if(conn_try_counter >= 5){ //socketoperations.writeMessage("Could not connect to store server: " + IPAddress + ".(IO Exception)"); break; } } } if(operation.equals("download")){ //Send command remotedisksocketoperations.writeMessage("download:::" + total_filename + ":::" + "NA"); //Download incoming file System.out.println("Downloading File From Old Remote Machine:" + IPAddress+ "/" + PortNo + "... and file:" + total_filepath); remotedisksocketoperations.readFile(total_filepath, 0); }

if(operation.equals("upload")){ //Send command remotedisksocketoperations.writeMessage("upload:::" + total_filename + ":::" + "NA"); //UPLOAD file System.out.println("Uploading File To New Remote Machine:" + IPAddress+ "/" + PortNo + "... and file:" + total_filepath); remotedisksocketoperations.writeFile(total_filepath); } if(operation.equals("delete")){ //Send command remotedisksocketoperations.writeMessage("delete:::" + total_filename); System.out.println("Deleting File From Old Remote Machine:" + IPAddress+ "/" + PortNo + "... and file:" + total_filepath); } //Disconnect remotedisksocketoperations.disconnect(); } public int GetServerPortNoByIPAddress(String IP_Address) { //Get Machine ID int Machine_PortNo=-1; for (Data_Stores element : machine_addresses) { if(element.machine_ipaddress.equals(IP_Address)){ Machine_PortNo = element.machine_portno; } } return Machine_PortNo; } }

6. Data analysis and discussion

6.1 Output generation

Starting a store server

Starting proxy server

Starting a shadow master

Starting a client

Shadow server checks every 20 seconds for proxy server

Uploading a file. Three replicas stored using daisy chaining and pipelining

Client requesting upload

Proxy server getting upload request and processing

Store machine 129.210.16.86 receiving replica and forwarding to replica 129.210.16.84 through

pipelining

Store machine 129.210.16.84 receiving replica and forwarding to replica 129.210.16.85 through

pipelining

Corrupting file in 129.210.16.85. Observe the size differences after ls –ltr

After corruption

Consistency check by the server

After correction by daemon process in proxy server running consistency check

Server interrupted. Shadow Server tries to establish connection every 20 seconds and if not successful in

three attempts, It begins to function as a proxy server.

Now client makes a download request from the now functional shadow server

At new proxy (alias Shadow server)

6.2 Output analysis and comparison

Feature Old implementation Our implementation

Time to upload a file(20MB) 25 seconds 15 seconds

Number of socket connections established to check consistency

O(number of file partitions) O(number of disks)

Table 1: Comparison of results

7. Conclusion and recommendations

7.1 Summary

A simplified cloud storage model with good optimizations that saves time. Uploading has improved drastically by the implementation of pipelining and daisy chaining. Also the comparisons or computations have decreased while maintaining consistency by the usage of varied version of the hash tree.

7.2 Future Work

Our present implementation of the cloud storage model assumes only 3 replicas and future implementation could be to extend into n replicas. An extension would be give individual replica numbers to different users or different files. Present hash tree at disk level computes every time from each disk with the number of files in the directory. A variation of the hash tree could be implemented where new files are simply added to the old computation.

8. Bibliography

[1] The Google File System by Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Google)

[2] Dynamo: Amazon’s Highly Available Key-value Store by Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels (Amazon.com) SOSP 07, October 14–17, 2007, Stevenson, Washington, USA. Copyright 2007 ACM 978-1-59593-591-5/07/0010.

[3] Decentralized Managing of Replication Objects in Massively Distributed Systems by Daniel Klan, Kai-Uwe Sattler, Katja Hose and Marcel Karnstedt. Proceedings of the 2008 international workshop on Data management in peer-to-peer systems (ACM)

[4] Dynamic Load Balancing Using Grid Services for HLA-Based Simulations on Large-Scale Distributed Systems by Azzedine Boukerche and Robson Eduardo De Grande. Proceedings of the 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications

[5] Dynamic Multi-User Load Balancing in Distributed Systems by Satish Penmatsa and Anthony T. Chronopoulos. Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International Year: 2007

Efficient Optimization to a Distributed Cloud Storage...

Documents

Transcript of Efficient Optimization to a Distributed Cloud Storage...