MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating...

72
MASTER'S THESIS Testing as a Service for Machine to Machine Communications Jorge Vizcaíno 2014 Master of Science (120 credits) Computer Science and Engineering Luleå University of Technology Department of Computer Science, Electrical and Space Engineering

Transcript of MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating...

Page 1: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

MASTERS THESIS

Testing as a Service for Machine toMachine Communications

Jorge Vizcaiacuteno2014

Master of Science (120 credits)Computer Science and Engineering

Lulearing University of TechnologyDepartment of Computer Science Electrical and Space Engineering

Testing as a Service for Machine toMachine Communications

Jorge Vizcaino

January 2014

CONTENTS

Chapter 1 ndash Introduction 5

11 Background 5

12 Problem statement 6

13 Method 7

14 Delimitations 8

15 Outline 9

Chapter 2 ndash Related work 11

21 Communication protocols 11

211 HTTP protocol 13

212 IP 15

213 Ethernet 18

214 UDP protocol 20

215 TCP protocol 21

22 Network Performance Metrics 24

221 RTT 24

222 Jitter 25

223 Latency 26

224 Bandwidth 26

225 Throughput 27

23 Tools strongly associated with this thesis 27

231 Network Tools 28

232 Programming language 29

233 Operating Systems 30

Chapter 3 ndash Traffic Test 31

31 Client Server Application 31

32 Loading test with proxy 33

33 Loading test with several clients 38

34 Performance results 40

Chapter 4 ndash Traffic Pattern Extraction 45

41 Collecting packet data 45

42 Replaying traffic pattern 46

Chapter 5 ndash Multiplying Traffic Pattern 49

51 Design of TaaS for M2M communications 49

52 Reproduce testing 50

53 Results of traffic recreation 50

Chapter 6 ndash Summary and Conclusions 55

61 Summary and Results 55

62 Future Work 56

Chapter 7 ndash Appendices 57

iv

Acknowledgements

I would like to offer a word of thanks to my supervisor Laurynas Riliskis for helping me

to carry out this project Thanks to his deep knowledge about this matter he could

give me many useful advices and was able to figure out some questions I had during this

thesis

1

Abstract

During the last years cloud computing and Software-as-a-Service (SaaS) are becoming

increasingly important due to the many advantages that they provide Therefore the

demand for cloud testing infrastructures is increasing as well Analysis and testing of

cloud infrastructures are important and required for its effective functioning Here is

where Test-as-a-Service (TaaS) comes in providing an infrastructure along with tools for

testing in the cloud and evaluating performance and scalability TaaS can offer several

kinds of cloud testing such as regression testing performance testing security testing

scalability testing and so on In this thesis TaaS concerns network testing with the main

goal of finding out the performance of a server To achieve this goal this thesis involves

mostly performance and scalability testing In this thesis we created a TaaS system

that uses a different method to test network This method is based on recreating traffic

pattern extracted from simulations and multiply this pattern to stress a server All this

is carried out in the Amazon Cloud In this way we can find out the server limits build

a theoretical foundation and prove its feasibility The traffic recreated must be as similar

as possible to the traffic extracted from the simulations To determine this similarity we

compared graphs with the number of bytes over time in a simulation and in a session

where the traffic was recreated The more similar the more accurate and better results

we achieved With the results obtained from this method we can compare the traffic

network created by different number of data sources and carried out in different type of

instances Several data such as packet loss round trip time or bytessecond are analyzed

to determine the performance of the server The work done in this thesis can be used

to know server limitation Estimating the possible number of clients that there could be

using the same server at once

3

CHAPTER 1

Introduction

11 Background

Cloud computing [1] provides access in the network to different resources such as soft-

ware servers storage and so on in an efficient way Clients can access to these services

on their own without human interaction since everything is done automatically All ap-

plications are offered over Internet therefore users can access from any location and

with different electronic devices Capability of cloud computing can be easily modified in

order to supply properly its services to the clients regardless of their number Moreover

applications can be monitored and analyzed to give information of their conditions to

both user and provider

Cloud structure can be divided in two parts the front end which is the part the user

can see and the back end that involves the computers servers and networks which are

part of the cloud computer [2] Moreover a main server takes over of the cloud structure

ensuring a good service depending on the number of clients

Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources

cost reduction scalable test structures and testing service availability at any time More-

over TaaS provides the model pay-as-you-test for customers All this characteristics make

TaaS be an efficient way for testing in the cloud The main reasons why there would be

clients interested in TaaS is the fact that this system can inform about several signifi-

cant software and network features such as functionality reliability performance safety

and so on In order to measure these characteristics there are several types of tests for

services in the cloud This thesis was mainly focused on performance testing [4] These

tests are usually carried out to provide information about speed scalability and stability

It is very common the use of performance testing to find out the performance of software

before coming out to the market to ensure it will meet all the requirements to run effi-

ciently Performance testing more specifically can be divided in several kinds of tests

The most related to this thesis are load testing to find out the behaviour of the server

under traffic loads and scalability testing to determine also performance and reliability

5

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 2: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

Testing as a Service for Machine toMachine Communications

Jorge Vizcaino

January 2014

CONTENTS

Chapter 1 ndash Introduction 5

11 Background 5

12 Problem statement 6

13 Method 7

14 Delimitations 8

15 Outline 9

Chapter 2 ndash Related work 11

21 Communication protocols 11

211 HTTP protocol 13

212 IP 15

213 Ethernet 18

214 UDP protocol 20

215 TCP protocol 21

22 Network Performance Metrics 24

221 RTT 24

222 Jitter 25

223 Latency 26

224 Bandwidth 26

225 Throughput 27

23 Tools strongly associated with this thesis 27

231 Network Tools 28

232 Programming language 29

233 Operating Systems 30

Chapter 3 ndash Traffic Test 31

31 Client Server Application 31

32 Loading test with proxy 33

33 Loading test with several clients 38

34 Performance results 40

Chapter 4 ndash Traffic Pattern Extraction 45

41 Collecting packet data 45

42 Replaying traffic pattern 46

Chapter 5 ndash Multiplying Traffic Pattern 49

51 Design of TaaS for M2M communications 49

52 Reproduce testing 50

53 Results of traffic recreation 50

Chapter 6 ndash Summary and Conclusions 55

61 Summary and Results 55

62 Future Work 56

Chapter 7 ndash Appendices 57

iv

Acknowledgements

I would like to offer a word of thanks to my supervisor Laurynas Riliskis for helping me

to carry out this project Thanks to his deep knowledge about this matter he could

give me many useful advices and was able to figure out some questions I had during this

thesis

1

Abstract

During the last years cloud computing and Software-as-a-Service (SaaS) are becoming

increasingly important due to the many advantages that they provide Therefore the

demand for cloud testing infrastructures is increasing as well Analysis and testing of

cloud infrastructures are important and required for its effective functioning Here is

where Test-as-a-Service (TaaS) comes in providing an infrastructure along with tools for

testing in the cloud and evaluating performance and scalability TaaS can offer several

kinds of cloud testing such as regression testing performance testing security testing

scalability testing and so on In this thesis TaaS concerns network testing with the main

goal of finding out the performance of a server To achieve this goal this thesis involves

mostly performance and scalability testing In this thesis we created a TaaS system

that uses a different method to test network This method is based on recreating traffic

pattern extracted from simulations and multiply this pattern to stress a server All this

is carried out in the Amazon Cloud In this way we can find out the server limits build

a theoretical foundation and prove its feasibility The traffic recreated must be as similar

as possible to the traffic extracted from the simulations To determine this similarity we

compared graphs with the number of bytes over time in a simulation and in a session

where the traffic was recreated The more similar the more accurate and better results

we achieved With the results obtained from this method we can compare the traffic

network created by different number of data sources and carried out in different type of

instances Several data such as packet loss round trip time or bytessecond are analyzed

to determine the performance of the server The work done in this thesis can be used

to know server limitation Estimating the possible number of clients that there could be

using the same server at once

3

CHAPTER 1

Introduction

11 Background

Cloud computing [1] provides access in the network to different resources such as soft-

ware servers storage and so on in an efficient way Clients can access to these services

on their own without human interaction since everything is done automatically All ap-

plications are offered over Internet therefore users can access from any location and

with different electronic devices Capability of cloud computing can be easily modified in

order to supply properly its services to the clients regardless of their number Moreover

applications can be monitored and analyzed to give information of their conditions to

both user and provider

Cloud structure can be divided in two parts the front end which is the part the user

can see and the back end that involves the computers servers and networks which are

part of the cloud computer [2] Moreover a main server takes over of the cloud structure

ensuring a good service depending on the number of clients

Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources

cost reduction scalable test structures and testing service availability at any time More-

over TaaS provides the model pay-as-you-test for customers All this characteristics make

TaaS be an efficient way for testing in the cloud The main reasons why there would be

clients interested in TaaS is the fact that this system can inform about several signifi-

cant software and network features such as functionality reliability performance safety

and so on In order to measure these characteristics there are several types of tests for

services in the cloud This thesis was mainly focused on performance testing [4] These

tests are usually carried out to provide information about speed scalability and stability

It is very common the use of performance testing to find out the performance of software

before coming out to the market to ensure it will meet all the requirements to run effi-

ciently Performance testing more specifically can be divided in several kinds of tests

The most related to this thesis are load testing to find out the behaviour of the server

under traffic loads and scalability testing to determine also performance and reliability

5

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 3: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CONTENTS

Chapter 1 ndash Introduction 5

11 Background 5

12 Problem statement 6

13 Method 7

14 Delimitations 8

15 Outline 9

Chapter 2 ndash Related work 11

21 Communication protocols 11

211 HTTP protocol 13

212 IP 15

213 Ethernet 18

214 UDP protocol 20

215 TCP protocol 21

22 Network Performance Metrics 24

221 RTT 24

222 Jitter 25

223 Latency 26

224 Bandwidth 26

225 Throughput 27

23 Tools strongly associated with this thesis 27

231 Network Tools 28

232 Programming language 29

233 Operating Systems 30

Chapter 3 ndash Traffic Test 31

31 Client Server Application 31

32 Loading test with proxy 33

33 Loading test with several clients 38

34 Performance results 40

Chapter 4 ndash Traffic Pattern Extraction 45

41 Collecting packet data 45

42 Replaying traffic pattern 46

Chapter 5 ndash Multiplying Traffic Pattern 49

51 Design of TaaS for M2M communications 49

52 Reproduce testing 50

53 Results of traffic recreation 50

Chapter 6 ndash Summary and Conclusions 55

61 Summary and Results 55

62 Future Work 56

Chapter 7 ndash Appendices 57

iv

Acknowledgements

I would like to offer a word of thanks to my supervisor Laurynas Riliskis for helping me

to carry out this project Thanks to his deep knowledge about this matter he could

give me many useful advices and was able to figure out some questions I had during this

thesis

1

Abstract

During the last years cloud computing and Software-as-a-Service (SaaS) are becoming

increasingly important due to the many advantages that they provide Therefore the

demand for cloud testing infrastructures is increasing as well Analysis and testing of

cloud infrastructures are important and required for its effective functioning Here is

where Test-as-a-Service (TaaS) comes in providing an infrastructure along with tools for

testing in the cloud and evaluating performance and scalability TaaS can offer several

kinds of cloud testing such as regression testing performance testing security testing

scalability testing and so on In this thesis TaaS concerns network testing with the main

goal of finding out the performance of a server To achieve this goal this thesis involves

mostly performance and scalability testing In this thesis we created a TaaS system

that uses a different method to test network This method is based on recreating traffic

pattern extracted from simulations and multiply this pattern to stress a server All this

is carried out in the Amazon Cloud In this way we can find out the server limits build

a theoretical foundation and prove its feasibility The traffic recreated must be as similar

as possible to the traffic extracted from the simulations To determine this similarity we

compared graphs with the number of bytes over time in a simulation and in a session

where the traffic was recreated The more similar the more accurate and better results

we achieved With the results obtained from this method we can compare the traffic

network created by different number of data sources and carried out in different type of

instances Several data such as packet loss round trip time or bytessecond are analyzed

to determine the performance of the server The work done in this thesis can be used

to know server limitation Estimating the possible number of clients that there could be

using the same server at once

3

CHAPTER 1

Introduction

11 Background

Cloud computing [1] provides access in the network to different resources such as soft-

ware servers storage and so on in an efficient way Clients can access to these services

on their own without human interaction since everything is done automatically All ap-

plications are offered over Internet therefore users can access from any location and

with different electronic devices Capability of cloud computing can be easily modified in

order to supply properly its services to the clients regardless of their number Moreover

applications can be monitored and analyzed to give information of their conditions to

both user and provider

Cloud structure can be divided in two parts the front end which is the part the user

can see and the back end that involves the computers servers and networks which are

part of the cloud computer [2] Moreover a main server takes over of the cloud structure

ensuring a good service depending on the number of clients

Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources

cost reduction scalable test structures and testing service availability at any time More-

over TaaS provides the model pay-as-you-test for customers All this characteristics make

TaaS be an efficient way for testing in the cloud The main reasons why there would be

clients interested in TaaS is the fact that this system can inform about several signifi-

cant software and network features such as functionality reliability performance safety

and so on In order to measure these characteristics there are several types of tests for

services in the cloud This thesis was mainly focused on performance testing [4] These

tests are usually carried out to provide information about speed scalability and stability

It is very common the use of performance testing to find out the performance of software

before coming out to the market to ensure it will meet all the requirements to run effi-

ciently Performance testing more specifically can be divided in several kinds of tests

The most related to this thesis are load testing to find out the behaviour of the server

under traffic loads and scalability testing to determine also performance and reliability

5

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 4: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

Chapter 5 ndash Multiplying Traffic Pattern 49

51 Design of TaaS for M2M communications 49

52 Reproduce testing 50

53 Results of traffic recreation 50

Chapter 6 ndash Summary and Conclusions 55

61 Summary and Results 55

62 Future Work 56

Chapter 7 ndash Appendices 57

iv

Acknowledgements

I would like to offer a word of thanks to my supervisor Laurynas Riliskis for helping me

to carry out this project Thanks to his deep knowledge about this matter he could

give me many useful advices and was able to figure out some questions I had during this

thesis

1

Abstract

During the last years cloud computing and Software-as-a-Service (SaaS) are becoming

increasingly important due to the many advantages that they provide Therefore the

demand for cloud testing infrastructures is increasing as well Analysis and testing of

cloud infrastructures are important and required for its effective functioning Here is

where Test-as-a-Service (TaaS) comes in providing an infrastructure along with tools for

testing in the cloud and evaluating performance and scalability TaaS can offer several

kinds of cloud testing such as regression testing performance testing security testing

scalability testing and so on In this thesis TaaS concerns network testing with the main

goal of finding out the performance of a server To achieve this goal this thesis involves

mostly performance and scalability testing In this thesis we created a TaaS system

that uses a different method to test network This method is based on recreating traffic

pattern extracted from simulations and multiply this pattern to stress a server All this

is carried out in the Amazon Cloud In this way we can find out the server limits build

a theoretical foundation and prove its feasibility The traffic recreated must be as similar

as possible to the traffic extracted from the simulations To determine this similarity we

compared graphs with the number of bytes over time in a simulation and in a session

where the traffic was recreated The more similar the more accurate and better results

we achieved With the results obtained from this method we can compare the traffic

network created by different number of data sources and carried out in different type of

instances Several data such as packet loss round trip time or bytessecond are analyzed

to determine the performance of the server The work done in this thesis can be used

to know server limitation Estimating the possible number of clients that there could be

using the same server at once

3

CHAPTER 1

Introduction

11 Background

Cloud computing [1] provides access in the network to different resources such as soft-

ware servers storage and so on in an efficient way Clients can access to these services

on their own without human interaction since everything is done automatically All ap-

plications are offered over Internet therefore users can access from any location and

with different electronic devices Capability of cloud computing can be easily modified in

order to supply properly its services to the clients regardless of their number Moreover

applications can be monitored and analyzed to give information of their conditions to

both user and provider

Cloud structure can be divided in two parts the front end which is the part the user

can see and the back end that involves the computers servers and networks which are

part of the cloud computer [2] Moreover a main server takes over of the cloud structure

ensuring a good service depending on the number of clients

Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources

cost reduction scalable test structures and testing service availability at any time More-

over TaaS provides the model pay-as-you-test for customers All this characteristics make

TaaS be an efficient way for testing in the cloud The main reasons why there would be

clients interested in TaaS is the fact that this system can inform about several signifi-

cant software and network features such as functionality reliability performance safety

and so on In order to measure these characteristics there are several types of tests for

services in the cloud This thesis was mainly focused on performance testing [4] These

tests are usually carried out to provide information about speed scalability and stability

It is very common the use of performance testing to find out the performance of software

before coming out to the market to ensure it will meet all the requirements to run effi-

ciently Performance testing more specifically can be divided in several kinds of tests

The most related to this thesis are load testing to find out the behaviour of the server

under traffic loads and scalability testing to determine also performance and reliability

5

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 5: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

Acknowledgements

I would like to offer a word of thanks to my supervisor Laurynas Riliskis for helping me

to carry out this project Thanks to his deep knowledge about this matter he could

give me many useful advices and was able to figure out some questions I had during this

thesis

1

Abstract

During the last years cloud computing and Software-as-a-Service (SaaS) are becoming

increasingly important due to the many advantages that they provide Therefore the

demand for cloud testing infrastructures is increasing as well Analysis and testing of

cloud infrastructures are important and required for its effective functioning Here is

where Test-as-a-Service (TaaS) comes in providing an infrastructure along with tools for

testing in the cloud and evaluating performance and scalability TaaS can offer several

kinds of cloud testing such as regression testing performance testing security testing

scalability testing and so on In this thesis TaaS concerns network testing with the main

goal of finding out the performance of a server To achieve this goal this thesis involves

mostly performance and scalability testing In this thesis we created a TaaS system

that uses a different method to test network This method is based on recreating traffic

pattern extracted from simulations and multiply this pattern to stress a server All this

is carried out in the Amazon Cloud In this way we can find out the server limits build

a theoretical foundation and prove its feasibility The traffic recreated must be as similar

as possible to the traffic extracted from the simulations To determine this similarity we

compared graphs with the number of bytes over time in a simulation and in a session

where the traffic was recreated The more similar the more accurate and better results

we achieved With the results obtained from this method we can compare the traffic

network created by different number of data sources and carried out in different type of

instances Several data such as packet loss round trip time or bytessecond are analyzed

to determine the performance of the server The work done in this thesis can be used

to know server limitation Estimating the possible number of clients that there could be

using the same server at once

3

CHAPTER 1

Introduction

11 Background

Cloud computing [1] provides access in the network to different resources such as soft-

ware servers storage and so on in an efficient way Clients can access to these services

on their own without human interaction since everything is done automatically All ap-

plications are offered over Internet therefore users can access from any location and

with different electronic devices Capability of cloud computing can be easily modified in

order to supply properly its services to the clients regardless of their number Moreover

applications can be monitored and analyzed to give information of their conditions to

both user and provider

Cloud structure can be divided in two parts the front end which is the part the user

can see and the back end that involves the computers servers and networks which are

part of the cloud computer [2] Moreover a main server takes over of the cloud structure

ensuring a good service depending on the number of clients

Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources

cost reduction scalable test structures and testing service availability at any time More-

over TaaS provides the model pay-as-you-test for customers All this characteristics make

TaaS be an efficient way for testing in the cloud The main reasons why there would be

clients interested in TaaS is the fact that this system can inform about several signifi-

cant software and network features such as functionality reliability performance safety

and so on In order to measure these characteristics there are several types of tests for

services in the cloud This thesis was mainly focused on performance testing [4] These

tests are usually carried out to provide information about speed scalability and stability

It is very common the use of performance testing to find out the performance of software

before coming out to the market to ensure it will meet all the requirements to run effi-

ciently Performance testing more specifically can be divided in several kinds of tests

The most related to this thesis are load testing to find out the behaviour of the server

under traffic loads and scalability testing to determine also performance and reliability

5

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 6: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

Abstract

During the last years cloud computing and Software-as-a-Service (SaaS) are becoming

increasingly important due to the many advantages that they provide Therefore the

demand for cloud testing infrastructures is increasing as well Analysis and testing of

cloud infrastructures are important and required for its effective functioning Here is

where Test-as-a-Service (TaaS) comes in providing an infrastructure along with tools for

testing in the cloud and evaluating performance and scalability TaaS can offer several

kinds of cloud testing such as regression testing performance testing security testing

scalability testing and so on In this thesis TaaS concerns network testing with the main

goal of finding out the performance of a server To achieve this goal this thesis involves

mostly performance and scalability testing In this thesis we created a TaaS system

that uses a different method to test network This method is based on recreating traffic

pattern extracted from simulations and multiply this pattern to stress a server All this

is carried out in the Amazon Cloud In this way we can find out the server limits build

a theoretical foundation and prove its feasibility The traffic recreated must be as similar

as possible to the traffic extracted from the simulations To determine this similarity we

compared graphs with the number of bytes over time in a simulation and in a session

where the traffic was recreated The more similar the more accurate and better results

we achieved With the results obtained from this method we can compare the traffic

network created by different number of data sources and carried out in different type of

instances Several data such as packet loss round trip time or bytessecond are analyzed

to determine the performance of the server The work done in this thesis can be used

to know server limitation Estimating the possible number of clients that there could be

using the same server at once

3

CHAPTER 1

Introduction

11 Background

Cloud computing [1] provides access in the network to different resources such as soft-

ware servers storage and so on in an efficient way Clients can access to these services

on their own without human interaction since everything is done automatically All ap-

plications are offered over Internet therefore users can access from any location and

with different electronic devices Capability of cloud computing can be easily modified in

order to supply properly its services to the clients regardless of their number Moreover

applications can be monitored and analyzed to give information of their conditions to

both user and provider

Cloud structure can be divided in two parts the front end which is the part the user

can see and the back end that involves the computers servers and networks which are

part of the cloud computer [2] Moreover a main server takes over of the cloud structure

ensuring a good service depending on the number of clients

Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources

cost reduction scalable test structures and testing service availability at any time More-

over TaaS provides the model pay-as-you-test for customers All this characteristics make

TaaS be an efficient way for testing in the cloud The main reasons why there would be

clients interested in TaaS is the fact that this system can inform about several signifi-

cant software and network features such as functionality reliability performance safety

and so on In order to measure these characteristics there are several types of tests for

services in the cloud This thesis was mainly focused on performance testing [4] These

tests are usually carried out to provide information about speed scalability and stability

It is very common the use of performance testing to find out the performance of software

before coming out to the market to ensure it will meet all the requirements to run effi-

ciently Performance testing more specifically can be divided in several kinds of tests

The most related to this thesis are load testing to find out the behaviour of the server

under traffic loads and scalability testing to determine also performance and reliability

5

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 7: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CHAPTER 1

Introduction

11 Background

Cloud computing [1] provides access in the network to different resources such as soft-

ware servers storage and so on in an efficient way Clients can access to these services

on their own without human interaction since everything is done automatically All ap-

plications are offered over Internet therefore users can access from any location and

with different electronic devices Capability of cloud computing can be easily modified in

order to supply properly its services to the clients regardless of their number Moreover

applications can be monitored and analyzed to give information of their conditions to

both user and provider

Cloud structure can be divided in two parts the front end which is the part the user

can see and the back end that involves the computers servers and networks which are

part of the cloud computer [2] Moreover a main server takes over of the cloud structure

ensuring a good service depending on the number of clients

Nowadays TaaS [3] is very significant as it implies cost sharing of computing resources

cost reduction scalable test structures and testing service availability at any time More-

over TaaS provides the model pay-as-you-test for customers All this characteristics make

TaaS be an efficient way for testing in the cloud The main reasons why there would be

clients interested in TaaS is the fact that this system can inform about several signifi-

cant software and network features such as functionality reliability performance safety

and so on In order to measure these characteristics there are several types of tests for

services in the cloud This thesis was mainly focused on performance testing [4] These

tests are usually carried out to provide information about speed scalability and stability

It is very common the use of performance testing to find out the performance of software

before coming out to the market to ensure it will meet all the requirements to run effi-

ciently Performance testing more specifically can be divided in several kinds of tests

The most related to this thesis are load testing to find out the behaviour of the server

under traffic loads and scalability testing to determine also performance and reliability

5

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 8: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

6 Introduction

of the server when increasing the load The process to develop a performance testing

involves the next steps [4]

1 Identify your testing environment it is necessary to know the physical environment

where the test will be developed as well as the testing tools required

2 Identify the performance criteria this includes limit of response times and other

values the simulations must meet to consider that the performance is good enough

to offer a reliable service

3 Design performance tests test all the different cases which could be taken for the

service or application

4 Configuring the environment prepare the environment and tools before starting

the simulations

5 Implement test design develop suitable performance tests for the test design

6 Run the tests start simulations and display values of the test

7 Analyze test look into the results to check the performance of the service

Performance testing will ensure cloud services so that applications will run properly

These are the most recommended steps to develop TaaS [3] and we have taken them in

account in this work This service provides good features such as elasticity safety easy

handling reliable environment and flexibility when choosing options regarding instance

storage

12 Problem statementNowadays TaaS [3] is something very common due to the wide use of internet clouds and

the large number of applications provided on them Therefore we found interesting to

use this concept to create a new approach for testing a particular scenario In this thesis

we focused on developing a different method to apply TaaS in a M2M framework In

addition this project could be modified to test different scenarios for further research

For example it would be possible to add more servers increasing the number of instances

in the scripts used in this project

The acronym M2M [5] can have different meanings such as Machine-to-Machine Machine-

to-Man Machine-to-Mobile and so on However M2M does have a clear goal which

is to allow the exchange of information over a communication network between two end

points

When it comes to test networks it is necessary to do it with the expected traffic that

will go through that network whenever is used To do so there are two different ways [6]

The first one is simulating the type of traffic that is suppose to go over the network TaaS

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 9: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

13 Method 7

systems use this option allowing TaaS users to configure the test simulations according

to their needs To configure these simulations some tools are used in the cloud such as

Selenium and Jmeter [3]

The second way to test networks is replay recorded network traffic The purpose of

this thesis is to apply this second way to create a TaaS system to test networks in the

cloud In order to replay recorded traffic we followed a method based on a replay attack

[7] which is explained in the next section

In this way we created a TaaS system that can estimate network performance using

a different method than the other systems already made First we must also configure

the simulations to test the server However the main difference in our method is that

we then extract the traffic pattern from those simulations in order to multiply it from a

black box so that we can stress the server This is an interesting method because since

we recreate precisely real exchange of traffic the results are very actual and accurate

Finally we had to prove the feasibility of the method applied

The TaaS system developed was done for testing in the Amazon Cloud [8] which allowed

us to set up the whole scenario easily and use different type of instances These instances

differ in features such as memory storage network performance and so on [9] Therefore

it was interesting to compare results when we picked out one sort of instance or another

13 Method

The method followed during this thesis is divided into three steps First we set up a

proxy between client and server to extract the traffic Then we had to figure out a way

to recreate this traffic to finally replay it M2M to test the server

In the following paragraphs the method is described in detail The first step consisted

of setting up a scenario client-proxy-server in the cloud Then we could run simulations

to look into the behaviour of the packets going over the network Afterwards we could

check how the network performance was different when we changed some factors (number

of clients type of instance etc) in this network The packets were sniffed in the proxy

with the tool tshark [10] Once we have some knowledge about the network simulated

we could start developing a method to extract a traffic pattern from those simulations

We must take in account that the script programmed must obtain the traffic pattern

properly So that when it comes to recreate the same traffic M2M the behaviour of the

packets was as similar as possible to the original simulation To achieve this goal we had

to extract the data sent and the timestamp of the packets with high precision

Once the pattern from the simulations is extracted we moved on to the third and last

step where we multiplied the traffic pattern scaling up the number of clients In this

way large traffic load recreations were carried out to test the server limits and find out

how this server could handle heavy traffic loads These data sources sent the pattern

extracted directly to the server in a M2M framework Finally when we obtained the

final results we could find out the server performance and the feasibility of the approach

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 10: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

8 Introduction

developed

This method carried out is a kind of replay attack [7] where there is a rdquoman-in-the-

middlerdquo (Wireshark sniffing in proxy) which intercepts the traffic Then this traffic is

replayed pretending to be the original sender in order to create problems to the host

server In our thesis this traffic is scaled up from a multiplier to stress and find out the

software limits

Regarding the traffic pattern extraction several tools were studied and two methods

to extract and replay traffic were considered The first method consisted of modifying

the pcap files previously recorded in the proxy so that the packets could be sent straight

to the server from the data sources Tcprewrite [11] was the tool used to modify the pcap

file The next step was to use another tool to recreate the traffic contained in the new

pcap One of them was Scapy [12] which sent the packets but not in the appropriate

time Another tool used was Tcpreplay [13] but it was not possible to simulate the server

since this tool does not work with the transport level Therefore we could not establish

a valid TCP connection [14] Finally we used the second method which is based on

programming our own script to extract the traffic pattern and recreate it later on This

way chosen was much trickier but much more suitable as well as completely automatic

With this method was not necessary to do anything handwriting (like modifying the

pcap file from the console) We just needed to type a few options such as name of file

to replay or server instance type After programming the script to extract the pattern

we needed to replay it somehow We tried again with Scapy [12] a very good tool

when it comes to create and configure packets However there were some problems to

receive segments coming from the server since it was needed to work with sequence and

acknowledge numbers Therefore this was extremely hard and finally sockets were used

again to replay the pattern

A diagram with the whole system carried out to develop the thesis is shown in the

Figure 11 The top of this Figure 11 (network traffic simulation) refers to the first part

of the thesis where we had to set up a client-proxy-server communication in the cloud

With this scenario we could run the simulations to exchange packets Below we can see

the next steps The traffic was recorded in the proxy for a further analysis and pattern

extraction Finally with the pattern obtained we came down to the last part of the

thesis shown on the bottom of the Figure 11 (traffic pattern recreation) In this part

we set up a multiplier composed of many data sources which recreated the traffic pattern

towards the same server In this way we could find out the server performance when it

comes to handle heavy traffic loads

14 Delimitations

During this thesis we have made some delimitations The project could be improved to

cover most of these aspects in further research

The TaaS system developed can only function in the Amazon Cloud because the main

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 11: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

15 Outline 9

library used to program this system works only for this particular cloud The goal of this

project is testing M2M communications we cannot test different scenarios apart from a

client-server connection Moreover we have focused on finding out server performance

therefore the scope of this TaaS system is to carry out performance and scalability tests

In addition the TaaS system programmed uses TCP sockets to test servers for instance

we cannot use http requests for testing

15 OutlineThe thesis is organized as follows

Introduction is in Chapter 1 Chapter 2 describes related work In Chapter 3 is

described simulations and analysis of the scenario data source proxy and server Traffic

pattern extraction is described in Chapter 4 Design of TaaS [3] for M2M communications

and results achieved with this system are in Chapter 5 Summary of the whole thesis

main results obtained and future work are in Chapter 6 Finally Chapter 7 includes the

appendices

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 12: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

10 Introduction

Figure 11 Flow diagram of the developed system

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 13: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CHAPTER 2

Related work

There are many different communication protocols and they are required to establish

connections within the network and transfer data among hosts All of them have their

place in the OSI model [15] where they are classified depending on their function In this

section the main protocols are explained properly since they are essential to analyze a

client-server communication Moreover some significant data needed to measure network

performance in testing are described as well as the tools to carry out simulations and

analyze segments

21 Communication protocols

In order to develop this thesis it was crucial to give a description of the most significant

protocols and explain their functions In this way it was easier to look into all the sniffed

packets check that everything is working properly It is also useful to have a detailed

knowledge of protocols when it comes to recreate the traffic pattern

Protocols are the objects that use the different OSI model layers in order to establish

a communication within a network [15] Each protocol provides two different interfaces

The first one is a service interface to deliver to the other objects in the same machine

that want to use the service offers for this protocol The other interface is called peer

interface and is sent and used for its equivalent in another machine

However before explaining the different kind of protocols it is important to describe

how they are organized depending on their function and see the whole where they all take

place To avoid a system becomes too complex it is needed to add levels of abstraction

In networks systems this is also applied creating layers with distinct functions each

In this way the problem of building a network is divided into more manageable parts

Another advantage is the ease to add new services since it will not be necessary to

modify all the part but only the one where the service will be introduce In networks

the architecture chosen is named the OSI model [15] Networks follow this structure

11

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 14: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

12 Related work

when connecting computers This architecture is composed by seven levels with different

functions These levels are represented from top to bottom in the Figure 21

Figure 21 OSI model

First the physical layer identifies the physical features of the network These charac-

teristics can be related with the hardware such as type of cables and connectors or with

the network topology (bus ring star and so on) This layer also determines voltage

and frequency that signals will use About data link layer it transmits the data from

upper levels to the physical layer but is also in charge of error detection and correction

and hardware addressing The main function of the network layer is to provide a mech-

anism to select routes within the network in order to exchange packets among different

systems This layer uses mainly the IP protocol The transport layer takes charge of

transporting data in the network To ensure packets get to the destination properly this

layer can check errors in the sending make sure the data goes to the right service in

the upper levels and divide packets in others more manageable (segmentation process)

The most significant protocols are TCP and UDP about which we will talk later The

session layer sets up connections between two endpoints (normally applications) making

sure the application on the other system has the proper settings to communicate with

the source application The next level contain the presentation layer which transform

the data linked to the application into another format in order to send it through the

network Finally the application layer gets requests and data from users in order to send

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 15: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

21 Communication protocols 13

them to the lower layers The most common application protocol is HTTP

211 HTTP protocol

The Hypertext Transfer Protocol (HTTP) [16] is a protocol in the application level used

for distributed collaborative hypermedia information systems This network protocol

is used for communication among users proxies or gateways to other Internet systems

HTTP is used to deliver files but another important function of this protocol is linked to

the transmission of resources A resource is a network data object that can be identified

by a URI Normally these resources are either files or outputs of a script Each HTTP

message has the general form shown in the List 21 [15]

Listing 21 HTTP message

START LINE ltCLRFgt

MESSAGE HEADER ltCLRFgt

ltCLRFgt

MESSAGE BODY ltCLRFgt

The first line shows whether this is a response or request message The next lines

provide parameters and options for the message There are different kinds of header lines

in HTTP and there is not limit on the number of lines that can be sent The last part is

a body of data sent after the header lines

Overall operation

HTTP is a requestresponse protocol and the port used by default is 80 but it is possible

to use other ports An important HTTP request methods are called GET and POST

[16] This method is used to request and retrieve data from a specified resource However

there is another request method which is very significant for this thesis and its name is

CONNECT [16] This method is used to send data through a proxy that can act like a

tunnel Therefore we needed this method to establish a connection between client and

server through the proxy

A few characteristics of HTTP communication must be pointed out There is not a

permanent connection when the request is sent the client disconnects from the server

The server will have to enable the connection again As a result client and server know

that there is a connection between them only during a request Therefore they cannot

keep information about the requests Any kind of data can be transmitted by HTTP as

long as client and server know how to manage the data A typical example of HTTP

request is shown in the Figure 22

To set up a communication with this protocol a client must open a connection sending

a request message to the server which returns a response message Afterwards the

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 16: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

14 Related work

Figure 22 HTTP request

server will close the connection First of all we will describe the initial line of the request

and response message Concerning the request message this first line consists of three

parts The first one is the HTTP request method The second part is the path of the

requested resource This part is called URI And finally the version of HTTP that is

being used This idea can be clearly seen in the List 22 This example was extracted

from the simulations made during the thesis

Listing 22 HTTP request with CONNECT

CONNECT

ec2minus54minus217minus136minus250euminuswest minus1compute amazonaws com50007

HTTP1 1

The initial line of the response from the server is also divided in three parts The initial

part involves the version of HTTP used for the communication Afterwards there will

be a code [15] for the computer to understand the result of the request The first digit

indicates the class of response We have the codes shown in the List 23

Listing 23 HTTP request result

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 17: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

21 Communication protocols 15

1xx i n f o rmat i ona l message

2xx s u c c e s s in the connect ion

3xx r e d i r e c t s the c l i e n t to another URL

4xx e r r o r l i nked to the c l i e n t

5xx e r r o r l i nked to the s e r v e r

Finally there is a word or sentence in English to describe the status of the connection

Header lines offer information about the request or response or about any object sent

in the message body which will be explained later There are many different headers lines

but they can be classified in four main groups [17] The entity header involves information

about either the request response or the information contained in the message body A

general header is used in both the request and the responseThe request header is sent

by a browser or a client to a server Finally the last kind of header is called response

and is sent by a server in a response to a requestThe format of the header lines is

aHeader-Name valuea Two examples of header lines are shown in the List 24

Listing 24 HTTP header lines

Userminusagent Moz i l l a 3 0

Host www amazon com

Finally an HTTP may have a body with data after the header lines In a response the

request resource is always sent in its body There may be also texts giving information

or warning of errors In a request it is in the body where the user enters data or uploads

files which will be sent to the server When the HTTP message contains a body there are

usually header lines that provide information about the body One of these header lines

is called Content-Type and it indicates the MIME and type of the data in the body For

instance texthtml or imagegif Another very common header line is Content-Length

which provides how many bytes were used in the body

212 IP

Internet Protocol (IP) is used to build and interconnect networks for the exchange of

packets [15] It is important to be clear about this layer to make sure the information is

going to expected points within the network created in the cloud

IP occupies the network layer in the OSI model IP runs both hosts and routers

defining an infrastructure that allows these nodes and networks operate as a single in-

ternetwork Concerning the delivery IP has the service model called best effort which

provides an unreliable datagram delivery therefore it is not ensured that the datagrams

reaches their destinations In addition this service model may cause more problems since

the packets can be delivered out of order as well as get the destination more than once

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 18: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

16 Related work

IP Header

The Figure 23 shows all the files carried in the IP header

Figure 23 Fields of the IP Header

The first field is the Version Type which indicates the IP version used in the transmis-

sion The Header Length identifies the length of the header in 32-bit words If there are

no options the header has 5 words (20 bytes) The next field Type of Service is used to

indicate the quality of the service The field Total Length indicates the length in bytes

(unlike in Header Length where the length was count in words) of the whole datagram

When it comes to the Identification Field the sender always marks each IP datagram

with an ID number before the transmission The goal is to have unique datagrams so

if several fragments arrive to the destination since all of them had the same ID value

the destination host can put together the fragments received If some fragment does not

arrive all the fragments with the same number will be discarded In the next field there

are up to three flags The first flag does not have any use for now it is set to 0 The

flag D allows the fragmentation of data into smaller pieces when this flag is set to 1 The

flag M indicates whether the datagram received is the last one of the stream (set to 0)

or there are more datagrams left (set to 1)

The Fragment Offset is a value used by the sender to indicate the position of the

datagrams within the stream in which they have been sent so the receiver can put them

in order The first byte of the third word of the header is the field TTL which set the

maximum time that a datagram may be on the network before being discarded The

main goal of this function is to discard datagrams that are within the network but never

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 19: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

21 Communication protocols 17

reach the receiver The next field is called ProtocolProtocol and indicates the kind of

protocol that is expected in the datagram The IP Header also uses a simple Checksum

to verify the integrity of the header and the data during the transmission To send the

packet is required to fill the field Source Address with the IP address of the sender as

well as to fill the Destination Address with the IP address of the receiver There is also a

field to set up some options if they were required and a Padding set with zeros to ensure

that the length of the header is multiple of 32

Fragmentation and Reassembly

Since IP provides host-to-host service throughout so many different networks with diverse

technology it is required to manage datagrams so they can go over all the networks

There are two choices available to figure this problem out [15] The first one is to ensure

that every IP datagrams are small enough in order to fit inside a packet in any type

of network The second option is to use some technique to fragment and reassemble

packets when they are too big to go through some network This second option is the

most suitable since networks are continuously changing and can be especially difficult to

choose a specific size for the packet that fits in every network This second option is the

one used in the Amazon networks where we ran the tests It is significant to know how

the segments are fragmented to examine each segment sent and its respective answer In

this way the exchange of packets was more organized and the recreation of traffic pattern

was easier to make

This second option is based on the Maximum Transmission Unit (MTU) which is the

biggest IP datagram that can be carried in a frame Normally the host chooses the MTU

size to send IP datagrams If by chance the packets go over some network with smaller

MTU it will be required to use fragmentation For instance if a packet of 1420 bytes

(including 20 bytes of IP header) has to go through a network with 532 bytes of MTU

the datagram will be fragmented in three packets The first two packets will contain 512

bytes of data and another 20 bytes for the header Therefore there will be 376 bytes

left (1400 ndash 5122) so that the last datagram will carry those 376 bytes of data plus 20

bytes for the header The result would look like in the Figure 24

It should be noted that the amount of data bytes in each packet must be always multiple

of 8 During this process the router will set the M bit in the Flag of the first and second

datagram to indicate that there are more packets coming As regards the offset field in

the first packet it is set to 0 because this datagram carries the first part of the original

packet However the second datagram will have the Offset set to 64 since the first byte

of data is the 513th (5128 bytes)

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 20: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

18 Related work

Figure 24 Datagram fragmentation

Ethernet address resolution protocol

Nowadays Ethernet is the most widely used link layer network To develop a mapping

between the link layer addresses and IP addresses is required to use the technic Address

Resolution Protocol (ARP) so that the physical interface hardware on the node can

understand the addressing scheme

The method to get the link layer of a particular server through this technique involves

the next steps [15] First of all the sender will check its ARP cache to find out if it has

already the link layer address (MAC) of the receiver If it is not there a new ARP request

message will be sent which carries its own IP and link layer addresses and the IP address

of the server desired This message is received by every device within the local network

since this message is a broadcast The receivers compare the searched IP address with

their own IP address The servers with different IP addresses will drop the packet but

the receiver which we are looking for will send an ARP reply message to the client This

server also will update its ARP cache with the link layer address of the client When the

sender receives the ARP reply the MAC address of the receiver is saved The required

steps can be seen in the picture 25

213 Ethernet

Ethernet occupies both the data link and the physical layer in the OSI model [18][19]

The data link layer is divided in two different sublayers Media Access Control known as

MAC (defined by IEEE 8023) and MAC client (defined by IEEE 8022) The structure

is shown in the Figure 26

The MAC client must be one of the next two different types of sublayers The first one

is the Logical Link Control (LLC) which supplies the interface from the MAC sublayer

to the upper layers The other option is called bridge entity which provides an interface

between LANs that can be using the same (for instance Ethernet to Ethernet) or different

protocols

Concerning the MAC sublayer [18] this level takes charge of data encapsulation as-

sembling also the frames before sending them as well as of analyzing these frames and

detecting errors during the communication Moreover this sublayer is in charge of starting

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 21: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

21 Communication protocols 19

Figure 25 ARP request

Figure 26 Ethernet layers in OSI model

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 22: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

20 Related work

frame transmissions and recovering them from communication errors

The physical layer enables the communication between the data link layer and the

respective physical layer of other systems In addition this layer provides significant

physical features of the Ethernet such as voltage levels timing but the most important

functions are related with data encoding and channel access This layer can code and

decode bits between binary and phase-encoded form About access to the channel this

level sends and receives the encoded data we spoke about before and detects collisions

in the packets exchange

214 UDP protocol

User Datagram Protocol (UDP) is an IP standard defined in the internet standard RFC

768 [20] It is used as transport protocol therefore its function is similar to the TCP

protocol but UDP is sometimes preferred since it is faster lighter and simpler than TCP

However it is less reliable UDP provides a best-effort service to an end system which

means that UDP does not guarantee the proper delivery of the datagrams Therefore

these protocols must not be used when a reliable communication is necessary

UDP header

UDP messages are sent within a single IP packet and the maximum number of bytes is

65527 for IPv6 [21] When a UDP datagram is sent the data and the header go together

in the IP network layer and the computer has to fill the fields of the UDP header in the

proper way The scheme of the UDP protocol is represented in the Figure 27

Among other things UDP is normally used to serve Domain Name System (DNS)

requests on port number 53 DNS is a protocol that transforms domain names into IP

addresses This is important in this thesis since the proxy between client and server

needs to work out the server IP address

Figure 27 UDP protocol header

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 23: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

21 Communication protocols 21

The UDP header is composed by four fields [15] each one contains 2 bytes The

Source Port indicates the port from which the packet was sent and it is by default the

port where the reply should be addressed if there is no any change The Destination

Port is the internet destination address where the packet will be sent The field for the

Length indicates the total number of bytes used in the header and in the payload data

Finally the Checksum is a scheme to avoid possible errors during the transmission Each

message is accompanied by a number calculated by the transmitter and the receiving

station applies the same algorithm as the transmitter to calculate the Checksum Both

Checksums must match to ensure that any error happened during the transmission

UDP ports

UDP ports give a location to send and receive UDP messages These ports are used to

send different kinds of traffic facilitating and setting an order for the packet transmission

Since the UDP port field is only 16 bits long there are 65536 available ports From 0

to 1023 are well-known port numbers The destination port is usually one of these

well-known ports and normally each one of these ports is used for one application in

particular

215 TCP protocol

Transmission Control Protocol (TCP) is a protocol pertaining to the transport layer

and used when a reliable delivered is required [15] TCP is by far the most important

protocol in this thesis since our TaaS system is based on TCP sockets With this protocol

a communication between two endpoints can be set up Each endpoint is defined by two

parameters the IP address and the TCP port number The following are some of the

main characteristics of this protocol In TCP the window size will decide the amount of

bytes that can be transferred before the acknowledgement from the receiver is required

Whit TCP is possible to place the datagrams in order when they are coming from the

IP protocol In addition this protocol allows the data management to create different

length fragment to forward them to the IP protocol In TCP is also possible to transfer

data coming from different sources on the same line multiplexing this data This task is

carried out by the ports

The TCP header is more complex than the UDP header The scheme of this header is

shown in the picture 28

The field Source Port identifies the sender port as well as the Destination Port does

with the receiver port The fields Sequence Number and the Acknowledgement Number

will be explained deeply in the next section since it is important to know how they work

during a connection The Header Length field also called data Offset sets the size of the

TCP header keeping in mind that the length will be always a multiple of 32 bits The

next field (called reserved in the picture) is useless for now and it is declared to zero

The flags field is used for additional information in the packets transmission The SYN

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 24: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

22 Related work

Figure 28 TCP protocol header

flag is used to set up a TCP connection and the FIN flag to finish it The ACK indicates

that the packet is an acknowledgement The URG flag is to inform that the segment

contain urgent data The PUSH flag is activated by the sender in order for the receiver

to let the sender know that the packet was received Finally the RESET is set to restart

the connection

Other important issue is the windows size Through this field we can know the number

of bytes that the receiver can accept without acknowledgement With the Checksum

field the packets transmission will be more reliable since this field is used to check the

integrity of the header The next field in the TCP header is called Urgent Pointer and its

function is to inform where the regular data (non-urgent data) contained in the packet

begins There can be also different options in the header the length of this field is

variable depending on what kind of options there are available Finally there is a space

between the options and the data called Padding It is set with zeros and the goal is to

ensure that the length of the packet is multiple of 32 bits

TCP connection

To set up a connection TCP uses an algorithm called three-way handshake [15] in

which three packets are sent In TCP connections sender and receiver must agree on

a number of parameters When a connection is established these parameters are the

starting sequence number An example of the way to set up a TCP connection is shown

in the Figure 29

First of all the client sends a packet to start the communication the SYN flag is set

to 1 and there will be a number carried in the sequence number field When the server

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 25: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

21 Communication protocols 23

Figure 29 Establishing a connection in TCP

responds it will send a packet with the acknowledgement number equal to the sequence

number of the first packet plus one and its own beginning sequence number Both the

ACK and then SYN flags will be set to 1 Finally the client responds with a packet

in which the acknowledgement number is one number higher than the sequence number

received from the server Obviously the flag ACK must be set to 1 again

Furthermore the client can also request to finish a connection The process to end

the communication starts with a packet sent by the client with the FIN flag activated

Once the server receives the packet it sends an acknowledgement with the FIN flag set

to 1 and keeps on sending the packets in progress Afterwards the client informs its

application that a FIN segment was received and sends another packet with the FIN flag

to the server to end the communication

Reliable delivery

TCP provides an ordered and reliable delivery which is achieved through a method called

rsquosliding windowrsquo [15] where is possible to define a number of sequences that does not need

acknowledgements This window moves depending on the acknowledgements received and

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 26: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

24 Related work

its size can be modified by the server changing the value in the window size field

Figure 210 Sliding window method

The window moves to the right when the client receives the ACK allowing the client

to send more packets In the example represented in the Figure 210 the window ends

up two position to the right because the sender got two acknowledgements The client

cannot send more than three packets straight without any ACK received since the size

of the window is three

22 Network Performance Metrics

In this section we focus on different aspects relating to network efficiency and perfor-

mance

221 RTT

Round trip time (RTT) is the time interval from a packet is sent to acknowledgement

of the packet is received (ignoring retransmissions) [22] This time is measured with

several samples in order to achieve a reliable result This time depends on several factors

such as the data transfer rate of the connection the material the network is made of

the distance between sender and receiver number of nodes the packets go through the

amount of traffic in the network and so on The RTT has a established minimum time

since it cannot be less than the time the signals take to go through the network The

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 27: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

22 Network Performance Metrics 25

formula to get the value of the RTT within a network is shown in the equation 21

EstimatedRTT = α lowast EstimatedRTT + (1 minus α) lowast SampleRTT (21)

Where α is a value (0 ltα lt1) that must be set For TCP it is advisable to fix this

parameter between 08 and 09 An example of exchange of packets and their direct

relation with the RTT is set out in the Figure 211

Figure 211 Example RTT interval

222 Jitter

Jitter is a variation in the delay of the packets sent within a network [15] A sender

will transmit many packets straight one after the other with a certain distance between

them However problems with network congestion queues or configuration errors cause

that this distance between packets varies The implications of the jitter in the pictures

can be seen in the Figure 212

Jitter is a great problem since these fluctuations happen randomly and change very

quickly in time Therefore it is crucial to correct this problem as much as possible

One solution for this problem is to set a buffer which receives the packets at irregular

intervals This buffer will hold these packets for a short space of time in order to reorder

them if necessary and leave the same distance between each packet The main problem

of this method is that this buffer adds delay to the transmission They also will always

have a limited size so if the buffer is full of packets the new packets that come will be

dropped and they will never arrive to their destination

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 28: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

26 Related work

Figure 212 Jitter effect

223 Latency

Latency [15] indicates the time a message takes to go from one point of the network

to another Several factors affect to this parameter The first contributor to network

latency is the propagation delay which is basically the time a packet takes to get from

one point to another at the speed of light The second factor to keep in mind is the time

it takes to transmit data and this depends on the bandwidth and the size of the packet

The last contributor is related with the queueing delays in switches and bridges where

packets are usually stored for some time These factors can be defined in the next three

formula

Latency = Propagation+ Transmit+Queue (22)

Propagation = DistanceSpeedOfLight (23)

Transmit = SizeBandwidth (24)

224 Bandwidth

This concept describes the number of bits that are transmitted within the network during

a second [15] There is an important relationship between bandwidth and latency to talk

about To visualize this idea it may help to think in a pipe through where the data

pass The bandwidth would be the diameter of the pipe and the latency the length of

this pipe A simple draw with the relation among network latency and bandwidth is in

the Figure 213

If we multiply both terms we will achieve the number of bits that can be transmitted

in this pipe at a given instant For instance a channel with 50 ms of latency and 45

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 29: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

23 Tools strongly associated with this thesis 27

Figure 213 Relation between Latency and Bandwidth

Mbps of bandwidth will be able to contain

50 lowast 10minus3 s lowast 45 lowast 106 bitss = 225 lowast 106 bits (25)

If more bandwidth is requested just adding more pipes the problem is solved

225 Throughput

Throughput [15] is defined as the amount of data that can be sent from one host to

another in a given time This concept is used to measure the performance or efficiency

of hard drives RAM and networks The throughput can be calculated with the next

formula

Throughput = TransferSize TransferT ime (26)

TransferT ime = RTT + 1Bandwidth lowast TransferSize (27)

Where RTT is the round trip time

Throughput and bandwidth can be sometimes confusing terms Bandwidth refers to

the number of bits per second that can be transmitted in practice However due to

inefficiencies of implementation or errors a couple of nodes connected in the network

with a bandwidth of 10 Mbps will usually have a throughput much lower (for instance 2

Mbps) so that the data can be sent at 2 Mbps at the most

23 Tools strongly associated with this thesis

We shall briefly describe a variety of tools which might be useful to develop this project

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 30: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

28 Related work

231 Network Tools

In this section we describe some tools and applications related with the computer network

management

SSLsplit

SSLsplit [23] is a tool to control attacks against SSLTLS network connections These

connections are intercepted and redirected to SSLsplit This tool may end SSLTLS and

launch a new SSLTLS connection with the same receiver address The goal of this tool

is to be helpful to test and analyze networks This tool can work with TCP SSL HTTP

and HTTPS connections over IPv4 and IPv6

Wireshark

Wireshark [24] is a powerful network packet analyzer with a high number of functions

This tool can capture datagrams and show in detail everything that the packet carries

Overall the aim of using wireshark is to solve and manage network problems examine

security problems remove errors in protocol implementations This program displays

the characteristics of the packets in great detail splitting them up in different layers

With this program users can see easily a list with captured packets running in real time

the details of a selected packet and the packet content in hexadecimal and ASCII In

addition it is also possible to filter the datagrams in order to make easier the search for

the packets which makes wireshark very manageable

Tcpdump

Tcpdump [25] is a tool to analyze packets that are going over the network Some

reasons why it is interesting to use tcpdump are verify connectivity between hosts and

look into the traffic network This tool also allows us to pick out particular kinds of

traffic depending on the header information Moreover it is possible to save all the traffic

captured in a file in order to be used in a future analysis These tcpdump files can be also

opened with software like wireshark Moreover tcpdump provides many instructions to

capture packets in different ways which give us a broad range of possibilities to manage

the traffic

Proxy

Proxy [26] is a server used as a gateway between a local network and another much

wider network A proxy is located in the middle of the communication between sender

and receiver The proxy receives the incoming data from one port and it forwards this

information to the rest of the network by another port Proxies may cache web sites

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 31: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

23 Tools strongly associated with this thesis 29

This happens each time a user from a local network asks for some URL The proxy that

receives this request will store a temporary copy of the URL The next time that a user

asks for the same web site the proxy can send the cached copy to the user instead of

forwarding the request to the network to find again the URL We can see this process in

the picture below where the proxy asks for each web site only once An example of how

a proxy works and handle the incoming requests is shown in the Figure 214

Figure 214 Proxy operation

In this way proxies can make much faster the delivery of packets within the network

but this is not the only function they cover They may also be used to avoid that hackers

get internal addresses since these proxies can block the access between two networks

Proxies can take part as a component of a firewall

232 Programming language

Several programming languages can be use for network programming Python [27] is one

of the most important and provides a library called Boto which could be very helpful

for this thesis

Boto

Boto [28] offers a Python interface to several services offered mainly by Amazon Web

Services (AWS) To use Boto is required to provide the Access Key and Secret Key which

we can either give manually in every connection or add in the boto file In addition it is

necessary to create connection objects before creating a machine These machines provide

a stable and secure execution environment to run applications Then main fields in which

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 32: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

30 Related work

Boto is involved are computer database deployment application services monitoring

storage and so on

233 Operating Systems

There are several sort of operating systems such as Microsoft Windows Linux and Mac

OS However the opportunities and ease to manage network tools are not the same in

all of them We believe that for the development of this thesis Linux would be more

suitable

Linux

Linux [29] is a computer operation system created by volunteers and employees of

many companies and organizations from every parts of the world in order to make of this

product free software The main advantages of Linux are low cost stability performance

network functionality security and so on This operating system very seldom freezes up

or slows down It can also provide high performance and support for networks where

client and server systems can be set up easily and quickly on a computer with Linux

It is very secure as well since Linux asks the user for the permissions Nowadays this

operating system is used more and more in both homes and companies due to all its

functionalities Linux offers many network applications so it could be very useful for this

thesis

We have described in this chapter many issues about networks which are crucial in

the next sections It is important to have a deep knowledge about this matter because

it is needed when it comes to analyze and recreate traffic network later on

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 33: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CHAPTER 3

Traffic Test

In this chapter we created the first scenarios to carry out the required simulations We

started with a simple example M2M and we ended up adding a proxy in between and

simulating several clients These scenarios were analyzed to acquire a deep knowledge

about this framework in order to extract the pattern properly later on

31 Client Server Application

At this point after describing the related work of this thesis we are ready to develop

a client-server application with python [27] The goal of this part was to analyze the

traffic in a very simple case between single client and server This is an easy way to

start setting up a connection and design a methodology with tools for developing a larger

scale testing later The structure of this connection is shown in the Figure 31

Figure 31 Structure client server

The tool chosen to program is called python This is a high level programming language

very recommendable for network programming due to its ease of handling in this field

31

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 34: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

32 Traffic Test

When it comes to program the client for this application it was needed to set the

server Inet address and a random port for the exchange of data It was also necessary

to create a socket and connect it to the address and through the port mentioned before

In addition to program the server is required to set the hostname and the same port

opened in the client Moreover we have to create a socket and bind both hostname and

port to the socket Finally we made the socket wait for incoming packets from the client

and accept the connection

In the List 31 the required packets to establish a client-server connection are shown

Listing 31 Establish connection

1 rdquo0 6 65 3 17 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo74rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [SYN] Seq=0 Win=5840 Len=0 MSS=1460

SACK PERM=1 TSval =4769150 TSecr=0 WS=64

2 rdquo0 6 69 7 36 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo66rdquo EtherNetminusIPminus1 gt rdquo49588rdquo [SYN ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS

=1452 WS=4 SACK PERM=1

3 rdquo0 6 69 7 66 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=1 Win=5888 Len=0

This exchange of packets is called the three way handshake Analyzing these segments

we can see clearly how the flags and the sequence and acknowledgement number have

the expected value The client sends a message with the flag SYN set to 1 in order to

connect to the server and with a random sequence number x Wireshark set by default

a relative sequence number starting with zero The answer of the server has the flags

SYN and ACK activated and with sequence number y and acknowledgement number

x+1 Finally a third packet is sent from the client only with the flag ACK set to 1 and

acknowledgement number y+1

When it comes to terminate the connection a packet with the flags FIN and ACK

activated is sent from the point where is wanted to close the connection Then there

is an ACK segment as a response This exchange must happened in both directions to

close the connection from both points otherwise only one point would be closed and the

other one could still send data These two packets are set out in the List 32

Listing 32 Terminate connection

1 rdquo0 6 71 9 45 rdquo rdquo 19 2 1 6 8 1 33 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquoTCPrdquo rdquo60rdquo EtherNetminusIPminus1 gt 49588 [ FIN ACK] Seq=1 Ack=1 Win=182952 Len=0

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 35: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

32 Loading test with proxy 33

2 rdquo0 6 72 2 51 rdquo rdquo 19 2 1 6 8 1 24 rdquo rdquo 19 2 16 8 1 33 rdquo rdquoTCPrdquo rdquo54rdquo rdquo49588rdquo gt

EtherNetminusIPminus1 [ACK] Seq=1 Ack=2 Win=5888 Len=0

In this section we made a simple test establishing and terminating a connection be-

tween client and server and checking the packets going through the network This is a

simple example to start looking into the behavior of the segments between client and

server

32 Loading test with proxy

In this section we start with the first part of the method which involves the creation of

the client-proxy-server scenario to run simulations with TCP sockets The connection is

set up with a proxy Squid in between [30] The structure is shown is the figure 32 After

setting up this connection we sent traffic in order to analyze the segments sent measure

the performance and extract a traffic pattern

Figure 32 Structure client proxy server

A proxy has been set up between the communication client-server to capture and

analyze the traffic so that we can recreate the pattern of communications and make

realistic loads towards the server In the beginning was needed to access to the instances

This is done through the port 22 and there are two main reasons to do so Firstly we

had to configure the proxy to accept the incoming packets and forward them properly

And secondly to run scripts in the instance it was necessary to access there and install

the required libraries to use that script Moreover some programs such as Tcpdump or

Wireshark were installed in the proxy instance to sniff the traffic When the proxy was

ready we could move on to write the script to create the scenario and make simulations

Several simulations were carried out with different types of instances [9] The sort of

ec2 instance matters regarding memory but also speed which is important in these tests

It is advisable to use a high performance instance for the proxy and the server in order

to handle all the packets quickly Especially in later tests when there are several data

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 36: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

34 Traffic Test

sources In order to develop these simulations we programmed the script Simulationpy

with boto so that the tests would be done automatically This script creates a scenario

comprised of in the simplest case three instances which are data source proxy and

server This script gives also the possibility of picking out the type of instance used for

the simulation Moreover after starting the instances the script set and initialized the

required server data sources and proxy Both server and data source were programmed

also with python due to its ease to develop anything related with networks

The goal of the data source is to send TCP packets towards the server always go-

ing through the proxy The server must answer to those packets creating a normal

connection Obviously before the exchange of data began the data source established

connection sending packets with the flag SYN set to 1 This is just done once in the

whole communication

When the packets were analyzed in the proxy it was possible to see how a TCP segment

with the flag SYN was sent towards the proxy Then another TCP packet arrived to the

data source This segment is the response from the proxy with the flags SYN and ACK

set to 1 This indicates the connection is established and the system is ready to exchange

information Finally the data source answers sending another packet to acknowledge the

previous packet This is shown in the list 33 and is called 3 way handshake [31]

Listing 33 Establishing data source-proxy connection

rdquo1rdquo rdquo0 000000rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo45125 gt

ndlminusaas [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294898611 TSecr=0 WS=16rdquo

rdquo2rdquo rdquo0 000054rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo74rdquo rdquo ndlminusaas

gt 45125 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294908735 TSecr =4294898611 WS=128rdquo

rdquo3rdquo rdquo0 000833rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125 gt

ndlminusaas [ACK] Seq=1 Ack=1 Win=14608 Len=0 TSval=4294900164

TSecr =4294908735rdquo

When this connection is established the data source sends a HTTP packet to the

proxy indicating the DNS server address Then the proxy looks for the IP address of

that server sending DNS packets We can see this in the list 34

Listing 34 Searching server IP address

rdquo4rdquo rdquo0 000859rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoHTTPrdquo rdquo197rdquo rdquo

CONNECT ec2minus54minus228minus99minus43euminuswest minus1compute amazonaws com

50007 HTTP11 rdquo

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 37: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

32 Loading test with proxy 35

rdquo6rdquo rdquo0 001390rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0xb33a AAAA ec2minus54minus228minus99minus43euminuswest minus1

compute amazonaws comrdquo

rdquo7rdquo rdquo0 002600rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo166rdquo rdquo

Standard query response 0xb33a rdquo

rdquo8rdquo rdquo0 002769rdquo rdquo10 235 11 67rdquo rdquo172 16 0 23rdquo rdquoDNSrdquo rdquo108rdquo rdquo

Standard query 0 xa3f9 A ec2minus54minus228minus99minus43euminuswest minus1compute

amazonaws comrdquo

rdquo9rdquo rdquo0 003708rdquo rdquo172 16 0 23rdquo rdquo10 235 11 67rdquo rdquoDNSrdquo rdquo124rdquo rdquo

Standard query response 0 xa3f9 A 1 0 2 2 4 8 3 2 1 rdquo

Finally the proxy sends also a packet with the flag SYN activated to set up the com-

munication between them two In this way the whole communication data source-proxy-

server is ready to work This exchange of packets is shown in the list 35

Listing 35 Establishing proxy-server connection

rdquo10rdquo rdquo0 003785rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo74rdquo rdquo33271

gt 50007 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK PERM=1

TSval=4294908736 TSecr=0 WS=128rdquo

rdquo11rdquo rdquo0 438963rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo74rdquo rdquo50007

gt 33271 [SYN ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460

SACK PERM=1 TSval=4294910381 TSecr =4294908736 WS=16rdquo

rdquo12rdquo rdquo0 439029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=4294908845

TSecr =4294910381rdquo

Then a HTTP10 200 OK connection established gets to the data source Therefore

now the connection is ready to start sending data In these simulations was decided to

send data from time to time with random time periods This makes the simulations be

more realistic since normally it is difficult to know when a client is going to communicate

with a server

The eight packets which compose the exchange of data between data source and server

are shown in the list 36

Listing 36 Exchange of data source-proxy-server

rdquo15rdquo rdquo0 466800rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo45125

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 38: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

36 Traffic Test

gt ndlminusaas [PSH ACK] Seq=132 Ack=40 Win=14608 Len=5 TSval

=4294900280 TSecr =4294908845rdquo

rdquo16rdquo rdquo0 466813rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo66rdquo rdquo ndlminusaas gt 45125 [ACK] Seq=40 Ack=137 Win=15616 Len=0 TSval

=4294908852 TSecr =4294900280rdquo

rdquo17rdquo rdquo0 466975rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo71rdquo rdquo33271

gt 50007 [PSH ACK] Seq=1 Ack=1 Win=14720 Len=5 TSval

=4294908852 TSecr =4294910381rdquo

rdquo18rdquo rdquo0 467901rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo50007

gt 33271 [ACK] Seq=1 Ack=6 Win=14480 Len=0 TSval=4294910389

TSecr =4294908852rdquo

rdquo19rdquo rdquo0 468018rdquo rdquo10 224 83 21rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo71rdquo rdquo50007

gt 33271 [PSH ACK] Seq=1 Ack=6 Win=14480 Len=5 TSval

=4294910389 TSecr =4294908852rdquo

rdquo20rdquo rdquo0 468029rdquo rdquo10 235 11 67rdquo rdquo10 224 83 21rdquo rdquoTCPrdquo rdquo66rdquo rdquo33271

gt 50007 [ACK] Seq=6 Ack=6 Win=14720 Len=0 TSval=4294908852

TSecr =4294910389rdquo

rdquo21rdquo rdquo0 468083rdquo rdquo10 235 11 67rdquo rdquo10 34 252 34rdquo rdquoTCPrdquo rdquo71rdquo rdquo ndlminusaas gt 45125 [PSH ACK] Seq=40 Ack=137 Win=15616 Len=5 TSval

=4294908852 TSecr =4294900280rdquo

rdquo22rdquo rdquo0 508799rdquo rdquo10 34 252 34rdquo rdquo10 235 11 67rdquo rdquoTCPrdquo rdquo66rdquo rdquo45125

gt ndlminusaas [ACK] Seq=137 Ack=45 Win=14608 Len=0 TSval

=4294900291 TSecr =4294908852rdquo

In this list 36 the packets with the PSH flag set to 1 denote that there is data being

sent in that segment [32] In these simulations the data source sent packets with data to

the server which simultaneously replayed with the same data to the data source Every

packet in the list 36 with the flag PSH activated is sending data First from data source

to proxy which forwards everything to the server And then all the way around sending

the data from server to data source

To test the performance of the scenario created many simulations were carried out

with different type of instances number of data sources and amount of data Each data

source was set in different instances and the number was scaled up from one up to ten

The network was firstly tested with a traffic load based on the sending of 1980 bytes of

data and later with a heavier load of 5940 bytes of data These loads were sent up to

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 39: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

32 Loading test with proxy 37

200 times with a random waiting time between them of either 1 or 2 seconds

The first simulations were carried out with only one data source In this section we

show the graphs with the number of bytes going through the proxy just with one client

connected The Figure 33 represents the number of bytes over time in the proxy in the

simulation with 1980 bytes of data Furthermore the Figure 34 represents the other

simulation with heavier traffic load The type of instance used is the same in both

examples

Figure 33 Bytes through the proxy with data burst of 1980 bytes

Figure 34 Bytes through the proxy with data burst of 5940 bytes

As expected the average of bytes in the Figure 34 is approximately three times bigger

than in Figure 33 This makes sense since the data sent is three times bigger as well

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 40: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

38 Traffic Test

therefore is needed around triple number of packets Other issue to point out is that the

Figure 34 is smoother than the other one This is not only due to the effect of the scale

in the graph but also because the frequency and amount of segments being sent in the

second case is bigger

33 Loading test with several clients

After the simulations with one client it was time to test the server harder A realistic

way to do so is simulating the connection of several clients To do so we created a similar

environment but in this case with a variable amount of data sources All this scenario is

created with a python script as the environment used previously for one client At this

point the server was tested with up to ten data sources The scheme is shown in the

Figure 35 Using the Amazon cloud it is possible to use instances setting one client in

each instance Therefore proxy receives packets from different IP addresses as would be

in a real case

Figure 35 Structure for simulation

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 41: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

33 Loading test with several clients 39

The next two graphs represent the same two kinds of simulation developed in the

previous section but in this case with ten data sources the maximum number tested

The Figure 36 shows the number of bytes going over the proxy with data burst of 1980

bytes as well as Figure 37 does with data burst of 5940 bytes

Figure 36 Bytes through the proxy with data burst of 1980 bytes

Figure 37 Bytes through the proxy with data burst of 5940 bytes

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 42: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

40 Traffic Test

The Figure 37 shows a much bigger amount of bytes exchanged compared to Figure

36 due to the heavier data load sent from data sources Both graphs are quite smooth

since the packet frequency that is being sent is high with ten data sources

34 Performance results

In this section the performance of the network was analyze in several ways First we

looked into the RTT values with different number of clients and then we analyzed other

important features explained later This becomes even more important in the last part

of the thesis when the number of clients is highly scaled up

First of all we compare two graphs which represent the average RTT of two simulations

differing only in the number of data sources For the Figure 38 packets were being sent

to the server from three different instances however in the Figure 39 there were up to

ten data sources working

Figure 38 Average RTT with 3 data sources

In these graphs there is no great difference since the data sent did not represent a big

problem in the network performance However we can appreciate that the lowest value

during the traffic exchange (2 ms approximately) last much longer in the Figure 38 In

the other graph there are many higher peaks therefore the RTT in this case is slightly

superior As expected the more clients the bigger congestion in the network and the

longer RTT

For every simulation the average RTT was calculated to very to what extent different

amount of data number of clients and type of instance affect to the network performance

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 43: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

34 Performance results 41

Figure 39 Average RTT with 10 data sources

As was mentioned before the RTT does not vary greatly If we look over the Table

31 and 32 we do not see large differences Moreover the lower in the table the shorter

period of time there should be However this does not apply in every case therefore the

type of instance is not very remarkable in these cases The simplest instance seems to be

enough for these exchange of data speaking about RTT values Concerning the number

of clients there is a slight difference especially comparing the RTT between 5 or 10 data

sources with only one But in general the results are quite similar because this amount

of packets do not represent serious problem for the network

Server instance type 1 source 3 sources 5 sources 10 sources

t1micro 00031 00046 00033 00039

m1large 00037 00035 00038 00032

c1medium 00031 00035 00051 00048

c1xlarge 00039 00043 00037 00042

Table 31 RTT with data bursts of 1980 bytes

The next analysis regarded some characteristics of network performance such as packet

loss TCP retransmissions and duplicate ACK It is remarkable that in this tests the

results were much more diverse The results show an average of packets since several

simulations were carried out for each case

In the Table 33 we have the average number of TCP packets which have been re-

transmitted in each type of simulation The number is low and the tests with 10 data

sources have more retransmissions Moreover examining this table by type of instance

the lowest quality instance (t1micro) seems to have more difficulties in the communica-

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 44: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

42 Traffic Test

Server instance type 1 Source 3 Sources 5 Sources 10 Sources

t1micro 00026 00022 00021 00029

m1large 00026 00024 00028 00024

c1medium 00028 00031 00025 00030

c1xlarge 00026 00029 00029 00024

Table 32 RTT with data bursts of 5940 bytes

tion With this instance the number of retransmissions is bigger Either way there is no

simulation that stands out concerning packets resent

The Table 34 shows packet losses Here the differences among the tests carried out

are considerably wider As expected the worst simulation with more difficulties in the

communication was the one with 10 data sources heaviest data burst and worst instance

Here there is an average of up to 67 lost packets Moreover we can appreciate how the

heaviest data burst is starting to create problems because there are many more losses

than in simulations with only 1980 bytes Every instances give better results than the

t1micro one Nevertheless there is no a very significant gap among these three instances

(m1large c1medium c1xlarge) The most important result in this tale concerns the

growth of packet loss as the number of data sources increases as well

Finally in the Table 35 we can check how many ACK were duplicated In this case

there are barely problems with the c1xlarge instance unlike with t1micro The table also

indicates the higher difficulty to send traffic properly with many data sources Finally

it must be pointed out that in these simulations a c1xlarge instance is enough to avoid

problems in the communication

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

15

0

0

0

2

0

2

m1large1980 bytes

5940 bytes

0

0

0

25

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1xlarge1980 bytes

5940 bytes

0

0

0

0

0

0

0

25

Table 33 Number of TCP retransmissions

Overall in this chapter the traffic pattern was analyzed and the network was tested

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 45: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

34 Performance results 43

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

2

0

6

0

15

0

67

m1large1980 bytes

5940 bytes

0

0

0

55

0

1

0

36

c1medium1980 bytes

5940 bytes

0

0

0

7

0

135

2

505

c1xlarge1980 bytes

5940 bytes

0

05

0

5

0

9

5

545

Table 34 Number of lost packets

Server instance type Data burst 1 Source 3 Sources 5 Sources 10 Sources

t1micro1980 bytes

5940 bytes

0

3

1

1

0

75

65

25

m1large1980 bytes

5940 bytes

0

0

0

0

0

0

0

0

c1medium1980 bytes

5940 bytes

0

0

0

0

0

25

45

25

c1xlarge1980 bytes

5940 bytes

05

05

0

0

0

0

05

0

Table 35 Number of duplicate ACK

to find out the influence of diverse factors in its performance We have seen how the

RTT vary just a little in the different tests Most probably due to the fact that the data

sent does not stress enough the server However in the last analysis the values measured

related with network performance gave more interesting results For example when it

comes to stress the network the number of clients is more significant than the type of

instance picked out Nevertheless the kind of instance was also important in to improve

the performance This was more noticeable when the number of data sources was high

The c1large instance solved a large part of the packet losses problem compared with the

t1micro one

After achieving these results we can move on to the next step where we extracted

the traffic pattern from these simulations All the process is explained in the following

chapter

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 46: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CHAPTER 4

Traffic Pattern Extraction

In this section a traffic pattern was obtained in order to recreate it and send this traffic

multiplied M2M (Machine-to-Machine) towards the same server later on It was needed

to look up a method to know how to develop a proper extraction to generate the traffic

again To do so we looked into several documents with projects where was explained

how traffic generators create realistic traffic From those publications we obtained three

main characteristics to rebuild the traffic as similar as possible The first one is the packet

length [33][34] this required packets to be created with same amount of data as well as

equally long headers The second feature concerns packet timestamp and packet time

distribution [34][35] It was needed that the outbound packets had similar frequency the

length of time from one packet to the next must be as similar as possible to the capture

we wanted to replay Finally to create a realistic traffic network it was significant to

send the same number of packets [33]

41 Collecting packet data

First of all we needed to obtain from a pcap file the most significant characteristics

of the packets We used the files recorded during the previous simulations in the proxy

instance The next step was to program a script in python made especially to obtain the

features needed from every packet The best option to make this possible was a python

library called dpkt [36] Using this library a script was written to collect the required

data from a pcap file such as packet time stamp length and data sent

To recreate the traffic the script had to extract the data of each packet one at a time in

order to resend them when replaying traffic However after some tests it was found out

that was much more accurate to gather the data from all the packets involved in one data

burst and put that data together again This way when it comes to replay the traffic

all the data contained in one burst is sent at once instead of sending the data packet by

packet This is exactly the same way the data source sent packets in the simulations

45

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 47: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

46 Traffic pattern extraction

therefore this method was much better to recreate the traffic pattern obtained Moreover

the script worked out the length of time elapsed from the first packet captured in the

simulation until the data source sent the first packet with data This was very helpful

when replaying the same capture since the data source started to replay the data at the

same time The script also extracted the timestamp when every data burst was sent

Therefore it was easier to compare graphs and the traffic recreation was highly precise

In the original simulations where the proxy was set a few protocols were needed to

establish the communication over this proxy These were a couple of HTTP and a few

DNS segments which were meaningless to recreate M2M traffic since there is no proxy in

between The best solution was to filter them out with the script written to extract the

pattern These segments would be very difficult to recreate and they are not noticeable

during the simulation due to their very low weight

42 Replaying traffic pattern

After analyzing the pcap file to obtain data sent and timestamps the same script must

save all this information so that a second script can use it to replay the packets properly

The first script saved in a file the information gathered from one data burst as well as the

timestamp Extractpatternpy is the script that obtains all this information and saves

it in the file mentioned Then the second script access to the information contained in

this file Knowing this information this program is ready to replay the data in the most

accurate and timely manner as well as with the same number of packets The script

resent the traffic using socket programming in python like in the simulations Both the

file and this script were deployed in an instance from where the packets were sent to the

server The script used to resend the traffic must be simple in order to run as fast as

possible and send the packets in an accurate manner I have to point out that in order to

test the server it is only necessary to run the second script named Replaytrafficpy since

it calls automatically the first script (Extractpatternpy) to obtain the traffic pattern

When replaying traffic M2M it was important to recreate the same traffic load than

in the original capture With this approach we could compare them to draw important

conclusions and check the accuracy of the method carried out To achieve this we had

to filter the data sent from data source to proxy These very data sniffed were replayed

twice M2M with the second script so that in the whole network we are sending the same

amount of data but in this case directly from client to server This strategy allows to

receiving the same data from the server as well therefore the behaviour of the packets

was very similar to the real case An example of this approach is represented in the Figure

41 The figure on the left shows the traffic network in the simulations with data source

proxy and server Furthermore the figure on the right is the result of implementing

the strategy mentioned before M2M As we can see the amount of data and number of

packets is the same

The results of following this strategy are shown in the Figure 42 In the graphs we

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 48: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

42 Replaying traffic pattern 47

Figure 41 Structure of traffic replayed M2M

compare the number of bytes sent over time in the simulation with one client and in the

recreation M2M

Figure 42 Comparison between simulation and replayed traffic

As can be appreciated in the Figure 42 both graphs follow a similar trajectory going

up and down in the same points Especially in the first half of the graph where the

traffic is exactly the same Then there is a point slightly different and from there the

graphs are not exactly the same However we must point out that they still keep a

very similar trajectory until the end and the difference does not get worse and worse

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 49: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

48 Traffic pattern extraction

Therefore this method to recreate the traffic is very accurate regardless of the amount of

time the simulation last Only in a few points the graphs do not match perfectly This is

due to the fact that the server response is something that we could not control Another

important issue is related with the duration Both graphs seem to finish approximately

at the same time Overall the graphs are very similar in amount of bytes sent and

duration Therefore this approach to extract the traffic pattern is very accurate

In this chapter we explained the method to obtain a pattern from the traffic generated

in the simulations Moreover the high precision of this approach was proven in the Figure

42 comparing the graph obtained in the simulation with another achieve recreating the

pattern obtained Therefore we can move on to the last part of the thesis where we

multiply this pattern to extract important results related with the server

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 50: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CHAPTER 5

Multiplying Traffic Pattern

In this chapter some features of the TaaS system developed are explained Then re-

markable results obtained from recreating the pattern in large scale are shown and

analyzed to find out how the server responses to certain quantities of data With the

values achieved we demonstrate the reliability of the TaaS [3] system created

51 Design of TaaS for M2M communications

As in the case of the previous simulations the scenario to recreate the traffic pattern

M2M was created within the Amazon cloud First of all the server was set up in a EC2

instance The TaaS infrastructure created allowed us to choose among different type of

instances easily where to deploy servers or clients Once the server was running we could

proceed to pick out number and type of instances in order to multiply the traffic pattern

towards this very server When these data sources started to run the TaaS system

sniffed the ongoing traffic in the server to finally download automatically the recording

for further analysis

This TaaS infrastructure was very flexible and highly customizable In the simulation

the data source characteristics could be easily modified For instance we could choose

amount of data to send frequency of data bursts and number of repetitions Therefore it

was possible to generate light normal or heavy traffic loads depending on the frequency

and the data sent In addition we could create short or long simulations choosing number

of repetitions The recordings obtained from these simulations are then multiplied M2M

so that the configuration is the same both in the simulations and the traffic recreations

Replaying real traffic is the best way to determine precisely [37] in this case how

much traffic the server can handle providing a certain quality of service The following

are some important characteristics when it comes to recreate traffic to stress a server

[37][38] First of all the traffic which is going to be replayed should not be a short time

period We must use the same infrastructure and recorded traffic when it comes to replay

49

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 51: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

50 Multiplying Traffic Pattern

the traffic pattern The traffic must be sent from different points at once simulating in

this way the connection of different clients Then scale up the number of clients in order

to increase the traffic towards the server until it stops working properly And finally look

into the break to figure out the server capacity

52 Reproduce testing

The following is a description of all the steps we must follow to use the TaaS system

created First of all we must configure the script Clientpy setting amount of data

repetitions and random waiting time with the values that suit us Then we run the

script Simulationpy to exchange the information between client and server The traffic

then is recorded and downloaded automatically in a pcap file to the computer where

we run Simulationpy The second part is about replaying the traffic pattern First we

must start the script Servertoreplaypy to set up the same server used in simulations but

now we have the chance to pick out the type of instance for this server Finally we run

Replaytrafficpy that obtains the traffic pattern and create the multiplier to recreate this

pattern towards the server Another file is then downloaded to the computer from which

we can extract the results of the tests

53 Results of traffic recreation

The first step to replay a real traffic pattern was to record a session in the proxy The

simulation chosen lasted nearly fifteen minutes and there was one data source sending

3960 bytes of data up to 400 times Between data burst there is a waiting time that

can vary from 1 to 3 seconds This feature make the simulations more realistic since we

never know when a client is going to establish a connection with a server

Once we have got a capture we were ready to multiply the traffic First we performed

several simulations scaling up the number of data sources We started with one data

source and increased up to 80 which was considered an enough number of clients to

create heavy loads of traffic Then we can compare between the different results and the

original simulation to extract interesting results The number of player was increased

one at a time every five seconds

In the Figure 51 is represented the number of bytes sent in different tests The instance

used in this case to set up the server is m1large The black red and blue graphs display

the recreation of the real simulation for one two and ten data sources respectively We

can appreciate that the red line seems to be twice as high as the graph for one client

This is exactly the value expected In addition the highest graph is ten times higher than

the black one so that all the data sources sent the right number of bytes The graph with

ten clients is rougher with higher peaks This is due to the fact that in those moments

several clients were not sending data so there is a big difference in the number of bytes

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 52: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

53 Results of traffic recreation 51

exchanged

Figure 51 Number of bytes over time in different tests

The Figure 52 shows the result of the same sort of tests but with higher number of

data sources The black graph was created with the amount of packets exchanged among

20 clients and the intended server The same for the red graph but in this case with up

to 80 sources sending data The black graph appears to have double amount of bytes

than the blue one of the Figure 51 This is a expected result since the number of clients

is twice larger as well However the red graph is not four times bigger than the black

one and it is very rough In fact in the 80 clients recreation appears to be problems to

send data very soon After 150 seconds the graph goes up very little and slowly In this

test with 80 clients every 5 seconds a new client was being added therefore the graph

should go up approximately until 400 seconds Nevertheless it stops going up after about

225 seconds Therefore in this case the communication seems to reach a limit in the

exchange of packets where bytes cannot be sent faster In this case when the number

of clients get to about 30 (keeping in mind there is a slight delay when increasing the

number of data sources)

Figure 52 Bytes using a m1large instance for the server

Now we can compare the Figure 52 with the results achieve running the same tests

but with a higher quality instance for the server We used the type c1xlarge If we

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 53: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

52 Multiplying Traffic Pattern

look over the Figure 53 and compare with the previous one we can appreciate the wide

difference that there can be between type of instances The black graph represents ten

clients and it has similar peaks and number of bytes sent than in the Figure 52 However

the recreation with 80 clients is much higher using the best instance The gap between

graphs is about three times larger in Figure 53 Either way there is another limit for the

data sent in this figure Here we can see that the network cannot exchange data faster

after about 60 clients are running

Figure 53 Bytes using a c1xlarge instance for the server

When it comes to analyze RTT as we can see in the Figure 54 the graphs obtained

from the tests simulating from 2 to 20 clients are not very different This is because

the server did not have problems to handle these amounts of clients as we could see in

the previous figures The most noticeable difference is in the 80 clients graph where

despite there are no peaks standing out the average is quite higher than in the other

tests This graph appears to be smoother as well because there are packets coming and

going more often than in the rest so there cannot be large variations Therefore the

main characteristics when recreating many clients is the bigger RTT average and the

smoothness of the graph obtained The rest of the test seems to have a similar RTT

Figure 54 Average RTT extracted from the traffic recreations

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 54: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

53 Results of traffic recreation 53

Finally we looked into another important feature of network performance packet losses

The percentage of the losses for the four type of instances tested is shown in the Table

51 In these cases the results vary slightly depending on the kind of instance used to

set up the server However the most important differences in the results are related with

the number of clients sending traffic In most of the cases for one or two clients there is

no even one segment lost Nevertheless with 80 clients in some tests the number of lost

packets reaches a percentage that definitely affects to the quality of service Normally

there is an important change in the results between 10 20 and 80 becoming worse and

worse One thing to point out in the decrease in the gap of packet losses between 20 and

80 clients as the type of instance becomes higher and higher quality

Server instance type 1 Client 2 Clients 10 Clients 20 Clients 80 Clients

t1micro 0 0011 0044 0091 0128

m1large 0 0027 0053 0128 0154

c1medium 0007 0 0039 0076 0085

c1xlarge 0007 0004 0067 0120 0125

Table 51 Percentage of lost packets

In this chapter we have shown many results obtained after recreating the pattern ex-

tracted Overall with the tests carried out and explained in this chapter we could check

how the server could handle the traffic in different situations There were similar quan-

tities of data during the whole simulation Afterwards we achieved some results relating

to number of bytesseconds where the exchange of packets always reached a limit before

being able to simulate up to 80 clients at once We observed that the maximum number

of clients handled for the server can vary depending on the kind of instance used Round

Trip Time was not really different until the number of clients increases quite a lot With

many clients the RTT graph becomes smoother and its RTT average gets higher Finally

the Table 51 gave us significant information about the sending of packets This table

informs about the risk in the packet delivery of connecting many clients to the server at

once The number of lost segments is different with the sort of instance but especially

with the number of clients used

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 55: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CHAPTER 6

Summary and Conclusions

In this last chapter we summarize the work carried out during this thesis and the results

obtained We also describe the possible future work related with this project

61 Summary and Results

In this thesis we wanted to develop a TaaS system for a M2M communication The

whole work was mainly divided in several steps creating a client-proxy-server scenario

extracting traffic pattern and multiplying it directly from client to the server

Therefore in the first part we developed in the Amazon Cloud the scenario we just

mentioned Then we tested it to find out how its behaviour and performance changed

under different conditions (number of clients type of instance etc) In the next step we

came down to the most important part of the thesis where we had to extract and recreate

the traffic pattern To achieve a correct pattern extraction and recreation the number of

bytes and some packet features must be as similar as possible in the simulations and in

the traffic replayed M2M When the script to recreate the traffic pattern was ready we

proceeded to replay the packets towards the server With the TaaS environment created

the server could be set up in different type of instances to analyze how this affected to

the server performance Moreover the number of clients could be highly scaled up to find

out how the server would manage to handle heavy traffic loads Finally after carrying

out many tests under different conditions the results were shown and explained trying

to estimate the reliability of the TaaS system developed

The results of the simulations show how the number of bytes was going up correctly as

the number of clients increased The RTT was very similar in all the tests since the traffic

load was not heavy enough to stress the server However some network performance

features such as packet loss tcp retransmissions and duplicated ACK had more different

values This results shown a performance improvement in the network when using high

quality instances and a deterioration when the number of clients was raising

55

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 56: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

56 Summary and Conclusions

When it comes to replay the traffic pattern the first step was to compare the results

obtained from recreating the packets with the simulation In this way we could find out

the quality of this traffic recreator We compared the amount of bytes sent over time and

the result was practically the same most of the time It is remarkable that the recreation

did not become worse and worse the similarity in both graphs was permanent Therefore

with this TaaS system developed we may recreate long simulations In the final results

obtained from the traffic pattern recreation we have shown how the server reached a

maximum exchange of bytes This happens when we highly increased the number of

data sources It has been proven as well how this limit can be higher or lower depending

on the instance quality The differences are slighter when measuring the RTT which was

very similar except when the number of data sources running was very high It was more

interesting the packet loss results The percentage of lost packets was much higher when

many clients were sending data There was also certain variance in these results when

we compared among different type of server instances

Overall there seems to be good results about the amount of bytes the server could

handle and the growth of packet loss with the number of clients However the RTT

results were not so satisfactory I must say that after testing many different TCP

servers this RTT behaved in a different way depending on the server

These results have estimated the good functionality and reliability of the TaaS for M2M

system created In addition this structure offers many options such as number of clients

type of instances used and so on Therefore the good results along with the flexibility

and many options this environment offers prove the great usefulness of this system

62 Future WorkIn this thesis the TaaS system created is based in TCP protocols and this system focused

on performance and scalability testing mainly As future work we propose to

1 Develop other cloud tests such as connectivity testing security testing compatibil-

ity testing etc

2 Test different type of servers for instance a HTTP server

3 Work out test costs before starting

4 The scripts may be modified to test different scenarios

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 57: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

CHAPTER 7

Appendices

57

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 58: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

List of Abbreviations

ACK Acknowledgement

ARP Address Resolution Protocol

ASCII American Standard Code for Information Interchange

ATM Asynchronous Transfer Mode

DNS Domain Name System

HTML HyperText Markup Language

HTTP Hypertext Transfer Protocol

IEEE Institute of Electrical and Electronics Engineers

ICMP Internet Control Message Protocol

IP Internet Protocol

Len Length (amount of data)

M2M Machine-to-Machine

MAC Media Access Control

MSS Maximum Segment Size

MTU Maximum Transmission Unit

NAT Network Address Translation

NetBIOS Network Basic InputOutput System

OSI Open Systems Interconnection

PDU Protocol Data Unit

RPC Remote Procedure call

RTT Round Trip Time

SaaS Software as a Service

SACK PERM Sack Permission

Seq Sequence

SSH Secure Shell

SSL Secure Sockets Layer

TaaS Testing as a Service

TCP Transmission Control Protocol

TSecr Timestamp echo reply

TSL Transport Layer Security

TSval Timestamp value

UDP User Datagram Protocol

59

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 59: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

60 Appendices

Win Window size

WS Window Scale

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 60: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

List of Tables

31 RTT with data bursts of 1980 bytes 41

32 RTT with data bursts of 5940 bytes 42

33 Number of TCP retransmissions 42

34 Number of lost packets 43

35 Number of duplicate ACK 43

51 Percentage of lost packets 53

61

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 61: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

List of Figures

11 Flow diagram of the developed system 10

21 OSI model 12

22 HTTP request 14

23 Fields of the IP Header 16

24 Datagram fragmentation 18

25 ARP request 19

26 Ethernet layers in OSI model 19

27 UDP protocol header 20

28 TCP protocol header 22

29 Establishing a connection in TCP 23

210 Sliding window method 24

211 Example RTT interval 25

212 Jitter effect 26

213 Relation between Latency and Bandwidth 27

214 Proxy operation 29

31 Structure client server 31

32 Structure client proxy server 33

33 Bytes through the proxy with data burst of 1980 bytes 37

34 Bytes through the proxy with data burst of 5940 bytes 37

35 Structure for simulation 38

36 Bytes through the proxy with data burst of 1980 bytes 39

37 Bytes through the proxy with data burst of 5940 bytes 39

38 Average RTT with 3 data sources 40

39 Average RTT with 10 data sources 41

41 Structure of traffic replayed M2M 47

42 Comparison between simulation and replayed traffic 47

51 Number of bytes over time in different tests 51

52 Bytes using a m1large instance for the server 51

53 Bytes using a c1xlarge instance for the server 52

54 Average RTT extracted from the traffic recreations 52

63

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 62: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

REFERENCES

[1] Alexa Huth and James Cebula ldquoThe Basics of Cloud Computingrdquo 2011

[2] J Strickland ldquoHow cloud computing worksrdquo httpcomputerhowstuffworks

comcloud-computingcloud-computing1htm Accessed January 2014

[3] J G et al ldquoA cloud-based taas infrastructure with tools for saas validation perfor-

mance and scalability evaluationrdquo in 2012 IEEE 4th International Conference on

Cloud Computing Technology and Science

[4] Guru99 ldquoPerformance testingrdquo httpwwwguru99comperformance-testing

html Accessed January 2014

[5] O E David Boswarthick and O Hersent M2M Communications A Systems Ap-

proach Hoboken NJ USA Willey 2012

[6] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed January

2014

[7] OWASP ldquoTesting for ws replayrdquo httpswwwowasporgindexphpTesting_

for_WS_Replay_28OWASP-WS-00729 Accessed January 2014

[8] A W Services ldquoAmazon elastic compute cloud (amazon ec2)rdquo httpaws

amazoncomec2 Accessed January 2014

[9] I Amazon Web Services ldquoAmazon ec2 instancesrdquo httpawsamazoncomec2

instance-types Accessed January 2014

[10] G C et al ldquotshark - dump and analyze network trafficrdquo httpwwwwireshark

orgdocsman-pagestsharkhtml Accessed January 2014

[11] E Software ldquotcprewriterdquo httptcpreplaysynfinnetwikitcprewrite Ac-

cessed January 2014

65

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 63: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

66

[12] P Biondi and the Scapy community ldquoWelcome to scapyrsquos documentationrdquo http

wwwsecdevorgprojectsscapydoc Accessed January 2014

[13] E Software ldquoWelcome to tcpreplayrdquo httptcpreplaysynfinnet Accessed

January 2014

[14] E Software ldquoFrequently asked questionsrdquo httptcpreplaysynfinnetwiki

FAQDoestcpreplaysupportsendingtraffictoaserver Accessed January 2014

[15] LPeterson and S Davie Computer Networks A Systems Approach San Franciso

Morgan Kaufmann 2007

[16] R Fielding J Gettys ldquoHypertext Transfer Protocol ndash HTTP11rdquo 1999

[17] Tutorialspoint ldquoHttp - quick guiderdquo httpwwwtutorialspointcomhttp

http_quick_guidehtm Accessed January 2014

[18] C Systems ldquoEthernet technologiesrdquo httpdocwikiciscocomwiki

Ethernet_Technologies Accessed January 2014

[19] B Hill Cisco The Complete Reference McGraw-Hill Osborne Media 2002

[20] J Postel ldquoUser datagram protocolrdquo 1980

[21] S Kollar ldquoIntroduction to ipv6rdquo 2007

[22] P Karn and C Partridge ldquoImproving round-trip time estimates in reliable transport

protocolsrdquo 1991

[23] D Roethlisberger ldquoSslsplit transparent and scalable ssltls interceptionrdquo http

wwwroechSSLsplit Accessed January 2014

[24] R S Ulf Lamping and E Warnicke ldquoWireshark userrsquos guiderdquo httpwww

wiresharkorgdocswsug_html_chunked Accessed January 2014

[25] L M Garcia ldquoTcpdump libpcaprdquo httpwwwtcpdumporg 2010-2014 Ac-

cessed January 2014

[26] B Mitchell ldquoIntroduction to proxy servers in computer networkingrdquo http

compnetworkingaboutcomcsproxyserversaproxyservershtm 2014 Ac-

cessed January 2014

[27] P S Foundation ldquoPython programming language a official websiterdquo httpwww

pythonorg 1990-2013 Accessed January 2014

[28] M Garnaat ldquoboto 2240rdquo httpspypipythonorgpypiboto Accessed Jan-

uary 2014

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014

Page 64: MASTER'S THESISdocshare04.docshare.tips/files/25293/252934504.pdftesting in the cloud and evaluating performance and scalability. TaaS can o er several kinds of cloud testing such

67

[29] M Garrels Introduction to Linux A Hands on Guide 2008

[30] D Wessels Squid The Definitive Guide Sebastopol OrsquoReilly and Associates 2004

[31] T Beardsley and J Qian ldquoThe tcp split handshake Practical effects on modern

network equipmentrdquo 2010

[32] Information Sciences Institute University of Southern California ldquoTansmission con-

trol protocolrdquo 1981

[33] P Owezarski and N Larrieu ldquoA trace based method for realistic simulationrdquo in

IEEE International Conference on Communications 2004

[34] P M Sandor Molnar and G Szabo ldquoHow to validate traffic generatorsrdquo in IEEE

International Conference 2013

[35] C-Y K et al ldquoReal traffic replay over wlan with environment emulationrdquo in

IEEE Wireless Communications and Networking Conference Mobile and Wireless

Networks

[36] J Silverman ldquoMy documentation on dpktrdquo httpwwwcommercialventvac

comdpkthtml Accessed January 2014

[37] G Shields ldquoTesting network performance with real trafficrdquo

httpcommunitiesquestcomcommunitynmsblog20120921

testing-network-performance-with-real-traffic Accessed October

2014

[38] Triometric ldquoReplaying xml trafficrdquo httpwwwtriometricnetsolutions

travelreplaying-xml-traffichtml Accessed January 2014