THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab...

28
THEORETICAL FOUNDATION 2.1 Data Backup In general, backup in computing system can be defined as the process to help or support the active main system in case of it occurring a failure to function. More specifically, the data backup can be considered as the activity of copying files or databases to the backup media in order to preserve it preventing occurred data loss in the main system [5]. The backed-up data can be reside in the backup media in the same form or achieved depends on the need. Once the data backed-up, there must be a way to retrieve it from the backup media to the main system or other. The process can be said as backup restoration or recovery. Backup is usually a routine part of the operation of large businesses with mainframes as well as the administrators of smaller business computers. For personal computer users, backup is also necessary but often neglected. A survey from Symantec in 2009, regarding to information protection in Small Medium Business, shows that annually those companies spent up to $16,000 for backup, recovery and archival, and $10,000 for disaster preparedness [2]. There are many methods to perform data backup which will impact the performance of the backup process including the speed of the data transfer, the comprehensiveness of the backed-up data, the storage needed, the easiness of data restoration, and the network performance (if any). At last, those factors will also determined the cost that the company have to spent. 19

Transcript of THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab...

Page 1: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

THEORETICAL FOUNDATION

2.1 Data Backup

In general, backup in computing system can be defined as the process to help or

support the active main system in case of it occurring a failure to function. More

specifically, the data backup can be considered as the activity of copying files or

databases to the backup media in order to preserve it preventing occurred data loss in the

main system [5]. The backed-up data can be reside in the backup media in the same form

or achieved depends on the need. Once the data backed-up, there must be a way to

retrieve it from the backup media to the main system or other. The process can be said as

backup restoration or recovery.

Backup is usually a routine part of the operation of large businesses with

mainframes as well as the administrators of smaller business computers. For personal

computer users, backup is also necessary but often neglected. A survey from Symantec

in 2009, regarding to information protection in Small Medium Business, shows that

annually those companies spent up to $16,000 for backup, recovery and archival, and

$10,000 for disaster preparedness [2].

There are many methods to perform data backup which will impact the

performance of the backup process including the speed of the data transfer, the

comprehensiveness of the backed-up data, the storage needed, the easiness of data

restoration, and the network performance (if any). At last, those factors will also

determined the cost that the company have to spent.

19

Page 2: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.2 Classification of Data Backup Method

To classify the data backup approach, we need to determine the base on how the

data backup can be done. The author synthesize the categories into 6 classification of

methods based on their architectures, storage locations, medias, frequencies, scales, and

attributes. The following figure 3 will maps the synthesize classification of data backup

method.

20

Figure 3: Classification of data backup

Page 3: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.2.1 Backup Architecture

Regarding to its architecture, backup can be classified as directly attached

backup, client-server backup, and storage area network backup. Table 1 shows the brief

comparison about the backup architecture characteristics.

Directly Attached Client-Server SANTransfer rate Fast Slow FastImplementation Easy Moderate DifficultManagement Difficult Easy EasyCost Low Moderate High

Table 1: Comparison of backup architecture

2.2.1.1 Directly Attached Backup

As its name, this approach is done without using network connection. In

other words, the administrator need to directly attach the backup media into the

backup client as the secondary storage [6]. The media can be in form of external

hard disk, tape, DVD, etc. Generally, the directly attached media should have the

fastest performance during the data transfer, compared to the network media such

as gigabit ethernet or fiber optic cable.

Despite of the considerably fast transfer rate, non-network approach

might become troublesome when the administrator have to handle many backup

clients. The one who handle the backup have to attach the media one to another

backup client. Without aided software, it is difficult for the admin to manage such

backup rotation that could make the backup management to be disorganized.

This non-network approach can offer a low cost solution needed to just

21

Page 4: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

provide the backup media. Due to its low difficulty to implement, this approach

might use by entry level enterprise that does not implement any backup

management before. There also a probability that large enterprise done non-

network backup to complement the more advance backup solution.

2.2.1.2 Client-Server Backup

The client-server backup consist of backup clients, which have data to be

backed up, and backup servers, which performing the backup and hold the data. It

is often for a company to have many to one relationship in this kind of

architecture. In other words, they can specify a dedicated server to backup some

other servers or workstations shown in figure 4. Those clients and server are

connected through a network connection, can be in various network type.

The backup server provide a flexibility in managing the backup. There are

options to choose such enterprise level operating system to cope the environment.

There are also numerous of software with various backup algorithm to aid the

admin. Having a centralized control also provide easiness to handle the clients

22

Figure 4: Client-server backup based on Preston

Page 5: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

scalability. A server also can be installed with many options of backup media such

as SATA, SAS hard disk, or using tape drive to keep the backup image or a plain

form of backup.

The drawback of this kind of method is that the client and server are

mostly connected to TCP/IP network. It cause a bottleneck in term of transfer rate

from a quick transfer input such as SAS into the slower output such as gigabit

ethernet.

This client-server approach can offer a middle cost solution needed to

provide the backup server and its storage. Due to its moderate difficulty to

implement, this approach might use by middle to high level enterprise that might

run several servers and many workstations during its daily operations.

2.2.1.3 Storage Area Network Backup

The SAN backup typically require a set of storage media, might be the

combination of disk drive and tape drive to form a certain tape or disk library.

SAN commonly managed by a special software, speaks its own protocols, and

connected through the backup client with such SCSI and fiber channel shown in

23

Figure 5: SAN architecture based on Preston

Page 6: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

figure 5 [8]. This feature make the mechanism in the data transfer as if the storage

is connected internally within the client.

While not just mimic the transfer mechanism, it actually highly affecting

the transfer rate of the data. Unlike the client-server approach that generally

speaks TCP/IP and uses media like ethernet, this SAN uses protocol such as SCSI-

3 that enable the the disk on the client speaks directly with the backup tape/disk

library. This feature eliminates the bottleneck disadvantage of the client-server

approach.

This SAN approach require higher cost solution needed to provide the

SAN system itself. Due to its higher difficulty to implement, this approach might

use by higher level enterprise that runs several servers with large amount of data

to backup.

2.2.2 Backup Storage Location

Regarding to its storage location, backup can be classified as local area network

backup, wide area network backup, and cloud backup. Table 2 shows the brief

comparison about the backup storage location characteristics.

LAN WAN CloudTransfer rate Fast Moderate SlowImplementation Easy/Moderate Difficult EasyManagement Moderate/Difficult Moderate EasyCost Moderate High ModerateDisaster Recovery No Yes YesFull Control Yes Yes No

Table 2: Comparison of backup storage location

24

Page 7: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.2.2.1 LAN Backup

A backup system can be said doing this approach where the backup media

and the clients located in the same area of building, or connected within a local

area network, or can be said as one network. The network media can be in form of

directly attached storage or LAN media such as fast ethernet, gigabit ethernet, or

fiber optic cable. This approach should have the fastest performance compared to

the other storage location since distance is heavily influence the data transmission.

Having a close location to its backup media make this solution provide

responsiveness in handling backup clients within a local network. The one who

handle the backup have to manage the backup media, probably a server and clients

through the network. There are softwares to aid the admin to manage such backup

rotation that could make the backup management more organized.

LAN backup approach can offer a middle cost solution needed not to just

provide the backup media, but also to have more proper network implementation

and management such as the bandwidth management [9]. This approach might

suitable by entry to middle level enterprise that do not have other branches in other

area.

2.2.2.2 WAN Backup

A backup system can be said doing this approach where the backup media

and the clients located in different area of building, but still belong to the company,

connected across the network, or can be said different network. The technologies

can be in form of internet or other type of WAN such as VPN [10]. Practically the

transfer rate can be slower compared to the LAN backup. There are also security

25

Page 8: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

issues faced if the data transfer across the internet.

In term of backup management, WAN backup achieve the same

advantage as the LAN backup, and the software used will be much more the same

with the LAN backup, probably with different configuration. Plus, it is good to

implement such centralized backup management where the backup clients located

across different area. Since the backup media located in the different area, WAN

backup also provide disaster recovery in order to keep data available if the main

system struck by disaster.

WAN backup approach could raise higher cost solution in addition to

provide the a sufficient WAN technologies considering its bandwidth, security, etc.

This approach might use by higher level enterprise that might runs several

distributed servers for its daily operation.

2.2.2.3 Cloud Backup

Cloud backup is basically a WAN backup which backup media belongs to

a backup solution provider. The technologies will be much more the same like

WAN backup, with high probability of using internet. The transfer rate is highly

dependent on the internet traffic. There are also facing security issues the same as

WAN backup, moreover the enterprise itself can not have full control of the media.

In term of backup optimization, cloud backup could achieve slightly less

advantage as the WAN backup due to partial control of the backup media. But

from the efficiency, cloud backup offers an easiness since the company will not

have to maintain the backup system [11]. Cloud backup also provide disaster

recovery just like WAN backup.

26

Page 9: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

Cloud backup can offers cheaper solution than WAN backup, usually the

backup solution company charge based on the data size that backed up. This

approach suitable for low to middle range company which might not have

immense data and allowed to use 3rd party service for data storage. There are some

reputable cloud backup provider such as Ahsay, Asigra and Zmanda.

2.2.3 Backup Media

Regarding to its storage media, backup can be classified as tape based backup

and disk to disk. Table 3 shows the brief comparison about the backup media

characteristics.

Tape D2DTransfer rate Moderate FastImplementation Moderate EasyManagement Moderate EasyCost Low ModerateDurability High Moderate

Table 3: Comparison of backup media

2.2.3.1 Tape Backup

The tape backup is a form of data backup that is used to create a image of

the data stored in a system at a specific point in time. The data are copied onto a

reel of magnetic tape and can be used for archival purpose for future references.

Tape based backup often use a rotation strategy which enable the backup system to

have several snapshots at a time of current main system data [12].

There are several types of tape drive, for instance is LTO with the transfer

rate ranged from 30 MB/s to 280 MB/s [13]. Most tape backup systems allow for

27

Page 10: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

sequential access only. The data are accessed at the beginning of the captured data

and then moved through in the order it was recorded. Tape drive also can be more

durable and shock resistant compared to disk drive due to insensitive tape's

magnetic band and its sequential mechanism.

It is not unusual for many enterprises to perform a tape backup since it

consume low cost while also performing the other type of backup. For example, a

company might also performing WAN backup to another server, providing quick

data restoration or fault tolerance when the main system fails. A strategy of this

type results in a great deal of data security for the company, making a slight

possibility for data to be permanently lost.

2.2.3.2 Disk to Disk Backup

The D2D backup is an activity of data backup that is using a hard disk

drive media, can be in form of an image or a plain folder of files. The data are

copied into a magnetic disk and can be used for quick responded backup data in

case of data loss occurred in the main system. D2D backup might not use rotation

strategy and just overwrite or append the existing data content of the main system.

There are several types of disk drive, for instance is SATA2 with the

average interface speed of 3 Gbit/s and SAS with average 6 Gbit/s [14]. Most D2D

backup systems allow for random access data. Disk drive might be less durable

and less shock resistant compared to the tape drive due to its random access

mechanism that causes the disk to keep rotating in high speed. Common failure of

disk drive includes bad sector and another type of crashing.

D2D backup cost higher than the tape backup. Enterprises utilize D2D

28

Page 11: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

backup for various strategies. For example, a company might also performing a

plain copy of directories as well as performing WAN backup to another server,

providing quick data restoration if there is a minor data loss occurred. A strategy

of this type provide more redundancy of the data resulting a flexibility in the

backup mechanism.

2.2.4 Backup Frequency

Regarding to its frequency, backup can be classified as hot backup, warm backup,

and cold backup. Table 4 shows the brief comparison about the backup frequency

characteristics.

Hot Warm ColdAvailability High Moderate LowConsistency High Moderate LowNetwork Requirement

High performance Moderate performance

Low / no performance

Implementation Difficult Moderate EasyCost High Moderate Low

Table 4: Comparison of backup frequency

2.2.4.1 Hot Backup

Hot backup requires the backup system up for 24/7. There is no time

needed to trigger the system to allow it performing the backup process since it

should be done automatically [15]. The idea of this kind of backup is to perform

backup process in real time whenever any data is changed in the main system.

Although it is said to perform real time backup, in practical there are no such thing

29

Page 12: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

which is real time due to the latency of the network media and so on.

The need of performing backup in real time make this method requires a

considerably high network performance. It might be no problem if the backup

media located locally which have low latency in transferring data, otherwise the

implementor should provide high bandwidth and low latency network, probably a

dedicated one to make sure the reliable data transfer.

Hot backup is commonly used by banking and commerce company where

the large portion of the data is belong to the customers and highly valuable to be

recorded. For instance, a commerce company might uses an database application

that record its customer transactions. It performs the MYSQL hot backup to

perform a replication every time there is a new transaction occurred.

Having those features of highly available backup system, hot backup have

to be paid off with rather high cost. It might be needed to provide network with

high performance, and such backup media that have reliable performance in 24/7.

2.2.4.2 Warm Backup

Warm backup might also requires the backup system for 24/7, or in less

frequent range of time. The system might required some time, or no time to

trigger the backup process, depends on the availability of the system [15]. The

idea of this kind of backup is to perform backup process in a non-outdated range

of time and might use auto scheduling, such as daily backup and weekly backup.

As a result, the degree of consistency in the backup media is not as

comprehensive compared to the hot backup.

Since there is no need of performing backup in real time, this method can

30

Page 13: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

compromise the need of high network performance. Same case as hot backup, it

should be no problem if the media located locally. When it comes to a remotely

located backup media, the latency can be compromise, but at least it should

provide adequate bandwidth to handle large scale of backup before the next

backup schedule due.

Warm backup is more generally done by middle-up level enterprise where

its data mostly still belong to the internal like documents or assets database. For

instance, a company might uses daily auto scheduling to backup folders of

documents, and weekly backup for the its asset database. The backup process

might be executed after work hours in order to have significant changes, and

avoid high network traffic during the work hours.

This kind of backup might offer midline cost, lower than the hot backup,

due to the less need of network performance and less reliable backup media that

not working for 24/7.

2.2.4.3 Cold Backup

Cold backup is the least frequent method to conduct backup. The system

does not have to be up regularly or for certain range of time. It may be up only

when the backup process needed. It might take longer time to trigger the backup

process. [15] The idea of this kind of backup is to perform backup process once in

a while, as it needed. The purpose might be various, including for archival

purpose. High degree of consistency in the backup media is not a concern for cold

backup.

From its purpose, this method might not need network availability since it

31

Page 14: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

only performs backup once a while, and it often use the directly attached backup

media. The concern here is generally focused on the capacity and scalability of the

backup media, whether it can fit the data or not.

Cold backup more widely used by small office-home office (SOHO), or

by personal use which the growth of data is not immense exponentially. Some

middle level enterprise also done it due to the unawareness of data loss or

insufficient resource. Higher level enterprise also might done this approach for

archival purpose.

This kind of backup offers lower starting cost that used for backup media

procurement. There is no need for installation of network media or more advanced

backup media.

2.2.5 Backup Scale

Regarding to its scale, backup can be classified as full backup, differential

backup, incremental backup, and mirror backup. Table 5 shows the brief comparison

about the backup scale characteristics.

Full Differential Incremental MirrorComprehensiveness High Moderate Low MostBackup Speed Slow Moderate Fast SlowRestore Speed Fast Moderate Slow FastStorage Required Large Moderate Low Large

Table 5: Comparison of backup scale

32

Page 15: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.2.5.1 Full Backup

Full backup performs the backup process by copying the entire content of

specified filesystem or directories. The backup content considered as

comprehensive and self-contained since it replicating all specified files every time

the backup process triggered. It consumes large amount of space in the backup

media since it copy all specified files, plus the number of rotations used the

system. As a result, the backup size can be multiply compared to the the original

size of the specified data.

The example of having full backup alone can be explained as following.

Suppose we perform full backup to a folder with 500 MB size. At the second day,

it grows 200 MB. When we perform full backup for the second time, it actually

copy as much as 700 MB, and so on if the data keep growing.

Since this method always perform entire copying, it will take longer time

to backup than differential and incremental backup. The backup result will always

be the most comprehensive in the latest rotation. As a result, the restoration

33

1st day 2nd day 3rd day 4th day0

200

400

600

800

1000

1200

Backed up

Figure 6: Full backup

Page 16: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

process will become easier and faster than the differential and incremental because

the administrator only have to choose the latest full backup. Full backup alone can

be comfortably used when the data is relatively small and not have immense

growth rate.

2.2.5.2 Differential Backup

Differential backup performs the backup process by copying all files

changed since the last full backup. The backup content is complementing the full

backup since it only replicating all changed files every time the differential backup

triggered [16]. It might consumes less amount of space in the backup media

depends on how much change occurred since the last full backup. As a result, the

backup size can be reduced compared to the full backup.

The example of having differential backup can be explained as following.

Suppose we perform full backup to a folder with 500 MB size. At the second day,

it grows 200 MB. When we perform differential backup at the second day, it

actually copy as much as 200 MB. If the data grows another 300 MB on the next

day, than the differential will perform the copy of 500 MB, and so on.

34

Page 17: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

The feature of copying only the updated data makes differential backup

need less time to complete compared than the full backup if the updated data is

not immense. The backup result is only comprehend the last period of backup. As

a result, the restoration process of a comprehensive data will become slower,

because other than the need to specify the latest one, it also need the result of full

backup to complete the data. The combination of full and differential backup can

be comfortably used when the main data is large and not have immense growth

rate.

2.2.5.3 Incremental Backup

Incremental backup performs the backup process by copying all files

changed since the last full backup or differential backup. The difference here is

that the incremental backup works with level that can limit the growth of data per

level [16]. When the growth of data exceed the limit, it will increment the backup

level and reset the limit. The backup content also complementing the full backup.

35

Figure 7: Differential backup

1st day 2nd day 3rd day 4th day0

100

200

300

400

500

600

700

800

900

Backed up

Page 18: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

It might consumes less amount of space in the backup media depends on how

much change occurred since the last full backup. As a result, the backup size can

be reduced compared to the full and differential backup.

The example of having incremental backup can be explained as

following. Suppose we perform full backup to a folder with 500 MB size and have

limit the growth rate at 100 MB per level. At the second day, it grows 200 MB.

When we perform incremental backup at the second day, it actually copy as much

as 200 MB to level 1. If the data grows another 300 MB on the next day, than the

incremental will perform the copy of 300 MB to level 2, and so on.

The feature of limiting the growth rate in each level makes incremental

backup need less time to complete compared than the differential backup if the

updated data is immense. The restoration process of a comprehensive data will

become slower and more troublesome, because other than the need of last full

backup, it also need the result of the latest backup on each level to complete the

36

Figure 8: Incremental backup

1st day 2nd day 3rd day 4th day0

100

200

300

400

500

600

Backed up

Page 19: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

data. The combination of full and incremental backup can be comfortably used

when the main data is large and have immense growth rate.

2.2.5.4 Mirror Backup

Mirror backup is similar to the full backup in terms of operation, it also

performs the backup process by copying the entire content of specified filesystem

or directories. The difference is in the result of backup that will be in form of

plain file, not compressed or in other special format. It consumes large amount of

space in the backup media exactly the same as the one in the main system.

This kind backup content considered as the most comprehensive and self-

contained since it also replicating the files attribute such as read, write, execute

permission, etc. That feature could be useful for the system which requires quick

restoration of the data.

2.2.6 Backup Attributes

Regarding to its attributes, backup can be classified as compressed backup and

encrypted backup. Table 6 shows the brief comparison about the backup attributes

characteristics.

Compressed EncryptedFeatures Reduce storage size Provide confidentiality

Server/Client side Server/Client sideTechnology .tar .zip .gzip .iso TLS, SSL, GPG

Table 6: Comparison of backup attribute

37

Page 20: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.2.6.1 Compressed Backup

Compressed backup is the process attempting to reduce the size of backed

up data than its original data or folders in the main system. The compression can

occurred in the client side or in the server / backup media side. The quality of the

compression usually affected by the CPU or processing overhead in the machine.

It is a tradeoff, the more it get compressed, the more CPU power and time needed

to done the process.

The client side compression is useful when the client is not heavily loaded

with other task, resulting the transmitted data to be reduced, could reduce the

network traffic used to backing up, and finally speed up the backup process. While

the server side compression can be used in the opposite circumstances. The format

of compressed backup would be various depends on the used software, but many

software cope it to known compression format such as .tar, .zip, .bzip, .gzip, .iso,

to provide easiness during the restoration.

2.2.6.2 Encrypted Backup

Encrypted backup is the process attempting to convert the content into

some code known as cipher in order to provide the confidentiality for the backed

up data. This method suitable to perform when the company conduct a WAN or

cloud backup with no secure network or private network. There are two things to

consider, whether performing the encryption during the data transmission only, or

to perform encryption of the actual contents even after the data already stored in

the backup media.

38

Page 21: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

If we have full control of the remote backup media, the option of

encrypting during the transmission would be suitable. This can be achieved by

sending the data through a SSL tunnel. The advantage of this method is during the

restoration, the key to do decryption process is no longer needed. However, if the

company does not have control over the remote backup media, the other option

would be suitable. One option to perform this is by using symmetric encryption

either on the client or server side. As a result, the admin need to either move the

key, or the backed up data to the respected machine.

39

Page 22: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.3 Example of Enterprise Backup Software

Based on the author research about renown enterprise level backup software,

there are 3 softwares that comes into the comparison. Table 7 shows brief comparison of

Bacula, Amanda, and IBM Total Storage Manager which is partly taken from Bacula's

community research [17].

Bacula Amanda IBM TSMOpen Source Yes Yes NoBackup Architecture Client-Server Client-Server,

SAN**Client-Server, SAN**

Backup Location LAN, WAN LAN, WAN, Cloud LAN, WAN

Backup Media Tape, Disk, DVD Tape, Disk, DVD Tape, Disk

Backup Frequency Hot*, Warm, Cold Hot**, Warm, Cold Hot, Warm, Cold

Backup Scale Full, Differential, Incremental, Consolidation

Full, Incremental-Differential

Full***, Incremental

Backup Attribute

Compressed, Encrypted (TLS)

Compressed, Encrypted (SSL, AES, PGP)

Compressed

GUI Yes - bat Yes - ZMC** Client & admin client

Multi Platform Yes Yes Yes

MS Exchange Support Yes Yes Yes**

MSSQL & Oracle No* Yes** Yes**

Table 7: Comparison of enterprise backup software

*) No built in feature, Bacula community provides the script to enable the feature**) Need to purchase additional license***) Full backup only done at the very first time backup

40

Page 23: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.3.1 Bacula

Bacula [18] is a powerful Linux backup solution, and it’s one of the few Linux

open source backup solutions that’s truly enterprise ready. But with this enterprise

readiness comes a level of complexity you might not find in any other solution. Unlike

many other solutions, Bacula contains a number of components:

Director. This is the application that supervises all of Bacula.

Console. This is how you communicate with the Bacula Director.

File. This is the application that’s installed on the machine to be backed up.

Storage. This application performs the reading and writing to your storage space.

Catalog. This application is responsible for the databases used.

Monitor. This application allows the administer to keep track of the status of the

various Bacula tools.

Bacula is not the easiest backup solution to configure and use. It is, however, one

of the most powerful. So if you are looking for power and aren’t concerned about putting

in the time to get up to speed with the configuration, Bacula is your solution.

2.3.2 Amanda

Amanda [7] allows an administrator to set up a single backup server and back up

multiple hosts to it. It’s robust, reliable, and flexible. Amanda uses native Linux dump

and/or tar to facilitate the backup process. One nice feature is that Amanda can use

Samba to back up Windows clients to the same Amanda server. It’s important to note

that with Amanda, there are separate applications for server and client. The key

41

Page 24: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

components of amanda includes like following.

Amanda index server, which performs the backup process by sending dumpers to

the backup clients

Holding disk, which holds the backed-up data before it is flushed to the tape

device, enabling the server in executing backup from many clients at a time.

Virtual tape device, which enables the hard disk drive used for backup to mimic

the tape backup rotations.

Tape type and dump type that can be specified flexibly with many options such

as the maximum storage, the network usage, the compression and encryption

type.

2.3.3 Tivoli Storage Manager

IBM Tivoli® Storage Manager [19] provides a wide range of storage

management capabilities from a single point of control, helping companies ride the

information tidal wave which have following key features.

Backup and recovery management, which helps the admin to perform intelligent

backups and restores utilizing a revolutionary progressive incremental backup

and restore strategy, where only new and used files are backed up.

Hierarchical storage management, which helps the admin to perform policy-

based management of file backup and archiving.

Archive management, which helps the admin to easily protect and manage

documents that need to be kept for a certain period of time.

42

Page 25: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

Advance data reduction, which Combines progressive-incremental backup,

source and target data de-duplication, compression and tape management to

provide best-in-class data reduction

2.4 Example of Backup Media

2.4.1 LTO Ultrium Tape

This Linear Tape Open technology was developed jointly by HP, IBM and

Certance (Seagate) now Quantum to provide a clear and viable choice in an increasingly

complex array of tape storage options. Ultrium LTO technology is an "open format"

technology, which means that users will have multiple sources of product and media.

The "open" nature of LTO technology also provides a means of enabling compatibility

between different vendors' offerings.

Generally, LTO cartridges able to do one of these two mode of operations, either

it is rewriteable (RW) or write once read many (WORM). Other than that, LTO 5 also

equipped with native 2:1 compression and 256 bit AES encryption [20]. Table 8

provides the example of the latest generation LTO which is LTO 5 Ultrium.

HP IBM QuantumCapacity (Compressed)

3 TB 3 TB 3 TB

Transfer Rate 280 MB/s 170 MB/s 280 MB/sMode RW, WORM WORM WORM

Table 8: Example of LTO 5 Ultrium specifications

43

Page 26: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

2.4.2 External Hard Disk

There are desktop external hard drives are based on the 3.5-inch internal hard

drives and laptop (or portable) external hard drives that are based on the 2.5-inch internal

hard drives. Generally, external hard drives are connected to a computer using

collectively these types of connections: USB 3.0, USB 2.0, FireWire 400, FireWire 800,

and eSATA [21]. Portable external hard drives are also often bus-powered, meaning it

require only one cable for both data and power connections. Table 9 provides example of

current generation of external hard disk.

WD MyBook 3.0

Iomega eGo Portable

Seagate FreeAgent GoFlex Pro

Transcend StoreJet 25

MobileCapacity 1 TB 500 GB 500 GB 500 GBInterface type USB 3.0 USB 2.0,

FireWire, FireWire 800

USB 2.0 USB 2.0

Other features Security lock slot

Drop Guard N/A One Click Backup, Shock Resistant

Table 9: Example of external hard disk specifications

2.4.3 Server Hard Disk

Server hard disk generally divided by 2 categories according to its interface

which is SATA (Serial Advanced Technology Attachment) and SAS (Serial Attached

SCSI). SAS hard disk is suitable for critical operation with high availability that requires

immense I/O, related with backup, it more likely suitable with the hot backup operations.

While SATA would be sufficient for less critical operation and hot or warm backup with

less I/O. Table 10 provides example of these server hard disk.

44

Page 27: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

HP ProLiant SATA

HP ProLiant SAS

IBM xSeries SATA

IBM xSeries SAS

Capacity range

120 GB – 2 TB 72 GB – 600 GB

160 GB – 500 GB

73 GB – 260 GB

RPM range 5.4 K – 7.2 K 7.2 K – 15 K 7.2 K 10 K – 15 KTransfer rate 1.5 Gb/s – 3

Gb/s3 Gb/s – 6 Gb/s 2.8 Gb/s 6 Gb/s

Table 10: Example of server hard disk specifications

HP : http: //h18004.www1.hp.com/products/servers/proliantstorage/drives-enclosure s/index.html IBM : http://www-03.ibm.com/systems/storage/disk/hdd.html

2.4.4 Midline Enterprise Server

A suitable server for data backup should focus more on its storage capability,

although we can not disregard the processing power and the memory at most. It should

accommodate enough and suitable storage media which can be see from the availability

of the storage controller (RAID controller) ,the type and number of the hard disk slots.

Table 11 provides example of branded midline enterprise server with range of suitable

budget.

45

Page 28: THEORETICAL FOUNDATION 2.1 Data Backuplibrary.binus.ac.id/eColls/eThesisdoc/Bab2/Bab 2__09-11_2a.pdf · backup solution company charge based on the data size that backed up. This

HP ProLiant DL 380 G6

HP ProLiant DL380 G7

IBM x3550 M2

IBM x5550 M3

Processor family

Intel® Xeon® 5500 series, Intel® Xeon® 5600 series

Intel® Xeon® 5600 series

Intel® Xeon® 5600 series

Intel® Xeon® 5500 series

Memory type PC3-10600R RDIMMs DDR3 or PC3-10600E UDIMMs DDR3

PC3-10600R RDIMMs DDR3 or PC3-10600E RDIMMs DDR3

RDIMMs DDR3 or UDIMMs DDR3

RDIMMs DDR3 or UDIMMs DDR3

Maximum drive bays

(16) SFF SAS/SATA with optional second drive cage

(16) SFF SAS/SATA with optional second drive cage

(8) SFF SAS, SATA, SSD

(8) SFF SAS, SATA

Storage controller

Smart Array P410i Integrated

Smart Array P410i Integrated

Hardware RAID-0, -1, -1E or RAID-0, -1, -10, -5, -50 (with additional option -6,-60) model dependent

Hardware RAID-0, -1, -1E or RAID-0, -1, -10 or RAID-0, -1, -10, -5, -50 with 256 MB or 512 MB cache (additional option RAID-6, -60

Power supply 460 Watt hot plug

460 Watt hot plug

1/2; 675 W each

1/2; 675 W each

Table 11: Example of midline enterprise server specifications

HP : http://h10010.www1.hp.com/wwpc/us/en/sm/WF04a/15351-15351-3328412-241644-241475.html

IBM : http://www-03.ibm.com/systems/x/hardware/rack/index.html

46