Improving the communication performance of distributed animation rendering using BitTorrent file...

14
The Journal of Systems and Software 97 (2014) 178–191 Contents lists available at ScienceDirect The Journal of Systems and Software j our na l ho me page: www.elsevier.com/locate/jss Improving the communication performance of distributed animation rendering using BitTorrent file system Ekasit Kijsipongse a,1 , Namfon Assawamekin b,a National Electronics and Computer Technology Center, Pathumthani 12120, Thailand b University of the Thai Chamber of Commerce, Bangkok 10400, Thailand a r t i c l e i n f o Article history: Received 31 October 2013 Received in revised form 19 June 2014 Accepted 21 July 2014 Available online 28 July 2014 Keywords: Animation rendering Distributed file system Peer-to-peer a b s t r a c t Rendering is a crucial process in the production of computer generated animation movies. It executes a computer program to transform 3D models into series of still images, which will eventually be sequenced into a movie. Due to the size and complexity of 3D models, rendering process becomes a tedious, time- consuming and unproductive task on a single machine. Accordingly, animation rendering is commonly carried out in a distributed computing environment where numerous computers execute in parallel to speedup the rendering process. In accordance with distribution of computing, data dissemination to all computers also needs certain mechanisms which allow large 3D models to be efficiently moved to those distributed computers to ensure the reduction of time and cost in animation production. This paper presents and evaluates BitTorrent file system (BTFS) for improving the communication performance of distributed animation rendering. The BTFS provides an efficient, secure and transparent distributed file system which decouples the applications from complicated communication mechanism. By having data disseminated in a peer-to-peer manner and using local cache, rendering time can be reduced. Its perfor- mance comparison with a production-grade 3D animation favorably shows that the BTFS outperforms traditional distributed file systems by more than 3 times in our test configuration. © 2014 Elsevier Inc. All rights reserved. 1. Introduction Animation rendering is a process that transforms 3D models into hundred thousands of image frames to be composed into a movie. Due to the size and complexity of 3D models, rendering pro- cess is very computing intensive and time consuming. Rendering a single frame of an industrial-level animation can even take several hours on a commodity machine. Accordingly, to speedup the ani- mation production, the rendering process is typically distributed to a set of machines in a network where each frame is rendered inde- pendently on a different machine to reduce the overall rendering time. There have been different distributed rendering strategies existed in the present day ranging from render farms to volunteer- based rendering. A render farm consists of a cluster of computers This paper is an extended version of our previous conference publication Assawamekin and Kijsipongse (2013). The extensions include the implementation details, security, load-balancing and fault-tolerance mechanisms as well as various additional experimental results. Corresponding author. Tel.: +66 02697 6506; fax: +66 02277 7007. E-mail addresses: [email protected] (E. Kijsipongse), namfon [email protected], [email protected] (N. Assawamekin). 1 Tel.: +66 02564 6900; fax: +66 02564 6901. connecting to a locally high-speed network, like Ethernet switch, and it is exclusively utilized for the rendering process. Render farms are only found in large companies where users are able to afford the procurement and operation cost of the systems. In contrast, dis- tributed rendering can happen on a set of non-dedicated machines such as the animator’s workstations in the company. He works on his machine to create 3D models during the day time and when he leaves he can turn it into a rendering machine at night time. On the other end of the spectrum, community/volunteer-based rendering (Distributed Rendering) allows the owners of personal computers to donate the idle time of their computers for rendering. People participating in the volunteer-based rendering could be friends, friends of friends, or even any trusted persons who are willing to share. By volunteering, it means they can cease the participation to help rendering at any time. For example, Renderfarm.fi, a large- scale volunteer-based rendering service, distributes rendering tasks across personal computers over the public Internet. Our work is intended for volunteer-based distributed render- ing which a large number of volunteer computers are located in wide area network. It is possible that some of them are co-located in the same network vicinity since they are owned by classmates, for example. Generally, volunteer-based distributed rendering con- sists of a few central servers and a number of volunteer computers http://dx.doi.org/10.1016/j.jss.2014.07.050 0164-1212/© 2014 Elsevier Inc. All rights reserved.

Transcript of Improving the communication performance of distributed animation rendering using BitTorrent file...

Ir

Ea

b

a

ARRAA

KADP

1

imcshmapteb

Ada

n

h0

The Journal of Systems and Software 97 (2014) 178–191

Contents lists available at ScienceDirect

The Journal of Systems and Software

j our na l ho me page: www.elsev ier .com/ locate / j ss

mproving the communication performance of distributed animationendering using BitTorrent file system�

kasit Kijsipongsea,1, Namfon Assawamekinb,∗

National Electronics and Computer Technology Center, Pathumthani 12120, ThailandUniversity of the Thai Chamber of Commerce, Bangkok 10400, Thailand

r t i c l e i n f o

rticle history:eceived 31 October 2013eceived in revised form 19 June 2014ccepted 21 July 2014vailable online 28 July 2014

eywords:nimation renderingistributed file systemeer-to-peer

a b s t r a c t

Rendering is a crucial process in the production of computer generated animation movies. It executes acomputer program to transform 3D models into series of still images, which will eventually be sequencedinto a movie. Due to the size and complexity of 3D models, rendering process becomes a tedious, time-consuming and unproductive task on a single machine. Accordingly, animation rendering is commonlycarried out in a distributed computing environment where numerous computers execute in parallel tospeedup the rendering process. In accordance with distribution of computing, data dissemination to allcomputers also needs certain mechanisms which allow large 3D models to be efficiently moved to thosedistributed computers to ensure the reduction of time and cost in animation production. This paperpresents and evaluates BitTorrent file system (BTFS) for improving the communication performance of

distributed animation rendering. The BTFS provides an efficient, secure and transparent distributed filesystem which decouples the applications from complicated communication mechanism. By having datadisseminated in a peer-to-peer manner and using local cache, rendering time can be reduced. Its perfor-mance comparison with a production-grade 3D animation favorably shows that the BTFS outperformstraditional distributed file systems by more than 3 times in our test configuration.

© 2014 Elsevier Inc. All rights reserved.

. Introduction

Animation rendering is a process that transforms 3D modelsnto hundred thousands of image frames to be composed into a

ovie. Due to the size and complexity of 3D models, rendering pro-ess is very computing intensive and time consuming. Rendering aingle frame of an industrial-level animation can even take severalours on a commodity machine. Accordingly, to speedup the ani-ation production, the rendering process is typically distributed to

set of machines in a network where each frame is rendered inde-endently on a different machine to reduce the overall rendering

ime. There have been different distributed rendering strategiesxisted in the present day ranging from render farms to volunteer-ased rendering. A render farm consists of a cluster of computers

� This paper is an extended version of our previous conference publicationssawamekin and Kijsipongse (2013). The extensions include the implementationetails, security, load-balancing and fault-tolerance mechanisms as well as variousdditional experimental results.∗ Corresponding author. Tel.: +66 02697 6506; fax: +66 02277 7007.

E-mail addresses: [email protected] (E. Kijsipongse),amfon [email protected], [email protected] (N. Assawamekin).1 Tel.: +66 02564 6900; fax: +66 02564 6901.

ttp://dx.doi.org/10.1016/j.jss.2014.07.050164-1212/© 2014 Elsevier Inc. All rights reserved.

connecting to a locally high-speed network, like Ethernet switch,and it is exclusively utilized for the rendering process. Render farmsare only found in large companies where users are able to affordthe procurement and operation cost of the systems. In contrast, dis-tributed rendering can happen on a set of non-dedicated machinessuch as the animator’s workstations in the company. He works onhis machine to create 3D models during the day time and when heleaves he can turn it into a rendering machine at night time. On theother end of the spectrum, community/volunteer-based rendering(Distributed Rendering) allows the owners of personal computersto donate the idle time of their computers for rendering. Peopleparticipating in the volunteer-based rendering could be friends,friends of friends, or even any trusted persons who are willing toshare. By volunteering, it means they can cease the participationto help rendering at any time. For example, Renderfarm.fi, a large-scale volunteer-based rendering service, distributes renderingtasks across personal computers over the public Internet.

Our work is intended for volunteer-based distributed render-ing which a large number of volunteer computers are located in

wide area network. It is possible that some of them are co-locatedin the same network vicinity since they are owned by classmates,for example. Generally, volunteer-based distributed rendering con-sists of a few central servers and a number of volunteer computers

al of S

ctersamficditt

ciost(ictcTs

iattcAswtfieTcr

atahmiuttprfSace

bwadwpo

E. Kijsipongse, N. Assawamekin / The Journ

alled clients (Volunteer Computing). The central servers are usedo store and distribute data to all clients. There is no notion of dataxchange between clients. To proceed with the volunteer-basedendering, users submit a job to the servers which there exists acheduler to dispatch the job to many clients. Each job is associ-ted with a rendering program, necessary arguments and the 3Dodels as well as the number of frames to render. These 3D model

les have to be transferred to the clients before rendering processan start on the clients. However, the transfer time is not trivialue to the latency of the public Internet which delays the render-

ng process. Besides, if there are too many clients requesting forhe data, the central servers can become overloaded, amplifyinghe problem.

In fact, a rendering job that is dispatched to run on differentlients requires the same input data. So, when a client needs thenput data for the job, the data may already have been existed onther clients in whole or in part. Since the same data are used byeveral clients at almost the same time, it is a great opportunityo coordinate the data transfer among clients in the peer-to-peerP2P) manner. We apply the BitTorrent (Cohen, 2003) P2P file shar-ng protocol as the means to disseminate the rendering data to alllients. A client who has already downloaded a whole or parts ofhe file from the central servers can directly share the file to otherlients. As a result, fewer requests are sent to the central servers.hus, it reduces the data transfer time and load on the centralervers can be alleviated.

To let any rendering applications access and exchange datan accordance with the BitTorrent protocol without having thepplications modified, the P2P file sharing service must be maderansparent. Transferring the files from/to several peers acrosshe network must be invisible to the applications and the filesould be treated the same way as usual files on local disks.s a result, it is necessary to implement the P2P file sharingervice at the file system layer of an operating system whichill isolate the applications from the complication of communica-

ion mechanism. In our previous work, we introduced BitTorrentle system (BTFS) (Assawamekin and Kijsipongse, 2013) as anfficient, transparent and distributed file system based on the Bit-orrent protocol. The BTFS was successfully applied to improve theommunication performance of the volunteer-based distributedendering.

In this paper, we extend our earlier work in many importantspects. Firstly, we address the security concern. As opposed tohe typical P2P file sharing applications which can exchange musicnd video to any anonymous users, BTFS requires that the dataave to be exposed to only authorized users. The access controlechanism in BTFS prevents clients (peers) to access the render-

ng data that they do not have permission. Furthermore, to keepnintended data disclosure minimal, data are always stored andransferred in encrypted format. Decryption keys are handed overo clients in encrypted channels. Secondly, we take into account theerformance and fault-tolerance concern. Data are partitioned andeplicated to multiple servers so that they help increasing the per-ormance of BTFS when there are not enough peers in sharing data.imilarly, fault-tolerance in BTFS is guaranteed by duplicating datand executable components of BTFS on different servers. Lastly, wearry out extensive performance evaluation of BTFS and present itsxpanded results.

The rest of this paper is organized as follows. Section 2 presentsackground knowledge and some existing research related to ourork. The BTFS architecture and all modules constituting in this

rchitecture are elaborately described in Section 3. In Section 4, we

escribe the experiments of using BTFS in distributed rendering asell as the evaluation results to show the advantages of BTFS in aractical use. Finally, we conclude the paper with the discussion ofur contributions and ongoing work in Section 5.

ystems and Software 97 (2014) 178–191 179

2. Background and related work

As part of our attempt in improving the communication per-formance of distributed animation rendering using BitTorrent filesystem, we have investigated some existing approaches and toolsto resolve this problem. The aim of this section is to give somebackground knowledge and review the previous research studiesrelating to our approach. We point out their common limitationsin which the techniques presented in Section 3 are proposed toovercome.

2.1. Distributed file system

Distributed file system (DFS) is a technique of storing and acces-sing files from remote storages. Typically, DFS uses one or moreservers to store files. Any number of clients can read or update afile on the server in the same ways as if it is stored locally on theclient machines. DFS can provide larger storage space than clientown storages. DFS beneficially makes multiple clients sharing afile easier. The DFS may provide redundancy by replicating files tomany locations to protect users from data loss in the case of failure.DFS usually organizes files into a hierarchical structure (directo-ries) following the common convention in local files. We give abrief explanation of some well-known DFSs as follows.

Network File System (NFS) (Sandberg, 1985) is one of the firstgeneration DFSs mostly used in Unix environment. It is imple-mented on a simple client/server model which allows users onclient computers to transparently access remote files shared froma centralized server. NFS is not easy to scale to a large number ofclients and the server poses a single point of failure. Likewise, ServerMessage Block/Common Internet File System (SMB/CIFS) is anotherpopular DFS based on the client/server model for sharing files acrossboth Unix and Windows platforms over Intranet and Internet. Simi-lar to NFS, SMB/CIFS has the problems of scalability and single pointof failure. In the next generation DFSs, such as Andrew File System(AFS) (Howard et al., 1988) and Coda, there are a set of servers tostore files and they address the problem of location-transparencyfor all clients to see a globally unique file namespace. Both AFS andCoda also make use of persistent cache on clients to speedup theaccess time for both file and directory data. Server replication isused to protect from data loss.

More advanced DFSs decouple data and metadata storages. Themetadata provides information where the data are stored. Themetadata is managed by single or a few metadata servers; whiledata is handled by a larger set of data servers. Typically, each clientperforms small operations to metadata servers to locate the dataservers for a file. Then, a large portion of data communication hap-pens directly between the client and the data servers. This designallows DFSs to scale better to large file transfers. Modern DFSs aredesigned with this principle. For example, Hadoop distributed filesystem (HDFS) (Shvachko et al., 2010) offers fault-tolerance andsupport storing very large files. Files are partitioned into blocks,each of which is stored on a different data server. HDFS replicatesfile blocks in many data servers for redundancy. The metadataserver in HDFS, known as NameNode, tracks the location of allblocks. HDFS may consist of hundreds or thousands of data servers.When one of these machines crashes, it can automatically recoverfrom failure. However, the NameNode of HDFS presents the sin-gle point of failure. Another major drawback is that HDFS filesare immutable, i.e. once files are created they cannot be changedlater. MooseFS has a similar design as HDFS. Each file is dividedinto chunks which are stored (and possibly replicated) on different

servers. MooseFS is transparent and adopts the POSIX file systemsemantics. Unfortunately, there is only single metadata server inHDFS and MooseFS, so the metadata server inevitably presents abottleneck and a single point of failure.

1 al of S

wimEaeadahdtltMe

2

P(uliuri

2

.rmtacBBsrf

2

piowfiopliouoc

2

wpfi

80 E. Kijsipongse, N. Assawamekin / The Journ

To resolve a single point of failure, MogileFS, an open source DFShich emphasizes on data archiving and deployment on commod-

ty hardware, spreads data and metadata over different servers. Theetadata servers must be configured as a high availability cluster.

ach file is replicated depending on its replication level defined bypplications. However, MogileFS is not designed to be transpar-nt to client applications. It provides the application-level APIs toccess files and so it requires application modification. MogileFSoes neither support the POSIX file system semantics nor hier-rchical namespace. GlusterFS, another highly fault-tolerant DFSaving POSIX compliance, completely eliminates the use of meta-ata servers by applying hashing algorithm to track the location ofhe files in the data servers. Its design allows the performance toinearly scale with the number of data servers. GlusterFS requireshat all data servers must operate in trusted network environment.

oreover, the access control is enforced at the client side which isasy to circumvent.

.2. BitTorrent file sharing protocol

BitTorrent (Cohen, 2003) is one of the most popular Peer-to-eer (P2P) file sharing protocol to distribute and share media filesmusic, videos, games and e-books) over the Internet. When thesers want to download a file, they use BitTorrent software to

ocate other computers having the same file and begin download-ng the file from several computers, known as peers, in parallel. Thesers who download files can also upload the files to others whenequested. BitTorrent protocol operates on three main components,.e. .torrent, tracker and peer, each of which is described below.

.2.1. .torrentA user who wants to disseminate a file creates a small file called

torrent that acts as the key to initiate the sharing of the file. The .tor-ent does not contain the content of the file; but it rather containsetadata about the file such as its length, the hashing informa-

ion for verifying integrity and the URL of the tracker(s). Whennother user wants to download the file, he/she firstly obtains theorresponding .torrent file, and then opens that .torrent file in aitTorrent software to start exchanging file with peers. Since, theitTorrent protocol does not specify its own facilities to deposit andearch for the .torrent files, users are required to distribute the .tor-ent files by other conventional means (like web blog, email, newseed, etc.).

.2.2. TrackerTrackers in the BitTorrent protocol play an important role in

eer communication. It helps peers locate one another by keep-ng track of peers that are interested in the same file. Peers findut the tracker URL from the .torrent file, and then communicateith the tracker to get a list of peers who are participating in thatle sharing. This list may not contain all the possible peers, butnly some randomly chosen peers in order to even the load on alleers. Once the list of peers is obtained, the file exchange (down-

oad/upload) process begins. The tracker neither directly involvesn any data transfer nor have a copy of the file. Peers must reportwn statistics information to the tracker periodically and receivepdated information about new peers to which they can connect,r about peers which have left in return. BitTorrent trackers areommonly implemented as HTTP/HTTPS servers.

.2.3. Peer

A peer is a computer running an instance of BitTorrent soft-

are to exchange files with other peers. Each file is exchanged inieces such that a peer is not necessary to have all pieces of thele; only some pieces. A peer reports its IP address and port to the

ystems and Software 97 (2014) 178–191

tracker for peer connections. Each peer polls the tracker for infor-mation about other peers to exchange pieces. Each piece can bedownloaded concurrently from different peers. Peers continuouslyexchange pieces with other peers in the network until they haveobtained every file piece needed to reassemble the original file. Apeer who has the complete file is called a seeder. BitTorrent proto-col is primarily designed to reduce the download time for large andpopular files. For example, BitTorrent protocol gives an incentive forpeers to share, i.e. peers that upload more should be able to down-load more. In addition, the order in which pieces are selected fordownload from other peers is based on the rarest pieces first algo-rithm so that it rapidly increases the number of replicas of thoserare pieces. Several research studies (Cohen, 2003; Qiu and Srikant,2004; Bharambe et al., 2006) have shown that the BitTorrent pro-tocol can handle large distributions and scales well as the numberof peers increases.

Other P2P file sharing systems have an architectural designrather different than BitTorrent explained above. For example,Napster, the first commercial P2P file sharing system, makes useof the centralized servers to index and search for the files its clientshave. However, the centralized servers limit its scalability and posea single point of failure. Gnutella (Klingberg and Manfredi, 2013)is a fully decentralized P2P file sharing system. It has no central-ized servers to index the files. The search for files is realized viamessage broadcasting which makes it suffer from the exponentialgrowth of the network traffic. KaZaA is built on the hybrid archi-tecture between Napster and Gnutella. KaZaA has no centralizedservers; but it dynamically promotes some peers to be superpeers.Each superpeer becomes an index server for a number of ordinarypeers that connect to it. An ordinary peer sends a request for files toits superpeer, which in turn communicates with other superpeersusing Gnutella-like broadcasting to find the files. KaZaA scales bet-ter and offers more reliable than Napster and Gnutella. However,this approach still consumes resources and bandwidth of super-peers.

2.3. Related work

This section gives a survey comparing relevant research workthat attempts to utilize BitTorrent protocol for improving the per-formance of data transfer in applications other than music andvideo file sharing. Many of them have actively been carried outin the Grid and volunteer-based computing domains.

Kaplan et al. (2007) propose GridTorrent framework to effi-ciently distribute data for scientific applications in Grid computingenvironment. The framework consists of many componentsinspired by BitTorrent protocol to handle task dispatching and filesharing. Another independent work from Zissimos et al. (2007),having the same name, GridTorrent integrate BitTorrent withGlobus Grid middleware to provide a P2P file transfer mechanismwhich outperforms the GridFTP protocol for large files. They elimi-nate the use of “.torrent” metadata files with necessary informationfrom the Replica Location Service (RLS). The RLS also functionslike a BitTorrent tracker to find the locations of file replicas. Costaet al. (2008) apply BitTorrent to optimize data distribution in BOINC(Anderson, 2004), the middleware for volunteering computing.They show that BitTorrent can reduce the network load at serverssignificantly while having minimal impact to the computing timeat clients. Wei et al. (2005, 2007) implement BitTorrent in thecomputational Desktop Grid platforms like XtremWeb (Cappelloet al., 2005) to collaboratively distribute data among users in solv-

ing scientific problems. They point out that even though BitTorrentprotocol has more overhead than typical file transfer protocols, itcan outperform when distributing large files to a high number ofnodes. BitDew (Fedak et al., 2008), a data management framework

E. Kijsipongse, N. Assawamekin / The Journal of Systems and Software 97 (2014) 178–191 181

Table 1Comparison of BitTorrent-based data dissemination in distributed computing environments.

Work Applicationtransparency

Name space Access control Torrent indexing and tracking Application and testingplatform

Kaplan et al. (2007) No Hierarchy Unix-like file permission No torrent indexing and use avariant of BitTorrent tracker

A synthetic application on 3nodes

Zissimos et al. (2007) No GridTorrent URL Grid security infrastructure Use Globus Replica LocationService (RLS)

A synthetic application on 18PlanetLab nodes

Costa et al. (2008) No Hierarchy Not specified Catalog service for torrentindexing and a nativeBitTorrent tracker

A synthetic application on 312nodes from Grid5000 testbed

Wei et al. (2005, 2007) No Flat file name (UUID) Not specified Catalog service for torrentindexing and a nativeBitTorrent tracker

A synthetic application on 64nodes

Fedak et al. (2008) No Hierarchy Not specified Catalog service for torrentindexing and a nativeBitTorrent tracker

A bioinformatics application onover 400 nodes from Grid5000testbed

Our approach Yes Hierarchy Access control list ford/wri

Catalog service for torrent Animation rendering on 8

fs

oFdtctepesnemp(Nepeurt(twom

RisTd

3

tseide

create/rea

or Desktop Grid, also relies on BitTorrent in its data distributionervice.

Although the aforementioned works have similar objectives tours, they remain distinct. Table 1 highlights these differences.irstly, all of them do not implement the BitTorrent-based dataistribution mechanism in the file system layer. So, applicationransparency is not accomplished. It is required to modify the appli-ations with the given APIs. Secondly, the namespace, which referso how users logically identify their data, varies. Some works (Weit al., 2005, 2007) employ flat namespace. The hierarchical names-ace as adopted in (Kaplan et al., 2007; Costa et al., 2008; Fedakt al., 2008) provides more semantics to users than flat namespaceince the organization of files in terms of directory hierarchies areaturally recognized for most users. URLs are also used (Zissimost al., 2007). Thirdly, different access control mechanism is imple-ented. For example, Kaplan et al. (2007) support the Unix-like file

ermission, i.e. public, group and user level access. Zissimos et al.2007) rely on Grid security infrastructure. Others do not specify.ext, torrent indexing and Tracking describes how a client discov-rs the .torrent metadata and peer information, respectively. Mostrevious works (Costa et al., 2008; Wei et al., 2005, 2007; Fedakt al., 2008) implement own catalog service to store .torrent andse native BitTorrent trackers. Kaplan et al. (2007) generate .tor-ent from task descriptions and use a variant of BitTorrent trackerhat includes additional features like access control. Zissimos et al.2007) utilize the Replica Location Service (RLS) in Globus for bothorrent indexing and tracking services. Lastly, their evaluationsere carried out with scientific or even synthetic applications. No

ne explores the advantages of using BitTorrent in distributed ani-ation rendering.For volunteer-based distributed animation rendering such as

enderfarm.fi, BURP and vSwarm, the rendering program and itsnput files are downloaded from the centralized servers or from aet of mirror servers. According to the current information, the Bit-orrent protocol has not been implemented to speed up the dataistribution to clients yet.

. Design and implementation

We design BitTorrent file system (BTFS) to function at file sys-em layer in the Linux operating system to provide applications thecaleable, fault-tolerant and distributed file system with transpar-

nt P2P data dissemination and persistent local cache. The BTFS isntended to improve the performance of data communication inistributed animation rendering where the same dataset is repeat-dly used over multiple remote machines. We assume the BTFS to

te/delete indexing and a nativeBitTorrent tracker

nodes from 5 distributed sites

work with an external job scheduler which is capable to dispatchjobs and manage the membership of remote machines. The BTFSconsists of 4 main components: metadata server, seeder, trackerand BTFS client, as shown in Fig. 1. The operations of these compo-nents are further described below.

3.1. Metadata server

The metadata server provides information about files and direc-tories such as attributes (e.g. file size or last modified time), as wellas the torrent information of the file. It does not store the contentof the files. The BTFS client interacts with the metadata server toget/set the file metadata. Since files in BTFS are organized into ahierarchical namespace which files are located under a particulardirectory, the metadata server is also responsible for such directorylisting. The torrent information associated with each file includefilename, file size, hash information, number of pieces, seeder andtracker URLs like other .torrent files used in BitTorrent file sharingsoftware. The BTFS clients are allowed to access the file metadataonly if the permission of the requested file is granted.

We use Apache ZooKeeper 3.4.3 to implement the metadataserver. ZooKeeper is a distributed coordination service for dis-tributed systems. It store data in a data structure called znode whichcan be organized into a hierarchy similar to directory tree. We cre-ate the/btfs and/config znodes at the top level of the hierarchy.The/btfs znode is used to hold the root directory of a BTFS names-pace. This is where file and directory entries of all BTFS users areplaced. They will appear as the normal files and directories underthe mount point on the BTFS client machines. The/config znode isused to store the globally system configuration for BTFS clients. Wemanage some client configuration through the/config znode so thatthe configuration updates can be distributed to all clients easily.

Each file and directory in the BTFS namespace is associatedwith a descendant of the/btfs znode. The BTFS file (or directory)attributes are stored in the corresponding znode data. The essen-tial attributes include the Universally Unique IDentifier (UUID), lastmodification time, file size and encryption key. These attributes willpartially be mapped into the appropriated fields in the POSIX statstructure on the BTFS client machine as described later. Each file hasa different UUID which is used internally by BTFS components as aunique ID to refer to the particular file (similar to what inode num-ber is). Fig. 2 shows the znode tree structure and the corresponding

BTFS files and directories. The “.torrent” znode in the znode treeis created as the child of a user file. This znode is invisible to theusers and stores the torrent information of the file to be shared inBitTorrent protocol.

182 E. Kijsipongse, N. Assawamekin / The Journal of Systems and Software 97 (2014) 178–191

file sys

3

sttsltSrptrd

fi

Fig. 1. BitTorrent

.2. Seeder

A seeder has responsibility like a central file server to store anderve the content of the files. Each file in BTFS must be uploaded intohe seeder to ensure that the availability of the file is independento the existence of BTFS clients. Files are always available on theeeder and the BTFS clients can retrieve the files from the seeder as aast resort. Under the normal operation, BTFS clients can downloadhe files from both the seeder and other clients, whatever is best.eeders should be located in the public network that any clients caneach. In a more complex deployment, there can be multiple (andossibly remote) seeders on which files are distributed such thathe load on a single seeder is reduced. It is also a great advantage to

eplicate a file on multiple seeders as the file is eligible for parallelownload and safe from single server failure.

The BTFS clients manipulate (including upload and download)les on the seeder via the WEBDAV protocol. Since each file on the

/

confi gbtfs

Mary John

tmp mod el

a.blend

a.blend .tor rent

ZooK

(a) Zoo Kee per tree stru cture

Fig. 2. Mapping from ZooKeeper data

tem architecture.

seeder is referenced by its UUID which is very unlikely to duplicate,there is no need to maintain directory structure of files in the seeder.

3.3. Tracker

According to the BitTorrent protocol, trackers are required tocoordinate all BitTorrent peers to exchange files. This is also thecase for BTFS clients which run the protocol. Any number of nativeBitTorrent trackers such as opentracker or public trackers can beused.

3.4. BTFS client

BTFS client is a core component that glues other componentstogether. It runs on client machines where rendering softwareexecute a job dispatched from the job scheduler. The implemen-tation of BTFS client is based on File System in Userspace (FUSE).

/

Mary John

tmp mod el

a.blend

eeper Node Direct ory File

(b ) BTFS file system

hierarchy to BTFS file system.

E. Kijsipongse, N. Assawamekin / The Journal of S

Table 2Mapping from BTFS to POSIX attributes.

BTFS attributes POSIX attributes Description

Size st size File size in bytesMtime st atime, st mtime,

st ctimeOnly the lastmodification timeis maintained andit is copied to thelast access and laststatus change forperformancereason

UUID – Universally uniqueidentifier of eachfile

IV – Initialization vectorKey – Encryption/decryption

BafiBacdttficuwfieewd

lBwmwaaat

3

fidiaaOavm

3

nfi

key

TFS clients intercept all file system calls such as open(), read()nd write() from any applications which request access to theles in BTFS and act upon them accordingly. For reading a file,TFS clients contact the metadata server for getting the attributesnd torrent information of the file, and then create a BitTorrentlient thread to download the file from seeders and peers. Theownloaded file will locally be cached into the local storage ofhe client machines. The file handle of the cached file is passedo the requesting application for future operations. The cachedle is kept as long as the cache space is available; otherwise, theache replacement algorithm is invoked to clear unused files. Wese the Least Recently Used (LRU) policy for cache replacementhich will remove the least recently used file first. Subsequentle reading operations will be redirected to the cached file, ifxisted, to reduce the network traffic. The cached file is shared andxchanged with other BTFS clients by the BitTorrent protocol asell. Each BTFS client stores cache information in a small embeddedatabase.

When writing a file, BTFS clients perform the writing operationocally to the cached file first. After the written file has been closed,TFS clients generate a new UUID and upload the file to the seederith this UUID. Then, BTFS clients update the file metadata on theetadata server to complete the write operation. BTFS clients canork with multiple seeders or trackers to improve the efficiency

nd reliability as described later. Other basic file operations suchs deleting a file, listing files, creating and removing a directory arelso implemented which require the BTFS clients to interact withhe metadata server only.

.5. Mapping to POSIX semantics

BTFS has the POSIX-like file system semantics. The attributes ofles and directories which are stored in the corresponding znodeata are mapped into the POSIX attributes on the BTFS clients

f valid. Table 2 shows the mapping between BTFS and POSIXttributes including file size and modification time. The UUID, IVnd key attributes are used internally by the BTFS components.ther POSIX attributes such as inode number and permission modere not aligned with the BTFS so they are assigned with some defaultalues in order to maintain the normal POSIX operations on clientachines.

.6. Consistency model

To improve the performance of BTFS over the wide areaetwork, we employ the weak consistency model which works suf-ciently in distributed animation rendering where files are mostly

ystems and Software 97 (2014) 178–191 183

read, neither frequently nor simultaneously updated. The weakconsistency model can greatly reduce the network latency; butwith some trade-off in data consistency. Firstly, the attributes of thefiles obtained from the metadata server are maintained in memorycache at the BTFS clients until time-to-expire (60 s in the currentimplementation). So, reading a file is not guaranteed to get the lat-est version of the file due to the stale attributes in cache. Secondly,we deploy write back policy. The files are locally updated and thensent back to the metadata server and seeder after closed. It is pos-sible for other BTFS clients to read the old version of the file if theyaccess the file before the write back has completed. However, BTFSensure that the file will never be corrupted since the updated filewill always be assigned a new UUID. Both old and updated filecontent will coexist in the seeders until the next garbage collec-tion which removes the old file from the seeders. Besides, whenthere are several BTFS clients trying to update the same file con-currently, they do not corrupt the integrity of the file; but last writerwins.

When a file is updated or removed from the BTFS, the associatedznode is also updated or removed, respectively. However, the filesin the seeder remain intact for a period of time, not to disrupt anyongoing download. To reclaim the space on the seeder afterwards,there exists a process running periodically to perform garbage col-lection. The process collects all UUIDs from the metadata server andcompares the UUIDs with all files in the seeders. For a file withoutreference from the metadata server and not being currently access,the process removes it from the seeder.

3.7. Security

Since BTFS is intended to be used over a public network, it isnecessary to guard against many security attacks such as unautho-rized access and data corruption. The security mechanism of BTFSis explained as follows.

3.7.1. Authentication, authorization and access controlBTFS needs all users to login before accessing the file metadata

on the metadata server. The BTFS client sends the user credential(username and password) for authentication when establishing theconnection with the metadata server. If the credential is valid, theuser is authenticated. Currently, we develop a file-based authenti-cation as a plug-in for ZooKeeper. The password file simply storesa list of valid username and password of all users. The groupfile maintains the group membership information of all users. Inthe future work, we plan to migrate the password and groupinformation to an LDAP server for better management. To furtherauthorize the user to access files or directories, BTFS relies on theZooKeeper’s Access Control Lists (ACLs). The authenticated user willbe inspected against the ACL to check if the access to the file meta-data is granted. When a new file or directory is created, BTFS clientsduplicate the ACL from parent directory by default.

Another essential security setting is required on the seeder toallow the BTFS clients to only upload and download the files. OtherWEBDAV operations like deleting a file and directory listing muststrictly be prohibited from BTFS clients; otherwise, malicious usersmay create denial of service attacks by scanning and removing seedfiles from the seeder directly.

3.7.2. Data integrityThe data integrity of the file is supported by the 20-byte SHA1

message digest (hash) of the file’s content available in the tor-rent information which the BTFS client obtains from the metadataserver. Thus, according to the BitTorrent protocol, the BTFS clientcan always ensure the data integrity by validating the downloaded

1 al of Systems and Software 97 (2014) 178–191

fic

3

awaifBieBtdtrl

sascddAotCa

3

uZfmosmwao

tSfolnfmt

dbtetttcpc

<?xml version="1.0" encoding="utf-8"?><config><replication>1</replication><seeder>

<ip>203.185.96.47</ip><port>8080</port><path>/repository/default</path><username>admin</username><password>admin</password>

</seeder>

<seeder><ip>203.185.96.48</ip><port>8080</port><path>/repository/default</path><username>admin</username><password>admin</password>

</seeder>

<tracker><url>http://203.185.96.47:6969/announce</url>

</tracker>

<tracker><url>http://203.185.96.48:6969 /announce</url>

</tracker>

84 E. Kijsipongse, N. Assawamekin / The Journ

le with the message digest. If it fails, the downloaded file could beorrupted or tampered and it should be downloaded again.

.7.3. ConfidentialityTo protect the confidentiality of the data stored in the seeders

nd the data transmitted in the network during P2P file exchange,e encrypt the content of the file. The file is always stored as

cipher text in the seeders. When a BTFS client creates a file,t will randomly generate an initialization vector (IV) and a keyor encrypting/decrypting the file, both of them are stored in theTFS attributes at the metadata server. The content of the file

s encrypted with the 128-bit AES CTR mode during write. Thisncrypted file is uploaded to the seeders and shared with otherTFS clients. For another BTFS client to read the file, it must knowhe matching IV and key which can only be obtained from the meta-ata server if it is authorized. The encrypted file is downloaded intohe local cache on each BTFS client and decrypted on-the-fly whileeading. The encryption/decryption is done in the FUSE file systemayer and thus transparent to the applications.

Similarly, the messages between BTFS clients and the metadataerver which carry some sensitive information such as usernamend password as well as the key must not be sent in plaintextince they can easily be exposed to eavesdroppers. So, a secureonnection must be established between BTFS clients and the meta-ata server. Unfortunately, the current ZooKeeper’s C client APIso not support the secure SSL connections (only available in JavaPIs; but we believe that it will be available in the future releasesf ZooKeeper). A workaround that we employ for the moment iso wrap the connection between the metadata server and BTFSlients with stunnel which can secure information and resist replayttacks.

.8. Scalability and fault tolerance

BTFS has been designed to tolerate from single point of fail-re. First, the metadata can be replicated to many servers by usingooKeeper’s master/slave replication. At the present of some serverailures, the operation can continue as long as the majority of the

etadata servers (quorum) agree on the data consistency. Furthern, BTFS clients can arbitrarily connect to one of any metadataervers for read, making the load evenly distributed to multipleetadata servers. In case of write, all the operations will be for-arded to the master server. Since in animation rendering, data

re more read than written, file write will not cause too much loadn the master server.

For the seeders to scale, we can setup multiple seeders and lethe BTFS client randomly choose one of them for uploading the file.o, the load on multiple seeders is likely to balance. If some seedersail, they partially affect the BTFS in parts which the data are storedn those seeders, while the remaining parts are intact. Neverthe-ess, it should be emphasized that files on the failed seeders willot be lost if the files remain cached in online BTFS clients. We can

urther increase the level of fault-tolerance by replicating files onore than one seeder, in which case a BTFS client randomly selects

o hold the replicas.Trackers are also scaleable. Each file can be associated with a

ifferent tracker to reduce the load on single tracker. It is also possi-le to associate an individual file with multiple trackers to alleviatehe problem of tracker failure. The BitTorrent protocol specificationxtension suggests that the BTFS client will try to connect to anyracker that is still functioning randomly. However, it may happenhat clients which share the same file may connect to a different

racker and see only the subset of peers. So, some of them cannotooperate in file transfer for an optimal result. Regarding to thisroblem, our BTFS client is configurable to support a variation thatontacts to all trackers at the expense of additional network traffic.

</config>

Fig. 3. Global configuration file.

3.9. Global configuration management

The global configuration of the BTFS is stored inthe/config/.btfsrc.xml znode in the metadata server. Each clientwill read the content of this node during the initialization phrase.The configuration is written in an XML format which is exem-plified in Fig. 3. The configuration is self-described, for instance,the <seeder> sections define the information of seeders such asIP and port. The <replication> section defines the replicationparameter which specifies the number of file replicas to store indifferent seeders.

3.10. Operation

Since the BTFS is implemented as a file system in the user space,a normal user can mount it to a local directory tree where he/shehas permission. The command to mount BTFS is as follows; wherethe first argument is the IP address of the metadata server and thesecond argument is the local mount point.

$btfsmount 192.168.1.1 /home/user1/btfs

4. Evaluation and experiments

This section describes the experimental setup and presents theresults of micro and macro benchmarks to determine the over-head of BTFS read/write operations and the overall applicationperformance when using BTFS in the volunteer-based distributedanimation rendering, respectively.

4.1. Testbed system configuration

We have carried out all experiments on a testbed system con-sisting of a set of servers located at a central site and multiple clientsfrom different remote sites. The testbed represents the volunteer-based distributed animation rendering which users donate theirdesktop or notebook computers for rendering jobs in a specificproject. The testbed system spans across 5 institutional and cor-

porate sites, i.e. NECTEC, INET, CAT, UTCC and CSLOXINFO, asillustrated in Fig. 4. They all connect to the public Internet. NECTECis chosen as the central site to place all the servers. We allocate 7clients from the remaining sites. For hardware specification, each

E. Kijsipongse, N. Assawamekin / The Journal of Systems and Software 97 (2014) 178–191 185

stbed

chcoa1sfMt2rC

4

tetof

(

(

(

(

((

insight.For write operation, each BTFS client carries out the following

steps respectively:

Fig. 4. Te

lient machine has 4 CPU cores, 4 GB RAM, and at least 50 GB freearddisk space. The CPU speed ranges between 2.2 and 2.4 GHz. Alllients have a public IP address and connect to the Internet with-ut any firewall. However, the original bandwidth between sitess measured by the ttcp network benchmarking tool is as high as00 MB/s which is too fast when comparing to the current Internetpeed of average home users. So, we set the bandwidth throttlingor both egress and ingress traffic on all servers and clients to 10

b/s for being more realistic. Note that, we sometimes observehe network bandwidth between some particular sites as low as

MB/s during the experiments but it does not persist and occursarely just from time-to-time. All machines are installed with LinuxentOS 6.2.

.2. Micro-benchmark

To understand the overhead cost incurred by BTFS, we measurehe time spent in each step of the read and write operations. In thisxperiment, we use only one seeder and one BTFS client. We varyhe file size to read and write from 0.5 MB to 20 MB to see how theverhead grows. For read operation, each BTFS client executes theollowing steps respectively:

1) Get metadata Get file metadata frommetadata server

2) Get torrent Get torrent information of thefile from metadata server

3) Download file Download the file from theseeder and store it in local disk

4) Update cache DB Update cache information inlocal embedded database

5) Read local Read the file locally6) Decrypt Decrypt the file using the key

obtain from the file metadata

system.

Fig. 5 shows the time of all steps in milliseconds for reading filesof different sizes except that the download time is presented inseconds by the secondary Y-axis. Clearly, as the file becomes larger,the download time increases proportionally. The accumulativetime of all other steps is considered as the overhead and it growsin accordance with file size. Most of the overhead time comesfrom the decryption step. However, the overhead time is less thana second and grows much slower than the download time. Forthe file larger than 1 M, this overhead is very trivial. Although,this operation breakdown differs by many factors such as band-width, the number of peers and CPU speed, it still gives some useful

Fig. 5. Read operation breakdown.

186 E. Kijsipongse, N. Assawamekin / The Journal of Systems and Software 97 (2014) 178–191

(

(((

(

((

Bmsachsgw

4

twe

4

ps3pTfkfiabm.Ttpcaal

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70

Fil e ID

Freq

uenc

y

Fig. 6. Write operation breakdown.

1) Get metadata Get the parent’s ACL from themetadata server

2) Write local Write the file in local disk3) Encrypt Encrypt the file using a generated key4) Update cache DB Update cache information in local

embedded database5) Create torrent Calculate file hash and create torrent

information6) Upload file Upload file to a seeder7) Update metadata Update file attribute and put torrent

information into the metadata serverSimilarly, Fig. 6 shows the overhead and the upload time of a

TFS client when writing a file of different sizes. The overhead iseasured in milliseconds, whereas the upload time is presented in

econds by the secondary Y-axis. In overall, the overhead in writing file is larger than that in reading a file but less than a second in allases. The time used in encryption is the largest part of the over-ead. The upload, encryption, writing locally and creating torrentteps are clearly grows as the file becomes larger. However, therowth rate of the upload time is higher than that of the overheadhich is also trivial when writing a file larger than 1 M.

.3. Macro-benchmark

This section presents the performance evaluation of BTFS onhe testbed using a distributed animation rendering applicationith real dataset. The characteristics of the benchmark dataset and

xperimental results are as follows.

.3.1. Render data and softwareThe data used in the experiments are from the Big Buck Bunny

roject which is the open animation movie initiated by the Blenderoftware development team. The project has publicly released allD models, image and texture files that have been used during theirroduction. There are over 400 files in 1.2 GB for the entire dataset.hese files are organized into 13 scenes separated by the top levelolders under the project directory. Each scene may further be bro-en into sub-scenes, each of which is composed of a main .blendle and references to many other files (including texture imagesnd .blend files) in the project directory. Fig. 7 depicts the num-er of references (frequency) to each .blend file, sorted from theost frequent to the least, for all scenes. It is clear that many of the

blend files are commonly accessed as shown by high frequency.herefore, it would be beneficial for the BTFS to share and cachehese files for distributed rendering. The rendering process finallyroduces a number of images. Since different images have distinct

omputational requirements, some images can finish rendering in

few minutes and use only small memory; while others may taken hour with much larger memory. The final animation is 10 minong consisting of more than 15,000 image frames.

Fig. 7. Frequency of file access.

To test BTFS with different workload, we select scenes havingdistinct computational requirements to represent small, mediumand large jobs as shown in Table 3. The input size is the total size ofthe main .blend file and all referenced files required for rendering.The average rendering time is measured when all files are stored inthe local disk. Note that several scenes in the Big Buck Bunny requirelarger memory than what we have (4 GB) in the testbed, in whichcase they cause the rendering to fail. These too large scenes, such as01 intro/01.blend, are initially precluded from our consideration.

We deploy DrQueue 0.63.4, the job scheduler for distributedrender farm, to dispatch jobs to client machines. DrQueuecan detect client availability, support fault-tolerance by job re-execution, and has a superb interface for end users. Basically, tosubmit a job, users use DrQueue GUI to create a job descriptionproviding the path of the .blend file, the number of start frame andthe end frame as well as other render parameters. Blender 2.49,the open source rendering software, is installed on all clients. Fora client to execute a job, it is required that the input files (i.e. the.blend and all referenced files) of the job are always accessible fromthe client. Thus, an instance of DFS is used to allow the clients totransparently access all the required files from the network.

4.3.2. Performance comparison of BTFS and SMB file systemWe compare the performance of BTFS with those of SMB, one

of the most widely used DFS. We setup a Samba version 3.5.10, anopen source SMB server software, at the NECTEC site as a reposi-tory to hold the entire project dataset. For a client machine to accessfiles from Samba, there are two possible SMB client software: Fuse-SMB and Linux CIFS. Both are different SMB client implementation.The former is a user-space SMB client to mount the SMB file sys-tem based on FUSE. The latter is the kernel-based SMB client thatrequires root permission to mount. We use a single BTFS seeder forthis experiment. Jobs of the selected scenes are submitted to thetestbed with 25% full HD resolution (480 × 270 pixels). Then, wemeasure the total rendering time and find the average of 3 runs.

The total rendering time varies for each file system used. Forsmall job, it spends approximately 1260, 1621 and 5040 s to finishunder BTFS, Linux CIFS and Fuse-SMB, respectively in an ascend-ing order as depicted in Fig. 8. Clearly, BTFS gives the best time;whereas Fuse-SMB is the worst. To elaborate how BTFS could helpimprove the performance, we plot the amount of data transfer fromthe server (seeder, in case of BTFS) over time from one such run asillustrated in Fig. 9. It shows that Fuse-SMB performs the worst

since the rendering application requires data transfer mostly allthe time, making the rendering process delayed. Linux CIFS is bet-ter than Fuse-SMB as the data transfer in Linux CIFS happens only atthe early period of time (around the first 1000 s) and then reduces

E. Kijsipongse, N. Assawamekin / The Journal of Systems and Software 97 (2014) 178–191 187

Table 3Characteristics of the testing dataset.

Job Size Scene No. of frames Mem. (MB) Avg. rendering time (min. per frame) Input size (MB)

Small 12 peach/03.blend 28 650 1:30 90

Medium 01 intro/02.blend 93 2500 4:50 40

Large 02 rabbit/02.blend 91 3500 9:06 290

gttmtche

tdtwstr

iFFt

radually until job finishes. This is due to the internal differences ofheir SMB implementation. Most of the network traffic reduction inhe Linux CIFS comes from using memory pagecache which auto-

atically caches files in free memory. The BTFS has the least dataransfer from the server because of two main reasons. First, BTFSlients can share data among other peers making the data transferappen only at the first 500 s. The other reason is that BTFS canffectively utilize the local disk cache.

For medium job, it spends approximately 5837, 5740 and 7202 so finish under BTFS, Linux CIFS and Fuse-SMB, respectively asepicted in Fig. 8. Similar to the small job, the Fuse-SMB performshe worst in terms of the rendering time and the amount of net-ork traffic from the server as shown in Fig. 10. Its network traffic

tays high all the time. Both BTFS and Linux CIFS transfer data fromhe server only at the beginning of time. While the BTFS has similarendering time, it has half amount of traffic than Linux CIFS.

Next, for the large job, the rendering process takes approx-mately 9248, 29,886 and 32,717 s under BTFS, Linux CIFS and

use-SMB, respectively as depicted in Fig. 8. The performance ofuse-SMB is the worst due to the same reasons as mentioned forhe small and medium jobs. It is observed that BTFS can reduce the

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

Small Medium Large

Job Size

Ren

deri

ng T

ime

(s)

BTFS

CIFS

Fuse SMB

Fig. 8. Rendering time under different file system.

rendering time from that of Fuse-SMB more than 3 times. Inter-estingly, the Linux CIFS now spends rendering time close to theFuse-SMB. Fig. 11 shows the traffic load of the server under differ-ent file system. It reveals that Linux CIFS has continuous networkactivity throughout the time. This is owing to the fact that the largejob requires entire memory of the machine in rendering, so thereis no free memory left for doing pagecache. As a result, Linux CIFSneeds to reload files from the server for every image frame. On thecontrary, BTFS stores files in persistent storage (disk) which is inde-pendent to pagecache. Thus, the rendering time and the traffic loadfrom the server are greatly reduced by the BTFS.

4.3.3. Load balance in BTFS with multiple seedersIn this experiment, we create multiple seeders for the purpose

of load sharing. There are 4 seeders, each of which is capable ofsending data at 1 kB/s. Data replication is disable such that a file isuploaded into a random seeder. The entire Big Buck Bunny projectis copied into the BTFS. Then, we run 7 remote clients that contin-uously read a random file from the project. The P2P data exchangeis turned off to make clients download data from the seeders only.Fig. 12 depicts the load of each seeder over time. The load variesand is spread across all seeders. However, load is not perfectly bal-anced as the load on seeder 1 and 3 is higher than others resultingin the actual aggregate bandwidth below 4 kB/s. From our obser-vation, each seeder holds roughly 100 files but seeder 1 and 3 holdlarger data as shown in Table 4. Ideally, the load can be more equalif the distribution of files involves file size rather than relying onlyon the randomness, which remains further investigation.

4.3.4. BTFS replication performanceIn this experiment, we replicate files into multiple seeders and

measure the time for a single BTFS client to retrieve the files fromdifferent number of seeders. With replication, the BTFS client candownload different parts of the file in parallel from multiple seed-

ers. We vary the number of seeders from 1 to 4. Fig. 13 showsthe transfer time (not including the rendering time) for large job,i.e. 02 rabbit/02.blend and all 58 referenced files, from using mul-tiple seeders. The transfer time reduces significantly when the

188 E. Kijsipongse, N. Assawamekin / The Journal of Systems and Software 97 (2014) 178–191

0

200

400

600

800

1,000

1,200

0 500 1,000 1,500 2,000 2,500 3,000 3,500 4,000 4,500

Time (s)

Dat

a T

rans

fer

(KB

/s)

BTFS

Fuse-SMB

CIFS

Fig. 9. Outbound network traffic from server (seeder) for small job.

0

200

400

600

800

1,000

1,200

0 1,00 0 2,000 3,000 4,000 5,000 6,000 7,000 8,000

s)

Dat

a T

rans

fer

(KB

/s)

BTFS

Fuse-SMB

CIFS

c from

nbFs

Time (

Fig. 10. Outbound network traffi

umber of seeders increases (and thus having more replicas). Theest performance is obtained when the number of seeders is 4.ig. 14 demonstrates the transfer speedup over the number ofeeders. Note that the transfer speedup is calculated as the ratio

0

200

400

600

800

1,000

1,200

0 5,000 10,000 15,000 20,000 2

Time (s)

Dat

a T

rans

fer

(KB

/s)

Fig. 11. Outbound network traffic fro

server (seeder) for medium job.

of the transfer time using multiple seeders to the transfer timeusing single one. Although, the transfer speedup does not reachthe ideal speedup due to some overhead incurred, it remainspersuasive.

5,000 30,000 35,000 40,000

BTFS

Fuse-SMB

CIFS

m server (seeder) for large job.

E. Kijsipongse, N. Assawamekin / The Journal of Systems and Software 97 (2014) 178–191 189

0

500

1,000

1,500

2,000

2,500

3,000

0

240

480

720

960

1,20

0

1,44

0

1,68

0

1,92

0

2,16

0

2,40

0

2,64

0

2,88

0

3,12

0

3,36

0

3,60

0

3,84

0

4,08

0

4,32

0

Time (s)

Dat

a T

rans

fer

(KB

/s)

See der 4

See der 3

Seeder 2

Seeder 1

Fig. 12. Accumulative load of multiple seeders.

0

50

100

150

200

250

300

350

1 2 3 4

Number of Seeders

Tim

e (s

)

Ideal

BTFS

Fig. 13. Data transfer time of multiple replicas.

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 2 3 4

Number of Seeders

Spee

dup

Ideal

BTFS

p of m

4

tlc

Fig. 14. Speedu

.3.5. Evaluating BTFS under volunteer volatility

In this experiment, we demonstrate that BTFS can work under

he volatility circumstance where nodes can dynamically join andeave the system at anytime during the job execution, and dis-uss how the volunteer volatility affects the rendering process. To

ultiple replicas.

handle such the volatility situation gracefully, DrQueue job sched-

uler keeps monitoring the availability of the nodes. If a node leaves,fails or the client process on the node ceases, DrQueue will removethe node from the list of available nodes, and the job will be rerunon the other node. We emulate the volatility of a node by randomly

190 E. Kijsipongse, N. Assawamekin / The Journal of S

Table 4Number of files and disk usages on each seeder.

Seeder 1 Seeder 2 Seeder 3 Seeder 4 Total

No. of files 119 103 112 101 435Disk usage (MB) 422 259 310 229 1220

Table 5Performance of rendering process under different volunteer volatility.

2700 s 1800 s 900 s

sBneatmadrwa

4

(fsudBcsfNrntnal4attr

5

ittfpaoiista

Time (s) # Rerun Time (s) # Rerun Time (s) # Rerun

BTFS + DrQueue 7525 11 11, 347 23 15, 840 31BTFS + BOINC 7861 14 10, 200 20 15, 546 36

tarting and killing the client process (including BTFS client andlender) on the testing nodes. The On and Off time periods of eachode are independently modeled by two random variables withxponential distribution. We vary the average On time to 900, 1800nd 2700 s, while fixing the average Off time to 1800 s. We reporthe average rendering time and the number of rerun jobs for the

edium job in Table 5. Intuitively, the larger On period gives longervailability time which is more preferable for the volunteer-basedistributed rendering than the smaller one as observed by lowerendering time and number of rerun jobs. Yet, the BTFS can workith DrQueue under the volatile environment to finish the jobs in

ll cases.

.3.6. Using BTFS with BOINC volunteer computing platformThis experiment shows that BTFS can also be used with BOINC

Volunteer Computing), the well-known volunteer computing plat-orm, for a volunteer-based animation rendering system. Here, weetup own BOINC server at NECTEC site. The BOINC platform is onlysed for job and user management; while the data repository andistribution is delegated to BTFS. To avoid porting Blender to theOINC platform, Blender is executed under the BOINC wrapper onlient nodes. Rendering jobs are submitted to the BOINC as shellcripts that mount the BTFS, invoke Blender to render a specificrame using data from BTFS, and unmount the BTFS when finished.ote that almost all BOINC’s job execution parameters such as

edundant computing, or resource requirements still function asormal. By using BTFS, users have an easier way to create jobs ashey do not need to pack and stage all relevant files into jobs likeative BOINC. Besides, the data can be more efficiently distributedmong volunteer nodes reducing the BOINC server load. We emu-ate the volunteer volatility by the method described in Section.3.5. The average rendering time and the number of rerun jobsre placed in the bottom of Table 5. The results are not notably dis-inct from those of DrQueue which shows that the BTFS can work inandem with the BOINC platform for volunteer-based distributedendering.

. Conclusions

The BitTorrent file system, or BTFS, provides a viable way tomprove the communication performance of volunteer-based dis-ributed animation rendering. It allows multiple rendering clientso share and exchange data in the P2P manner so that the trans-er time is greatly reduced allowing the rendering process to takelace more rapidly. The experiments carried out on a testbed using

production-grade 3D animation show that BTFS can lessen theverall rendering time and lower traffic load on the server compar-ng to other traditional network file systems. BTFS is implemented

n a file system layer to shield the rendering applications fromuch the complication in P2P communication mechanism. By beingransparency, BTFS could be used in not only the rendering but alsony other application domains which require disseminating a large

ystems and Software 97 (2014) 178–191

amount of common data to many distributed computers in a shorttime. Data in BTFS are stored securely by enforcing access controlas well as encryption on files. BTFS is designed to have all com-ponents scalable and fault-tolerant. Replication is also supportedto enhance the availability of data. Although, the core componentsof BTFS are built around well-developed standard protocols whoseability, maturity and compatibility have been proven, it remainschallenged to deploy BTFS on very large-scale environments wherethere exist thousands of computers. In addition, the “last writerwins” consistency model used in BTFS might not be suitable for allapplications. Release consistency model which requires lock andunlock operations when accessing files should further be imple-mented in BTFS as it is required in some applications. Besides, thepartial request files and the data reduplication in BTFS are alsoworth investigation for future work.

References

Anderson, D.P., 2004, November. BOINC: a system for public-resource computingand storage. In: Proceedings of the fifth IEEE/ACM International Workshop onGrid Computing (GRID 2004), Pittsburgh, PA, USA, pp. 4–10.

Apache ZooKeeper. Available at http://zookeeper.apache.org/ (accessed 31.10.13).Assawamekin, N., Kijsipongse, E.,2013. Design and implementation of BitTorrent file

system for distributed animation rendering. In: Proceedings of the 17th Interna-tional Computer Science and Engineering Conference (ICSEC 2013), September4–6. Windsor Suites Hotel, Bangkok, Thailand, pp. 68–72.

Bharambe, A.R., Herley, C., Padmanabhan, V.N., 2006, April. Analyzing and improv-ing a BitTorrent networks performance mechanisms. In: Proceedings of 25thIEEE International Conference on Computer Communications (INFOCOM 2006),Barcelona, Spain, pp. 1–12.

Big Buck Bunny. Available from: http://www.bigbuckbunny.org/ (accessed31.10.13).

BitTorrent Multitracker Metadata Extension. Available from: http://www.bittorrent.org/beps/bep 0012.html (accessed 31.10.13).

Blender. Available from: http://www.blender.org/ (accessed 31.10.13).BURP: the Big and Ugly Rendering Project. Available from: http://burp.

renderfarming.net/ (accessed 31.10.13).Cappello, F., et al., 2005. Computing on large-scale distributed systems: XtremWeb

architecture, programming models, security, tests and convergence with grid.Future Gener. Comput. Syst. 21 (March (3)), 417–437.

Coda File System. Available from: http://www.coda.cs.cmu.edu/ (accessed31.10.13).

Common Internet File System (CIFS). Available from: http://www.cifs.com/(accessed 31.10.13).

Cohen, B., 2003, May. Incentives Build Robustness in BitTorrent. Workshop on Eco-nomics of Peer-to-Peer Systems, Berkeley, CA, USA.

Costa, F., et al., 2008, April. Optimizing the data distribution layer of BOINC with Bit-Torrent. In: Proceedings of the 2008 IEEE International Parallel and DistributedProcessing Symposium (IPDPS 2008), Miami, Florida, USA, pp. 1–8.

Distributed Rendering. Available from: http://www.isgtw.org/visualization/distributed-rendering/ (accessed 31.10.13).

DrQueue, the Open Source Distributed Render Queue. Available from: http://www.drqueue.org/ (accessed 31.10.13).

Fedak, G., He, H., Cappello, F., 2008. BitDew: a programmable environment for large-scale data management and distribution. In: Proceedings of the 2008 ACM/IEEEConference on Supercomputing (SC 2008), Austin.

Free Rendering by the People for the People. Available from:http://www.renderfarm.fi/ (accessed 31.10.13).

FUSE: Filesystem in Userspace. Available from: http://fuse.sourceforge.net/(accessed 31.10.13).

Globus Toolkit. Available from: http://www.globus.org/toolkit/ (accessed 31.10.13).GlusterFS. Available from: http://www.gluster.org/ (accessed 31.10.13).Howard, J.H., et al., 1988. Scale and performance in a distributed file system. ACM

Trans. Comput. Syst. 6 (February (1)), 51–81.Kaplan, A., Fox, G.C., Laszewski, G.v., 2007, November. GridTorrent Framework: A

High-Performance Data Transfer and Data Sharing Framework for ScientificComputing. Grid Computing Environments Workshop, Reno, NV, USA.

KaZaA Lite. Available at http://kazaa-lite.en.softonic.com/ (accessed 31.10.13).Klingberg, T., Manfredi, R. Gnutella Protocol Development. Available from:

http://rfc-gnutella.sourceforge.net/src/rfc-0 6-draft.html (accessed 31.10.13).Moose File System. Available from: http://www.moosefs.org/ (accessed 31.10.13).MogileFS. Available from: https://github.com/mogilefs/ (accessed 31.10.13).Napster. Available from: http://www.napster.com/ (accessed 31.10.13).opentracker – An Open and Free BitTorrent Tracker. Available from:

http://erdgeist.org/arts/software/opentracker/ (accessed 31.10.13).

PublicBitTorrent – An Open Tracker Project. Available from: http://publicbt.com/

(accessed 31.10.13).Qiu, D., Srikant, R., 2004. Modeling and performance analysis of BitTorrent-like peer-

to-peer networks. ACM SIGCOMM Comput. Commun. Rev. 34 (October (4)),367–378.

al of S

S

S

S

S

sT

V r

v

W

Namfon Assawamekin received her Ph.D. in computer science from Mahidol Uni-versity, Thailand in 2009. She is currently an assistant professor at School of Science

E. Kijsipongse, N. Assawamekin / The Journ

amba – Opening Windows to a Wider World. Available from: http://www.samba.org/ (accessed 31.10.13).

andberg, R., et al., 1985. Design and implementation of the sun network filesystem.In: USENIX 1985 Summer Conference Proceedings, pp. 119–130.

hvachko, K., et al., 2010. The Hadoop distributed file system. In: 2010 IEEE 26thSymposium on Mass Storage Systems and Technologies (MSST), May 6–7, LakeTahoe, Nevada, USA, pp. 1–10.

MB for Fuse. Available from: http://www.ricardis.tudelft.nl/∼vincent/fusesmb/(accessed 31.10.13).

tunnel. Available from: https://www.stunnel.org/index.html/ (accessed 31.10.13).he Linux Kernel Archives. Available from: https://www.kernel.org/ (accessed

31.10.13).olunteer Computing. Available from: http://boinc.berkeley.edu/trac/wiki/Voluntee

Computing/ (accessed 31.10.13).

Swarm: Free Render Farm. Available from: http://www.vswarm.com/ (accessed

31.10.13).ei, B., Fedak, G., Cappello, F., 2005, July. Collaborative data distribution with Bit-

Torrent for computational desktop grids. In: Proceedings of the 4th InternationalSymposium on Parallel and Distributed Computing (ISPDC 2005), France.

ystems and Software 97 (2014) 178–191 191

Wei, B., Fedak, G., Cappello, F., 2007, November. Towards efficient data distributionon computational desktop grids with BitTorrent. Future Gener. Comput. Syst. 23(8), 983–989.

Zissimos, A., et al., 2007, May. GridTorrent: optimizing data transfers in the gridwith collaborative sharing. In: Proceedings of the 11th Panhellenic Conferenceon Informatics (PCI 2007), Patras, Greece.

Ekasit Kijsipongse received his Ph.D. in computer science from Mahidol University,Thailand in 2009. He currently works as a researcher at the National Electronics andComputer Technology Center, Thailand. His research interests include distributedand parallel systems, as well as grid and cloud computing.

and Technology, University of the Thai Chamber of Commerce, Thailand. Her mainresearch interests are in the areas of software engineering, Web engineering, ontolo-gies, data and knowledge engineering and distributed systems.