A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached...

9
A NoSQL based Cached Storage Solution of GIS Web Service 1 *Jin Liu, 1 Yangping Wu, 1 Dan Zhuo, 2 Qiping Hu, 1 Yan Wang, 3 Wensheng Zhang *1,Corresponding Author State Key Lab.of Software Engineering, Computer School, Wuhan University, China, [email protected] 2 International School of Software, Wuhan University, China, 3 State Key Lab.of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Science, China Abstract One of challenges in GIS Web service is mass GIS data storage and processing. There are at least three obstacles that oppose dealing with this challenge: the mismatch between the frequent requirement of concurrent tasks for accessing data in GIS Web service and the poor performance of conventional storage mechanism such as relational databases; the insufficiency of relational databases to manage complex data types in GIS Web service; the horizontal scalability of storage for non-stop GIS Web service evolution. This paper proposes a distributed cached storage solution, as the mixture of MongoDB storage and Memcached cache, for GIS Web service. Both techniques are closely related to NoSQL that is a class of database management system identified by its non-adherence to the widely used relational database management system (RDBMS) model. A concise case that facilitates a better understanding of this work indicates its feasibility. Keywords: Cached Storage, GIS Web Service, NoSQL 1. Introduction The rapid development of geo-spatial information technology promotes various GIS applications maximize their utilities as Web service in different scales [1, 2, 22]. As this trend continues, mass GIS data storage and processing becomes one of challenges in GIS Web service [3], which can be easily found in different fields such as urban planning, water and electricity, meteorology, and so forth. Especially, the rise of Web 2.0 not only changes the appearance of GIS Web services but also demands scaling their overall performance. Against this background, there are at least three obstacles that oppose dealing with this challenge. The first is the mismatch between frequent requirement of concurrent tasks for accessing data in GIS Web service and the poor performance of conventional storage mechanism such as relational databases [3, 4]. The former are compute-intensive operations of reading and writing data in large scale, which is applicable to flash memory that satisfies fast response times [3, 4]. The latter is usually I/O-intensive operations that are appropriate to disk storage [3, 4]. Taking dynamic web pages for an example, they are deduced by computation and easy to vanish if they do not be cached. It is easy to find them serve thousands of users. When this happened, slow speed of disk operations of traditional relational databases cannot meet the concurrent data access in large scale, which needs quick response. The second is the insufficiency of relational databases to manage complex data types in GIS Web service [5]. Versatile types of mass data can be integrated into GIS Web services, involving geometry, word, voice, graph, hypertext, and so forth. Traditional rational databases that predefine their data schema cannot process last-minute data type that frequently finds in GIS Web service. The last is the horizontal scalability of storage for service evolution [6], which means extending storage units to perform service in larger scale and making all storage units working as a whole. This is especially important to a non- stop GIS Web service with millions of users. Although several efforts had been made to alleviate the above mentioned difficulties, each of them alone solve the challenge of mass data storage and processing in GIS Web service. To bridge the gap between concurrent tasks of accessing data in GIS Web service and slow I/O-intensive disk operations with regards to relational database, squid [7] as a caching proxy for the Web is developed to support HTTP, HTTPS, FTP, and the so forth. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. Squid has extensive access controls and makes a server accelerator. It runs on most available operating systems, including Windows and is licensed A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number20,November 2012 doi:10.4156/jdcta.vol6.issue20.29 268

Transcript of A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached...

Page 1: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

A NoSQL based Cached Storage Solution of GIS Web Service

1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu, 1Yan Wang, 3Wensheng Zhang *1,Corresponding AuthorState Key Lab.of Software Engineering, Computer School, Wuhan University,

China, [email protected] 2International School of Software, Wuhan University, China,

3State Key Lab.of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Science, China

Abstract

One of challenges in GIS Web service is mass GIS data storage and processing. There are at least three obstacles that oppose dealing with this challenge: the mismatch between the frequent requirement of concurrent tasks for accessing data in GIS Web service and the poor performance of conventional storage mechanism such as relational databases; the insufficiency of relational databases to manage complex data types in GIS Web service; the horizontal scalability of storage for non-stop GIS Web service evolution. This paper proposes a distributed cached storage solution, as the mixture of MongoDB storage and Memcached cache, for GIS Web service. Both techniques are closely related to NoSQL that is a class of database management system identified by its non-adherence to the widely used relational database management system (RDBMS) model. A concise case that facilitates a better understanding of this work indicates its feasibility.

Keywords: Cached Storage, GIS Web Service, NoSQL

1. Introduction

The rapid development of geo-spatial information technology promotes various GIS applications maximize their utilities as Web service in different scales [1, 2, 22]. As this trend continues, mass GIS data storage and processing becomes one of challenges in GIS Web service [3], which can be easily found in different fields such as urban planning, water and electricity, meteorology, and so forth. Especially, the rise of Web 2.0 not only changes the appearance of GIS Web services but also demands scaling their overall performance.

Against this background, there are at least three obstacles that oppose dealing with this challenge. The first is the mismatch between frequent requirement of concurrent tasks for accessing data in GIS Web service and the poor performance of conventional storage mechanism such as relational databases [3, 4]. The former are compute-intensive operations of reading and writing data in large scale, which is applicable to flash memory that satisfies fast response times [3, 4]. The latter is usually I/O-intensive operations that are appropriate to disk storage [3, 4]. Taking dynamic web pages for an example, they are deduced by computation and easy to vanish if they do not be cached. It is easy to find them serve thousands of users. When this happened, slow speed of disk operations of traditional relational databases cannot meet the concurrent data access in large scale, which needs quick response. The second is the insufficiency of relational databases to manage complex data types in GIS Web service [5]. Versatile types of mass data can be integrated into GIS Web services, involving geometry, word, voice, graph, hypertext, and so forth. Traditional rational databases that predefine their data schema cannot process last-minute data type that frequently finds in GIS Web service. The last is the horizontal scalability of storage for service evolution [6], which means extending storage units to perform service in larger scale and making all storage units working as a whole. This is especially important to a non-stop GIS Web service with millions of users.

Although several efforts had been made to alleviate the above mentioned difficulties, each of them alone solve the challenge of mass data storage and processing in GIS Web service. To bridge the gap between concurrent tasks of accessing data in GIS Web service and slow I/O-intensive disk operations with regards to relational database, squid [7] as a caching proxy for the Web is developed to support HTTP, HTTPS, FTP, and the so forth. It reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. Squid has extensive access controls and makes a server accelerator. It runs on most available operating systems, including Windows and is licensed

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

International Journal of Digital Content Technology and its Applications(JDCTA) Volume6,Number20,November 2012 doi:10.4156/jdcta.vol6.issue20.29

268

Page 2: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

under the GNU GPL[7]. When a user needs to access a web page, it sends the request to squid that acts on its behalf to take the page back from the services and forwards the page to the user’s client. Meanwhile, squid also keeps a copy of this page so that it can deliver to other users that request for the same page and achieve so-called high performance. However, there are still several deficiencies of Squid, including its low permeability, weak stability, poor speed up effect and uneasy deployment [8]. Memcached is another general purpose distributed memory caching system that is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source must be read [9]. Memcached runs on Unix, Linux, Windows and Mac OS X and is distributed under a permissive free software license [9]. Memcached's APIs provide a giant hash table distributed across multiple machines. When the table is full, subsequent inserts make older data to be purged in least recently used (LRU) order. Memcached is used by sites including YouTube, Reddit, Zynga, Facebook, Orange and Twitter [9]. Similar web cache techniques are deliberately omitted due to the limitation of paper length.

Generally speaking, these web cache techniques are closely related to the Brewer's CAP theorem, which indicates that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: consistency, availability and partition tolerance [10, 11]. Consistency means all nodes should see the same data at the same time. Availability means a guarantee that every request receives a response about whether it was successful or failed. Partition tolerance indicates that the system continues to operate despite arbitrary message loss or failure of part of the system. Versatile web cache techniques strive for availability and partition tolerance of the CAP theorem by getting the supports from tolerance to network partition. By doing so, it meets the requirement of frequent concurrent data access. By sacrificing the strict consistency, they usually satisfy the eventual consistency that means given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent [12].

To make up for the insufficiency of relational databases to manage complex data types in GIS Web service, key value store was introduced that allow the application developer to store schema-less data [13]. This data is usually consisting of a string which represents the key and the actual data which is considered to be the value in the "key-value" relationship. The data itself is usually some kind of primitive of the programming language or an object that is being marshaled by the programming languages bindings to the key value store. This replaces the need for fixed data model and makes the requirement for properly formatted data less strict. CouchDB is a key-value store implementation in Erlang that uses HTTP to communicate with clients and Javascript to generate "views" [23]. Tokyo Cabinet Tokyo Cabinet is a key-value store that supports three modes of operation: hash table mode, b-tree mode and table mode. Tokyo Cabinet Tokyo Cabinet is used by the high-load environment of the Japanese Facebook-equivalent Mixi [15]. Redis is an in-memory key-value store focusing on performance [15]. As an extended key-value store originally developed by Facebook, Cassandra was developed by some of the key engineers behind Amazon's famous Dynamo database [16]. Memcached is also an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering [9].

To surmount the last obstacle, Apache Cassandra is designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure [16, 18]. Cassandra provides a structured key-value store with tunable consistency. Keys map to multiple values, which are grouped into column families. The column families are fixed when a Cassandra database is created, but columns can be added to a family at any time. MongoDB provides auto-sharding to achieve storage scalability [17]. It can allocate databases, set and objects in databases to Shards without shutdown of services. Memcached can easily add units to clusters in a mass distributed system without any additional load because there is no communication between any two Memcached units [9]. This way avoids implosion of network traffic.

Inspired by these efforts, this paper proposes a distributed cached storage solution, as the mixture of MongoDB storage and Memcached cache, for GIS Web service. This is because both techniques satisfy good horizontal scalability of GIS Web services and provide sufficient support for complex data types. MongoDB storage and Memcached cache are both closely related to NoSQL that is a class of database management system identified by its non-adherence to the widely used relational database management system (RDBMS) model [19]. Moreover, it is obvious that the high speed of data access of Memcached as the auxiliary cache compensates for the poor performance of I/O operations of

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

269

Page 3: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

MongoDB as the data Persistence layer. The simple design of memcached can promote quick deployment, ease of development, and solve many problems facing large data caches. This paper will implement principles of distributed key-value storage or cache to achieve high performance of concurrent data operations in distributed GIS Web services. To facilitate a better understanding of this work, the representation of the rest paper is closely accompanied by a concise case.

The rest paper is organized as follows: section 2 explains the primitive method of this paper. Section 3 presents the design details of the cached storage system and its experiment. Section 4 demonstrates the results of experiment. At last, section 5 concludes this paper. 2. Primitive method

Against the background of key-value store and distributed cache, a GIS Web service with cached storage should satisfy the following requirements:

a). The current whole data size is around 400GB, GIS data belongs to tiff files around 200kb, and the size of each file may grow coefficient.

b). The regular operations on GIS data are query and retrieval. Once a GIS data file is deposited into a database, it seldom to be changed.

c). The storage mechanism should be able to retrieval the geographic position information of GIS objects if it has been given the query information of rectangular area and objects’ IDs.

d). The storage is scalable with multiple interface protocols, especially http protocol. e). Under the condition of concurrent query with hundreds of users, the cost of each GIS image

should be less than 10 ms. f). The storage mechanism should provide a console for checking storage and cache indicators,

configuring the cached storage, and executing batch processing operations on GIS images.

The function of distributed cached mechanism is to decrease load of databases by prefecting common used data into the cache memory on different web nodes. Due to the distributed topology of cached storage, the capability of memory can be added with conversion. A running massive GIS image storage system shall meet heavy data load. This paper takes Memcached as its cache and MongoDB as its storage mechanism.

When the mass image storage system is running, there will be a lot of database load; therefore, it is necessary to use the caching system. In this system, we will use the Mongo DB in Key/value storage system and Memcached in distributed caching system. Memcached can be deployed at any web nodes and be accessed via network. Its scalability is hidden to terminal services that get concrete data by only providing Memcached keys of data.

MongoDB, as a database based on distributed file storage, provides a high-performance data storage solution with good scalability for Web applications. It takes a collection-oriented approach. It means that data is deposited into sets by groups and each group is a collection. Each collection that owns a unique identification name at a database can accommodate almost unlimited documents. The collection of MongoDB is similar to the concept of table in relational databases. But the collection of MongoDB is schema free, which means it is not necessary predefine the structure of data. Data with different structure can be deposited at the same database as key-value pairs. The key, as a character string, is used to uniquely identify a data document. The value can be almost any types, especially complex types. This storage style is called BSON (Binary Serialized Document Format) [20].

Until now the logical model of the cached storage mechanism for GIS Web service can be represented in figure 1.

In Figure 1, each rectangle means a function module. HIS module that means Humongous Image Storage is the core module of the cached storage of GIS Web service. MongoDB cluster plays as consistency layer for GIS Web service with good scalability. Memcached is deployed as the distributed cache. The module of Http Adapter is the adapter of Http protocol, which provide the function of data access from the outside of the GIS Web service. The data access module is isolated and provided adaptors of different protocols. The module of Http Adapter implements the primitive access protocol. The module of console is designed for system configuration. The module of 3rd App is the third party applications. The arrows indicate the dependency relationships between these modules. Thus, the HIS

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

270

Page 4: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

module depends on MongoDB and Memcached. The module of 3rd App depends on HttpAdapter and HIS. HttpAdapter and Console depend on HIS.

Figure 1. Logical model of the cached storage mechanism for GIS Web service

The HIS module is an important component that is referenced by most other modules. The HIS

module is implemented as a dynamic link library to fold drivers of Memcached and MongoDB and provide unified access interfaces. The function of HIS can further be designed in details, as described in Figure 2. The design of HIS follows principles of High-cohesion in the inner module and loose couple among modules to guarantee the scalability of HIS so that similar storage or cache techniques can be incorporated into the system.

Figure 2. The function of HIS module in details

3. System and experiment design

File system, relational database and key-value database are three well-known techniques for data consistency layer. Each one corresponds with different concurrency mechanisms. Thus two kinds of magnanimous data accessing approach are introduced into the cached storage mechanism, i.e., the magnanimous data access with single thread and concurrent multithreads. The former tests the absolute efficiency of data access and the latter tests the ability of data processing for different consistency layer techniques in a concurrent computing environment.

To perform the single thread test, the system will test the performance of reading operation on file system, relational database and key-value database on Windows OS. The relational database is Oracle 10g database, which is known as a very stable and efficient rational database. The key-value database is MongoDB with a good reputation of flexible settings and good performance. The file system takes the Windows NTFS. The detailed testing environment parameters are described in Table 1.

By observing the changing trends of these samples with different storage or cache techniques, their performance can be compared. All the above mentioned techniques provide API with C#

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

271

Page 5: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

programming language, including testing environment and data processing test under the environment of dot Net Framework 4.0.

Table 1. Experimental environment of the magnanimous data access with single thread

Type Description Operating system Windows Server 2003 SP2 full version

CPU Inter Core i5,4core,2.67GHZ

Memory 4GB,2.66GHZ

Hardware 1TB capacity,32MB cache,rotate speed 15000/s

File System NTFS Relational database Oracle DB 10g key/value database Mongo DB 1.6.4

The test involves several technical caution places. All testing cases randomly select IDs from

samples to query data. Different cases use corresponding drivers of consistency storage layers. A case suite is consisted of three cases: File System Case, Oracle Case and Mongo Case. The design of case suite makes it possible to perform batch process of testing cases, deduce unnecessary manual intervention and improve testing efficiency. A sample as the test imput is ID set of images. The size of a sample is the number of all the members in this sample. For each test case, it can accept samples with different sizes and perform test.

Figure 3. Test of single thread

According to the size and repeat rate of samples, the times of data access in a case test should be

calculated because the size of a sample can significantly influence the test result. Thus, it is a good choice of calculating the sample size and the repeat rate of samples to get neutral test result, than performing case test in fixed times. Since the efficiency of different consistency layers vary, the efficiency of data load also differs. The query cost is the sum of time for query and data load. The performance unit of statistical test is carried out around statistical suite unit. Each suite accepts a statistical sample. Each case in a suite uses this statistical sample. The statistical process is described in Figure 3. Step 1: Construct sample: the sample is produced according to original directory. Before this step, the

root of original data and the sample size should be identified. Along this root, there is a recursive procedure on the original data to gain IDs of images until the number of accessed images reaches to the sample size.

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

272

Page 6: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

Step 2: Construct cases: a step of constructing file cases, Oracle database cases and MongoDB cases respectively.

Step 3: Assemble case suites: a step of organizing cases into case suite for batch execution. Step 4: Statistics of each case: For the file system, this step copies all files into a working directory

according to ID in the sample space and takes data in this direction for statistics. For Oracle database, this step builds a table with two columns. One column deposits image IDs with data type as varchar2. The other column deposits binary GIS image data with data type as blob. It will establish indexes about ID and insert GIS images of sample spaces into database. For MongoDB, this step inserts image IDs and image data as key-value pair into database. After all these have been done, the cumulative cost of data access is divided by the times of data access to get the mean cost of each query. The times of data access is the repeat rate of the sample multiplies the sample size. Each access is performed according to randomly selected IDs so that centralized data access can be avoided. After the statistic procedure is completed, it is necessary to destroy environment to avoid influence the next procedure. For the file system, it needs to delete all images. Similarly, Oracle database also does so but it will keep the table and indexes for the sake of reuse next time. For MongoDB, it deletes all data and reboots itself to release occupied memory.

Step 5: Record statistical results: this step records grouped statistical results, as described in Table 2.

Table 2. Single-threaded statistical results Summary Information

Case Ave Time/ms File System 70.12 Mongo DB 15.56 Relation DB 20.75 Repeat: 1000 Sample Size: 100000

Step 6: Establish a new sample space and repeat step 1 to step 5.

Concurrent performance test with multiple threads is to estimate the ability of concurrent

data query for each consistency layer technique. Before this test, massive GIS image data is induced to different consistency layers, which is a long course due to huge quantity of data. After that, the software Siege, as an http load tester and benchmarking utility for web load estimation, is used to measure the performance of different consistency layers under duress, to see how they will stand up to load on the Internet [21]. Siege lets the user hit a web server with a configurable number of concurrent simulated users. For each http request, the service will precede the following steps.

Step 1. Get the type parameter in the URL, taking an example as “http://localhost: 8080/query.aspx?type=mongo”.

Step 2. Produce random GIS image ID. Step 3. Query GIS image from consistency layers according to ID and data type. Step 4. Add image data into memory to reduce network flow and test time. Step 5. Return randomly produced ID.

There are many factors should be considered in concurrent test. To begin with, current test

needs many users request data access at the same time. Then final result can be calculated by collecting relative parameters in each data access. However, in most cases, the concurrent test can only be simulated with specific software due to the difficulty of inviting so many users to attend the test.

The test experiment takes the pressure test node as the client and GIS Web service node as service that deployed three kinds of storage techniques. By simulating multiple users’ concurrent behaviors to send a series of access requests, the request result is calculated. The concurrent test is performed on a data set with a fixed size, i.e., 200, 000 images. The experimental environment of concurrent test is

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

273

Page 7: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

described in Table 3. The pressure test software Siege that works on Linux OS is easy to set the number of concurrent tasks and support URL parameter pass.

Table 3. The experimental environment of pressure test

The client operating system Ubuntu 10.10 The client pressure software Siege 2.7.0

The client CPU 1.4GHZ The client memory 512MB

The server operating system Windows Server 2008 Enterprise The server CPU Inter Core i5,4-core,2.67GHZ

The server memory 4GB,2.66GHZ

The server hardware 1TB capacity,32MB cache,rotate speed 15000/s

The web server IIS 7.5 The server file system NTFS

Relation database Oracle DB 10.2 Key/value database Mongo DB 1.6.4

Data set 200,000 tiff pictures,Each size 196KB

4. Experiment result

The result of the single thread test is illustrated in Figure 4. In this picture, the horizontal coordinate indicates the size of GIS image data sample and vertical coordinate presents the mean cost of each access with the time unit as microsecond. According to Figure 4, the access cost is increase in logarithm as the increase of data. NTFS has a clear advantage when the quantity of data is small, i.e., less than 20,000 files. But with the increase of data quantity, the performance of NTFS is not up to Oracle database and MongoDB. Although the access cost of Oracle database and MongoDB also increase in logarithm, they increase very slowly. This means that the performance of Oracle database and MongoDB is very stable so that they are qualified to be the consistency layers of mass GIS data.

Figure 4. The result of the single thread test

The result of concurrent performance test with multiple threads is as follows. A test result

of the pressure test is presented in Table 4. The test result of Table 4 is derived from 10 times access on Oracle database with 100 users. “Transactions” is the result of the number of users multiplying the repeat number, i.e., 100*10=1000. “Availability” can be the success rate for HTTP requests. Due to network delay or server performance, not every HTPP request will succeed. "Elapsed time" is the time of consumption in the pressure test process. “Data transferred” is the overall amount of data transfer process in the pressure test process. “Response time” is the average response time of each request. “Transaction rate”, as the average number of requests completed per second, is calculated by the overall request to divide the number of overall time (transactions/Elapsed Time). “Throughput" is the data quantity of

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

274

Page 8: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

successful data transmission per second. “Concurrency” is the number of concurrent requests that can be handled in one time. “Successful transaction” is the number of successful requests. “Failed transactions" is the number of failed requests. “Longest transaction” is the longest time of request and “Shortest transaction” is the shortest time of request.

Table 4. The test result of pressure test with Siege

Transactions: 1000 hits Availability: 100.00 % Elapsed time: 29.86 secs Data transferred: 0.60 MB Response time: 2.31 secs Transaction rate: 33.49 trans/sec Throughput: 0.02 MB/sec Concurrency: 77.30 Successful transactions: 1000 Failed transactions: 0 Longest transaction: 6.33 Shortest transaction: 0.02

In these parameters, the parameter concurrency shows its ability of parallel processing. To

objectively understand the abilities of different techniques to process parallel transactions, the parallel rata is introduced, which is always less than or equal to 1. The parallel rata is the result of the value of concurrency divided by the number of parallel access.

The result of multi-thread experiment is indicated in Figure 5, where the horizontal coordinate indicates the concurrent rate and the vertical coordinate presents the number of concurrent users. According to the picture, the concurrent rate of Oracle database and MongoDB can reach to around 90 percent but the concurrent rate of the File system NTFS cannot move beyond 10 percent. It is obvious that the performance of Oracle database and MongoDB is very excellent when they process concurrent tasks.

Figure 5. The result of the multiple thread test

5. Conclusions

After comparison between the NoSQL key-value database MongoDB and the relational database Oracle and the analysis of the distributed cache system Memcached, a NoSQL based cached storage of GIS Web service is proposed, as a mixture of Memcache and MongoDB. The experiment of a concise case of GIS Web service indicates the feasibility of our approach with a single thread test and a multiple thread test. Acknowledgement This work was supported by the grants of the National Natural Science Foundation of China (61070013, U1135005, 61170305, 61070009), the Research on the Mechanism of Distributed

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

275

Page 9: A NoSQL based Cached Storage Solution of GIS Web Service · 2015-07-29 · A NoSQL based Cached Storage Solution of GIS Web Service 1*Jin Liu, 1Yangping Wu, 1Dan Zhuo, 2Qiping Hu,

Data Storage & Processing for Real-time GIS; the Open Fund Project of Key Lab. of Shanghai Information Security Management and Technology Research, the knowledge innovation program of the Chinese academy of sciences under grant No.Y1W1031PB1, the Project for the National Basic Research 12th Five Program under Grant No.0101050302 and the Science and Technology Commission of Wuhan Municipality ‘‘Chenguang Jihua’’ (201050231058). 10. References

[1] Katherine H. Weimer, Miriam Olivares and Robin A. Bedenbaugh, “GIS Day and Web Promotion:

Retrospective Analysis of U.S. ARL Libraries' Involvement”, Journal of Map & Geography Libraries: Advances in Geospatial Information, Collections & Archives, vol.8, no.1, pp.39-57, 2012.

[2] Kun Yang, Rui Huang, Fen Lin, ,Zhongzhi Shi, “Exploring Heterogeneous Semantic Data for Topic-Specific Crawling and Search”, International Journal of Digital Content Technology and its Applications, vol. 5, no. 7, pp. 335-349, 2011

[3] Yansheng Zhang, “A Server Side Caching System for Efficient Web Map Services”, International Conference on Embedded Software and Systems Symposia (ICESS Symposia'08), pp.32-37, 2008.

[4] Serdar Yeşilmurat and Veysi İşler, “Retrospective Adaptive Prefetching for Interactive Web GIS Applications”, Geoinformatica, vol.16, no.3, pp.435-466, 2012

[5] Stefan Steiniger and Andrew J. S. Hunter, “Free and Open Source GIS Software for Building a Spatial Data Infrastructure”, Geospatial Free and Open Source Software in the 21st Century, Springer LNGC, Part 5, pp.247-261, 2012

[6] Puyam S. Singh, Dibyajyoti Chutia and Singuluri Sudhakar, “Development of a Web Based GIS Application for Spatial Natural Resources Information System Using Effective Open Source Software and Standards”, Journal of Geographic Information System, 4, pp.261-266, 2012

[7] Squid, Wikipedia, June 12, 2012, http://en.wikipedia.org/wiki/Squid [8] Duane Wessels, “Squid: The Definitive Guide”, O'Reilly, USA, 2004. [9] Alan Harris, “Distributed Caching via Memcached”, ASP.NET 4 CMS, pp.165-196, 2010. [10] L. Y. Cao and M. T. Özsu, “Evaluation of Strong Consistency Web Caching Techniques”, World

Wide Web, vol. 5, no.2, pp. 95-123, 2002. [11] S. Gilbert, “Perspectives on the CAP Theorem”, IEEE Computer, vol.45, no.2, pp. 30-36, 2012. [12] D. Abadi, “Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only

Part of the Story”, IEEE Computer, vol.45, no.2, pp.37-42, 2012. [13] NoSQL, WikiPedia, June 12, 2012, http://en.wikipedia.org/wiki/NoSQL [14] Toshihiko Yamakami, “Sub-culture-driven Mobile Internet Business Models”, International

Conference on Mobile Business, pp.76-81, 2008. [15] Paola Lecca, Lorenzo Dematté, Adaoha E.C. Ihekwaba and Corrado Priami, “Redi: A Simulator of

Stochastic Biochemical Reaction-Diffusion Systems”, International Conference on Advances in System Simulation, pp. 82-87, 2010.

[16] Chen Feng,Yongqiang Zou and Zhiwei Xu, “CCIndex for Cassandra: A Novel Scheme for Multi-dimensional Range Queries in Cassandra”, International Conference on Found in: Semantics, Knowledge and Grid, pp.130-136, 2011.

[17] Pouria Pirzadeh, Junichi Tatemura, Oliver Po and Hakan Hacıgümüş, “Performance Evaluation of Range Queries in Key Value Stores”, Journal of Grid Computing, vol.10, no.1, pp.109-132, 2012

[18] Christian Baun, Marcel Kunze, Tobias Kurze and Viktor Mauch, “Private Cloud-Infrastrukturen und Cloud-Plattformen”, Informatik-Spektrum, vol.34, no 3, pp. 242-254, 2011.

[19] C. von der Weth and A. Datta, “Multiterm Keyword Search in NoSQL Systems”, IEEE Internet Computing, vol.16, no.1, pp. 34-42, 2012.

[20] BSON, Wikipedia, June 15, 2012, http://en.wikipedia.org/wiki/BSON [21] Siege, Oct 15, 2011, JoeDog.org, http://www.joedog.org/siege/index.php [22] Practicability of Dataspace Systems Hamid Turab Mirza, Ling Chen, Gencai Chen , “Practicability

of Dataspace Systems”, International Journal of Digital Content Technology and its Applications, vol. 4, no. 3, pp. 233-243, 2010.

[23] Antonella Di Stefano, Francesca Gangemi and Corrado Santoro, "ERESYE: artificial intelligence in Erlang programs", ERLANG '05 Proceedings of the 2005 ACM SIGPLAN workshop on Erlang, pp. 62-71, 2005.

A NoSQL based Cached Storage Solution of GIS Web Service Jin Liu, Yangping Wu, Dan Zhuo, Qiping Hu, Yan Wang, Wensheng Zhang

276