xml2

Research on Relational Database and NoSQL Based on XML Data

Jia Xinwei1,a , Shen Guicheng2,b

1Graduate Department, Beijing Wuzi University, Beijing101149, China

2School of Information, Beijing Wuzi University, Beijing101149, China

[email protected] , [email protected],

Keywords: NO_SQL, Relational Database, XML, MongoDB, Time Complexity

Abstract. With the rapid development of electronic commerce, huge transaction data is produced

every day. How to store these data effectively and mine knowledge from these data has become the

key urgent problem to solve. This paper first introduces the NO_SQL database MongoDB and its

advantages. Then we have a compare on writing and reading performance between MogoDB and

MySQL when storing XML document. Finally we have a conclusion that MongoDB has higher

performance than MySQL by experiment, and No-SQL is a good choice when querying in massive

data.

Introduction

With the spring of Web2.0 on the Internet, traditional database looks increasingly out of its of

high performance of concurrent reading and writing, storing huge amounts of data that require high

efficiency and frequent access, high scalability and high availability in response to ultra-large-scale

and high concurrency pure dynamic websites of Web2.0, and therefore NoSQL database came into

being[1].

MongoDB is a document database that provides high performance, high availability, and easy

scalability, and it was born in PaaS project by 10gen company in 2007. A record in MongoDB is a

document which is the basic unit.A document is a set of key-value pairs and a collection is a set of

documents. A document is similar to the row in relational database and a collection is similar to the

table in relational database. Data in MongoDB has a flexible schema. Documents in collections

have many types, such as embedded documents (a value of key in another document) and arrays. A

MongoDB deployment can host a number of databases. A database holds a set of collections.

MongoDB documents on disk in the BSON (Binary Serialized Document Format) serialization

format are easy to query and update. Data in document is horizontal scalability because joints are

not important. Indexes in MongoDB are similar to the indexes in other database systems. All

MongoDB collections have an index on the _id key that exists by default. In addition to the

MongoDB-defined _id index, MongoDB supports user-defined indexes on a single field of a

document and other index such as compound indexes, multikey indexes, even Geospatial Index [2,

3].

XML (eXtensible Markup Language) is an extensible markup language, which was designed to

describe any logical relationship of data. XML focus on the definitions of data structure. User can

define the mark-up elements flexible and organize data logical schema. Data in XML format can

exchange in different information systems without need to change data type, as long as they follow

the pre-definition XML Schema. XML as the standard of exchange data has been widely used on

the cross-platform environment, especially in Electronic Commerce, and where huge of data need to

be exchanged between server and server or between serve and browser.

With the rapid development of electronic commerce, huge transaction data is produced every day.

Choosing a suitable database to store these data for efficient management and data mining is an

important research focus. In this article we take storing XML data in MongoDB which is the best

NoSQL database and MySQL as an example, having a comparison on query and writing

performance between MongoDB and MySQL. We draw a conclusion that MongoDB has higher

performance than MySQL by experiment.

Applied Mechanics and Materials Vols. 713-715 (2015) pp 2329-2334 Submitted: 07.11.2014 (2015) Trans Tech Publications, Switzerland Accepted: 08.11.2014doi:10.4028/www.scientific.net/AMM.713-715.2329

Data Structure

XML Data Structure. In this article we assume that a retailer has a number of branches and

each of them have many transaction records in everyday. Each branch transports their transaction

records to the database for data mining. Data of transaction records in XML format show as

follows.

C11010000

2

7441.97

2014-07-11 16:08:22

1

1100110111101

Food

6066.08

2

< ItemID > 1100110111001

Fruit

1375.89

Fig.1XML Schema

Transactions element is the root node in the XML document, and it may have some Transaction

elements which represent a transaction record and which are marked by a unique attribute of

TradeID. Transaction element has two child nodes of TradeDetaile element and GoodsDetailes

element. TradeDetaile node is the overview of transaction record including child node of UserID,

TradeAmount, TradeMoney and TradeTime. GoodsDetailes element can have some GoodsDetailes

which store the detail items information of transaction records. Each GoodsDetaile includes child

node of No, ItemID, Type, and Price.

MongoDB Data Structure. The XML data in MongoDB is defined as follows: a transaction

record is a Transaction document in MongoDB. The fields of key in Transaction document are: _id

with a unique ObjectId value (auto-generation and have index on it), TradeID, UserID,

TradeAmount, TradeMoney, TradeTimed and GoodsDetailes. The value of GoodsDetailes is an

embedded document which the fields of key is No, ItemID, Type, and Price. GoodsDetailes

document can have a number of GoodsDetaile. MongoDB collections have an index on the _id field

by default, and we create B+ index on the interesting field: TradeAmount. The data in MongoDB is

show in Fig.2.

2330 Mechatronics Engineering and Modern Information Technologies inIndustrial Engineering

Fig.2 Data in MongoDB

MySQL Data Structure. The data structure in MySQl composes of TradeDetaile table and

GoodsDetaile table. If we only create one table and put all transaction records in it, each

GoodsDetaile will have a TradeDetaile and this will result in great redundancy. The fields in

TradeDetaile table are TradeID, UserID, TradeAmount, TradeMoney, and TradeTime. The fields in

TradeDetaile table are TradeID, No, ItemID, Type, and Price. The TradeID field is the foreign-key

of TradeDetaile table.

We have B+ index on all fields of the two tables in MySQL, but the more indexes in table do not

mean the better. We create indexes on all fields just suit for the experiment in this article. Index is

suit for usage when a small portion of data is required to return. If the data of return is more than a

half total data, index is not suitable. Insert, delete, update data frequently will cause index update,

which will cost much.

Compare MongoDB and MySQL

Experiment Environment. In this section, all the experiments we have done are in the same

hardware and software environment. We create mock data in different amount and all the value in

the data created randomly. We store different amount data by a java program with same class

structure in MongoDB and MySQL, and then we compare the writing performance of the two

databases with the time they cost. We use the same way to compare query performance. All the

programs in the experiment are written by JAVA. The version of MongoDB is 2.2.3 and MySQL is

1.2.12.

The XML data is created randomly in experiment. Transactions element is the root node in the

XML document. Transactions node has one TradeDetaile node and one GoodsDetailes node.

GoodsDetailes has some GoodsDetaile nodes, whose number ranges from 1 to 5 randomly. The

value of GoodsDetaile is in a pre-determined range. The amount of XML document is from 1

thousand to 100 thousand.

Writing Comparison. We store the XML documents in the two databases by the same process.

First we parse the XML document and get the values of each node, and then put the values in a list;

finally get the values of list and store in the two databases. Above steps are repeated until the whole

XML document are stored in the database. The driver to access MySQL is JDBC. We get the

writing-time in different amounts of data as shown in Fig.3.

Applied Mechanics and Materials Vols. 713-715 2331

Fig.3 Writing-time comparison

From the Fig.3 we can have a conclusion that the writing performance of MongoDB is better

than MySQL in this experiment. When the amount of data is less than 10 thousand, the writing-time

of MongoDB maintains a steady level. However, the writing-time of MySQL is increased linearly

with the increasing amounts of data. When the amount of data is more than 10 thousand, the

writing-time of MongoDB begins to increase linearly while the writing-time of MySQL increased

dramatically. In this experiment, the time we used to compare the write performance is a relative

time as it included the time of parse XML document, but this has no influence on the results of

comparison.

Query Comparison. Firstly, we compare Single-table query where all the data need to be read in

a one table. The experiment is to query transaction records when TradeMoney is more than 14000

and the TradeDetaile information is returned. We assume the number of data is n . About half of

TradeMoney is more than 14000 and these data are stored uniformly in database. As index is on

TradeAmount field in both of database, the amount of data need to be scanned is2

n, and the time

complexity is ( )n .the query language in MySQL is select * from TradeDetaile where

TradeMoney>14000.we get the reading-time with amount of data from 1 thousand to 100 thousand

as show in Fig.4.

Fig.4 Read-time in Single-table

As shown in the Fig.4, The query performance of MongoDB is better than MySQL significantly.

The reading-time of MongoDB stays within 200ms (Millisecond) with the amount of data from

1thousand to 100 thousand. When the amount of data is less than 10 thousand, the read time of


MySQL is about 500ms, and the amount of data is more than 10 thousand, the read-time increased

significantly with the increasing amounts of data.

We compare the join-table query now, the experiment is to query transaction records which

TradeMoney is more than 14000 and return the TradeDetaile information and GoodsDetaile

information. At this time we need to have a join-query in TradeDetaile table and GoodsDetaile table.

We assume the number of data is n . About half of TradeMoney is more than 14000 and these data

are stored uniformly in database. The time of projection operation is negligible. First MySQL scans

the TradeDetaile table to get tuples where TradeMoney is more than 14000. As index is on

TradeMoney, the amount of data need to scan is 2

n in TradeDetaile table. Then MySQL scans the

GoodsDetaile table to get tuples where the TradeID is equal to the TradeID in tuples of

TradeDetaile table. For each TradeID, MySQL will scan 2

n tuples in GoodsDetaile table, and the

total number of tuples to scan is4

2n

. For this query, MySQL will scan about4

22 nn + tuples and the

time complexity is ( )2n . The query language in MySQL is select t.*, g.* from TradeDetaile As t, GoodsDetaile As g where t.TradeMoney>14000 and t.TradeID=g.TradeID.While in MongoDB,

transaction record stored in document like the tuples in MySQL, so the amount of data need to scan

is 2

n and the time complexity is ( )n .

Fig.5 shows the read-time with the amount of data from 1thousand to 100 thousand. The

read-time of MongoDB maintains about 200ms with the amount of data from 1thousand to 100

thousand. When the amount of data is less than 10 thousand, the read-time of MySQL is at a low

level. But more than 10 thousand, the read-time of MySQL increase dramatically.

Fig.5Read-time in Join-tables

When more than a half of data is need to return, index will lose its advantage. Full table scan

would be more efficient than the query index. If we need to return all the Transactions information,

MySQL will have a join-table query, the time complexity of which is ( )2n . While MongoDB scans the documents orderly and its time complexity is ( )n . When the amount of data is very big, the read performance of MongoDB is better than MySQL significantly.

When data only have index optimization, if we want to query a data randomly, we need to scan

the average number of tuples is n2log in each table, and total number isn

2log2 , while the number

is only n2log in MongoDB

Applied Mechanics and Materials Vols. 713-715 2333

Conclusions

Any Web systems with large amount of data are very taboo of join-table query on large

tables.MongoDB is similar to the single-table query in relational databases, and it changes the

complexity join-table query into single- table query, providing high read performance in massive

amounts of data. In addition, MongoDB can show the data in the form of document without join

tables, and users can observe and analyze the data more intuitively

From the above results of experiment, we can see that the performance of MongoDB is better

than MySQL, but NoSQL database is not to replace relational database. Document-oriented NoSQL

database mainly solve the problem of high query performance when mass data storage is faced with.

Almost all the features of MongoDB can be find in relational database, relational database provides

a powerful feature set, but when users deal with large amount of non-relational data, they may do

not need a complex relational database. In this case, NoSQL database is a good choice.

Acknowledgment

This work has been supported by Beijing Key Laboratory (No: BZ0211) and Key Scientific

Research Project of Beijing WuZi University.

References

[1] Wenlong WangEssential MongoDB:Management and development(China Machine Press, 2011,1st edn)

[2] Kristina hodorow,Michael DirolfMongoDB:The Definitive Guid (Posts and telecom press, 2011,1st edn).

[3] Kyle Banker, MongoDB in Action (Posts and telecom press,2012,1st edn).

[4] Shan WangShixuan Sa. Introduction to Database SystemsHigher Education Press,2006,4st

edn.

[5] Ben Forta. MySQL Crash Course(Posts and telecom press,2009,1st edn)

[6] He Shengtao, MongoDB in network behavior analysis and control system, Network Secutity,

2013,5


Copyright of Applied Mechanics & Materials is the property of Trans Tech Publications, Ltdand its content may not be copied or emailed to multiple sites or posted to a listserv withoutthe copyright holder's express written permission. However, users may print, download, oremail articles for individual use.

xml2

Documents

Transcript of xml2