xml2
-
Upload
vladimir-calle-mayser -
Category
Documents
-
view
214 -
download
1
description
Transcript of xml2
-
Research on Relational Database and NoSQL Based on XML Data
Jia Xinwei1,a , Shen Guicheng2,b
1Graduate Department, Beijing Wuzi University, Beijing101149, China
2School of Information, Beijing Wuzi University, Beijing101149, China
[email protected] , [email protected],
Keywords: NO_SQL, Relational Database, XML, MongoDB, Time Complexity
Abstract. With the rapid development of electronic commerce, huge transaction data is produced
every day. How to store these data effectively and mine knowledge from these data has become the
key urgent problem to solve. This paper first introduces the NO_SQL database MongoDB and its
advantages. Then we have a compare on writing and reading performance between MogoDB and
MySQL when storing XML document. Finally we have a conclusion that MongoDB has higher
performance than MySQL by experiment, and No-SQL is a good choice when querying in massive
data.
Introduction
With the spring of Web2.0 on the Internet, traditional database looks increasingly out of its of
high performance of concurrent reading and writing, storing huge amounts of data that require high
efficiency and frequent access, high scalability and high availability in response to ultra-large-scale
and high concurrency pure dynamic websites of Web2.0, and therefore NoSQL database came into
being[1].
MongoDB is a document database that provides high performance, high availability, and easy
scalability, and it was born in PaaS project by 10gen company in 2007. A record in MongoDB is a
document which is the basic unit.A document is a set of key-value pairs and a collection is a set of
documents. A document is similar to the row in relational database and a collection is similar to the
table in relational database. Data in MongoDB has a flexible schema. Documents in collections
have many types, such as embedded documents (a value of key in another document) and arrays. A
MongoDB deployment can host a number of databases. A database holds a set of collections.
MongoDB documents on disk in the BSON (Binary Serialized Document Format) serialization
format are easy to query and update. Data in document is horizontal scalability because joints are
not important. Indexes in MongoDB are similar to the indexes in other database systems. All
MongoDB collections have an index on the _id key that exists by default. In addition to the
MongoDB-defined _id index, MongoDB supports user-defined indexes on a single field of a
document and other index such as compound indexes, multikey indexes, even Geospatial Index [2,
3].
XML (eXtensible Markup Language) is an extensible markup language, which was designed to
describe any logical relationship of data. XML focus on the definitions of data structure. User can
define the mark-up elements flexible and organize data logical schema. Data in XML format can
exchange in different information systems without need to change data type, as long as they follow
the pre-definition XML Schema. XML as the standard of exchange data has been widely used on
the cross-platform environment, especially in Electronic Commerce, and where huge of data need to
be exchanged between server and server or between serve and browser.
With the rapid development of electronic commerce, huge transaction data is produced every day.
Choosing a suitable database to store these data for efficient management and data mining is an
important research focus. In this article we take storing XML data in MongoDB which is the best
NoSQL database and MySQL as an example, having a comparison on query and writing
performance between MongoDB and MySQL. We draw a conclusion that MongoDB has higher
performance than MySQL by experiment.
Applied Mechanics and Materials Vols. 713-715 (2015) pp 2329-2334 Submitted: 07.11.2014 (2015) Trans Tech Publications, Switzerland Accepted: 08.11.2014doi:10.4028/www.scientific.net/AMM.713-715.2329
-
Data Structure
XML Data Structure. In this article we assume that a retailer has a number of branches and
each of them have many transaction records in everyday. Each branch transports their transaction
records to the database for data mining. Data of transaction records in XML format show as
follows.
C11010000
2
7441.97
2014-07-11 16:08:22
1
1100110111101
Food
6066.08
2
< ItemID > 1100110111001
Fruit
1375.89
Fig.1XML Schema
Transactions element is the root node in the XML document, and it may have some Transaction
elements which represent a transaction record and which are marked by a unique attribute of
TradeID. Transaction element has two child nodes of TradeDetaile element and GoodsDetailes
element. TradeDetaile node is the overview of transaction record including child node of UserID,
TradeAmount, TradeMoney and TradeTime. GoodsDetailes element can have some GoodsDetailes
which store the detail items information of transaction records. Each GoodsDetaile includes child
node of No, ItemID, Type, and Price.
MongoDB Data Structure. The XML data in MongoDB is defined as follows: a transaction
record is a Transaction document in MongoDB. The fields of key in Transaction document are: _id
with a unique ObjectId value (auto-generation and have index on it), TradeID, UserID,
TradeAmount, TradeMoney, TradeTimed and GoodsDetailes. The value of GoodsDetailes is an
embedded document which the fields of key is No, ItemID, Type, and Price. GoodsDetailes
document can have a number of GoodsDetaile. MongoDB collections have an index on the _id field
by default, and we create B+ index on the interesting field: TradeAmount. The data in MongoDB is
show in Fig.2.
2330 Mechatronics Engineering and Modern Information Technologies inIndustrial Engineering
-
Fig.2 Data in MongoDB
MySQL Data Structure. The data structure in MySQl composes of TradeDetaile table and
GoodsDetaile table. If we only create one table and put all transaction records in it, each
GoodsDetaile will have a TradeDetaile and this will result in great redundancy. The fields in
TradeDetaile table are TradeID, UserID, TradeAmount, TradeMoney, and TradeTime. The fields in
TradeDetaile table are TradeID, No, ItemID, Type, and Price. The TradeID field is the foreign-key
of TradeDetaile table.
We have B+ index on all fields of the two tables in MySQL, but the more indexes in table do not
mean the better. We create indexes on all fields just suit for the experiment in this article. Index is
suit for usage when a small portion of data is required to return. If the data of return is more than a
half total data, index is not suitable. Insert, delete, update data frequently will cause index update,
which will cost much.
Compare MongoDB and MySQL
Experiment Environment. In this section, all the experiments we have done are in the same
hardware and software environment. We create mock data in different amount and all the value in
the data created randomly. We store different amount data by a java program with same class
structure in MongoDB and MySQL, and then we compare the writing performance of the two
databases with the time they cost. We use the same way to compare query performance. All the
programs in the experiment are written by JAVA. The version of MongoDB is 2.2.3 and MySQL is
1.2.12.
The XML data is created randomly in experiment. Transactions element is the root node in the
XML document. Transactions node has one TradeDetaile node and one GoodsDetailes node.
GoodsDetailes has some GoodsDetaile nodes, whose number ranges from 1 to 5 randomly. The
value of GoodsDetaile is in a pre-determined range. The amount of XML document is from 1
thousand to 100 thousand.
Writing Comparison. We store the XML documents in the two databases by the same process.
First we parse the XML document and get the values of each node, and then put the values in a list;
finally get the values of list and store in the two databases. Above steps are repeated until the whole
XML document are stored in the database. The driver to access MySQL is JDBC. We get the
writing-time in different amounts of data as shown in Fig.3.
Applied Mechanics and Materials Vols. 713-715 2331
-
Fig.3 Writing-time comparison
From the Fig.3 we can have a conclusion that the writing performance of MongoDB is better
than MySQL in this experiment. When the amount of data is less than 10 thousand, the writing-time
of MongoDB maintains a steady level. However, the writing-time of MySQL is increased linearly
with the increasing amounts of data. When the amount of data is more than 10 thousand, the
writing-time of MongoDB begins to increase linearly while the writing-time of MySQL increased
dramatically. In this experiment, the time we used to compare the write performance is a relative
time as it included the time of parse XML document, but this has no influence on the results of
comparison.
Query Comparison. Firstly, we compare Single-table query where all the data need to be read in
a one table. The experiment is to query transaction records when TradeMoney is more than 14000
and the TradeDetaile information is returned. We assume the number of data is n . About half of
TradeMoney is more than 14000 and these data are stored uniformly in database. As index is on
TradeAmount field in both of database, the amount of data need to be scanned is2
n, and the time
complexity is ( )n .the query language in MySQL is select * from TradeDetaile where
TradeMoney>14000.we get the reading-time with amount of data from 1 thousand to 100 thousand
as show in Fig.4.
Fig.4 Read-time in Single-table
As shown in the Fig.4, The query performance of MongoDB is better than MySQL significantly.
The reading-time of MongoDB stays within 200ms (Millisecond) with the amount of data from
1thousand to 100 thousand. When the amount of data is less than 10 thousand, the read time of
2332 Mechatronics Engineering and Modern Information Technologies inIndustrial Engineering
-
MySQL is about 500ms, and the amount of data is more than 10 thousand, the read-time increased
significantly with the increasing amounts of data.
We compare the join-table query now, the experiment is to query transaction records which
TradeMoney is more than 14000 and return the TradeDetaile information and GoodsDetaile
information. At this time we need to have a join-query in TradeDetaile table and GoodsDetaile table.
We assume the number of data is n . About half of TradeMoney is more than 14000 and these data
are stored uniformly in database. The time of projection operation is negligible. First MySQL scans
the TradeDetaile table to get tuples where TradeMoney is more than 14000. As index is on
TradeMoney, the amount of data need to scan is 2
n in TradeDetaile table. Then MySQL scans the
GoodsDetaile table to get tuples where the TradeID is equal to the TradeID in tuples of
TradeDetaile table. For each TradeID, MySQL will scan 2
n tuples in GoodsDetaile table, and the
total number of tuples to scan is4
2n
. For this query, MySQL will scan about4
22 nn + tuples and the
time complexity is ( )2n . The query language in MySQL is select t.*, g.* from TradeDetaile As t, GoodsDetaile As g where t.TradeMoney>14000 and t.TradeID=g.TradeID.While in MongoDB,
transaction record stored in document like the tuples in MySQL, so the amount of data need to scan
is 2
n and the time complexity is ( )n .
Fig.5 shows the read-time with the amount of data from 1thousand to 100 thousand. The
read-time of MongoDB maintains about 200ms with the amount of data from 1thousand to 100
thousand. When the amount of data is less than 10 thousand, the read-time of MySQL is at a low
level. But more than 10 thousand, the read-time of MySQL increase dramatically.
Fig.5Read-time in Join-tables
When more than a half of data is need to return, index will lose its advantage. Full table scan
would be more efficient than the query index. If we need to return all the Transactions information,
MySQL will have a join-table query, the time complexity of which is ( )2n . While MongoDB scans the documents orderly and its time complexity is ( )n . When the amount of data is very big, the read performance of MongoDB is better than MySQL significantly.
When data only have index optimization, if we want to query a data randomly, we need to scan
the average number of tuples is n2log in each table, and total number isn
2log2 , while the number
is only n2log in MongoDB
Applied Mechanics and Materials Vols. 713-715 2333
-
Conclusions
Any Web systems with large amount of data are very taboo of join-table query on large
tables.MongoDB is similar to the single-table query in relational databases, and it changes the
complexity join-table query into single- table query, providing high read performance in massive
amounts of data. In addition, MongoDB can show the data in the form of document without join
tables, and users can observe and analyze the data more intuitively
From the above results of experiment, we can see that the performance of MongoDB is better
than MySQL, but NoSQL database is not to replace relational database. Document-oriented NoSQL
database mainly solve the problem of high query performance when mass data storage is faced with.
Almost all the features of MongoDB can be find in relational database, relational database provides
a powerful feature set, but when users deal with large amount of non-relational data, they may do
not need a complex relational database. In this case, NoSQL database is a good choice.
Acknowledgment
This work has been supported by Beijing Key Laboratory (No: BZ0211) and Key Scientific
Research Project of Beijing WuZi University.
References
[1] Wenlong WangEssential MongoDB:Management and development(China Machine Press, 2011,1st edn)
[2] Kristina hodorow,Michael DirolfMongoDB:The Definitive Guid (Posts and telecom press, 2011,1st edn).
[3] Kyle Banker, MongoDB in Action (Posts and telecom press,2012,1st edn).
[4] Shan WangShixuan Sa. Introduction to Database SystemsHigher Education Press,2006,4st
edn.
[5] Ben Forta. MySQL Crash Course(Posts and telecom press,2009,1st edn)
[6] He Shengtao, MongoDB in network behavior analysis and control system, Network Secutity,
2013,5
2334 Mechatronics Engineering and Modern Information Technologies inIndustrial Engineering
-
Copyright of Applied Mechanics & Materials is the property of Trans Tech Publications, Ltdand its content may not be copied or emailed to multiple sites or posted to a listserv withoutthe copyright holder's express written permission. However, users may print, download, oremail articles for individual use.