xml2

7
Research on Relational Database and NoSQL Based on XML Data Jia Xinwei 1,a , Shen Guicheng 2,b 1 Graduate Department, Beijing Wuzi University, Beijing101149, China 2 School of Information, Beijing Wuzi University, Beijing101149, China a [email protected] , b [email protected], Keywords: NO_SQL, Relational Database, XML, MongoDB, Time Complexity Abstract. With the rapid development of electronic commerce, huge transaction data is produced every day. How to store these data effectively and mine knowledge from these data has become the key urgent problem to solve. This paper first introduces the NO_SQL database MongoDB and its advantages. Then we have a compare on writing and reading performance between MogoDB and MySQL when storing XML document. Finally we have a conclusion that MongoDB has higher performance than MySQL by experiment, and No-SQL is a good choice when querying in massive data. Introduction With the spring of Web2.0 on the Internet, traditional database looks increasingly out of its of high performance of concurrent reading and writing, storing huge amounts of data that require high efficiency and frequent access, high scalability and high availability in response to ultra-large-scale and high concurrency pure dynamic websites of Web2.0, and therefore NoSQL database came into being[1]. MongoDB is a document database that provides high performance, high availability, and easy scalability, and it was born in PaaS project by 10gen company in 2007. A record in MongoDB is a document which is the basic unit.A document is a set of key-value pairs and a collection is a set of documents. A document is similar to the row in relational database and a collection is similar to the table in relational database. Data in MongoDB has a flexible schema. Documents in collections have many types, such as embedded documents (a value of key in another document) and arrays. A MongoDB deployment can host a number of databases. A database holds a set of collections. MongoDB documents on disk in the BSON (Binary Serialized Document Format) serialization format are easy to query and update. Data in document is horizontal scalability because joints are not important. Indexes in MongoDB are similar to the indexes in other database systems. All MongoDB collections have an index on the _id key that exists by default. In addition to the MongoDB-defined _id index, MongoDB supports user-defined indexes on a single field of a document and other index such as compound indexes, multikey indexes, even Geospatial Index [2, 3]. XML (eXtensible Markup Language) is an extensible markup language, which was designed to describe any logical relationship of data. XML focus on the definitions of data structure. User can define the mark-up elements flexible and organize data logical schema. Data in XML format can exchange in different information systems without need to change data type, as long as they follow the pre-definition XML Schema. XML as the standard of exchange data has been widely used on the cross-platform environment, especially in Electronic Commerce, and where huge of data need to be exchanged between server and server or between serve and browser. With the rapid development of electronic commerce, huge transaction data is produced every day. Choosing a suitable database to store these data for efficient management and data mining is an important research focus. In this article we take storing XML data in MongoDB which is the best NoSQL database and MySQL as an example, having a comparison on query and writing performance between MongoDB and MySQL. We draw a conclusion that MongoDB has higher performance than MySQL by experiment. Applied Mechanics and Materials Vols. 713-715 (2015) pp 2329-2334 Submitted: 07.11.2014 © (2015) Trans Tech Publications, Switzerland Accepted: 08.11.2014 doi:10.4028/www.scientific.net/AMM.713-715.2329

description

xxxx

Transcript of xml2

  • Research on Relational Database and NoSQL Based on XML Data

    Jia Xinwei1,a , Shen Guicheng2,b

    1Graduate Department, Beijing Wuzi University, Beijing101149, China

    2School of Information, Beijing Wuzi University, Beijing101149, China

    [email protected] , [email protected],

    Keywords: NO_SQL, Relational Database, XML, MongoDB, Time Complexity

    Abstract. With the rapid development of electronic commerce, huge transaction data is produced

    every day. How to store these data effectively and mine knowledge from these data has become the

    key urgent problem to solve. This paper first introduces the NO_SQL database MongoDB and its

    advantages. Then we have a compare on writing and reading performance between MogoDB and

    MySQL when storing XML document. Finally we have a conclusion that MongoDB has higher

    performance than MySQL by experiment, and No-SQL is a good choice when querying in massive

    data.

    Introduction

    With the spring of Web2.0 on the Internet, traditional database looks increasingly out of its of

    high performance of concurrent reading and writing, storing huge amounts of data that require high

    efficiency and frequent access, high scalability and high availability in response to ultra-large-scale

    and high concurrency pure dynamic websites of Web2.0, and therefore NoSQL database came into

    being[1].

    MongoDB is a document database that provides high performance, high availability, and easy

    scalability, and it was born in PaaS project by 10gen company in 2007. A record in MongoDB is a

    document which is the basic unit.A document is a set of key-value pairs and a collection is a set of

    documents. A document is similar to the row in relational database and a collection is similar to the

    table in relational database. Data in MongoDB has a flexible schema. Documents in collections

    have many types, such as embedded documents (a value of key in another document) and arrays. A

    MongoDB deployment can host a number of databases. A database holds a set of collections.

    MongoDB documents on disk in the BSON (Binary Serialized Document Format) serialization

    format are easy to query and update. Data in document is horizontal scalability because joints are

    not important. Indexes in MongoDB are similar to the indexes in other database systems. All

    MongoDB collections have an index on the _id key that exists by default. In addition to the

    MongoDB-defined _id index, MongoDB supports user-defined indexes on a single field of a

    document and other index such as compound indexes, multikey indexes, even Geospatial Index [2,

    3].

    XML (eXtensible Markup Language) is an extensible markup language, which was designed to

    describe any logical relationship of data. XML focus on the definitions of data structure. User can

    define the mark-up elements flexible and organize data logical schema. Data in XML format can

    exchange in different information systems without need to change data type, as long as they follow

    the pre-definition XML Schema. XML as the standard of exchange data has been widely used on

    the cross-platform environment, especially in Electronic Commerce, and where huge of data need to

    be exchanged between server and server or between serve and browser.

    With the rapid development of electronic commerce, huge transaction data is produced every day.

    Choosing a suitable database to store these data for efficient management and data mining is an

    important research focus. In this article we take storing XML data in MongoDB which is the best

    NoSQL database and MySQL as an example, having a comparison on query and writing

    performance between MongoDB and MySQL. We draw a conclusion that MongoDB has higher

    performance than MySQL by experiment.

    Applied Mechanics and Materials Vols. 713-715 (2015) pp 2329-2334 Submitted: 07.11.2014 (2015) Trans Tech Publications, Switzerland Accepted: 08.11.2014doi:10.4028/www.scientific.net/AMM.713-715.2329

  • Data Structure

    XML Data Structure. In this article we assume that a retailer has a number of branches and

    each of them have many transaction records in everyday. Each branch transports their transaction

    records to the database for data mining. Data of transaction records in XML format show as

    follows.

    C11010000

    2

    7441.97

    2014-07-11 16:08:22

    1

    1100110111101

    Food

    6066.08

    2

    < ItemID > 1100110111001

    Fruit

    1375.89

    Fig.1XML Schema

    Transactions element is the root node in the XML document, and it may have some Transaction

    elements which represent a transaction record and which are marked by a unique attribute of

    TradeID. Transaction element has two child nodes of TradeDetaile element and GoodsDetailes

    element. TradeDetaile node is the overview of transaction record including child node of UserID,

    TradeAmount, TradeMoney and TradeTime. GoodsDetailes element can have some GoodsDetailes

    which store the detail items information of transaction records. Each GoodsDetaile includes child

    node of No, ItemID, Type, and Price.

    MongoDB Data Structure. The XML data in MongoDB is defined as follows: a transaction

    record is a Transaction document in MongoDB. The fields of key in Transaction document are: _id

    with a unique ObjectId value (auto-generation and have index on it), TradeID, UserID,

    TradeAmount, TradeMoney, TradeTimed and GoodsDetailes. The value of GoodsDetailes is an

    embedded document which the fields of key is No, ItemID, Type, and Price. GoodsDetailes

    document can have a number of GoodsDetaile. MongoDB collections have an index on the _id field

    by default, and we create B+ index on the interesting field: TradeAmount. The data in MongoDB is

    show in Fig.2.

    2330 Mechatronics Engineering and Modern Information Technologies inIndustrial Engineering

  • Fig.2 Data in MongoDB

    MySQL Data Structure. The data structure in MySQl composes of TradeDetaile table and

    GoodsDetaile table. If we only create one table and put all transaction records in it, each

    GoodsDetaile will have a TradeDetaile and this will result in great redundancy. The fields in

    TradeDetaile table are TradeID, UserID, TradeAmount, TradeMoney, and TradeTime. The fields in

    TradeDetaile table are TradeID, No, ItemID, Type, and Price. The TradeID field is the foreign-key

    of TradeDetaile table.

    We have B+ index on all fields of the two tables in MySQL, but the more indexes in table do not

    mean the better. We create indexes on all fields just suit for the experiment in this article. Index is

    suit for usage when a small portion of data is required to return. If the data of return is more than a

    half total data, index is not suitable. Insert, delete, update data frequently will cause index update,

    which will cost much.

    Compare MongoDB and MySQL

    Experiment Environment. In this section, all the experiments we have done are in the same

    hardware and software environment. We create mock data in different amount and all the value in

    the data created randomly. We store different amount data by a java program with same class

    structure in MongoDB and MySQL, and then we compare the writing performance of the two

    databases with the time they cost. We use the same way to compare query performance. All the

    programs in the experiment are written by JAVA. The version of MongoDB is 2.2.3 and MySQL is

    1.2.12.

    The XML data is created randomly in experiment. Transactions element is the root node in the

    XML document. Transactions node has one TradeDetaile node and one GoodsDetailes node.

    GoodsDetailes has some GoodsDetaile nodes, whose number ranges from 1 to 5 randomly. The

    value of GoodsDetaile is in a pre-determined range. The amount of XML document is from 1

    thousand to 100 thousand.

    Writing Comparison. We store the XML documents in the two databases by the same process.

    First we parse the XML document and get the values of each node, and then put the values in a list;

    finally get the values of list and store in the two databases. Above steps are repeated until the whole

    XML document are stored in the database. The driver to access MySQL is JDBC. We get the

    writing-time in different amounts of data as shown in Fig.3.

    Applied Mechanics and Materials Vols. 713-715 2331

  • Fig.3 Writing-time comparison

    From the Fig.3 we can have a conclusion that the writing performance of MongoDB is better

    than MySQL in this experiment. When the amount of data is less than 10 thousand, the writing-time

    of MongoDB maintains a steady level. However, the writing-time of MySQL is increased linearly

    with the increasing amounts of data. When the amount of data is more than 10 thousand, the

    writing-time of MongoDB begins to increase linearly while the writing-time of MySQL increased

    dramatically. In this experiment, the time we used to compare the write performance is a relative

    time as it included the time of parse XML document, but this has no influence on the results of

    comparison.

    Query Comparison. Firstly, we compare Single-table query where all the data need to be read in

    a one table. The experiment is to query transaction records when TradeMoney is more than 14000

    and the TradeDetaile information is returned. We assume the number of data is n . About half of

    TradeMoney is more than 14000 and these data are stored uniformly in database. As index is on

    TradeAmount field in both of database, the amount of data need to be scanned is2

    n, and the time

    complexity is ( )n .the query language in MySQL is select * from TradeDetaile where

    TradeMoney>14000.we get the reading-time with amount of data from 1 thousand to 100 thousand

    as show in Fig.4.

    Fig.4 Read-time in Single-table

    As shown in the Fig.4, The query performance of MongoDB is better than MySQL significantly.

    The reading-time of MongoDB stays within 200ms (Millisecond) with the amount of data from

    1thousand to 100 thousand. When the amount of data is less than 10 thousand, the read time of

    2332 Mechatronics Engineering and Modern Information Technologies inIndustrial Engineering

  • MySQL is about 500ms, and the amount of data is more than 10 thousand, the read-time increased

    significantly with the increasing amounts of data.

    We compare the join-table query now, the experiment is to query transaction records which

    TradeMoney is more than 14000 and return the TradeDetaile information and GoodsDetaile

    information. At this time we need to have a join-query in TradeDetaile table and GoodsDetaile table.

    We assume the number of data is n . About half of TradeMoney is more than 14000 and these data

    are stored uniformly in database. The time of projection operation is negligible. First MySQL scans

    the TradeDetaile table to get tuples where TradeMoney is more than 14000. As index is on

    TradeMoney, the amount of data need to scan is 2

    n in TradeDetaile table. Then MySQL scans the

    GoodsDetaile table to get tuples where the TradeID is equal to the TradeID in tuples of

    TradeDetaile table. For each TradeID, MySQL will scan 2

    n tuples in GoodsDetaile table, and the

    total number of tuples to scan is4

    2n

    . For this query, MySQL will scan about4

    22 nn + tuples and the

    time complexity is ( )2n . The query language in MySQL is select t.*, g.* from TradeDetaile As t, GoodsDetaile As g where t.TradeMoney>14000 and t.TradeID=g.TradeID.While in MongoDB,

    transaction record stored in document like the tuples in MySQL, so the amount of data need to scan

    is 2

    n and the time complexity is ( )n .

    Fig.5 shows the read-time with the amount of data from 1thousand to 100 thousand. The

    read-time of MongoDB maintains about 200ms with the amount of data from 1thousand to 100

    thousand. When the amount of data is less than 10 thousand, the read-time of MySQL is at a low

    level. But more than 10 thousand, the read-time of MySQL increase dramatically.

    Fig.5Read-time in Join-tables

    When more than a half of data is need to return, index will lose its advantage. Full table scan

    would be more efficient than the query index. If we need to return all the Transactions information,

    MySQL will have a join-table query, the time complexity of which is ( )2n . While MongoDB scans the documents orderly and its time complexity is ( )n . When the amount of data is very big, the read performance of MongoDB is better than MySQL significantly.

    When data only have index optimization, if we want to query a data randomly, we need to scan

    the average number of tuples is n2log in each table, and total number isn

    2log2 , while the number

    is only n2log in MongoDB

    Applied Mechanics and Materials Vols. 713-715 2333

  • Conclusions

    Any Web systems with large amount of data are very taboo of join-table query on large

    tables.MongoDB is similar to the single-table query in relational databases, and it changes the

    complexity join-table query into single- table query, providing high read performance in massive

    amounts of data. In addition, MongoDB can show the data in the form of document without join

    tables, and users can observe and analyze the data more intuitively

    From the above results of experiment, we can see that the performance of MongoDB is better

    than MySQL, but NoSQL database is not to replace relational database. Document-oriented NoSQL

    database mainly solve the problem of high query performance when mass data storage is faced with.

    Almost all the features of MongoDB can be find in relational database, relational database provides

    a powerful feature set, but when users deal with large amount of non-relational data, they may do

    not need a complex relational database. In this case, NoSQL database is a good choice.

    Acknowledgment

    This work has been supported by Beijing Key Laboratory (No: BZ0211) and Key Scientific

    Research Project of Beijing WuZi University.

    References

    [1] Wenlong WangEssential MongoDB:Management and development(China Machine Press, 2011,1st edn)

    [2] Kristina hodorow,Michael DirolfMongoDB:The Definitive Guid (Posts and telecom press, 2011,1st edn).

    [3] Kyle Banker, MongoDB in Action (Posts and telecom press,2012,1st edn).

    [4] Shan WangShixuan Sa. Introduction to Database SystemsHigher Education Press,2006,4st

    edn.

    [5] Ben Forta. MySQL Crash Course(Posts and telecom press,2009,1st edn)

    [6] He Shengtao, MongoDB in network behavior analysis and control system, Network Secutity,

    2013,5

    2334 Mechatronics Engineering and Modern Information Technologies inIndustrial Engineering

  • Copyright of Applied Mechanics & Materials is the property of Trans Tech Publications, Ltdand its content may not be copied or emailed to multiple sites or posted to a listserv withoutthe copyright holder's express written permission. However, users may print, download, oremail articles for individual use.