A Middleware Design for Multiple Embedded

4
A Middleware Design for Multiple Embedded Database Systems Jhe-Hao Hu, Chin-Hsien Wu, and Chang-Hong Lin Department of Electronic Engineering National Taiwan University of Science and Technology, Taipei, Taiwan Email: {M9702120, chwu, chlin}@mail.ntust.edu.tw Abstract—Since embedded systems and consumer electronic devices are popular now, they have adopted huge-capacity storage systems such as flash-memory cards or solid-state drives (SSDs). Many embedded database systems(EDBS) also emerge for the maintenance of data on these storage systems. However, it is complicated and time-consuming to modify an application with one embedded database system to a new one with another embed- ded database system. In the paper, we will design a middleware for multiple embedded database systems by considering their different interfaces and overhead. With the help of the middle- ware, it is convenient for users to write applications that can easily adopt various embedded database systems. Furthermore, the middleware can leverage various embedded database systems for better performance and reasonable cost. Index Terms—Embedded Systems, Database Systems, Middle- ware Design I. I NTRODUCTION Most embedded systems and consumer electronic devices have adopted NAND flash-memory as their storage media since NAND flash-memory has advantages such as huge- capacity, low-power consumption, non-volatility, and shock- resistance. Many embedded database systems [4], [5], such as SQLite [1], Berkeley DB [2] , and GDBM [3] , etc, have been proposed for NAND-based storage systems. Each embedded database system might have its application programming in- terface (API) such that it is inconvenient to write applications that can work well with all embedded database systems. Such an observation motivates this research. We will design a middleware for multiple embedded database systems by providing a standard API. The mid- dleware should integrate various embedded database systems such that programmers can use the middleware to write applications transparently over different embedded database systems. The middleware will provide basic operations such as insert, delete, select and join as well as embedded database systems. Furthermore, the middleware should be responsible to reduce programming overhead and improve system perfor- mance when one embedded database system will be changed to another one. The rest of this paper is organized as follows: Section II will introduce the motivation and the related work about some popular embedded database systems. Section III introduct will present the middleware design for multiple embedded database systems. Section IV use the real EDBSs to implement our middleware. Section V is experiment with middleware. Section VI is the conclusion and future work. II. RELATED WORK AND MOTIVATION We will choose two embedded database systems (SQLite [1] and Berkeley DB [2] ) to test the middleware design. SQLite [1] is a software library that implements a self- contained, serverless, zero-configuration, transactional SQL database engine. SQLite [1] is widely used SQL database engine. The source code for SQLite [1] is also open. Berkeley DB [2] is a fast, open-source embedded database and is used in several well-known open-source products such as Linux and BSD Unix operating systems, Apache Web server, OpenLDAP directory, and OpenOffice productivity suite. Fig. 1. Traditional applications and embedded database systems Traditional embedded database systems might have their APIs such that it is inconvenient to write applications that can work well with all embedded database systems. The rela- tionship between applications and embedded database systems is related and one application just can run on one embedded database system, as shown in Fig. 1. The middleware can help applications to run on any embedded database systems since the middleware will provide a standard API. Programmers can reduce also their programming work when the used embedded database system will be changed to another one. III. A MIDDLEWARE DESIGN A. Overview Each embedded database system might have its application programming interface for programmers. If a programmer uses embedded database system A, he has to learn its APIs. If the system environment does not support embedded database sys- tem A, the programmer might modify the original application 2010 IEEE 14th International Symposium on Consumer Electronics 978-1-4244-6673-3/10/$26.00 ©2010 IEEE

Transcript of A Middleware Design for Multiple Embedded

Page 1: A Middleware Design for Multiple Embedded

A Middleware Design for Multiple Embedded

Database Systems

Jhe-Hao Hu, Chin-Hsien Wu, and Chang-Hong LinDepartment of Electronic Engineering

National Taiwan University of Science and Technology, Taipei, Taiwan

Email: {M9702120, chwu, chlin}@mail.ntust.edu.tw

Abstract—Since embedded systems and consumer electronicdevices are popular now, they have adopted huge-capacity storagesystems such as flash-memory cards or solid-state drives (SSDs).Many embedded database systems(EDBS) also emerge for themaintenance of data on these storage systems. However, it iscomplicated and time-consuming to modify an application withone embedded database system to a new one with another embed-ded database system. In the paper, we will design a middlewarefor multiple embedded database systems by considering theirdifferent interfaces and overhead. With the help of the middle-ware, it is convenient for users to write applications that caneasily adopt various embedded database systems. Furthermore,the middleware can leverage various embedded database systemsfor better performance and reasonable cost.

Index Terms—Embedded Systems, Database Systems, Middle-ware Design

I. INTRODUCTION

Most embedded systems and consumer electronic devices

have adopted NAND flash-memory as their storage media

since NAND flash-memory has advantages such as huge-

capacity, low-power consumption, non-volatility, and shock-

resistance. Many embedded database systems [4], [5], such as

SQLite [1], Berkeley DB [2] , and GDBM [3] , etc, have been

proposed for NAND-based storage systems. Each embedded

database system might have its application programming in-

terface (API) such that it is inconvenient to write applications

that can work well with all embedded database systems. Such

an observation motivates this research.

We will design a middleware for multiple embedded

database systems by providing a standard API. The mid-

dleware should integrate various embedded database systems

such that programmers can use the middleware to write

applications transparently over different embedded database

systems. The middleware will provide basic operations such

as insert, delete, select and join as well as embedded database

systems. Furthermore, the middleware should be responsible

to reduce programming overhead and improve system perfor-

mance when one embedded database system will be changed

to another one.

The rest of this paper is organized as follows: Section II

will introduce the motivation and the related work about some

popular embedded database systems. Section III introduct will

present the middleware design for multiple embedded database

systems. Section IV use the real EDBSs to implement our

middleware. Section V is experiment with middleware. Section

VI is the conclusion and future work.

II. RELATED WORK AND MOTIVATION

We will choose two embedded database systems (SQLite

[1] and Berkeley DB [2] ) to test the middleware design.

SQLite [1] is a software library that implements a self-

contained, serverless, zero-configuration, transactional SQL

database engine. SQLite [1] is widely used SQL database

engine. The source code for SQLite [1] is also open. Berkeley

DB [2] is a fast, open-source embedded database and is used

in several well-known open-source products such as Linux and

BSD Unix operating systems, Apache Web server, OpenLDAP

directory, and OpenOffice productivity suite.

Fig. 1. Traditional applications and embedded database systems

Traditional embedded database systems might have their

APIs such that it is inconvenient to write applications that

can work well with all embedded database systems. The rela-

tionship between applications and embedded database systems

is related and one application just can run on one embedded

database system, as shown in Fig. 1. The middleware can help

applications to run on any embedded database systems since

the middleware will provide a standard API. Programmers can

reduce also their programming work when the used embedded

database system will be changed to another one.

III. A MIDDLEWARE DESIGN

A. Overview

Each embedded database system might have its application

programming interface for programmers. If a programmer uses

embedded database system A, he has to learn its APIs. If the

system environment does not support embedded database sys-

tem A, the programmer might modify the original application

2010 IEEE 14th International Symposium on Consumer Electronics

978-1-4244-6673-3/10/$26.00 ©2010 IEEE

Page 2: A Middleware Design for Multiple Embedded

for another embedded database system. it is complicated and

time-consuming for programmers to rewrite applications.

Fig. 2. A middleware structure

We will propose a middleware that can work well with

embedded database systems. As shown in Fig. 2, applications

can run on EDBS A, B and C through the middleware. The

middleware can play a transparent role of communication

between applications and embedded database systems. Ap-

plications will not need to be modified when one embedded

database system is changed to another one. The middleware

can handle the change process and reduce the overhead. Fur-

thermore, the middleware can also choose a better embedded

database system under different access patterns.

B. Design Issues

1) Standard API: The middleware provides programmers

with a standard API to access various embedded database

systems. The standard API includes five basic operations such

as create, insert, delete, select and join. Since the middle-

ware might support various embedded database systems, the

middleware must understand their operation mechanism and

programming setup. By considering these issues, a standard

API can be designed well for programmers.

2) Data format: Since different embedded database sys-

tems might have different data format, parameters for insert,

delete, select, and join operations might be different. The

middleware should understand their data format and hide their

parameters. When the middleware will be used for multiple

embedded database systems, programmers only use an unified

data format and parameters to write applications.

Fig. 3. Embedded database system A convert to embedded database systemB

3) Conversion Overhead: When an application will be run

on another embedded database system, the middleware will

face the conversion problem. As shown in Fig. 3, if an appli-

cation wants to change its original embedded database system,

the middleware will handle the data conversion process. The

middleware might obtain data from the original embedded

database system and write these data into a new embedded

database system. Obviously, the conversion process will cause

overhead and the middleware should reduce the overhead as

much as.

4) Benchmark: The middleware can play a smart role

by test their efficiency of embedded database systems and

then suggest a better one. Since different embedded database

systems might have different strength. For example, some

embedded database systems might have fast read and write

performance, some might have quick search time, and some

might require less system resource. The middleware can pro-

vide related benchmark for programmers and determine which

embedded database system is suitable.

IV. A MIDDLEWARE IMPLEMENTATION

In this section, we will implement a middleware to support

two real EMBSs. We will use SQLite[1] and Berkeley DB

[2] that were introduced in Section II. Now, we will introduce

how SQLite and Berkeley DB work. We will also present how

to design a standard API for the middleware.

A. SQLite

SQLite is an open source lightweight embedded database

system that was created by D. Richard Hipp using C language.

SQLite has some features as follow:

• Unlike most other SQL databases, SQLite does not have a

separate server process. SQLite reads and writes directly

to ordinary disk files.

• Transactions are atomic, consistent, isolated, and durable

(ACID) even after system crashes and power failures.

• Implement most functions of SQL-92.

• Small code footprint: less than 300KB fully configured

or less than 180KB with optional features omitted.

• Sources are in the public domain. Use for any purpose.

We will introduce how SQLite uses its APIs to access data

as follow:

1) SQLite Structure:

• typedef struct sqlite3 sqlite3.

Each open SQLite database is represented by a pointer

to an instance of the opaque structure named ”sqlite3”.

It is useful to think of an sqlite3 pointer as an object.

2) SQLite API: SQLite have basic pointer structures and

basic APIs for opening and closing the database.

• int sqlite3 open(const char *filename, sqlite3 *).

The function can open a SQLite database file whose name

is given by the filename argument.

• int sqlite3 close(sqlite3 *)

This function is the destructor for the sqlite3 object.

SQLite can write and read data from databases by a SQL-92

language.

Page 3: A Middleware Design for Multiple Embedded

• int sqlite3 exec(sqlite3*, const char *sql, char **errmsg

/* Error msg written here */).

Programmers just write the SQL-92 language and use

sqlite3 exec() to control databases. If programmers

want to create a table, let char* sql be ”create ta-

ble table name(table spec)” and execute the function

sqlite3 exec.

B. Berkeley DB

Berkeley DB is a product of open source by Oracle and can

provide developers with fast, reliable, local persistence with

zero administration. Often deployed as embedded databases,

Berkeley DB can provide high performance, reliability, scal-

ability, and availability for applications. However, it does not

support a SQL-92 language. Berkeley DB has some features

as follow:

• ACID transaction.

• Indexed and sequential retrieval (Btree, Queue, Hash).

• Programmatic administration and management - zero

human administration.

• Sources are in the public domain. Use for any purpose.

Berkeley DB can support many programming language such

as C, C++, Java, Perl, PHP, etc. We will use C language API

design our middleware and we will introduce its structure and

C APIs as follow.

1) Berkeley DB Structure: Berkeley DB [2] does not sup-

port SQL-92 to control databases, it has specific structures to

access data from databases.

• typedef struct { void *data; /* a pointer to a string */u int32 t size; /* The length of data, in bytes. */ } DBT.DBT can store data and data length, Berkeley DB can

write and read data from database by DBT.

• typedef struct DB DB.

DB is the handle for a Berkeley DB database.

2) Berkeley DB API: We will introduce how Berkeley DB

uses its APIs to access data as follow:

• db create(DB **dbp, DB ENV *dbenv, u int32 t flags).

db create() function creates a DB structure that is the

handle for a Berkeley DB database. The function allocates

memory for the structure and returns a pointer to the

structure in the memory to which dbp refers.

• DB->open(DB *db, const char *db name, DBTYPE

type).

DB->open() method opens the database. DBTYPE are

Btree, Hash, Queue, and Recno.

• DB->close(DB *db).

DB->close() method flushes any cached database in-

formation to disk, closes any open cursors, frees any

allocated resources, and closes any underlying files.

• DB->put(DB *db, DBT *key, DBT *data).

DB->put() method stores key/data pairs in the database.

The default behavior of the DB->put() function is to in-

sert the new key/data pair, replace any previously existing

key if duplicates are disallowed, or add a duplicate data

item if duplicates are allowed.

• DB->get(DB *db, DBT *key, DBT *data).

DB->get() method retrieves key/data pairs from the

database. The address and length of the data associated

with the specified key are returned in the structure to

which data refers.

C. Standard API Design

The middleware will integrate various embedded database

systems since each embedded database system might have its

APIs. A standard API should be designed and work well with

all embedded database systems.

According to our observations, SQLite supports a SQL-92

language and table conception. However, Berkeley DB does

not support table conception. If we want to insert data with

the same primary keys into Berkeley DB, Berkeley DB has to

store these data into different database files. When we design a

standard API, we should add this table conception and resolve

the potential conflicts between different embedded database

systems. We will list the standard APIs as follows:

• void create(char *db name,char *table name,int flag/*

which database will be used */,char *spec /*Like:(a

INTEGER PRIMARY KEY, b INT, e TEXT)*/).

The function can create a table and an initial database.

• void insert(char *db name,char *table name,int flag,char

*data).

The function can insert data to a database.

• void deletedata(char *db name,char *table name,int

flag).

The function can delete data from a database.

• void selectdata(char *db name,char *table name,int

flag,int num,char *spec).

The function can search data from a database.

• void joindata(char *db name,char *table1 name,int

num1,char *table2 name,int num2,int flag).

The function can join two tables.

V. EXPERIMENT

In this section, we will test our middleware and measure its

performance and overhead. Our experiments were performed

on a Dual Core 2.0Ghz Intel Pentium machine with 3 GB

RAM running SUSE Linux 2.6.21.6. In the experiments, our

data specification is a phone book like : (ID INTEGER

PRIMARY KEY,Name VARCHAR(50),PhoneNumber VAR-

CHAR(50)). We will create 1000, 5000 and 10000 records by

the data specification and execute insert and join operations to

these records.

By using SQLite, we inserted 1000, 5000 and 10000

records. We can see that the execution time is just a little

different between using middleware and no using middleware,

as shown in Fig. 4. It means that the middleware only caused a

little overhead for SOLite. This is because the middleware will

just a role of communication between application and SQLite.

By using Berkeley DB [2], we also inserted 1000, 5000,

10000 records, as shown in Fig. 5. We can observe that

the execution time with the middleware was longer than that

without the middleware. This is because Berkeley DB does

Page 4: A Middleware Design for Multiple Embedded

Fig. 4. Insertion time for SQLite using middleware and no using middleware

Fig. 5. Insertion time for Berkeley DB using middleware and no usingmiddleware

not provide the table conception and the middleware design

will cause overhead in adding the table conception. However,

it is required when a standard API is implemented.

Fig. 6. Join time for SQLite and Berkeley DB using middleware

For join function using middleware, we can observe that

using SQLite was faster than using Berkeley DB, as shown

in Fig. 6. Since Berkeley DB does not provide table concep-

tion, the middleware design should resolve the conflict when

different embedded database systems are integrated. Extra

conversion might cause overhead in the table conception. As

a result, SQLite might have better performance when join

operations are required.

According to Fig. 4, Fig. 5, and Fig. 6, we can know

the middleware’s overhead is to add the table conception

into embedded database systems, especially for Berkeley DB.

So we can realize that the middleware design can provide

programmers with flexibility but might cause overhead for

integrating different embedded database systems.

In conversion overhead, we can know it will cause extra

overhead. So we advise programmers need to consider this

conversion overhead and performance. If programmers want

to change a embedded database system to another one, the

middleware will read the data information from the original

embedded database system and insert the data information into

the new embedded database system. The advantage is that

programmers never handle how to do the conversion process.

VI. CONCLUSION

Since embedded systems and consumer electronic devices

are popular now, they have adopted flash-memory cards or

solid-state drives as their storage systems. Many embedded

database systems also emerge for the maintenance of data

on these storage systems. Since it is complicated for users

to modify an application with one embedded database system

to a new one with another embedded database system. We

propose a middleware design for resolving the issue. We list

the contributions of the paper in the following:

• The middleware can provide a standard development

API for multiple embedded database systems and reduce

programming overhead.

• According to the development environment, the middle-

ware can handle the change process between embedded

database systems and programmers do not rewrite appli-

cations.

• The middleware can provide related benchmark for pro-

grammers and determine which embedded database sys-

tem is suitable.

For future research, we should further explore different

application characteristics and different workloads in embed-

ded database systems. More research and tool designs in

the optimization of the middleware for different embedded

applications might prove being very rewarding.

VII. ACKNOWLEDGMENTS

This paper is supported in part by a research grant from

the National Science Council under Grant 98-2221-E-011-091-

and 98-2221-E-011-103-.

REFERENCES

[1] SQLite3. http://www.sqlite.org/[2] BerkeleyDB. http://www.oracle.com/technology/products/berkeley-

db/index.html/[3] GDBM. http://www.gnu.org/software/gdbm/[4] GyeJeong Kim, SeungCheon Baek, HyunSook Lee, HanDeok Lee,

Moon Jeung oe, ”LGeDBMS: a small DBMS for embedded systemwith flash memory” 32nd international conference on Very large databases , 2006, pp. 1255-1258

[5] Sang-Won Lee, Gap-Joo Na, Jae-Myung Kim, Joo-Hyung Oh, Sang-Woo Kim, ”Research issues in next generation DBMS for mobileplatforms” 9th Intl. Conf. on Human Computer Interaction with MobileDevices and Services, 2007, pp. 457-461