An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor...

download An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor Network_2012

of 5

Transcript of An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor...

  • 7/30/2019 An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor Network_2012

    1/5

    An Open, Flexible and Multilevel Data Storing andProcessing Platform for Very Large Scale Sensor

    Network

    Jing LIU*, Jing CHN*, Li PENG**, Xixin CAO*, RenChun Lian***, Ping WANG*

    * School of Soware ad Microelectronics, Peking University, China

    ** School of IoT Engineering, Jiagnan University, China

    ***Inspeed Communication Co.,Ltd, China

    Ij@, @

    - kinds of sensor network have been deployed allover the world. Combining some of these networks, we can get alarger sensor network. The data of very large scale sensornetwork are polymorphous, heterogeneous, large in quantity and

    time-limited. In its application, how to store and process thesedata has become a key technology.

    The major contribution of this paper is proposing a dataProcessing model based on Cloud computing for very large scalesensor network. In this exible model, the huge sensor data andnode information are stored in Multilevel Data storage includinglocal data bases of dierent devices and distributed data bases ofcloud; Dierent kinds of computations are decomposed anddistributed on different nodes mainly considering theirdifferences on computation ability and power supply. The cloudsmake the development and deployment of applications base onthese very large scale data very easy.

    We have setup a platform to verify this model base on someopen source software. This platform composes of sensor nodes,

    WIFI and Zigbee networks, embedded gateways, local servers,and clouds. Dierent data bases including SQLite, TinyDB,MySQL, MongoDB and Cassandra have been used for dierenttype of devices to store data. We also have developed some cloudapplications for it. The result shows that this platform base onthe new computation model is exible, easy to develop newapplication and energy balanced for sensor node.

    Ksensor network, cloud computing, large scale, model,

    exible

    . NTRODUTION

    The sensor network is a network consisting of spatiallydistributed devices equipped with sensors which ae used tocollect physical data and monitor environmental condition atdierent locations (e.g. [1], [2]). In a typical sensor network,nodes cooperate to complete the task of collecting raw dataad retu to the application back-end. The application backend, which is usually a local sever, nishes the task of datastoring, aalysing and drives actuator.

    In the last few years, all kinds of sensor networks have beendeployed all over the word. If we combine some of thesenetworks, we can get a lager sensor network, which makes

    the applications over a larger ea become possible. Sensornetworks are various in chitecture ad implementation. Thedata of ve lage scale sensor network are polymorphous,

    heterogeneous, lage in quatity and time-limited. In a velage scale sensor network, How to manage the sensingresources d computational resources, d how to store dprocess these data has become a key technology.

    Another trend in information ad communicationtechnology is cloud computing. Cloud is a virtualized platformwhich oers open ad uniformed access to extensiblecomputational resources, storages, d soware services.Three cloud computing models are proposed. Computerinastructure resources e oered in the cloud is IaaS(Inastructure as a Service) (e.g. [3], [4]). Computationalresources with completely supporting environment ae oeredin the cloud is PaaS (Platform as a Service) (e.g. [5], [6]).

    Online soware accesses offered in the cloud is SaaS(Soware as a Service) [7].Cloud computing model can easily handle the massive data

    storing ad processing works. It seems suitable as the backend of the very lage scale sensor network. But there ae stillmay problems to be solved if we want to mae these two

    technologies cooperate together ad to produce new value. As(1) How to store the data? If all the data are send to the datacenters of the cloud at the sampling time, the inow ofmassive data to the wideea networks may cause networkcongestion. (2) Where to processing the data? If all the data ofthe sensors are processing at the cloud, the communicationlatency may make the applications requiring for real-time

    demand intolerable.To solve these problems we propose a exible admultilevel data processing model based on cloud computing.In this model the massive sensor data ad node informationae stored in multilevel data storage including local data basesof different devices and distributed data bases of cloud;Different kinds of computations ae decomposed addistributed on different nodes mainly considering theirdierences on computation ability ad power supply. Theclouds make the development and deployment of applicationsbase on these very large scale data very easy.

    SN 9788999 9 Fb 9 T

  • 7/30/2019 An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor Network_2012

    2/5

    This paper is structured as follows. Section I providesthree deployment scenios of large scale sensor network. Insection , we proposed a multilevel storage model for theapplication demands of lge scale sensor network. Section describes a uniform data access model. Section Vdescribesthe application model. Section presents our veri platform.Finally, Section v presents some conclusions and tureworks.

    . ARGE SALE SENSOR NETWORK DELOYMENT

    SENARIOS

    A large scale sensor network ca provide much uselsensor data such as temperure, humidity, location, light,sound, image and so on. These data could be used to provideservices to many usel applications. Dierent applicationmay have different demands on data sources and dataprocessing. In this section, we consider there typicalapplications that may be deployed on a large scale sensornetwork. We analyse their requirements for the data storingand processing.

    We consider a very lager sensor network all over thenational expressways. This network is consisting of manysmall sensor networks, which cover a small aea as theentraces, the exits, the service places and so on. There aremay potential applications of such network.

    Vehicle trackingVehicle c be tracked by different ways, as image, RFID.

    Vehicle tracking is a hard real-time application. When thepolice oce wants to track a gitive ca all over the countthe data must be processed ad aswer must be given quicklyat the place such as the entrances ad the exits. So thecomputation must be delivered to the local servers, and there

    is no need to storing the historical data for this scenio.

    B Trafc dpatchingTrac dispatching need real-time data of trac ow,

    weather data and so on. Trafc scheduling is a so real-timeapplication, as several minutes lag will not cause sueringresult. If all the trac related data ae sent to the data centersof the cloud, the inowing data may cause network congestion.So the statistics information must be computed on the localservers, but the global decisions ca only be made on thecloud, as only the cloud has the information of wide area. Ifthe scheduling algorithm only base on the real-data, there is noneed to store the collected data in the cloud, but if it is a

    prediction algorithm base on historical data, the collected datamust be stored in the cloud.

    C Expressway planningUnlike in the scenio of trafc dispatching, expressway

    planning does not need real-time data, but need huge historicaldata. The cloud data centers ca collect statistical data omlocal servers, and complete the computation using the cloudcomputing resources.

    . ATA STORING MODEL

    Above scenaios show that different application hasdierent data demands. this section, we propose amultilevel model for storing sensor data of the very large scalesensor network.

    In our model, heterogonous sensors are distributed over avery lager aea. Space close sensors ae grouped into a sensornetwork. There are coordinators d gateways in each groupwhich are responsible for collecting raw data, storing them onthe local storage, and act as accessing to the wide-aeanetworks. One or more sensor networks ae connected to alocal server, which provides the local back-up storage andcomputing ability. Several clouds ae work above these localservers to grasp the data for application needs.

    Row sensor data e adjusted at sensor nodes and gateways.Metadata of the sensor network is stored at its gateway. Thesemetadata usually include sensor type, node type (as a nodemay contain several dierent sensors), node Mac address,node ID, node location, sensor ID ad so on. The local serversgather data om the gateways, ad store them in local database. As dierent sensor network has different inner dataformat. The local servers have to trasform these data to thesame format, d provide uniform accesses. Sensor ID is agood example. In order to save energy, short IDs ae usedinner the small sensor network, but in order to discriminatesensor nodes in the lage network, short IDs must betrasformed global unique IDs. Metadata of sensor networks,local servers are used in this trasformation. Historical dataae stored in the local servers. Clouds grasp interesting dataom local servers for different applications ad store them inthe data centers. Clouds also provide platfo and sowareresources for these applications. Data sent by different datastorage nodes ca be aggregated in order to reduceredundac and minimize network trac load.

    Q; W " '/' ' :,ro h,m/". / \ .. . . .

    /.. ...

    Figure Multileve data storing model. ATA ESS MODEL

    Different application requires dierent sensor data whichstored at dierent places. We need a method to bind these datasets with applications. In this section we propose a uniformdata access model. We rst analyse four important propertiesof the sensor data.

    Space

    SN 9788999 97 Fb 9 T

  • 7/30/2019 An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor Network_2012

    3/5

    The sensor data is sampled at certain location dapplications need to access sensor data of certain eas orlocations. Space information of sensor data can be speciedby longitude, latitude, and height. Some relevant spaceinformation is also attached to the data if this data related to aspecial and meaningl object. As a ID sensor which usedto monitor the bypass vehicles at entrance of expressway,we can attach the entrance ID to this space data.

    B meThe sensor data is sampled at a certain time ad dierent

    application need sensor data at different time. Real-timeapplications use real time data, and other applications needhistorical data.

    C real sensor data and virtual sensor dataWe divide the data into two categories: real sensor data ad

    virtual sensor data. When we use a FID sensor to track thebypass vehicles, we need real data sampled om the sensor.

    But we want to know the temperature at some point, theremayn't be a temperature sensor just at this point, but we cdeduce it om the nearby sensors, it is virtual data. So we cansee that real sensor data is the data which can directly get oma sensor. Virual sensor data is the data that deduced omrelated data.

    D. Data precion

    The precision of sensor data is mainly affected by twofactors the precision of the sensor for real sensor data, d emethod used to deduce the virtual sensor data.

    d ud d

    /( UFigure Uniform data access model

    In order to providing a uniform access to the sensor data,we must consider these properties. In our model the localserver is responsible to describe the data access interface. Thespecics of data access interfaces are stored in a global server.Through this global server, the clouds know where to get theirinteresting data ad how to deploy applications. Figure 2shows this achitecture.

    In our model, we use a nest rectangular structure todescribe the space property of the sensor data set. Figure 3shows a example of a local server aea. This server

    connected to two networks: sensor network1 ad sensornetwork2. The outside lger rectangular presents the localserver area, ad it contain two small rectangular whichpresents sensor network 1 area ad sensor network 2 aearesectivel.

    ca erver area

    enr ewrk

    area

    er ewrk area

    Figure Uniform data access modelWe also use a uniform method to describe the time attribute

    of every sensor data set. If a sensor set is described by [stattime, end time], it means that the set contains historical sensor

    data which sampled between the start time and the end time.[stat time, now) means the set contains sensor data om starttime to now, and the real-time data. [now, now means the setonly contains real-time sensor data.

    Figure 4 shows a simple XML format description for dataaccess interface of the above example.

    Figure XL format description for data access interce. pLIATION MODEL

    May sensor network applications have large amount ofdata to be deal with. Google's map-reduce [8] ad Hadoop [9]are eective tools to support massive data computations.Figure5 shows the principle of map-reduce.

    Briey, map-reduce uses a map nction to map the datastored in les into key-value pairs. All the produced pairs arerouted by a master controller to one of several Reduceprocesses d all the pairs with the same key wind up at thesame reduce process. The reduce processes use a reducenction to combine the values associated with one key to

    SN 9788999 98 Fb 9 T

  • 7/30/2019 An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor Network_2012

    4/5

    produce a single result for that key. The master is a controllerwho monitors the map and reduce processes d is able toredo them if a process fails.

    (I) fork 2)aSS g nlap/spiO Sspilspi2spi3s t4s t

    nut e M hsen md i

    i Redue he

    Figure 5 Map-reduce principle

    Ouu e

    Map-reduce model is a natural way to implement dataintensive applications in paallel. But using it in the very lagescale sensor networks, there ae still some drawbacks. (1) Inmap-reduce mode the input data are stored in les where themaster ca easily to get them. But in the large sensor networks,the data are stored in dierent place with different format. (2)The application of sensor network usually needs to deal withreal-time data. The time delay of map-reduce may mae thereal-time application intolerable. (3) Key-value map is notenough to describe the sensor data sets. The space and timeattributes of sensor data give me natural hits of how to mapthe data.

    In order to solve these problems, we propose a new parallelcomputation model for sensor network. Figure 6 shows thismodel.

    Lc serve c eve c seve

    Figure 6 New computation model for sensor networkThe steps of this new model ae as follows. (1) The

    application submits the data access requires to data accessserver. (2) Data access server ret the data accessdescription to the application. (3) According to the data accessdescription, the application program is deployed to the

    workers on the application containers of local servers. (4) Theworkers of the local servers bind data to the application, and the map nction. (5) The workers of local servers pushdata to the workers on the cloud. (6) The workers on the cloudstore the data on the cloud data centers, reduce nctionad write the result to the output. The master monitors themap d reduce processes and tries to redo them if a processfails.

    In this model we can see that several changes have beenintroduced. Application containers on local server provideexecuting environment for the map nctions. They also binddata to the local workers. Real-time demands can be satisedon local computation. Local workers push data to the workson cloud so the cloud could allocate enough computingresources to deal with the data as required.

    As in the scenio of vehicle tracking, for example, theapplication rst looks up all the usel sensor data sourcesom the data access server. The data sources may includeID signals, images, or videos. The progr is deployed to

    the application containers on the local server according to thelook up result. The data sources are bind to the application bythe application containers. Different methods ae used to dealwith different data type. Simple ID comparing is used forFID signals, image alyze technique is used for images andvideos. The results are pushed to dierent workers on cloud.These workers on the cloud e distinguished by dierenttracking objects, so the objects move track could besynthesized on the cloud.

    In the other scenario of expressway planning, the workerson the local servers e responsible to collect statistical dataom local servers. These historical data ae stored in the clouddata centers. The workers on the cloud analyse these data toget the planning result.

    . PLATFORM TO VERIFY THE MODEL

    In order to veri the eectiveness of this model, we havesetup a platform using some open source sowae.

    We have used ve types of sensor. They ae temperaturesensor, light sensor, humidity sensor, location sensor adcabon dioxide density sensor. We set up four WSNs (wirelesssensor networks) base on WIFI and Zigbee protocol. FourMini 6410 AM11 embedded gateways ae used to collectingdata om sensor network. Two local servers each connect totwo gateways. Data are formatted on the local servers and theuniform access descriptions e sent to a global data accessserver. We also set up two cloud computation environment

    base on and Cassandra, MongoDB distribution data bases.Different data bases including SQLite, TinyDB, MySQL,MongoDB ad Cassandra have been used for dierent type ofdevices to store data. SQLite and TinyDB are used ongateways to store meta data of the sensor networks. MySQLis used on local servers to store sensor data and historical data.MongoDB d Cassadra e used to store application relateddata.

    We have developed some cloud applications for thisplatform. one application, we track the highest temperature.Works on local servers report the highest temperature of

    SN 9788999 99 Fb 9 T

  • 7/30/2019 An Open Flexible and Multilevel Data Storing and Processing Platform for Very Large Scale Sensor Network_2012

    5/5

    associated WSNs to the work on the cloud. The worker on thecloud tracks the highest temperature node ad shows itinformation. In aother application, temperature, humidity,light ad carbon dioxide density data e collected by the fourworkers on the local servers. Each worker is for a WSN. Fourworkers on the cloud aalyses these data and give out theenvironmental indexes of four eas.

    The result shows that this platform base on the newcomputation model is exible, easy to develop newapplication ad energy balanced for sensor node.

    . CONLUSIONS

    In this paper, we ague that tradition Map-Reducecomputation model is not enough for large sensor network asthe data of very lge scale sensor network e polymorphous,heterogeneous, large in quantity ad time-limited, d thereae many real-time application demands. So we propose a newcomputation model for very large sensor network. Multilevelstorage model, uniform data access, d local applicationcontainer ae introduced.

    But there are still many problems that are worth to beexploited. We list some of them as follows: (1) How to givea uniform description the sensor data set precisely adeectively. (2) The data structure used to describe the dataaccess information on the data access server. (3) How todescribe the data precision property. (4) How to bind the datasources to the workers on the local application container.

    EFERENES[1] Akyildiz LF, Su W, Sakarasubriam Y, Cayirci E. Wireless

    sensor network: A survey [J]. Computer Networks , 2002, 38(4):393422.

    [2] Ren Fengyu, Hug Haining, in Chug. Wireless sensor networks[J].Joual of soware,2003,14(2):11481157.

    [3] Amzon Elastic Compute Cloud (EC2), available online:http://aws.amazon.com/ec2/,accessed July 2010.

    [4] SliceHost Cloud Services,available online: http:/www.slicehost.com/[5] Google App Engine, availle online at:

    http:code.google.com/appengine/[6] Sales Force,available online at: http:www.salesforce.com/platform[7] A. Dubey, and D. Wagle, "Delivering soware as a service, The

    McKinsey Quarterly,May 2007.[8] 1. Dean d S. Ghemawat, "Mapreduce: Simplied Data Processing on

    arge Clusters, Comm ACM vol. 51,no. 1,pp. 1 07113,2008.[9] Apache,"Hadoop, http://hadoop.apache.org,2006.

    SN 9788999 9 Fb 9 T