p1629 Gaber

download p1629 Gaber

of 2

Transcript of p1629 Gaber

  • 7/30/2019 p1629 Gaber

    1/2

    Distributed Data Stream Classification for Wireless Sensor

    Networks

    Mohamed Medhat GaberCentre for Distributed Systems and Software

    EngineeringMonash [email protected]

    Ary Mazharuddin ShiddiqiClayton School of Information Technology

    Monash [email protected]

    ABSTRACT

    It has been established experimentally that in-network pro-cessing in wireless sensor networks is the acceptable mode ofoperation. However, this solution is faced by resource con-straints of the sensor nodes, especially when running tra-ditional data mining techniques that tend to consume theresources rapidly. On the other hand, data stream miningalgorithms still fall short with the limited computationalcapabilities of the nodes. These algorithms need real-timeadaptation to availability of resources. Distributed process-

    ing is also essential to produce a global model of the datastreams emanated from the network. In this paper, we pro-pose a novel distributed data stream classification techniquethat is able to adapt to availability of resources in wirelesssensor networks.

    1. INTRODUCTION

    Mining data streams in wireless sensor networks has manyimportant scientific and security applications. However, therealization of such applications is faced by two main con-straints. The first is represented by the fact that sensornodes are battery powered. This necessitates that the run-ning applications have a low battery footprint. Consequently,in-network data processing is the acceptable solution. Thesecond is the resource constraints of each node in the net-work such as memory and processing power [2].

    Many applications in wireless sensor networks require eventdetection and classification. The use of unsupervised learn-ing techniques has been proposed recently for this problem.Despite the applicability of the proposed methods, thesetechniques have not addressed the problem of running thetechniques on resource constrained computing environmentsby adapting to availability of resources. The problem hasbeen addressed by proposing lightweight techniques. How-ever, this may cause the sensor node to stop the processingdue to the low availability of resources. Experimental re-sults have proved that typical stream mining algorithms cancause the device to shutdown. Also, the use of unsupervised

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SAC10 March 22-26, 2010, Sierre, Switzerland.Copyright 2010 ACM 978-1-60558-638-0/10/03 ...$10.00.

    learning may fail to detect events of interest due to the pos-sibility of producing impure clusters that contain instancesof two or more classes.

    In this paper, we propose the use of distributed classifica-tion of data streams in wireless sensor networks for event de-tection and classification. The proposed technique can adaptto availability of resources and work in a distributed settingusing ensemble classification. The technique is coined RA-Class in reference to its resource awareness capabilities. Theexperimental results have shown the high accuracy, while be-

    ing able to adapt to resource availability on a real dataset.The rest of the paper is organized as follows. Section 2

    reviews the related work briefly. The proposed techniqueis given in Section 3. Section 4 discusses the experimentalresults. Finally, the paper in concluded in Section 5.

    2. RELATED WORK

    The resource adaptive framework proposed by Gaber andYu in [3] represents the closet work to the research reportedin this paper. Our research uses the proposed frameworkfor adapting to variations of resource availability on a sin-gle node. The framework proposed by Gaber and Yu [3]uses three settings that are adjusted in response to the re-source availability during the mining process. The input set-

    tings are termed Algorithm Input Granularity (AIG). Theoutput and processing settings are termed Algorithm Out-put Granularity (AOG), and Algorithm Processing Granu-larity (APG) respectively. The input settings include sam-pling, load shedding, and creating data synopsis techniques.The output settings include knowledge structures created,or levels of output granularity. Changing the error rate ofapproximation algorithms or using randomization representthe processing granularity. The three Algorithm Granular-ity settings are termed collectively as Algorithm GranularitySettings (AGS).

    3. ADAPTIVE CLASSIFICATION IN WIRE-

    LESS SENSOR NETWORKS

    RA-Class follows a similar procedure to LWClass proposedby Gaber et al [2]. However, RA-Class extends LWClass intwo different aspects:

    RA-Class uses all the algorithm granularity settings(input, processing and output). On the other hand,LWClass uses only the algorithm output granularity.

    RA-Class works in a distributed environment using en-semble approach. On the other hand, LWClass is de-

    1629

  • 7/30/2019 p1629 Gaber

    2/2

    signed for centralized processing.

    The algorithm starts by examining each incoming streamingrecord. The algorithm determines whether the new recordwill be assigned to a specific stored entry, or will be stored asa new entry. There is an update function that changes thesettings of the algorithm, when needed. In the proposedRA-Class, there are three settings that can be adjusted:sampling interval, randomization factor and threshold value.

    The result of the RA-Class algorithm is a list of entries, eachassociated with a class label and a weight. The weight rep-resents the number of data stream records represented by aparticular entry.

    In a distributed environment, there is a possibility thatone of the nodes will run out of battery resources. Therefore,we have used a mechanism to handle this scenario to keepthe recent list of stored entries, produced during the datastream classification process.

    In the process of deciding a class label for an unlabeledstreaming record in a distributed environment, each RA-Class node needs to find the label using its own knowledge.The node labeling technique used needs to find the optimalapproach to determine the closest entry. In this research,we use the K-NN algorithm with K = 2 for fast classifica-tion of streaming inputs. We then use an ensemble approachto classify any unlabeled streaming record. Each node con-tributes to the election by giving a vote to each class label,while also providing an error rate. An error rate is used asa mechanism to state the assurance level of a vote.

    4. EXPERIMENTAL EVALUATION

    We run our experiments on the Sun SPOT sensor nodesfrom Sun Microsystems using SunSPOT API version 3.0. Toevaluate the performance of our algorithms, we have con-ducted a set of experiments to assess the accuracy. We haveused the Iris dataset from UCI Machine Learning [1]. Thedataset contains 150 entries, with 4 attributes.

    The main goal is to test the validity of RA-Class in a dis-tributed environment on the real Sun SPOT devices. Weuse three nodes that run RA-Class, and then use the ensem-ble approach for the classification process. We have dividedthe dataset into three disjoint subsets that are equal in size.Then we simulated 1500 data streams drawn randomly fromeach subset of the dataset to feed each node. We have usedthe Sun SPOT LEDs to indicate the on going process asshown in Figure 1.

    After performing the classification process, we have testedthe accuracy of RA-Class using 15 randomly selected entriesfrom the iris dataset. This test is done on a single nodefunctioning as a testing node. The final entries list of theremaining two nodes is transferred to the testing node andthen the accuracy testing is performed using the ensemble

    algorithm. The reason of this technique is only for efficiencypurposes, so that the accuracy testing will be easier to ob-serve. By repeating the experiments ten times, the resultshows that the distributed RA-Class produced 88.0% accu-racy of the total entries tested. The result shows that ina real distributed system environment, the elaborated RA-Class remains producing a better result than a single node ofRA-Class. This is due to the use of the ensemble algorithmthat elevates the accuracy level.

    The above experiment has not considered the migrationand merging processes that clearly affect the accuracy. To

    (a) Classification process (b) Critical situation andneighbour selection

    (c) Migration process (d) Sleeping mode

    Figure 1: Distributed RA-Class on Sun SPOT

    measure the accuracy when the migration and merging pro-cesses take place, we have conducted the same experiment

    with two nodes contributing to the classification and onedying node. The results show that the distributed RA-Classproduced 84.67% accuracy in average over ten different runsof the experiment.

    5. CONCLUSION

    The paper explored the validity of an adaptive classifi-cation technique we termed Resource-Aware Classification(RA-Class) to process data streams in wireless sensor net-works. The algorithm has been tested in a real testbed usingthe Sun SPOT sensor nodes from Sun Microsystems. The re-sults have shown high accuracy, while adapting to the scarceavailability of resources.

    6. REFERENCES[1] Asuncion, A. and Newman, D.J. (2007). UCI Machine

    Learning Repository[http://www.ics.uci.edu/ mlearn/MLRepository.html].Irvine, CA: University of California, School ofInformation and Computer Science.

    [2] Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A.,On-board Mining of Data Streams in Sensor Networks,in Advanced Methods of Knowledge Discovery fromComplex Data, (Eds.) Sanghamitra Badhyopadhyay,Ujjwal Maulik, Lawrence Holder and Diane Cook, pp.307-335, Springer Verlag, 2005.

    [3] Gaber M. M., and Yu P. S., A Holistic Approach forResource-aware Adaptive Data Stream Mining, Journal

    of New Generation Computing, Volume 25, Number 1,November, 2006, pp. 95-115, Ohmsha, Ltd., andSpringer Verlag.

    [4] Phung N. D., Gaber M. M., and Rohm U,Resource-aware Online Data Mining in Wireless SensorNetworks, Proceedings of the IEEE Symposium onComputational Intelligence and Data Mining, CIDM2007, pp. 139-146, IEEE press.

    1630