SENSEYE: A MULTI-TIER HETEROGENEOUS CAMERA SENSOR …lass.cs.umass.edu/theses/puru.pdfPURUSHOTTAM...

SENSEYE: A MULTI-TIER HETEROGENEOUS CAMERA

SENSOR NETWORK

A Dissertation Presented

by

PURUSHOTTAM KULKARNI

Submitted to the Graduate School of theUniversity of Massachusetts Amherst in partial fulfillment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

February 2007

Computer Science

c© Copyright by Purushottam Kulkarni 2007

All Rights Reserved


SENSOR NETWORK

A Dissertation Presented

by


Approved as to style and content by:

Prashant Shenoy, Co-chair

Deepak Ganesan, Co-chair

James Kurose, Member

Mark Corner, Member

C. Mani Krishna, Member

W. Bruce Croft, Department ChairComputer Science

To Nanna, Aai and Pappa.

ACKNOWLEDGMENTS

My six-year stay at Amherst for my doctoral degree has been a memorable experience—

made possible by a lot of people. I am immensely grateful to you all.

First and foremost, I am grateful to the the Computer Science Department at the Univer-

sity of Massachusetts, Amherst which gave me an opportunity to pursue the Ph.D. program.

I am indebted to my advisers Prof. Prashant Shenoy and Prof. Deepak Ganesan. Prashant

provided valuable guidance and mentoring throughout my stay at Amherst. I am also grate-

ful to Deepak for advising me for my dissertation. I have learnt several aspects of research

and teaching from both, which I hope to follow.

I would like to thank my thesis committee members— Prof. Jim Kurose, Prof. C. Mani

Krishna and Prof. Mark Corner, for agreeing to be part of my dissertation and for their

feedback.

I took several classes at University of Massachusetts, Amherst and at University of

Minnesota, Duluth which not only taught me important subjects but also the importance of

teaching. I am thankful to all my teachers for this experience.

Prof. Rich Maclin, my Masters thesis adviser at University of Minnesota, Duluth, en-

couraged me to pursue the Ph.D. program and I will always be grateful for his advice.

I am grateful to the Computer Science Department Staff, including Sharon Mallory,

Pauline Hollister and Karren Sacco, for helping with all the administrative matters. Tyler

Trafford thanks for helping with all the computer hardware and equipment issues.

While at Amherst, I was fortunate to meet and make a lot of friends, who made my stay

memorable. My sincere thanks to all my friends, Sudarshan Vasudevan, Hema Raghavan,

Ramesh Nallapati, Bhuvan Urgaonkar, Nasreen Abdul-Jaleel, Sharad Jaiswal, Kausalya

v

Murthy, Swati Birla, Preyasee Kamath, Sreedhar Bunga, Rati Sharma, Sourya Ray, Koushik

Dutta, Ambarish Karmalkar, Smita Ramnarian, Satyanarayan Ray Pitambar Mohapatra,

Neil Naik, Stephanie Jo Kent, Hema Dave, Dheeresh Mamidi, Pranesh Venugopal and

many others. Special thanks to Tejal Kanitkar, Ashwin Gambhir, Ashish Deshpande and

Anoop George Ninan who helped and motivated me in several ways to complete the Ph.D.

program. I will forever cherish the wonderful memories with you all.

Lastly, I am grateful to my parents, Nanna, Aai and Pappa, my brother, Dhananjay, and

sister, Renuka, for their support, encouragement and patience. Thank you for everything.

vi

ABSTRACT


SENSOR NETWORK

FEBRUARY 2007


B.E., PUNE INSTITUTE OF COMPUTER TECHNOLOGY, PUNE, INDIA

M.S., UNIVERSITY OF MINNESOTA, DULUTH, MN

Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST

Directed by: Professor Prashant Shenoy and Professor Deepak Ganesan

Rapid technological developments in sensing devices, embedded platforms and wire-

less communication technologies, have enabled and led to a large research focus in sensor

networks. Traditional sensor networks have been designed as networks of homogeneous

sensor nodes. Single-tier networks consisting of homogeneous nodes achieve only a sub-

set of application requirements and often sacrifice others. In this thesis, I propose the

notion of multi-tier heterogeneous sensor networks, sensors organized hierarchically into

multiple tiers. With intelligent use of resources across tiers, multi-tier heterogeneous sen-

sor networks have the potential to simultaneously achieve the conflicting goals of network

lifetime, sensing reliability and functionality.

I consider a class of sensor networks—camera sensor networks—wireless networks

with image sensors. I address the issues of automatic configuration and initialization and

design of camera sensor networks.

vii

Like any sensor network, initialization of cameras is an important pre-requisite for cam-

era sensor networks applications. Since, camera sensor networks have varying degrees of

infrastructure support and resource constraints a single initialization procedure is not appro-

priate. I have proposed the notions ofaccurateandapproximateinitialization to initialize

cameras with varying capabilities and resource constraints. I have developed and empiri-

cally evaluatedSnapshot, an accurate calibration protocol tailored for sensor network de-

ployments. I have also developed approximate initialization techniques that estimate the

degree of overlapandregion of overlapestimates at each camera. Further, I demonstrate

usage of these estimates to instantiate camera sensor network applications. As compared to

manual calibration, which can take a long time (order of hours) to calibrate several cameras,

is inefficient and error prone, the automated calibration protocol is accurate and greatly re-

duces the time for accurate calibration—tens of seconds to calibrate a single camera and

can easily scale to calibrate several cameras in order of minutes. The approximate tech-

niques demonstrate feasibility of initializing low-power resource constrained cameras with

no or limited infrastructure support.

With regards to design of camera sensor networks, I present the design and imple-

mentation ofSensEye, a multi-tier heterogeneous camera sensor network and address the

issue of energy-reliability tradeoff. Multi-tier networks provide several levels of reliability

and energy usage based on the type of sensor used for application tasks. UsingSensEye

I demonstrate how multi-tier networks can achieve simultaneous system goals of energy

efficiency and reliability.

viii

TABLE OF CONTENTS

Page

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiv

CHAPTER

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Initialization of Camera Sensor Networks . . . . . . . . . . . . . . . . . . 41.3 Design of Camera Sensor Networks . . . . . . . . . . . . . . . . . . . . . 6

1.3.1 Single-Tier Sensor Networks . . . . . . . . . . . . . . . . . . . . . 61.3.2 Multi-Tier Heterogeneous Sensor Networks . . . . . . . . . . . . . 7

1.4 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.5 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2. RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

2.1 System Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Sensor Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 Multimedia Sensor Networks . . . . . . . . . . . . . . . . . . . . 122.1.3 Power management . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1.4 Heterogeneous Sensor Networks . . . . . . . . . . . . . . . . . . . 13

2.2 Initialization and Configuration . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Sensor Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.2 Sensor Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 14

ix

2.2.3 Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.1 Video Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3.2 Localization and Positioning . . . . . . . . . . . . . . . . . . . . . 162.3.3 Track Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3. ACCURATE CALIBRATION OF CAMERA SENSOR NETWORKS . . . . 18

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Snapshot: An Accurate Camera Calibration Protocol . . . . . . . . . . . . 22

3.2.1 Snapshot Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1.1 Camera Location Estimation: . . . . . . . . . . . . . . . 233.2.1.2 Estimatingθ1 throughθ6: . . . . . . . . . . . . . . . . . 263.2.1.3 Camera Orientation Estimation: . . . . . . . . . . . . . . 273.2.1.4 Eliminating False Solutions: . . . . . . . . . . . . . . . . 30

3.2.2 Determining Visual Range and Overlap . . . . . . . . . . . . . . . 313.2.3 Iterative Refinement of Estimates . . . . . . . . . . . . . . . . . . 323.2.4 Snapshot Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Snapshot Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.1 CRB-based Error Analysis . . . . . . . . . . . . . . . . . . . . . . 363.3.2 Empirical Error Analysis . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 An Object Localization and Tracking Application . . . . . . . . . . . . . . 413.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 433.5.2 Camera Location Estimation Accuracy . . . . . . . . . . . . . . . 44

3.5.2.1 Effect of Iteration on Estimation Error . . . . . . . . . . 44

3.5.3 Camera Orientation Estimation Error . . . . . . . . . . . . . . . . 453.5.4 Comparison With Lower Bound Error . . . . . . . . . . . . . . . . 463.5.5 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 473.5.6 Object Localization . . . . . . . . . . . . . . . . . . . . . . . . . . 483.5.7 Runtime Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

x

4. APPROXIMATE INITIALIZATION OF CAMERA SENSORNETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.1.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.3 Approximate Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.3.1 Determining the Degree of Overlap . . . . . . . . . . . . . . . . . 56

4.3.1.1 Estimatingk–overlap . . . . . . . . . . . . . . . . . . . 574.3.1.2 Handling skewed reference point distributions . . . . . . 584.3.1.3 Approximate Tessellation . . . . . . . . . . . . . . . . . 59

4.3.2 Determining the Region of Overlap . . . . . . . . . . . . . . . . . 60

4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.4.1 Duty-Cycling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624.4.2 Triggered Wakeup . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.5 Prototype Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 644.6 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.6.2 Degree of overlap estimation . . . . . . . . . . . . . . . . . . . . . 67

4.6.2.1 Initialization with uniform distribution of referencepoints . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.6.2.2 Initialization with skewed distribution of referencepoints . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.6.3 Region of overlap estimation . . . . . . . . . . . . . . . . . . . . . 704.6.4 Implementation Results . . . . . . . . . . . . . . . . . . . . . . . 72

4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5. ENERGY-RELIABILITY TRADEOFF IN MULTI-TIER SENSORNETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74

5.1 Background and System Model . . . . . . . . . . . . . . . . . . . . . . . . 74

5.1.1 Camera Sensor Network Tasks . . . . . . . . . . . . . . . . . . . . 745.1.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

xi

5.2 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.3 SensEye Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.3.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.3.2 Inter-Tier Wakeup . . . . . . . . . . . . . . . . . . . . . . . . . . 795.3.3 Object Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 805.3.4 Object Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.3.5 Object Localization . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.4 SensEye Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.4.1 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . 845.4.2 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . 85

5.5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.5.1 Component Benchmarks . . . . . . . . . . . . . . . . . . . . . . . 885.5.2 Comparison ofSensEyewith a Single-Tier Network . . . . . . . . 90

5.5.2.1 Energy Usage . . . . . . . . . . . . . . . . . . . . . . . 915.5.2.2 Sensing Reliability . . . . . . . . . . . . . . . . . . . . . 92

5.5.3 Tracking at Tier 1 and Tier 2 . . . . . . . . . . . . . . . . . . . . . 945.5.4 Coverage with Tier 3 Retargetable Cameras . . . . . . . . . . . . . 955.5.5 Sensitivity to System Parameters . . . . . . . . . . . . . . . . . . . 96

5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6. SUMMARY AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . 98

6.1 Automatic Accurate Calibration Protocol . . . . . . . . . . . . . . . . . . 986.2 Approximate Initialization of Camera Sensor Networks . . . . . . . . . . . 1006.3 Energy-Reliability Tradeoff in Multi-Tier Sensor Networks . . . . . . . . . 1006.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .103

xii

LIST OF TABLES

Table Page

1.1 Type of calibration suited for different types of sensors. . . . . . . . . . . . 6

2.1 Different sensor platforms and their characteristics. . . . . . . . . . . . . . 12

2.2 Different camera sensors and their characteristics. . . . . . . . . . . . . . . 12

5.1 SensEyeTier 1 (with CMUcam) latency breakup and energy usage. Totallatency is 136 ms and total energy usage is 167.24 mJ. . . . . . . . . . . 88

5.2 SensEyeTier 1 (with Cyclops) latency breakup and energy usage. . . . . . . 88

5.3 SensEyeTier 2 Latency and Energy usage breakup. The total latency is 4seconds and total energy usage is 4.71 J.† This is measured on an optimized Stargate

node with no peripherals attached.. . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.4 Number of wakeups and energy usage of a Single–tier system. Total energyusage of both Stargates when awake is 2924.9 J. Total missed detectionsare 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.5 Number of wakeups and energy usage of eachSensEyecomponent. Totalenergy usage when components are awake with CMUcam is 466.8 J andwith Cyclops is 299.6 J. Total missed detections are 8. . . . . . . . . . . 92

xiii

LIST OF FIGURES

Figure Page

1.1 A typical sensor network consisting of sensors for sampling and sink nodesfor data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 A multi-tier heterogeneous camera sensor network. . . . . . . . . . . . . . 7

3.1 Left Handed Coordinate System. . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Projection of reference points on the image plane through the lens. . . . . . 24

3.3 Geometric representation of possible camera locations. . . . . . . . . . . . 24

3.4 Vector representation of reference points and their projections. . . . . . . . 25

3.5 Relationship between object location and its projection. . . . . . . . . . . . 29

3.6 Polyhedron representing the visual range of the camera. . . . . . . . . . . . 31

3.7 Object localization using two cameras. . . . . . . . . . . . . . . . . . . . . 42

3.8 Empirical CDF of error in estimation of camera location. . . . . . . . . . . 45

3.9 Effect of number of reference points on location estimation error. . . . . . . 46

3.10 Empirical CDF of error in estimating orientations with the CMUcam. . . . 47

3.11 Comparison of empirical error with lower bounds with and withoutconsidering error due to Cricket. . . . . . . . . . . . . . . . . . . . . . 48

3.12 Sensitivity of estimation to uncertainty in reference point location. . . . . . 49

3.13 Empirical CDF of error in estimation of object’s location. . . . . . . . . . . 49

3.14 Runtime of different calibration tasks. . . . . . . . . . . . . . . . . . . . . 50

xiv

4.1 Different degrees of overlap (k-overlap) for a camera. . . . . . . . . . . . . 56

4.2 k-overlap estimation with distribution of reference points. . . . . . . . . . . 57

4.3 Region of overlap estimation using reference points and Voronoitessellation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4 Estimating reference points locations without ranging information. . . . . . 61

4.5 Region of overlap for triggered wakeup. . . . . . . . . . . . . . . . . . . . 63

4.6 Setup and software architecture of prototype implementation. . . . . . . . . 65

4.7 Evaluation of k-overlap estimation scheme with uniform distribution ofreference points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.8 Evaluation of weighted k-overlap estimation with skewed distribution ofreference points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.9 Evaluation of the weighted k-overlap estimation scheme. . . . . . . . . . . 70

4.10 Region of overlap estimation and wakeup heuristic performance. . . . . . . 71

4.11 Initialization using prototype implementation. . . . . . . . . . . . . . . . . 72

5.1 A multi-tierSensEyehardware architecture. . . . . . . . . . . . . . . . . . 76

5.2 Software architecture ofSensEye. . . . . . . . . . . . . . . . . . . . . . . . 78

5.3 3D object localization using views from two cameras. . . . . . . . . . . . . 81

5.4 Prototype of a Tier 1 Mote and CMUcam and a Tier 2 Stargate, web-camand a Mote. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.5 SensEyeSoftware Architecture. . . . . . . . . . . . . . . . . . . . . . . . . 86

5.6 Placement of Tier 1 Motes and Tier 2 Stargates inSensEye. . . . . . . . . . 90

5.7 SensEyesensing reliability and coverage. . . . . . . . . . . . . . . . . . . . 93

5.8 Tracking at Tier 1 and Tier 2 inSensEye. . . . . . . . . . . . . . . . . . . . 94

5.9 Sensitivity toSensEyesystem parameters. . . . . . . . . . . . . . . . . . . 96

xv

CHAPTER 1

INTRODUCTION

1.1 Motivation

A sensor network—a wireless network of spatially distributed sensors—senses and

monitors an environment to detect events. Several interesting sensor network applications

exist and a few examples are: (i)Surveillance and Tracking:The focus of surveillance and

tracking applications to monitor and area of interest and report events of interest. Surveil-

lance and tracking applications [17, 43, 22] can use a deployment of sensors to detect

and recognize objects of interest and coordinate to tract their movement, (ii)Disaster Re-

sponse:In emergency scenarios, where existing infrastructure has been damaged, a quick

deployment of sensors [32] provides valuable feedback for relief operations, (iii)Envi-

ronmental and Habitat Monitoring:Sensors deployed in forests or in remote locations

monitor environmental phenomenon [56, 35] of temperature, soil moisture, humidity, pres-

ence of chemicals, solar radiation etc. which in turn can be used for recording observations

and forecasting. to study their movements and environmental conditions. Sensor network

applications like landslide detection and prediction [1, 53] can be used to determine the

scope of a landslide and also to predict occurrences. Sensors have also been used in volca-

noes [64] to aid in-depth study of volcanic activity. Further, sensor nodes can also be placed

in natural habitats of animals [33, 24, 67] (iv)Seismic Structure Monitoring:A network

of seismic sensors monitor stress levels and seismic activity of buildings [66] and bridges.

Such systems are used to detect and localize damage to a structure and also quantify the

severity.

1

SinkSink

Sink

Sensors

Figure 1.1.A typical sensor network consisting of sensors for sampling and sink nodes fordata collection.

Figure 1.1 shows a typical sensor network, consisting of sensors spread within an area

of interest and sink nodes that are interested in gathering events and sampled data form

the sensors. Further, sensors most often communicate with each other and the sink nodes

using a wireless network. As is the case in several of the example applications presented

above, sensors are deployed in areas with no infrastructure like remote forests, volcanoes

or disaster areas where existing infrastructure has been destroyed or on moving objects

like animals. As a result, an important characteristic of sensor networks is non-availability

of constant power supply and most often the nodes are battery powered—making energy-

efficient operation a primary design parameter. Battery-powered sensors, to conserve their

limited energy, have limited computation and communication capabilities. Sensor network

deployments using these often resource-constrained devices has introduced a variety of

interesting research challenges. A few of the important research problems in the design of

sensor networks fall in the following categories,

• Sensor platforms:Low-power embedded platforms are required for energy-efficient

operation and also to interface with different modality sensors based on applications

needs.

2

• Programming tools: Programming paradigms and operating system support is re-

quired to add and modify functionality at sensor to drive applications.

• Deployment: Techniques are required to guide placement of sensors in an area

of interest to meet coverage requirements. Further, if nodes are randomly placed,

inferences about degree of sensing coverage need to be deduced.

• Initialization: Once sensors are deployed they need to be initialized—relative with

other sensors or absolute with some global reference. A few initialization parameters

that are estimated are: location, orientation, clock synchronization, set of neighbors.

• Networking: Sensors often need to coordinate with each other to transmit useful

data to the sink or collaborate to execute application tasks. Several networking issues

arise to meet these requirements, different types of radios and their characteristics,

transmission protocols and routing algorithms.

• Data Management:Data collected at sensors has to be processed and transmitted to

entities that are interested in it. Issues in data management are aggregation to reduce

volume, multi-resolution storage and archival and efficient dissemination.

Limited resources and lack of continuous power supply are two main constraints in the

design of solutions of research problems in the sensor network domain.

I focus on a class of sensor networks calledcamera sensor networks—wireless net-

works with camera sensors. A few examples of camera sensor networks applications are:

ad–hoc surveillance, environmental monitoring, and live virtual tours. Regardless of the

end-application camera sensors perform several common tasks such as object detection,

object recognition and tracking. The object detection task detects object appearance, recog-

nition identifies object of interest and tracking tracks object movements. A characteristic

of camera sensors that differentiates it from other modalities like temperature, acoustic, vi-

bration etc. is that they are directional. Each image sensor points in a certain direction and

for a given location can have different orientations resulting in different viewing regions.

3

As mentioned above, research issues in camera sensor networks are several—development

of energy-efficient camera sensors, deployment and configuration of nodes,task allocation

and coordination for collaborative task execution, resource allocation at each node for var-

ious tasks, communication of images and events to interested entities, efficient storage and

archival of image data. In this thesis, I address the issues of automatic initialization of

camera sensors with varying resource capabilities and design of multi-tier heterogeneous

camera sensor networks to achieve the simultaneous goals of energy efficiency and relia-

bility.

1.2 Initialization of Camera Sensor Networks

Like any sensor network, initialization of cameras is an important pre-requisite for cam-

era sensor network applications. Monitoring and surveillance applications using an ad-hoc

camera sensor network, requires initialization of cameras with location and orientation in-

formation. Camera calibration parameters of orientation and location, are essential for

localization and tracking of detected objects. While internal calibration parameters, like

focal length, scaling factor and distortion can be estimated a priori, the external parameters

have to be estimated only after the camera sensors are placed in an environment. Further,

in cases where camera sensors have no infrastructure support and are resource-constrained,

estimating exact camera parameters may not be feasible. Inspite of such limitations, cam-

eras need to initialized with information to enable applications.

Manual calibration of cameras is one possibility, but is highly error prone, inefficient

and can take a long time (in order of hours to calibrate several cameras). Several vision-

based calibration techniques have been studied, but are not well-suited for camera sensor

networks. Vision-based techniques often relay on high-fidelity images, abundant process-

ing power, calibrate a single or few cameras and depend on knowing exact locations on

reference points(landmarks)—assumptions that are most often not applicable to sensor

networks. In this thesis, I proposeSnapshotan automatic calibration protocol for cam-

4

era sensors.Snapshotleverages capabilities of position sensors for efficient and accurate

estimation of external parameters of cameras.Snapshotreduces the time required for ac-

curate calibration of camera networks from the order of hours to order of minutes. The

Cricket [38] mote sensors, which localize themselves using ultrasound beacons, are used

as a calibration device along with captured images from the camera for calibration. I aim

to analytically characterize the different errors introduced by the calibration protocol and

empirically compare them with those due toSnapshot.

However, accurate calibration techniques are not feasible for deployments of ad-hoc

low power camera sensors for the following reasons: (i)Resource constraints:Accurate

calibration of cameras is fairly compute intensive. Low-power cameras do not have the

computation capabilities to execute these complex mathematical tasks, (ii)Availability of

landmarks: In many scenarios, ad-hoc camera sensor networks are deployed in remote lo-

cations for monitoring mountainous and forest habitats or for monitoring natural disasters

such as floods or forest fires. No landmarks may be available in remote inhabited loca-

tions, and infrastructure support such as positioning technologies may be unavailable or

destroyed, making it difficult to define new landmarks.

One solution to initialize deployments of ad-hoc low-power camera sensors is it to equip

each camera sensor with a positioning device such as GPS [5] and a directional digital com-

pass [11], which enable direct determination of the node location and orientation. However,

today’s GPS technology has far too much error to be practical for calibration purposes (GPS

can localize an object to within 5-15m of its actual position). Ultrasound-based positioning

and ranging technology [42] is an alternative that provides greater accuracy. But the use

of additional positioning hardware both consumes more energy resources on battery pow-

ered nodes, and in some cases, can be prohibitive due to their cost. As a result, accurate

calibration is not always feasible for initialization of resource-constrained camera sensor

networks with limited or no infrastructure support. In this thesis, we answer the funda-

mental question of whether it is possible to initialize resource-constrained camera sensors

5

Computation Type ofSensor Capability CalibrationCyclops Limited Approximate

CMUCam + Mote Limited ApproximateWebcam + Stargate Abundant Accurate

PTZ + Stargate Abundant Accurate

Table 1.1.Type of calibration suited for different types of sensors.

without known landmarks or without extensive use of positioning technology. I propose

the notion ofapproximate initialization, which determines relative relationships between

sensor nodes and also supports application requirements. The approximate initialization

depends on very limited infrastructure support and is well-suited for low-power camera

sensors with limited computational resources.

Table 1.1 shows different types of sensor nodes, their computation capabilities and a

suitable calibration technique for each. In this thesis, I study both, accurate and approxi-

mate, camera initialization techniques to support application requirements.

1.3 Design of Camera Sensor Networks

1.3.1 Single-Tier Sensor Networks

One common technique to design sensor networks, is as single-tier homogeneous net-

works [15, 16]. Given a set of application requirements and tasks, an appropriate sensor

and embedded platform is chosen for the entire network. The choice of the hardware is

guided by the most demanding application task to be performed at each node. All sen-

sor nodes execute the same set of tasks and coordinate in a distributed manner to achieve

application requirements.

A homogeneous network has several design choices, network lifetime being an im-

portant primary constraint. For increased lifetime of the network, low power sensor nodes,

e.g., cell-phone class cameras, can be deployed. Low power consumption nodes address the

6

Figure 1.2. A multi-tier heterogeneous camera sensor network.

lifetime constraint but often have lower reliability and functionality. The low power cell-

phone class cameras yield low resolution coarse grained images—sacrificing reliability for

network lifetime. Another design choice is to optimize the network for high reliability and

functionality by using high resolution webcams. The sensors produce high resolution im-

ages resulting in better reliability, but sacrifice lifetime as each node consumes considerably

more power than cell-phone class cameras. As a result, there exists a tradeoff between the

design choices of lifetime or energy efficiency and reliability and functionality. A similar

tradeoff exists between energy efficiency and latency of detection. Sensor nodes deployed

to optimize energy efficiency result in higher latency detections, as nodes sleep for larger

duration to save energy. Nodes deployed to minimize latency of detection result in lower

energy efficiency as they are asleep for smaller durations. Energy efficiency and cost are

also similar conflicting design choices. Thus, a single choice along the axes of power, re-

liability and cost results in a sensor network that sacrifices one or more of the other key

requirements. As a result, homogeneous networks often achieve only a subset of the design

goals and sacrifice others.

1.3.2 Multi-Tier Heterogeneous Sensor Networks

In this thesis, I propose a novel multi-tier design of sensor networks consisting of het-

erogeneous sensors. A multi-tier sensor network is a hierarchical network of heterogeneous

7

sensors as shown in Figure 1.2. The network consists of sensors with different capabilities

and power requirements at each tier. Referring to Figure 1.2, Tier 1 consists of low power

cell-phone class cameras, whereas Tier 2 consists of high power webcams. Tier 1 sensor

nodes can be used when energy efficiency is a primary constraint and nodes from Tier 2

when reliability is a primary constraint. Intelligent usage of nodes at each tier has the poten-

tial to reconcile the conflicting goals of energy efficiency and reliability and overcome the

drawback of homogeneous single-tier networks. Further, intelligent node placement can

enable usage of high power nodes at higher tiers only when required with a wakeup mech-

anism resulting in energy benefits. Coverage of a region with nodes from multiple tiers

leads to availability of extra sensing and computation resources resulting in in redundancy

benefits.

Multi-tier heterogeneous sensor networks provide an interesting balance of cost, cover-

age, functionality, and reliability. For instance, the lower tier of such a system can employ

cheap, untethered elements that can provide dense coverage with low reliability. However,

reliability concerns can be mitigated by seeding such a network with a few expensive, more

reliable sensors at a higher tier to compensate for the variability in the lower tier. Similarly,

a mix of low-fidelity, low-cost sensors and high-fidelity high-cost sensor can be used to

achieve a balance between cost and functionality. Application performance can also be

improved by exploiting alternate sensing capabilities that may reduce energy requirements

without sacrificing system reliability. As a result, multi-tier sensor networks can exploit

the spectrum of available heterogeneous sensors to reconcile conflicting system goals and

overcome drawbacks of homogeneous networks.

This thesis addresses research challenges in multi-tier sensor network design and im-

plementation with focus on camera sensor networks. The rest of this chapter describes

contributions of the thesis and research issues addressed.

8

1.4 Thesis Contributions

My thesis makes the following contributions related to initialization of camera sen-

sor networks and energy-reliability tradeoff in multi-tier heterogeneous camera sensor net-

works:

• Initialization of Camera Sensors

In a multi-tier heterogeneous sensor network different tiers consist of sensor nodes

with different resource-constraints and capabilities. Like any other sensor network,

accurate and rapid initialization of these nodes is an important pre-requisite to sup-

port application requirements. Due to the varying capabilities at each node and nature

of infrastructure support, an uniform solution to initialize nodes at all tiers is not fea-

sible. In this thesis, I propose techniques foraccurateandapproximateinitialization

according to resource availability at nodes.

The contributions of this thesis related to initialization of camera sensors are as fol-

lows:

– I have developed an automated calibration protocol tailored for camera sensor

networks.

– I have shown that using the Cricket position sensors to automate the calibration

procedure does not effect accuracy.

– I have developed techniques to initialize camera sensor networks with no or

limited infrastructure support.

– I have demonstrated techniques to exploit the approximate initialization infor-

mation to enable applications. The effective error at the application level was

found to be acceptable using camera sensors initialized using the approximation

techniques.

9

– The proposed accurate and approximate initialization methods demonstrate fea-

sibility of initializing low-power low-fidelity camera sensors quickly and effi-

ciently.

• Energy-Reliability Tradeoff in Multi-Tier Sensor Networks

In this thesis, I argue that multi-tier heterogeneous sensor networks can achieve si-

multaneous system goals which are seldom possible in single-tier homogeneous net-

works. I study the energy-reliability tradeoff and demonstrate that a multi-tier net-

work can simultaneously achieve these conflicting goals.

– I have designed and implemented a multi-tier camera sensor network and demon-

strated its benefits over a single-tier homogeneous network.

– Using the tasks of object detection, recognition and tracking I have quantified

the energy usage and latency benchmarks across different tiers.

– I studied and quantified the energy-reliability tradeoff of a multi-tier camera

network and found that a multi-tier network can obtain comparable reliability

with substantial energy savings.

1.5 Structure of the thesis

The rest of the thesis is structured as follows: A review of related work is presented in

Chapter 2. The issue of automatic accurate camera calibration is described in Chapter 3.

Chapter 4 describes approximate initialization of low-power camera sensors with no or

limited infrastructure support. Chapter 5 describesSensEye, a multi-tier camera sensor

network developed to study the tradeoffs of energy usage and object detection accuracy.

predictions for energy usage optimizations in multi–tier networks. Chapter 6 summarizes

contributions of this thesis.

10

CHAPTER 2

RELATED WORK

This thesis draws upon numerous research efforts in camera sensors. There has been

work in the broad topics of system-level issues, initialization and configuration of sensors

and design of various applications using camera sensor networks. This chapter gives an

overview of these related research efforts and places the contributions of this thesis.

2.1 System Issues

2.1.1 Sensor Platforms

Recent technological developments have led to the emergence of a variety sensors,

networked embedded platforms and communication technologies. Networked platforms

vary from embedded PCs, PDA-class Intel Stargates [55] to Crossbow Telos nodes [40] and

Motes [37] (see Table 2.1). Commonly used communication technologies vary from infra-

red (IR), Bluetooth, to RF-based standards like 802.11 and 802.15.4. Several modalities of

sensors interface with the above embedded platforms to sense and monitor different kind

of phenomenon. A few different sensing modalities are: acoustic, vision, temperature,

humidity, vibration, light etc. Considering camera sensors, the available sensors range

from high-end pan-tilt-zoom (PTZ) cameras, Webcams [31] to low-fidelity cellphone class

cameras like Cyclops [45] and CMUcams [9] (see Table 2.2). The available choices span

the spectrum of form factor, cost, reliability, functionality and power consumption. These

developments in turn have resulted in a major research focus in the field of sensor networks

and its applications.

11

Platform Type Resources

Mica Mote Atmega128 84mW, 4KB RAM,(6 MHz) 512 KB Flash

Telos TI MSP430 40mW, 10 KB RAM,(8 MHz) 48 KB Flash

Yale XYZ OKI ArmThumb 7-160mW, 32K RAM,(2-57 MHz) 2MB external

Stargate XScale PXA255 170-400 mW, 32MB RAM,(100MHz–400MHz) Flash and CF card slots

Table 2.1.Different sensor platforms and their characteristics.

Camera Power Cost Features

Cyclops 33mW $300 128x128, fixed-angle, 10fpsCMUCam 200mW $200 176x255, fixed-angle, 26fpsWeb-Cam 600mW $75 640x480, auto-focus, 30 fps

PTZ Camera 1W $1000 1024x768, retargetable pan-tilt-zoom, 30 fps

Table 2.2.Different camera sensors and their characteristics.

2.1.2 Multimedia Sensor Networks

Several studies have focused on single-tier camera sensor networks. Panoptes [62]

is an example of a video sensor node built using a Intel StrongARM PDA platform with a

Logitech Webcam as the vision sensor. The node uses the 802.11 wireless interface and can

be used to setup a video–based monitoring network. Panoptes is an instance of a single-tier

sensor network and is not a multi-tier network likeSensEye. A higher tier node ofSensEye

is similar to Panoptes, with additional support for network wake ups and optimized wakeup-

from-suspend energy saving capability. Panoptes also incorporates compression, filtering

and buffering and adaptation mechanisms for the video stream and can be used by higher

tier nodes ofSensEye. Other types of multimedia sensors, like audio sensors [61], have

also been used for calibration and localization applications.

12

2.1.3 Power management

Power management schemes, like wake–on-wireless [14] and Triage [6], are techniques

to efficiently use the limited battery power and thus extend lifetime of sensor platforms. The

wake–on–wireless solution uses a incoming call to wakeup the PDA and reduces power

consumption by shutting down the PDA when not in use. Triage is a software architecture

for tiered micro-servers, which contains more than one subsystem with different capabili-

ties and power requirements. The architecture uses an approach called Hierarchical Power

Management [54], which through intelligent software control reduces the amount of time a

higher power tier must remain on by executing tasks whenever possible at lower tiers. The

SensEyehigher tier nodes are optimized using both the above solutions.

2.1.4 Heterogeneous Sensor Networks

There exist several sensor network applications comprising of heterogeneous and hy-

brid sensor nodes. “Do-Not-Disturb” [20] is a heterogeneous sensor network that uses

acoustic and motion sensors for low-level sampling and resource efficient Stayton nodes

(equipped with Intel XScale processors). The low-power sensors transmit noise-level read-

ings to resource rich nodes for correlation and fusion tasks, and to identify and send alert

messages appropriate nodes. The Cane-toad monitoring application [24] also uses a pro-

totype consisting of heterogeneous sensors—low power Mica2 nodes and resource rich

Stargate nodes. The low-power nodes are used for high frequency acoustic sampling and

the Stargates for compute intensive machine learning tasks and calculation of Fast Fourier

transforms. While both these applications use the higher-tier resource-rich nodes for com-

putation and communication services,SensEyealso uses the higher tier nodes to sense

and increase reliability. Tenet [19] is describes a generic architecture and the associated

research challenges of tiered sensor network. The tiered architecture consists of resource-

rich masternodes which are responsible for data fusion and application logic tasks. Master

nodes can further task lower level mote nodes to perform basic sampling tasks. The archi-

13

tecture and research challenges discussed overlap with the motivation forSensEyeand is

similar to our work [28].

2.2 Initialization and Configuration

2.2.1 Sensor Placement

An important criteria of sensor networks is placement and coverage. Single tier place-

ment of cameras is studied in [60]. The paper solves the problem of efficient placement

of cameras given an area to be covered to meet task–specific constraints. This method

provides solutions for the single–tier placement problem and is useful to place each tier of

SensEyeindependently. Some of these techniques apply to placement of nodes inSensEye

but need to be extended for multi–tier settings.

2.2.2 Sensor Initialization

Parameters commonly estimated as part of the initialization procedure of sensor net-

works are: location, orientation, set of neighbors and route setup. Localization of sensor

nodes in ad-hoc networks for localizing events and geographic routing is discussed in [52].

The technique depends on the presence of a few beacon nodes and localizes nodes using

distributed iterative algorithms. In [46], the authors develop techniques to estimate virtual

coordinates for nodes with very limited or no information regarding location information.

The virtual coordinates assigned to nodes are used for geographical routing purposes. I

have borrowed ideas from [36] and [51] which derive lower bounds for location and ori-

entation estimates of sensor nodes. Both studies identifies the source of errors and use

Cramer Rao Boundanalysis based on Euclidean distance and angle of arrival measurement

to derive error bounds. In this thesis, I apply a similar analysis to consider localization

based on the relation between a set of reference points and their projection locations.

14

2.2.3 Camera Calibration

Camera calibration using a set of known reference points is well studied in the computer

vision community. Methods developed in [58, 59, 68] are examples of techniques that esti-

mate both the intrinsic and extrinsic parameters of a camera using a set of known reference

points. The goal of these efforts is to estimate a complete set of about twelve parameters

of the camera. As a result, the methods require a larger number of reference points, are

compute-intensive, and require multiple stages to determine all parameters.Snapshotis de-

signed to estimate only the extrinsic parameters and requires only four known reference

locations to estimate a camera’s parameters. A recent effort [63] has proposed techniques

to estimate only the extrinsic parameters and also requires four reference points. The tech-

nique requires three out of the four reference locations to be collinear.Snapshotis similar

to some these calibration techniques proposed by the vision community, but differs in the

used of the Cricket position sensors to automate the protocol. Further, our empirical eval-

uation shows that the use of Cricket introduces very small error.

2.3 Applications

2.3.1 Video Surveillance

A distributed video surveillance sensor network is described in [17]. The video sensor

network is used to solve the problem of attention to events in presence of limited computa-

tion, bandwidth and several event occurrences. The system implements processing at cam-

eras to filter out uninteresting and redundant events and tracks abnormal movements. An

example of a single-tier video surveillance and monitoring system is VASM [43]. The main

objective of the system is to use multiple, cooperative video sensors for continuous tracking

and coverage. The system develops sophisticated techniques for target detection, classifi-

cation and tracking and also a central control unit to arbitrate sensors to tracking tasks. A

framework for single-tier multi-camera surveillance is presented in [30]. The emphasis

of the study is efficient tracking using multi-source spatio-temporal data fusion, hierarchi-

15

cal description and representation of events and learning-based classification. The system

uses a hierarchical master-slave configuration, where each slave camera station tracks lo-

cal movements and relays information to the master for fusion and global representation.

While our general aim is to build similar systems, we focus on systems, networking and

performance issues in a multi-tier network using video surveillance as an application. The

vision algorithms and cooperation techniques of the above systems can extend capabilities

of SensEye.

2.3.2 Localization and Positioning

Localization is well studied in the sensor networks community [21, 52, 65]. All these

techniques assume a sensor node cable of position estimation. For example, a temperature

sensor can use its RF wireless communication link to send and receive beacons for location

estimation. Snapshot does not require any position estimation capability on the nodes and

directly uses the imaging capability of the cameras for localization and calibration.

Several positioning and self-localization systems have been proposed in the literature.

Active Badge [2] is a locationing system based in IR signals, where badges emit IR signals

are used for location estimation. A similar successor system based on ultrasound signals

is the Active Bat [3] system. Several other systems use RF signal strength measurements,

like RADAR [4], for triangulation based localization. While most of these techniques are

used indoors, GPS [5] is used for outdoor localization. While any of these methods can be

used by the Snapshot calibration device instead of the Cricket, each has its own advantages

and disadvantages. Based on the environment and desired error characteristics a suitable

positioning system can be chosen.

2.3.3 Track Prediction

Bayesian techniques like Kalman filter [26] and its variants [18, 50] have extensively

been used for track and trajectory prediction in several applications. [39] and [29] have used

Kalman filters to model user mobility and predict trajectory in cellular and ATM networks

16

for advance resource reservation, advance route establishment and for efficient seamless

handoff across bases stations. [34] used switching Kalman filters to track meteorological

features over time to determine future radar sensing decisions. While these applications use

Kalman filters for track prediction and optimizing performance of single-tiers networks, as

part of SensEyewe aim to use Bayesian prediction techniques to improve performance

of multi-tier networks. Techniques are required to account for prediction uncertainty and

balance the tradeoff of energy usage and reliability across multiple tiers.

17

CHAPTER 3

ACCURATE CALIBRATION OF CAMERA SENSOR NETWORKS

3.1 Introduction

Typical applications of camera sensor networks include active monitoring of remote

environments and surveillance tasks such as object detection, recognition, and tracking.

Video surveillance and monitoring involves interaction and coordination between multiple

cameras, for instance, to hand-off tracking responsibilities for a moving object from one

camera to another. If a camera has an estimate of the tracked object’s location and knows

the set of cameras that view objects in that region, it can handoff tracking responsibilities

to the appropriate next camera. Object localization is one technique that is used to estimate

an object’s location and combine it information regarding location and orientation of other

cameras for effective handoff. The procedure used to calculate the camera’s parameters1:

location, orientation, focal length, skew factor, distortion is known as camera calibration.

Calibration of camera sensors is a necessary pre-requisite for interactions and coordination

among cameras. Once a camera is calibrated, its viewing range can be used to estimate the

overlap and spatial relationships with other calibrated cameras in the network. Redundant

coverage of a region by overlapping cameras can be exploited for increasing lifetime of

the network. Cameras can be intelligently duty-cycled, with a minimal subset of cameras

guaranteeing complete coverage, while others are in power-save mode.

Automated camera calibration is well studied in the computer vision community [58,

59, 63, 68]. Many of these techniques are based on the classical Tsai method—they require

1In our work we focus only on external camera parameters and assume internal parameters to be knownor estimated a priori.

18

a user to specify reference points on a grid whose true locations are known in the physi-

cal world and use the projection of these points on the camera image plane to determine

camera parameters. However, such vision-based calibration techniques may not be directly

applicable to camera sensor networks for the following reasons. First, the vision-based

systems tend to use high-resolution cameras as well as high-end workstations for image

and video processing; consequently, calibration techniques can leverage the availability of

high-resolution images and abundance of processing power. Neither assumption is true

in sensor networks. Such networks may employ low-power, low-fidelity cameras such as

the CMUcam [48] or Cyclops [45] that have coarse-grain imaging capabilities; at best, a

mix of low-end and a few high-end cameras can be assumed for such environments. Fur-

ther, the cameras may be connected to nodes such as the Crossbow Motes [37] or Intel

Stargates [55] that have one or two orders of magnitude less computational resources than

PC-class workstations. Calibration techniques for camera sensor networks need to work

well with low-resolution cameras and should be computationally efficient.

Second, vision-based calibration techniques have been designed to work with a single

camera or a small group of cameras. In contrast, a camera sensor network may comprise

tens or hundreds of cameras and calibration techniques will need to scale to these larger

environments. Further, camera sensor networks are designed for ad-hoc deployment, for

instance, in environments with disasters such as fires or floods. Since quick deployment

is crucial in such environments, it is essential to keep the time required for calibrating the

system to a minimum. Thus, calibration techniques need to be scalable and designed for

quick deployment.

Third, vision-based camera calibration techniques are designed to determine both in-

trinsic parameters (e.g., focal length, lens distortion, principal point) and extrinsic param-

eters (e.g., location and orientation) of a camera. Due to the large number of unknowns,

the calibration process typically involves many tens of measurements of reference points

and is computationally intensive. In contrast, calibrating a camera sensor network involves

19

only determining external parameters such as camera location and orientation, and may be

amenable to simpler, more efficient techniques that are better suited to resource-constrained

sensor platforms.

Automated localization techniques are a well-studied problem in the sensor community

and a slew of techniques have been proposed. Localization techniques employ beacons

(e.g., IR [2], ultrasound [3], RF [4]) and use sophisticated triangulation techniques to de-

termine the location of a node. Most of these technique have been designed for general-

purpose sensor networks, rather than camera sensor networks in particular. Nevertheless,

they can be employed during calibration, since determining the node location is one of the

tasks performed during calibration. However, localization techniques are by themselves

not sufficient for calibration. Cameras aredirectionalsensors and camera calibration also

involves determining other parameters such as the orientation of the camera (where a cam-

era is pointing) as well as its range (what it can see). In addition, calibration is also used

to determine overlap between neighboring cameras. Consequently, calibration is a harder

problem than pure localization.

The design of an automated calibration technique that is cost-effective and yet scalable,

efficient, and quickly deployable is the subject matter of this paper.

3.1.1 Research Contributions

In this paper, we proposeSnapshota novel wireless protocol for calibrating camera

sensor networks.Snapshotadvances prior work in vision-based calibration and sensor lo-

calization in important ways. Unlike vision-based techniques that require tens of reference

points for calibration and impose restrictions on the placement of these points in space,

Snapshotrequires only four reference points to calibrate each camera sensor and allows

these points to be randomly chosen without restrictions. Both properties are crucial for

sensor networks, since fewer reference points and fewer restrictions enable faster calibra-

tion and reduce the computational overhead for subsequent processing. Further, unlike

20

sensor localization techniques that depend on wireless beacons,Snapshotdoes not require

any specialized positioning equipment on the sensor nodes. Instead, it leverages the inher-

ent picture-taking abilities of the cameras and the on-board processing on the sensor nodes

to calibrate each node. Our results showSnapshotyields accuracies that are comparable

those obtained by using positioning devices such as ultrasound-based Cricket on each node.

Our techniques can be instantiated into a simple, quick and easy-to-use wireless cali-

bration protocol—a wireless calibration device is used to define reference points for each

camera sensor, which then uses principles from geometry, optics and elementary machine

vision to calibrate itself. When more than four reference points are available, a sensor can

use median filter and maximum likelihood estimation techniques to improve the accuracy

of its estimates.

We have implementedSnapshoton a testbed of CMUcam sensors connected to wireless

Stargate nodes. We have conducted a detailed experimental evaluation ofSnapshotusing

our prototype implementation. Our experiments yield the following key results:

1. Feasibility: By comparing the calibration accuracies of low and high-resolution cam-

eras, we show that it is feasible to calibrate low-resolution cameras such as CMU-

cams without a significant loss in accuracy.

2. Accuracy: We show thatSnapshotcan localize a camera to within few centimeters

of its actual location and determine its orientation with a median error of 1.3–2.5

degrees. More importantly, our experiments indicate that this level of accuracy is

sufficient for tasks such as object tracking. We show that a system calibrated with

Snapshotcan localize an external object to within 11 centimeters of its actual loca-

tion, which is adequate for most tracking scenarios.

3. Efficiency: We show that theSnapshotalgorithm can be implemented on Stargate

nodes and have running times in the order of a few seconds.

21

4. Scalability: We show thatSnapshotcan calibrate a camera sensor in about 20 sec-

onds on current hardware; Since a human needs to only specify a few reference

points using the wireless calibration device—a process that takes a few seconds per

sensor—Snapshotcan scale to networks containing tens of camera sensors.

3.2 Snapshot: An Accurate Camera Calibration Protocol

Snapshotis an automatic calibration protocol for camera sensor networks. It relies on

infrastructure support for the localization of reference points using ultra-sound beacons.

Cricket position sensors, which can localize themselves using ultrasound beacons are used

a calibration device.Snapshotestimates the camera location coordinates and orientation

using the acquired images, principles of optics and solving a non-linear optimization prob-

lem. These capabilities available on a high-end sensor node and hence an example of

accurate calibration.

As part of the study ofSnapshotwe have empirically studied the accuracy or location

and orientation estimates for two different types of cameras, the CMUCam and the Sony

VAIO webcam. As part of this study, I am developing techniques to analytically character-

ize the error in the calibration procedure. I plan to compare the empirical results with the

results from the analysis.

3.2.1 Snapshot Design

The basicSnapshotprotocol involves taking pictures of a small randomly-placed cal-

ibration device. To calibrate each camera sensor, at least four pictures of the device are

necessary, and no three positions of the device must lie along a straight line. Each position

of the calibration device serves as a reference point; the coordinates of each reference point

are assumed to be known and can be automatically determined by equipping the calibra-

tion device with a locationing sensor (e.g., GPS or ultra-sound Cricket receiver). Next, we

describe howSnapshotuses the pictures and coordinates of the calibration device to esti-

22

Figure 3.1. Left Handed Coordinate System.

mate camera parameters. We also discuss how the estimates can be refined when additional

reference points are available.

3.2.1.1 Camera Location Estimation:

We begin with the intuition behind approach. Without loss of generality, we assume all

coordinate systems are left handed (see Figure 3.1), and the z-axis of the camera coordinate

system is co-linear with the camera’s optical axis. Consider a camera sensorC whose

coordinates need to be determined. Suppose that four reference pointsR1, R2, R3 andR4

are given along with their coordinates for determining the camera location. No assumption

is made about the placement of these points in the three dimensional space, except that

these points be in visual range of the camera and that no three of them lie along a straight

line. Consider the first two reference pointsR1 andR2 as shown in Figure 3.2. Suppose

that point objects placed atR1 andR2 project an image ofP1 andP2, respectively, in the

camera’s image plane as shown in Figure 3.2. Further, letθ1 be the angle incident by the

the reference points on the camera. Sinceθ1 is also the angle incident byP1 andP2 on

the camera lens, we assume that it can be computed using elementary optics (as discussed

later). Givenθ1, R1 andR2, the problem of finding the camera location reduces to finding

a point in space whereR1 andR2 impose an angle ofθ1. With only two reference points,

there are infinitely many points whereR1 andR2 impose an angle ofθ1. To see why,

consider Figure 3.3(a) that depicts the problem in two dimensions. GivenR1 andR2, the

set of possible camera locations lies on the arcR1CR2 of a circle such thatR1R2 is a

23

(x , y , z )2 22

(x , y , z )1 11

��

��

��

��

��

��

��

��

��

��

1

θ

Lens2

focal length f

θ11

P

PC

Camera center at(x, y, z)

Imageplane

R 1

R 2

Figure 3.2. Projection of reference points on the image plane through the lens.

θ1θ1

��

��

��

location estmiatesset of possible camera

1

C

RR 2

1θ

R1 R2axis ofrotation

3D surface representingpossible camera locations

θ

(a) Arc depicting (b) Football-like surfacepossible solutions of possible solutionsin two dimensions. in three dimensions.

Figure 3.3. Geometric representation of possible camera locations.

chord of the circle andθ1 is the angle incident by this chord on the circle. From elementary

geometry, it is known that a chord of a circle inscribes a constant angle on any point on

the corresponding arc. Since we have chosen the circle such that chordR1R2 inscribes an

angle ofθ1 on it, the camera can lie on any point on the arcR1CR2. This intuition can be

generalized to three dimensions by rotating the arcR1CR2 in space with the chordR1R2 as

the axis (see Figure 3.3(b)). Doing so yields a three dimensional surface of possible camera

locations. The nature of the surface depends on the value ofθ1: the surface is shaped like

a football whenθ1 > 90◦, is a sphere whenθ1 = 90◦, and a double crown whenθ1 < 90◦.

The camera can lie on any point of this surface.

Next, consider the third reference pointR3. Considering pointsR1 andR3, we obtain

another surface that consists of all possible locations such thatR1R3 impose a known angle

θ2 on all points of this surface. Since the camera must lie on both surfaces, it follows that the

24

v2(x , y , z )2 22

(x , y , z )1 11

u2

u1

��

��

��

��

��

��

��

1

Lens

θC

v1

Camera center at (x, y, z)

(a)

R 2

R 1

θ

Lens2P

focal length f

1

2

1PC (0, 0, 0)

Imageplane

(b)

1

(−px ,−py , −f)

(−px ,−py , −f)1

2

Figure 3.4. Vector representation of reference points and their projections.

set of possible locations is given by the intersection of these two surfaces. The intersection

of two surfaces is a closed curve and the set of possible camera locations is reduced to any

point on this curve.

Finally, if we consider the pair of reference pointsR2 andR3, we obtain a third surface

of all possible camera locations. The intersection of the first surface and the third yields

a second curve of possible camera locations. The camera lies on the intersection of these

two curves, and the curves can intersect in multiple points. The number of possible camera

locations can be reduced further to at most4 by introducing the fourth reference pointR4.

Although4 reference points give us up to4 possible camera locations, we observe that, in

reality, only one of these locations can generate the same projections asR1, R2, R3, and

R4 on the image plane. Using elementary optics, it is easy to eliminate the false solutions

and determine the true and unique location of the camera.

With this intuition, the details of our technique are as follows, consider a camera

C placed at coordinates(x, y, z), and four reference pointsR1, ..., R4 with coordinates

(x1, y1, z1) . . . (x4, y4, z4). The line joining the camera C with each of these reference point

defines a vector. For instance, as shown in Figure 3.4(a), the line joiningC andR1 defines

a vector−−→CR1, denoted by~v1. The components ofv1 are given by

~v1 =−−→CR1 = {x1 − x, y1 − y, z1 − z}

25

Similarly, the vector joining pointsC andRi, denoted by~vi, is given as

~vi =−−→CRi = {xi − x, yi − y, zi − z} 1 ≤ i ≤ 4

As shown in Figure 3.4(a), letθ1 denote the angle between vectors~v1 and ~v2. The dot

product of vectors~v1 and~v2 is given as

~v1 · ~v2 = |~v1||~v2| cos θ1 (3.1)

By definition of the dot product,

~v1 · ~v2 = (x1 − x)(x2 − x) + (y1 − y)(y2 − y) + (z1 − z)(z2 − z) (3.2)

The magnitude of vector~v1 is given as

|~v1| =√

(x1 − x)2 + (y1 − y)2 + (z1 − z)2

The magnitude of~v2 is defined similarly. Substituting these values into Equation 3.2,weget

cos(θ1) =(x1 − x)(x2 − x) + (y1 − y)(y2 − y) + (z1 − z)(z2 − z)

|~v1| · |~v2|(3.3)

Let θ2, throughθ6 denote the angles between vectors~v1 and ~v3, ~v1 and ~v4, ~v2 and ~v3, ~v2

and~v4 and~v3 and~v4 respectively. Similar expressions can be derived forθ2, θ3, . . . θ6.

The anglesθ1 throughθ6 can be computed using elementary optics and vision, as dis-

cussed next. Given these angles and the coordinates of the four reference points, the above

expressions yield six quadratic equations with three unknowns:x,y, andz. A non-linear

solver can be used to numerically solve for these unknowns.

3.2.1.2 Estimatingθ1 through θ6:

We now present a technique to compute the angle between any two vectors~vi and ~vj.

Consider any two reference pointsR1 andR2 as shown in Figure 3.4 (a). Figure 3.4 (b)

26

shows the projection of these points through the camera lens onto the image plane. The

image plane in a digital camera consists of a CMOS sensor that takes a picture of the

camera view. LetP1 andP2 denote the projections of the reference points on the image

plane as shown in the Figure 3.4(b), and letf denote the focal length of the lens. For

simplicity, we define all points with respect to the camera’s coordinate system: the center

of the lens is assumed to be the origin in this coordinate system. Since the image plane

is at a distancef from the lens, all points on the image plane are at a distancef from

the origin. By taking a picture of the reference points, the coordinates ofP1 andP2 can

be determined. These are simply the pixel coordinates where the reference points project

their image on the CMOS sensor; these pixels can be located in the image using a simple

vision-based object recognition technique.2 Let the resulting coordinates ofP1 andP2 be

(−px1,−f,−pz1) and(−px2,−f,−pz2) respectively. We define vectors~u1 and~u2 as lines

joining the camera (i.e., the origin C) to the pointsP1 andP2. Then, the angleθ1 between

the two vectors~u1 and ~u2 can be determined by taking the dot product of them.

cos(θ1) =~u1 · ~u2

| ~u1|| ~u2|

The inverse cosine transform yieldsθ1, which is also the angle incident by the original

reference points on the camera.

Using the above technique to estimateθ1–θ6, we can then solve our six quadratic equa-

tions using a non-linear optimization algorithm [10] to estimate the camera location.

3.2.1.3 Camera Orientation Estimation:

We now describe the technique employed bySnapshotto determine the camera’s ori-

entation along the three axes. We assume that the camera location has already been es-

timated using the technique in the previous section. Given the camera location(x, y, z),

2In Snapshotthe calibration device contains a colored LED and the vision-based recognizer must locatethis LED in the corresponding image.

27

our technique uses three reference points to determine the pan, tilt, and roll of the camera.

Intuitively, given the camera location, we need to align the camera in space so that the three

reference points project an image at the same location as the pictures takes by the camera.

Put another way, consider a ray of light emanating from each reference point. The camera

needs to be aligned so that each ray of light pierces the image plane at the same pixel where

the image of that reference point is located. One reference point is sufficient to determine

the pan and tilt of the camera using this technique and three reference point are sufficient

to uniquely determine all three parameters: pan, tilt and roll. Our technique uses the actual

coordinates of three reference points and the pixel coordinates of their corresponding im-

ages to determine the unknown rotation matrixR that represents the pan, tilt and roll of the

camera.

Assume that the camera is positioned at coordinates(x, y, z) and that the camera has a

a pan ofα degrees, a tilt ofβ degrees, a roll ofγ degrees The pan, tilt and roll rotations can

be represented as matrices, and can be used to calculate locations of points in the camera’s

coordinate space. The composite matrix for the pan, tilt and roll rotations of the camera

that results in its orientation is given by

R =

cos(γ) 0 sin(γ)0 1 0

− sin(γ) 0 cos(γ)

× 1 0 0

0 cos(β) sin(β)0 − sin(β) cos(β)

× cos(α) − sin(α) 0

sin(α) cos(α) 00 0 0

=

r11 r12 r13

r21 r22 r23

r31 r32 r33

(3.4)

If an object is located at(xi, yi, zi) in the world coordinates, the object’s location in the

camera coordinates(x′i, y

′i, z

′i) can be computed via Equation 3.5. x

′i

y′i

z′i

= R×

xi − xyi − yzi − z

(3.5)

where the composite rotation matrixR is given by Equation 3.4.

28

D pD i

ii’ i(x’ ,y’ ,z’ )

��

��

��C

R

Imageplane

Front

plane

i

Pi

Lens

Image

(0,0,0)

focal length ffocal length f

i

(−px ,−f,−pz )i i

(px ,f,pz )i

Figure 3.5. Relationship between object location and its projection.

Intuitively, we can construct and solve a set of linear equations (see Equation 3.6) where

(x1, y1, z1), (x2, y2, z2), and (x3, y3, z3) are the world coordinates of3 reference points,

and(x′1, y

′1, z

′1), (x

′2, y

′2, z

′2), and(x

′3, y

′3, z

′3) are the corresponding camera coordinates to

estimateR, and then estimateα, β, andγ from R.

It is easy to see that as these three reference points are not co-linear, the matrix

x1 − x y1 − y z1 − z

x2 − x y2 − y z2 − z

x3 − x y3 − y z3 − z

is a non-singular matrix, and hence, the three sets of linear equations in Equation 3.6 have

unique solution forRT . x1 − x y1 − y z1 − zx2 − x y2 − y z2 − zx3 − x y3 − y z3 − z

×RT =

x′1 y

′1 z

′1

x′2 y

′2 z

′2

x′3 y

′3 z

′3

(3.6)

As shown in Figure 3.5, an object’s location in the camera coordinates and the projec-

tion of the object on the image plane have the following relation: x′i

y′i

z′i

=Di

Dp×

pxi

fpzi

(3.7)

29

where:

Di =√

(xi − x)2 + (yi − y)2 + (zi − z)2 and

Dp =√

px2i + f 2 + pz2

i

Di andDp represent the magnitude of the object to camera center vector and the projection

on image plane to camera center vector respectively.Therefore, we can compute the location of an object in the camera coordinate system

using Equation 3.7, given the camera location and focal length, and the object location andits projection. The actual location of each reference point and its location in the cameracoordinates can then be used in Equation 3.6 to determine the rotation matrixR. GivenR,we we can obtain panα, tilt β, and rollγ using Equation 3.4 as follows:

α =

arctan( r21

r22)− 180◦ if r21

cos(β) < 0 and r22cos(β) < 0

arctan( r21r22

) + 180◦ if r21cos(β) >= 0 and r22

cos(β) < 0arctan( r21

r22) otherwise

β = arcsin(r23) (3.8)

γ =

arctan( r13

r33)− 180◦ if r13

cos(β) < 0 and r33cos(β) < 0

arctan( r13r33

) + 180◦ if r13cos(β) >= 0 and r33

cos(β) < 0arctan( r13

r33) otherwise

3.2.1.4 Eliminating False Solutions:

Recall from Section 3.2.1.1 that our six quadratic equations yields up to four possible

solutions for the camera location. Only one of these solution is the true camera location. To

eliminate false solutions, we compute the pan, tilt and roll for each computed location using

three reference points. The fourth reference point is then used to eliminate false solutions as

follows: for each computed location and orientation, we project the fourth reference point

onto the camera’s image plane. The projected coordinates are then matched to the actual

pixel coordinates of the reference point in the image. The projected coordinates will match

the pixel coordinates only for the true camera location. Thus, the three false solutions can

be eliminated by picking the solution with the smallest re-projection error. The chosen

solution is always guaranteed to be the correct camera location.

30

C

LensCamera

ImageSensor visual range

P

Q

S

Rmaximum viewable distancefocal length

Polyhedron

A

Figure 3.6. Polyhedron representing the visual range of the camera.

3.2.2 Determining Visual Range and Overlap

Once the location and orientation of each camera have been determined, the next task is

to determine the visual range of each camera and the overlap of viewable regions between

neighboring cameras. The overlap between cameras is an indication of the redundancy in

sensor coverage in the environment. Overlapping cameras can also be used to localize and

track moving objects in the environment.

The visual range of a camera can be approximated as a polyhedron as shown in Fig-

ure 3.6. The apex of the polyhedron is the location of the cameraC (also the lens center)

and height of the pyramid is the maximum viewable distance of the camera. An object in

the volume of the polyhedron is in the visual range of the camera.

Although a camera can view infinitely distant objects, such objects will appear as point

objects in any picture taken by the camera and are not useful for tasks such as object

detection and recognition. Thus, it is necessary to artificially restrict the viewable range

of the camera; the maximum viewable distance is determined in an application-specific

manner and depends on the sizes of the objects being monitored (the larger the object, the

greater is the maximum viewable distance of each camera). Assuming that this distance is

determined offline,Snapshotcan then precisely determine the polyhedron that encompasses

the viewable range of the camera (assuming no obstacles such as walls are present to cut

off this polyhedron).

31

Assume that the camera location(x, y, z) is given. We also assume that the size of the

camera CMOS sensor is known (specifications for digital cameras typically specify the size

of the internal CMOS sensor). Since the CMOS sensor is placed at a focal length distance

from the lens, the coordinates of the four corners of the sensor can be determined relative

to the camera location(x, y, z). As shown in Figure 3.6, the polyhedron is fully defined

by specifying vectors−→CP ,

−→CQ,

−→CR and

−→CS which constitute its four edges. Further,

−→CP = d

f·−→AC, whereAC is the line segment joining the edge of the CMOS sensor to

the center of the lens, andd is the maximum viewable distance of the camera. Since the

coordinates of pointsA andC are known, the vector−→AC is known, and

−→CP can then be

determined. The four edges of the polyhedron can be determined in this fashion.

To determine if two cameras overlap, we need to determine if their corresponding poly-

hedrons intersect (the intersection indicates the region in space viewable from both cam-

eras). To determine if two polyhedrons intersect, we consider each surface of the first

polyhedron and determine if one of the edges on the other polyhedron intersects this sur-

face. For instance, does the line segmentCP intersect any of the four surfaces of the other

polyhedron? If any edge intersects a surface of the other polyhedron, then the two cameras

have overlapping viewable regions. The intersection of a line segment with a plane can be

easily represented in vector algebra using vector cross and dot products [25] and we omit

specific details due to space constraints.

3.2.3 Iterative Refinement of Estimates

While Snapshotrequires only four reference points to calibrate a camera sensor, the

estimates of the camera location and orientation can be improved if additional reference

points are available. Suppose thatn reference points,n ≥ 4, are available for a particular

sensor node. Then(

n4

)unique subsets of four reference points can be constructed from these

n points. For each subset of four points, we can compute the location and orientation of

the camera using the techniques outlined in the previous sections. This yields(

n4

)different

32

estimates of the camera location and orientation. These estimates can be refined to obtain

the final solution using one of three methods:Least Square Method: This technique picks one solution from the

(n4

)solutions that

most accurately reflects the camera location and orientation. To do so, the technique useseach computed camera location and orientation to re-project all reference points on thecamera image plane and chooses the solution that yields the minimum error between theprojected coordinates and the actual coordinates in the image. The solution that yields theminimum error is one that minimizes the following expression

n∑i=1

|| fy

′i

×

x′i

y′i

z′i

− Pi||2. (3.9)

where

x

′i

y′i

z′i

is the location of reference pointi in camera coordinates according to Equa-

tion 3.5, andPi =

pxi

f

pzi

is the real projection of reference pointi.

Median Filter Method: This method simply takes the median of each estimated pa-

rameter, namelyx, y, z, panα, tilt β, and rollγ. These median values are then chosen as

the final estimates of each parameter. Note that while the least squares method picks one

of the(

n4

)initial solutions as the final solution, the median filter method can yield a final

solution that is different from all(

n4

)initial solutions (since the median of each parameter

is computed independently, the final solution need not correspond to any of the initial so-

lutions). The median filter method is simple and cost-effective, and it performs well when

n is large.

Maximum Likelihood Estimation: The MLE method [13] uses the initial estimates

as its initial guess and searches through the state space to choose a solution that minimizes

an error term. We choose the same error function as the least squares method: the search

should yield a solution that yields the least error when projecting the reference points on

the camera image plane.

33

Minimizing Equation 3.9 by searching through the parameter state space is a non-linear

minimization problem, that can be solved numerically using the Levenberg-Marquardt al-

gorithm. The algorithm requires an initial guess ofR and (x, y, z): our estimates from

Snapshotcan be used as this initial guess. Note that, MLE is computationally more ex-

pensive that the median filter method or the least squares method. While its advantage

diminishes whenn is large, it can yield better accuracy whenn is small.

Choosing between these methods involves a speed versus accuracy tradeoff. In general,

the first two methods are more suitable if calibration speed is more important. The MLE

method should be chosen when calibration accuracy is more important or ifn is small.

3.2.4 Snapshot Protocol

This section, describes how the estimation techniques presented in the previous section

can be instantiated into a simple wireless protocol for automatically calibrating each camera

sensor. The protocol assumes that each sensor node has a wireless interface that enables

wireless communication to and from the camera. The calibration process involves the use of

a wireless calibration device which is a piece of hardware that performs the following tasks.

First, the device is used to define the reference points during calibration—the location of

the device defines a reference point, whose coordinates are automatically determined by

equipping the device with a positioning sensor (e.g., ultrasound-based Cricket). Second, the

device also also serves as a point object for pictures taken by the camera sensors. To ensure

that the device can be automatically detected in an image by vision processing algorithms,

we equip the device with a bright LED sensor (which then serves as the point object in

an image). Third, the devices serves as a “wireless remote” for taking pictures during the

calibration phase. The devices is equipped with a switch that triggers a broadcast packet

on the wireless channel. The packet contains the coordinates of the device at that instant

and includes a image capture command that triggers a snapshot at all camera sensors in its

wireless range.

34

Given such a device, the protocol works as follows. A human assists the calibration

process by walking around with the calibration device. The protocol involves holding the

device at randomly location points and initiating the trigger. The trigger broadcast a packet

to all cameras in the range with a command to take a picture (if the sensor node is asleep,

the trigger first wakes up a node using a wakeup-on-wireless algorithm). The broadcast

packet also includes the coordinates of the current position of the device. Each camera then

processes the picture to determine if the LED of the calibration device is visible to it. If so,

the pixel coordinates of the device and the transmitted coordinates of the reference point

are recorded. Otherwise the camera simply waits for the next trigger. When at least four

reference points become available, the sensor node then processes this data to determine

the location, orientation and range of the camera. These parameters are then broadcast so

that neighboring cameras can subsequently use them for determining the amount of overlap

between cameras. Once a camera calibrates itself, a visual cue is provided by turning on an

LED on the sensor node so that the human assistant can move on to other sensors.

3.3 Snapshot Error Analysis

Referring to the description of the Snapshot protocol 3.2.4 the two main sources that

contribute towards errors in estimation of the camera’s location are:

• Projection point location: The calibration protocol use the projection point on

the camera image plane to calculate the camera parameters. Errors introduced in

the location of the projection point, due to lens distortion, skew and object detection

algorithm contribute contribute towards the error in the calibrated parameters.

• Reference point location: The locations of the reference points are used to calibrate

the camera, which in turn are calculated based on range estimates to fixed beacons.

The error in the ultrasound based range estimation introduces an error in the location

35

of the reference point, which in turn influences the error for the calibrated camera

parameters.

In this section, we present two techniques to analyze the effect of the above errors on

the calibration parameters.

3.3.1 CRB-based Error Analysis

We first derive a lower bound on the camera location estimation error using theCramer

Rao Bound(CRB) [27, 36, 51]. The CRB gives a lower bound on the expected covariance

for an unbiased estimator. We use the CRB to derive a lower bound on the Euclidean

distance error between the exact camera location and the estimated camera location. Let

C = (xc, yc, zc) be the location of a camera andC′= (xc

′, yc

′, zc

′) its estimate. The error

covariance matrix is defined as:

V = E{(C ′ − C)(C′ − C)

T} (3.10)

The lower bound on the error covariance is theCramer Rao Boundand is calculated as,

CRB = [J(C)]−1 (3.11)

where, the matrixJ(C), the Fisher information matrix. Fisher information is the amount of

information an observed/measured random variable posses about an unobservable param-

eter upon which the probability distribution of the measured variable depends. The Fisher

information matrix is given by,

J(C) = E{[ ∂

∂Cln fX(x; C)][

∂

∂Cln fX(x; C)]T} (3.12)

where,x = (u, v) is the measured location of the reference point projection on the image

plane.

36

Consider a reference point at location(xi, yi, zi) and camera atC = (xc, yc, zc). The

coordinated of the projection point on the image plane(u, v) are given by:

u =xi

′ − xc′

zi′ − zc

′ × f, v =yi

′ − yc′

zi′ − zc

′ × f,

where,f is the focal length of the camera and,

[xi′ yi

′ zi′]

T= R× [xi yi zi]

T

[xc′ yc

′ zc′]

T= R× [xc yc zc]

T

whereR is the composite rotation matrix. For this analysis, we assumeR to be the identity

matrix, as we are interested only error relative to each camera and not in terms of absolute

error which depends on the camera’s orientation in the reference coordinate system.

For the purpose of this analysis, we assume the error in measuring the coordinates

X = (u, v) on the image plane is Gaussian, and the probability density function is given

by,

fU(u; R,C) = N (u(C, Rp), σ2) (3.13)

fU(v; R,C) = N (v(C, Rp), σ2) (3.14)

where,u(C, Rp) andv(C, Rp) are the true coordinates on the image plane,σ2 the variance

andRp the reference point. As stated above, the projection coordinates of a reference point

depend on the reference points location and the camera location, denoted byu(C, Rp)

andv(C, Rp). Let D denote the array of parameters the projections depend. Since theu

coordinate depends on thex and thez intercepts,Du = [xc zc xr zr] and thev coordinate

on they andz intercepts,Dv = [yc zc yr zr], where,(xr, yr, zr) represent reference point

locations and(xc, yc, zc) the camera location.

37

Based on equations 3.12 and 3.14, the information matrix for theu coordinate on the

image plane can be represented as,

Ju(C, Rp) = [Gu(C, Rp)]T Σ−1[Gu(C, Rp)] (3.15)

[Gu(C, Rp)]ij =∂u(C, Ri)

∂Dju

(3.16)

where,i is index of theith reference point,j is thejth dependent variable andΣ the co-

variance matrix corresponding to the projection error. The entries ofGu(C, Rp) for each

reference pointi are:

Gu(C, Rp)i =

[∂u(C, Ri)

∂xc

∂u(C, Ri)

∂zc

∂u(C, Ri)

∂xr

∂u(C, Ri)

∂zr

](3.17)

The Fisher information matrixJu(C, Rp) can be computed from equation 3.12 using the

matrixGu(C, Rp). We can similarly estimateJv(C, Rp), the Fisher information matrix for

thev projection coordinate usingGv(C, Rp).

Next, we consider the error introduced due to uncertainty in location of the reference

point. In this case too, the error in the location of the reference point is assumed to be

Gaussian and the probability distribution is given by,

f0(Rp) = N (µ0, Σ0) (3.18)

where,Rp is a vector of reference point locations,µ0 the exact locations of the reference

points andΣ0 a diagonal matrix of associated uncertainties. For the Gaussian probability

distribution of prior information, the information matrix is given by,

J0 =

0 0

0 Σ−10

(3.19)

38

Under the assumption that the information provided by the measurements are independent

of the a priori information, the total fisher information matrix is the sum of the two. From

equations 3.16 and 3.19, the total information is given by,

J tu = Ju(C, Rp) + J0 (3.20)

J tv = Jv(C, Rp) + J0 (3.21)

The total fisher information matrices can be used to determine the CRB for the coordi-

nates of the camera location as,

Cu ≥ J tu−1

(3.22)

Cv ≥ J tv−1

(3.23)

The lower bound on the expected localization error is calculated using the expected

variances of the camera’s coordinates calculated from equations in 3.23.

Err =√

var(xc) + var(yc) + var(zc) (3.24)

var(xc) = Cu(1, 1) (3.25)

var(yc) = Cv(1, 1) (3.26)

var(zc) =1

2× [Cu(2, 2) + Cv(2, 2)] (3.27)

While the above technique calculates the lower bound of the error in location estima-

tion of a camera, a similar bound can be derived for the error in orientation estimation.

Techniques to calculate this bound needs further investigation and is not explored in this

thesis. Further, theCramer Rao Bound-based analysis assumes location errors of the ref-

erence objects and the projection errors are independent. Techniques can be developed to

analytically study the error characteristics of estimating calibration parameters when these

errors are assumed to be correlated.

39

3.3.2 Empirical Error Analysis

As described in Section 3.2.4, locations of reference points are estimated using a po-

sitioning system (ultrasound based Cricket locationing system) which are further used for

calibration. The estimated locations of reference points have uncertainties due to errors in

ultrasound based range estimates. The average location error using Cricket (measured in

terms of Euclidean distance) is empirically estimated to be 3-5 cm. The error in reference

point locations impacts the calculated calibration parameters and we study the sensitivity

of calibrated parameters to these errors. Consider four reference points with true loca-

tions(x1, y1, z1), (x2, y2, z2), (x3, y3, z3) and(x4, y4, z4) which estimate the location of the

camera as(xc, yc, zc) and orientation angles asα, β andγ. Further, we assume that the

error in each dimension of the reference point location is specified by a normal distribution

N (0, σ2), with zero mean and varianceσ2. Givenn reference points, an error component

is added to each reference point(xi, yi, zi) as follows,

x′i = xi + ex; (3.28)

y′i = yi + ey; (3.29)

z′i = zi + ez; (3.30)

where,ex, ey, ez are randomly sampled from a normal distributionN . The(

n4

)updated

reference point subsets are then used to compute the camera location(x′c, y

′c, z

′c) and orien-

tation parametersα′, β′, γ′. The relative error in calibration as result of the error in reference

point locations is measured as,

locerr =√

(x′c − xc)2 + (y′c − yc)2 + (z′c − zc)2 (3.31)

panerr = ||α′ − α|| (3.32)

tilterr = ||β′ − β|| (3.33)

rollerr = ||γ′ − γ|| (3.34)

40

where,locerr is the relative location error, measured as the Euclidean distance between the

estimated camera locations andpanerr, tilterr androllerr are the relative orientation errors

of pan, tilt and roll angles respectively.

The sensitivity of the calibration parameters is estimated by measuring the relative lo-

cation and orientation errors for different (increasing) variances of the error distribution.

We test sensitivity forrandom error—errors in each dimension of every reference point are

randomly sampled andcorrelated error—errors for each dimension are sampled randomly

but are same for all reference points. We present the experimental results of the sensitivity

analysis in Section 3.5.

3.4 An Object Localization and Tracking Application

In general, the accuracy desired from the calibration phase depends on the application

that will subsequently use this calibrated sensor network. To determine how calibration

errors impact application accuracy, we consider a simple object localization and tracking

example. This scenario assumes that the calibrated sensor network is used to detect external

objects and track them as they move through the environment. Tracking is performed by

continuously computing the coordinates of the moving object. A camera sensor network

can employ triangulation techniques to determine the location of an object—if an object

is simultaneously visible from at least two cameras, and if the locations and orientations

of these cameras are known, then the location of the object can be calculated by taking

pictures of the object and using its pixel coordinates to compute its actual location.

To see how this is done, consider Figure 3.7 that depicts an objectO that simultaneously

visible in camerasC1 andC2. Since both cameras are looking at the same object, the lines

connecting the center of the cameras to the object, should intersect at the objectO. Since

the locations of each camera is known, a triangleC1OC2 can be constructed as shown in

the figure. LetD1 andD2 denote the distance between the object and the two cameras,

respectively, and letD12 denote the distance between the two cameras. Note thatD12 can

41

D2D1

C1

C2

Imageplane

Imageplane

φ

θ θ2 1

12D

Object O

Camera

Camera

vv

1

2

Figure 3.7. Object localization using two cameras.

be computed as the Euclidean distance between the coordinatesC1 andC2, while D1 and

D2 are unknown quantities. Letθ1, θ2 andφ denote the internal angles of the triangle as

shown in the figure. Then the Sine theorem for a triangle from elementary trigonometry

states thatD1

sin(θ1)=

D2

sin(θ2)=

D12

sin(φ)(3.35)

The anglesθ1 andθ2 can be computed by taking pictures of the object and using its

pixel coordinates as follows. Suppose that the object projects an image at pixel coordinates

(−px1,−pz1) at cameraC1, Let f1 denote the focal length of cameraC1. Then projection

vector ~v1 = (px1, f, pz1) is the vector joining the pixel coordinates to the center of the

lens and this vector lies along the direction of the object from the camera center. If~v is

the vector along the direction of line connected the two cameras, the the angleθ1 can be

calculated using the vector dot product:

~v.~v1 = |~v| × |~v1| × cos(θ1) (3.36)

The angleθ2 can be computed similarly and the angleφ is next determined as(180−θ1−θ2).

Givenθ1, θ2 andφ and the distance between two camerasD12, the values ofD1 andD2

can be computed using the Sine theorem as stated above.

42

Given the distance of the object from the cameras (as given byD1 andD2) and the

direction along which the object lies (as defined by the projection vectors~v1 and ~v2), the

object location can be easily computed. Note that the orientation matrices of the cameras

must also be accounted for when determining the world coordinates of the object using each

camera. In practice, due to calibration errors, the object location as estimated by the two

cameras are not identical. We calculate the mid–point of the two estimates as the location

of the object.

Thus, two overlapping cameras can coordinate with one another to triangulate the lo-

cation of an external object. We will use this object localization application in our exper-

imental evaluation to quantify the impact of calibration errors on the application tracking

error.

3.5 Experimental Evaluation

In this section, we evaluate the efficacy of Snapshot, quantify the impact of using

Cricket, and the evaluate the impact of Snapshot on our object tracking application.

3.5.1 Experimental Setup

The setup to evaluate the accuracy and sensitivity to system parameters ofSnapshot

consisted of placing the two types of cameras, CMUcam and the Sony MotionEye webcam,

at several locations. To simplify accurate location measurements we marked a grid to place

the position sensor objects. Each camera took several pictures to estimate the parameters.

The difference between the estimated parameter value and the actual value is reported as

the measurement error. The Cricket sensors on the objects received beacons from a set of

pre–calibrated Cricket sensor nodes placed on the ceiling of a room. The digital compass

was attached to the two cameras in order to measure the exact orientation angles.

43

3.5.2 Camera Location Estimation Accuracy

To evaluateSnapshot’s performance with camera location estimation, we place tens

of reference points in the space, and take pictures of these reference points at different

locations and orientations. We measure the location of these reference points by hand

(referred as without Cricket) which can be considered as the object’s real location and by

Cricket [42] (referred as with Cricket) where we observed a 2–5cm error.

For each picture, we take all the combinations of any four reference points in view (not

any 3 points in the same line), and estimate camera’s location accordingly. We consider

the distance between the estimated camera’s location and the real camera’s location as the

location estimation error.

As shown in Figure 3.8(a), our results show: (i) the median errors using webcam with-

out Cricket and with Cricket are4.93cm and9.05cm, respectively; (ii) the lower quartile

and higher quartile errors without Cricket are3.14cm and7.13cm; (iii) the lower quartile

and higher quartile errors with Cricket are6.33cm and12.79cm; (iv) the median filter (re-

ferred as M.F.) improves the median error to3.16cm and7.68cm without Cricket and with

Cricket, respectively.

Figure 3.8(b) shows: (i) median errors using CMUcam without Cricket and with Cricket

are6.98cm and12.01cm, respectively; (ii) the lower quartile and higher quartile errors

without Cricket are5.03cm and10.38cm; (iii) the lower quartile and higher quartile errors

with Cricket are8.76cm and15.97cm; (iv) the median filter improves the median error to

5.21cm and10.58cm without Cricket and with Cricket, respectively.

3.5.2.1 Effect of Iteration on Estimation Error

As our protocol proceeds, the number of available reference points increases. As a re-

sult, the number of combinations of any four reference points also increases, and we have

more location estimations available for the median filter. Consequently, we can eliminate

tails and outliers better. In this section, we study the effect of the iterations of our proto-

44

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Pro

babi

lity

Error (cm)

webcam,no Cricketwebcam,no Cricket(M.F.)

webcam+Cricketwebcam+Cricket(M.F.)

(a) Webcam

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Pro

babi

lity

Error (cm)

CMUcam,no CricketCMUcam,no Cricket(M.F.)

CMUcam+CricketCMUcam+Cricket(M.F.)

(b) CMUcam

Figure 3.8. Empirical CDF of error in estimation of camera location.

col’s runs on camera location estimation error by plotting the median versus the number of

available reference points.

Figure 3.9 shows: (i) the median errors using webcam drop from4.93cm to 2.13cm and

from 9.05cm to 6.25cm as the number of reference points varies from4 to 16 for without

and with Cricket, respectively; (ii) the median errors using CMUcam drop from6.98cm to

2.07cm and from12.01cm to 9.59cm as the number of reference points varies from4 to 16

for without and with Cricket, respectively. The difference in the location estimation errors

(with and without Cricket) are due to the position error estimates in Cricket and also due to

errors in values of camera intrinsic parameters.

3.5.3 Camera Orientation Estimation Error

Next, we evaluateSnapshot’s accuracy with estimation of camera orientation parame-

ters. We used the two cameras, the CMUcam and the Sony MotionEye webcam, to capture

images of reference points at different locations and different orientations of the camera.

We used estimated location of the camera based on exact locations on reference points and

Cricket–reported locations of reference points to estimate the orientation parameters of the

camera. The orientation of the camera was computed using the estimated camera location.

45

0

2

4

6

8

10

12

14

16

2 4 6 8 10 12 14 16M

edia

n E

rror

(cm

)Number of Reference Points

CMUcam,no CricketCMUcam+Cricket

webcam,no Cricketwebcam+Cricket

Figure 3.9. Effect of number of reference points on location estimation error.

We compared the estimated orientation angles with the measured angles to calculate error.

Figure 3.10(a) shows the CDF of the error estimates of the pan, tilt and roll orientations

respectively using the CMUcam camera. Figure 3.10(b) show the CDF of the error of the

three orientations using Cricket for location estimation. The cumulative error plots follow

the same trends for each of the orientation angles. The median roll orientation error using

Cricket and without Cricket for camera location estimations is 1.2 degrees. In both cases,

the 95th percentile error is less than 5 degrees for the pan and tilt orientation and less than

3 degrees for the roll orientation. The slight discrepancies in the error measurement of the

two cases is due to the use the digital compass to measure the orientation of the camera.

Thus, we conclude the Cricket’s positioning errors do not add significant errors in esti-

mation of camera orientation parameters. In our experiments, we find that a median loca-

tion estimation error of 11cm does not affect the orientation estimation significantly.

3.5.4 Comparison With Lower Bound Error

Next, we compare the empirical error usingSnapshotto calibrate cameras with the

expected error lower bounds obtained usingCramer Rao Boundanalysis. As discussed

in Section 3.3, the error in estimation of camera parameters is effected by the projection

error and the error in the location of the reference point. Figure 3.11 reports results with a

46

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 1 2 3 4 5 6 7 8

Pro

babi

lity

Error (degrees)

pantilt

roll

(a) CDF without Cricket

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0 1 2 3 4 5 6 7 8

Pro

babi

lity

Error (degrees)

pantilt

roll

(b) CDF with Cricket

Figure 3.10.Empirical CDF of error in estimating orientations with the CMUcam.

variance of 3 pixels in the projection error and 8cm error in each dimension of the reference

point location. Figure 3.11(a) reports the error considering only error in projection, both the

empirical and the lower bound on the error decrease. Comparing the lower bound on error

with the empirical error, the difference with fewer reference points in 4-5 cm and decreases

to 2-3 cm with greater than 10 reference points. Figure 3.11(b) reports the comparison

between the empirical error and the lower bound when both the projection error and error

in reference point location are considered. As can be seen, the lower bound on error is

greater than only when the effect projection error is considered for not the CMUCam and

the Webcam. Further, the error in reference point location dominates the calibration error

and remains almost constant even with increase in number of reference points. The trend

in the empirical error is similar to the lower bound in the error and differs from the lower

bound by 3cm with the Webcam 5-6cm with the CMUcam.

3.5.5 Sensitivity Analysis

As described in Section 3.3.2, we evaluate the sensitivity of calibrated parameters to

uncertainty in reference point locations. We varied the standard deviation of the error dis-

tribution in each dimension from1cm to 8cm and numerically computed its impact on the

47

4 6 8 10 12 14 160

1

2

3

4

5

6

7

8

9

Number of reference points

Dist

ance

err

or (c

m)

CMUCam (352x288)Webcam (640x480)CRB(352x288)CRB(640x480)

4 6 8 10 12 14 160

2

4

6

8

10

12

Number of reference points

Dist

ance

err

or (c

m)

CMUCam (352x288)Webcam (640x480)CRB(352x288)CRB(640x480)

(a) Without Cricket Error (b) With Cricket Error

Figure 3.11.Comparison of empirical error with lower bounds with and without consider-ing error due to Cricket.

calibration parameters. As shown in Figure 3.12(a), the estimated locations are less sensi-

tive to the correlated error, but are highly sensitive to the random error. Further, the results

in Figure 3.12(b) shows that: (i) orientation estimation is insensitive to the correlated er-

ror, the mean error is always very close to zero; and (ii) the orientation estimation is very

sensitive to the random error, the mean error increases by a factor of four as the standard

deviation increases from1cm to 8cm. The calibrated parameters are less sensitive to corre-

lated errors as all reference points have the same error magnitudes and the camera location

shifts in the direction of the error without affecting the estimated orientation. With random

errors in each dimension of the reference points, all reference points shift to different direc-

tions by different offsets, and as a result, calibration errors are larger. However, the error

in a real Cricket system is neither correlated nor random, it is somewhere between these

two cases, and has intermediate sensitivity. The previous experimental results verify this

hypothesis.

3.5.6 Object Localization

In this section, we study the performance of object localization usingSnapshot. We

useSnapshotto estimate camera locations and their orientations, and then in turn use the

48

0

10

20

30

40

50

60

70

80

90

1 2 3 4 5 6 7 8

Mea

n E

rror

(cm

)

Standard Deviation (cm)

webcam(random)webcam(correlated)cmucam(random)cmucam(correlated)

(a) Location

0

2

4

6

8

1 2 3 4 5 6 7 8

Mea

n E

rror

(Deg

ree)

Standard Deviation (cm)

pan(random)tilt(random)roll(random)pan(correlated)tilt(correlated)roll(correlated)

(b) Orientation (CMUcam)

Figure 3.12.Sensitivity of estimation to uncertainty in reference point location.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40 45 50

Pro

babi

lity

Error (cm)

webcam,no Cricketwebcam+Cricket

CMUcam,no CricketCMUcam+Cricket

Figure 3.13.Empirical CDF of error in estimation of object’s location.

calibrated parameters to triangulate an object via the technique described in Section 3.4.

Similar to Section 3.5.2, we use the empirical CDF of object’s location estimation error to

measure the performance. Our results (see Figure 3.13) show that: (i) the median local-

ization error using webcams is4.94cm and5.45cm without and with Cricket, respectively;

(ii) the median localization error using CMUcams is11.10cm and11.73cm without and

with Cricket, respectively; (iii) localization without Cricket outperforms localization us-

ing Cricket for all cameras; and (iv) localization using webcams outperforms that with the

CMUcams due to its higher fidelity.

49

Task Duration(ms)Snap Image 178± 2

Recognize Object Location 52± 0.1Location Estimation 18365± 18

Figure 3.14.Runtime of different calibration tasks.

3.5.7 Runtime Scalability

Using our prototype implementation of we measure the runtime of theSnapshotpro-

tocol. Figure 3.14 reports runtime of different tasks of theSnapshotcalibration protocol

executing on the Intel Stargate platform with the camera attached to a USB connector (the

transfer of an image on the serial cable with the CMUcam requires additional time). As

seen from the table, the location estimation task which uses a non–linear solver, has the

highest execution time. The time to calibrate an individual camera is, 4× (178 ms + 52

ms) – time to snap four images and recognize the location of object in each and 18365 ms

for the location and orientation estimation, which is total time of 19.285 seconds. Thus,

with a time of approximately 20 seconds to calibrate a single camera,Snapshotcan easily

calibrate tens of cameras on the scale of a few minutes.

3.6 Conclusions

In this chapter, we presentedSnapshot, an automated calibration protocol that is ex-

plicitly designed and optimized for sensor networks. Our techniques draw upon principles

from vision, optics and geometry and are designed to work with low-fidelity, low-power

camera sensors that are typical in sensor networks. Our experiments showed thatSnapshot

yields an error of 1-2.5 degrees when determining the camera orientation and 5-10cm when

determining the camera location. We argued that this is a tolerable error in practice since a

Snapshot-calibrated sensor network can track moving objects to within 11cm of their actual

locations. Finally, our measurements showed thatSnapshotcan calibrate a camera sensor

50

within 20 seconds, enabling it to calibrate a sensor network containing tens of cameras

within minutes.

51

CHAPTER 4

APPROXIMATE INITIALIZATION OF CAMERA SENSORNETWORKS

4.1 Introduction

Wireless sensor networks have received considerable research attention over the past

decade, and rapid advances in technology have led to a spectrum of choices of image sen-

sors, embedded platforms, and communication capabilities. Consequently, camera sensor

networks—a class of wireless sensor networks consisting of low-power imaging sensors

[44, 49]—have become popular for applications such as environmental monitoring and

surveillance.

Regardless of the end-application, camera sensor networks perform several common

tasks such as object detection, recognition, and tracking. While object detection involves

determining when a new object appears in range of the camera sensors, recognition in-

volves determining the type of the object, and tracking involves using multiple camera

sensors to continuously monitor the object as it moves through the environment. To effec-

tively perform these tasks, the camera sensor network needs to becalibratedat setup time.

Calibration involves determining the location and orientation of each camera sensor. The

location of a camera is its position (3D coordinates) in a reference coordinate system, while

orientation is the direction in which the camera points. By determining these parameters

for all sensors, it is possible to determine the viewable range of each camera and what por-

tion of the environment is covered by one or more cameras. The relationship with other

nearby cameras, in particular, the overlap in the viewable ranges of neighboring cameras

can be determined. This information can be used by applications to determine which cam-

era should be used to sense an object at a certain location, how to triangulate the position

52

of an object using overlapping cameras, and how to handoff tracking responsibilities from

one camera to another as the object moves.

Calibration of camera sensors is well-studied in the computer vision community and a

number of techniques to accurately estimate the location and orientation of cameras have

been proposed [23, 59, 68]. These techniques assume that coordinates of few landmarks

are known a priori and use the projection of these landmarks on the camera’s image plane,

in conjunction with principles of optics, to determine a camera’s coordinates and orienta-

tion.1 In certain cases locations of landmarks are themselves determined using range esti-

mates from known locations; for instance, a positioning technology such as Cricket can be

used to determine the coordinates of landmarks from known beacon locations. However,

these techniques are not feasible for deployments of ad-hoc low power camera sensors

for the following reasons: (i)Resource constraints:Vision-based techniques for accu-

rate calibration of cameras are compute intensive. Low-power cameras do not have the

computation capabilities to execute these complex mathematical tasks. Further, images of

low-power cameras are often of low fidelity and not well suited for high precision calibra-

tion, (ii) Availability of landmarks: In many scenarios, ad-hoc camera sensor networks

are deployed in remote locations for monitoring mountainous and forest habitats or for

monitoring natural disasters such as floods or forest fires. No landmarks may be available

in remote inhabited locations, and infrastructure support such as positioning technologies

may be unavailable or destroyed, making it difficult to define new landmarks.

One solution that eliminates the need to use landmarks is it to equip each camera sen-

sor with a positioning device such as GPS [5] and a directional digital compass [11], which

enable direct determination of the node location and orientation. However, today’s GPS

technology has far too much error to be practical for calibration purposes (GPS can lo-

calize an object to within 5-15m of its actual position). Ultrasound-based positioning and

1Vision-based calibration techniques can also determine a camera’s internal parameters such as the camerafocal length and lens distortion, in addition to external parameters such as location and orientation.

53

ranging technology [42] is an alternative which provides greater accuracy. But use of ad-

ditional hardware with low-power cameras both consumes more energy and in some cases,

can be prohibitive due to its cost. As a result, accurate calibration is not always feasible

for initialization of resource-constrained camera sensor networks with limited or no infras-

tructure support.

Due to these constraints, in this thesis we ask a fundamental question:is it possible

to initialize camera sensors without the use of known landmarks or without using any po-

sitioning technology?In scenarios where accurate camera calibration may not always be

feasible, determining relative relationships between nearby sensor nodes may be the only

available option. This raises the following questions:

• How can we determine relative locations and orientations of camera sensors without

use of known landmarks or positioning infrastructure?

• What kind of accuracy can these approximate initialization techniques provide?

• What is the performance of applications based on approximate initialization?

4.1.1 Research Contributions

To address the above challenges, in this thesis, we propose novelapproximate initializa-

tion techniques for camera sensors. Our techniques rely only on the inherent picture-taking

ability of cameras and judicious use of on-board computational resources to initialize each

camera relative to other cameras in the system. No infrastructure support for beaconing,

range estimation or triangulation is assumed. Our initialization techniques are computa-

tionally lightweight and easily instantiable in environments with little or no infrastructure

support and are well suited for resource-constrained camera sensors.

Our techniques rely on two key parameters—thedegree of overlapof a camera with

other cameras, and theregion of overlapfor each camera. We present approximate tech-

niques to estimate these parameters by taking pictures of a randomly placed reference ob-

ject. To quantify the accuracy of our methods, we implement two techniques—duty-cycling

54

and triggered wakeup—that exploit this initialization information to effectively perform

these tasks.

We have implemented our initialization techniques on a testbed of Cyclops [44] cameras

and Intel Crossbow Motes [37] and have conducted a detailed evaluation using the testbed

and simulations. Our experiments yield the following results:

• Our approximate initialization techniques can estimate bothk-overlap and region of

overlap to within 10% of the actual values.

• The approximation techniques can handle and correct for skews in the distribution of

reference point locations.

• The application-level accuracy using our techniques is 95-100% for determining the

duty-cycle parameter and 80% for a triggered wakeup application.

4.2 Problem Formulation

We consider a wireless network of camera sensors that are deployed in an ad-hoc fash-

ion with no a priori planning. Each sensor node is assumed to consist of a low-power

imaging sensor such as the Cyclops [44] or the CMUCam [49] connected to an embedded

sensor platform such as the Crossbow Mote [37] or the Telos node [41]. No positioning

hardware is assumed to be present on the nodes or in the environment. Given such an ad-

hoc deployment of camera sensors, our goal is to determine the following parameters for

each node:

• Degree of overlap, which is the fraction of the viewable range that overlaps with

other nearby cameras; specifically we are interested in thek-overlap, which is the

fraction of the viewable region that overlaps with exactlyk other cameras.

• Region of overlap, which is the spatial volume within the viewable region that over-

laps with another camera. While the degree of overlap indicates the extent of the

55

Camera 1

Camera 2

Camera 3

k1

k1

k1k1

k1k3k2

k2

Figure 4.1. Different degrees of overlap (k-overlap) for a camera.

viewable region that overlaps with another camera, it does not indicatewhichportion

of the viewable range is covered by another camera. The region of overlap captures

this spatial overlap and is defined as the 3D intersection of the viewable regions of

any pair of cameras.

Our goal is to estimate these parameters using the inherent picture-taking capability

of cameras. We assume the presence of a reference object that can be placed at random

locations in the environment; while the coordinates of the reference object areunknown,

the sensors can take pictures to determine if the object can be viewed simultaneously by

two or more cameras from a particular location. Our goal is to design techniques that use

this information to determine the degree and region of overlap for the various nodes. The

physical dimensions of the reference object as well as the focal lengthf of each camera is

assumed to be known a priori.

4.3 Approximate Initialization

In this section, we describe approximate techniques to determine thedegree of overlap

andregion of overlapfor camera sensors.

4.3.1 Determining the Degree of Overlap

As indicated earlier, degree of overlap is defined by thek-overlap, which is the fraction

of the viewing area simultaneously covered byexactlyk cameras. Thus, 1-overlap is the

56

Camera 1

Camera 2

Camera 3

Field−of−view

Camera 1

Camera 2

Camera 3

Field−of−view

Camera 1

(a) Uniform distribution (b) Skewed Distribution (c) Weighted Approximation

Figure 4.2.k-overlap estimation with distribution of reference points.

fraction of a camera’s viewable region that does not overlap with any other sensor; 2-

overlap is the fraction of region viewable to itself and one other camera, and so on. This is

illustrated in Figure 4.1 wherek1 denotes the region covered by a single camera,k2 andk3

denote the regions covered by two and three cameras, respectively. It follows that the union

of thek-overlap regions of a camera is exactly the total viewable range of that camera (i.e.,

the sum of thek-overlap fractions is 1). Our goal is to determine thek-overlap for each

camera,k = 1 . . . n, wheren is the total number of sensors in the system.

4.3.1.1 Estimatingk–overlap

Our approximate technique employs random sampling of the three-dimensional space

to determine thek-overlap for each camera sensor. This is done by placing an easily identi-

fiable reference object at randomly chosen locations and by having the camera sensors take

pictures of the object. Let each object location be denoted as a reference point (with un-

known coordinates). Each camera then processes its pictures to determine which reference

points are visible to it. By determining the subset of the reference points that are visible to

multiple cameras, we can estimate thek-overlap fractions for various sensors. Suppose that

ri reference points from the total set are visible to camerai. From theseri reference points,

let rki denote the reference points that are simultaneously visible to exactlyk cameras. As-

suming an uniform distribution of reference points in the environments, thek-overlap for

camerai is given by

Oki =

rki

ri

(4.1)

57

Depending on the density of reference points, error in the estimate ofOKi can be controlled.

The procedure is illustrated in Figure 4.2(a), where there are 16 reference points visible to

camera 1, of which 8 are visible only to itself, 4 are visible to cameras 1 and 3 and another

4 to cameras 1, 2, and 3. This yields a1-overlap of 0.5, 2-overlap and 3-overlap of 0.25 for

camera 1.k-overlaps for other cameras can be similarly determined.

4.3.1.2 Handling skewed reference point distributions

The k-overlapestimation technique presented above assumes uniform distribution of

reference points in the environment. In reality, due to the ad-hoc nature of the deployment

and the need to calibrate the system online in the field, the placement of reference objects

at randomly chosen locations will not be uniform. The resulting error due to a non-uniform

distribution is illustrated in Figure 4.2(b), where our technique estimates the 1-, 2- and

3-overlap for camera 1 as23, 2

9, 1

9as opposed to the true values of1

2, 1

4and1

4respectively.

Thus, we need to enhance our technique to handle skews in the reference point distribution

and correct for it.

The basic idea behind our enhancement is to assign a weight to each reference point,

where the weight denotes the volume that it represents. Specifically, points in dense pop-

ulated region are given smaller weights and those in sparely populated regions are given

higher weights. Sincea higher weight can compensate for the scarcity of reference points

in sparely populated region, we can correct for skewed distributions of reference points.

Our enhancement is based on the computational geometry technique calledVoronoi tessel-

lation [7]. In two dimensions, a Voronoi tessellation of a set of points is the partitioning of

the plane into convex polygons such that all polygons contain a single generating point and

all points within a polygon are closest to the corresponding generating point. Figure 4.2(c)

shows a skewed distribution of reference points in the 2D viewing area of a camera and the

corresponding Voronoi tessellation. Each reference point in the camera is contained within

a cell, with all points in a cell closest to the corresponding reference point. Given a skewed

58

distribution of reference points, it follows that densely situated points will be contained

within smaller polygons, and sparsely situated points in larger polygons. Since the size

of each polygon is related to the density of the points in the neighborhood, it can be used

as an approximation of the area represented by each point. Voronoi tessellations can be

extended to points in three dimensions, with each point contained with a 3D cell instead of

a polygon.

Using Voronoi tessellation, each reference point is assigned a weight that is approxi-

mately equal to volume of the cell that it lies in. Thek-overlap is then computed as

Oki =

wki

wi

(4.2)

wherewki is the cumulative weight of all reference points that are simultaneously visible

to exactlyk cameras andwi is the total weight of all the cells in the viewable region of

camerai. Observe that when the reference points are uniformly distributed, each point gets

an equal weight, and the above equation reduces to Equation 4.1.

As a final caveat, Voronoi tessellation requires the coordinates of reference points in

order to partition the viewable region into cells or polygons. Since reference point coor-

dinates are unknown, our techniques must estimate them during the initialization phase

(without using any infrastructure support). We describe how to do this in Section 4.3.2.

4.3.1.3 Approximate Tessellation

Since tessellation is a compute-intensive procedure that might overwhelm the limited

computational resources on a sensor node, we have developed an approximation. Instead

of tessellating the 3D viewing region of a camera into polyhedrons, a computationally

expensive task, the viewing region is discretized into smaller cubes. For each cube, the

closest viewable reference point from the center of the cube is calculated. The volume of

the cube is added to the weight of that reference point. When all cubes are associated and

their volumes added to the respective reference points, the weight of each reference points

59

��

��

Camera 1Camera 2

Figure 4.3. Region of overlap estimation using reference points and Voronoi tessellation.

is in proportion to the density of points in the vicinity—points in less dense regions will

have higher weights than points in less dense regions, thereby yielding an approximation

of the tessellation process.

4.3.2 Determining the Region of Overlap

Sincek-overlap only indicates the extent of overlap but does not specifywherethe over-

lap exists, our techniques also determineregion of overlapfor each camera. Like before,

we assume a reference object placed at randomly chosen locations. Using these points, first

a Voronoi tessellation of the viewing area is obtained for each camera. Theregion of over-

lap for any two camerasCi andCj is simply the the union of cells containing all reference

points simultaneously visible to the two cameras. Figure 4.3 shows the Voronoi tessellation

of the 2D viewing region of camera 1, the reference points viewable by cameras 1 and 2,

and the approximateregion of overlap(shaded region) for(C1, C2). Thus, our approximate

tessellation (described in Section 4.3.1.3) can be used to determine the region of overlap

for all pairs of cameras in the system.

Estimating reference point locations: As indicated before, the tessellation process

requires the locations of reference points. Since no infrastructure is available, we present

a technique to estimate these locations using principles of optics. While it is impossible

to determine the absolute coordinates of a reference point without infrastructure support,

it is possible to determine the coordinate of a reference pointrelative to each camera.

A key insight is thatif each camera can determine the coordinates of visible reference

points relative to itself, then tessellation is feasible—absolute coordinates are not required.

60

dr

LensImageplane

fs’

s

DpD i

ii’ i(x’ ,y’ ,z’ )

��

��

��C

R

Imageplane

Front

plane

i

Pi

Lens

Image

(0,0,0)

focal length ffocal length f

i i

i(px ,py ,f)i

(−px ,−py , −f)

(a) Object and image (b) Estimation of referencerelation in 2D point location

Figure 4.4. Estimating reference points locations without ranging information.

Assuming the origin lies at the center of the lens, the relative coordinates of a point are

defined as(dr, vr), wheredr is its distance from the origin, and~vr is a vector from the

origin in the direction of the reference point that defines its orientation in 3D space.

We illustrate how to determine the distancedr from the camera in 2-dimensions. We

have assumed that the size of the reference object is known a prior, says. The focal length

f is also known. Then the camera first estimates the size of the image projected by the

object—this is done by computing the bounding box around the image, determining the

size in pixels and using the size of the CMOS sensor to determine the size of those many

pixels. If s′ denotes the size of the image projected by the reference object on the camera,

then from Figure 4.4(a) , the following condition holds

tanθ =s

dr

=s′

f(4.3)

Sinces, s′ andf are known,dr can be computed. A similar idea holds in 3D space where

instead of size, area of the object has to be considered.

Next, to determine the orientation of the reference point relative to the camera, assume

that the reference object projects an image at pixel coordinates(x, y) on the image plane of

the camera. Then the vector~vr has the same orientation as the vector that joins the centroid

of the image to center of the lens (i.e., the origin). As shown in Figure 4.4(b), the vector

61

~PO = (x, y, f) has the same orientation as~vr, whereO is the origin andP is the centroid

of the image with coordinates(−x,−y,−f). Since(x, y) can be determined by processing

the image andf is known, the relative orientation of the reference point can be determined.

4.4 Applications

In this section, we describe how camera that are initialized approximately can satisfy

application requirements.

4.4.1 Duty-Cycling

Duty-cyclingis a technique to operate sensors in cycles of ON and OFF durations to

increase lifetime while providing the desired event-detection reliability and also to bound

the maximum time to detect an event. The duty-cycling parameterd is commonly defined

as the fraction of time a sensor is ON. An important criteria in deciding the duty-cycle

parameter is the degree of overlap. Sensors with high coverage redundancy can be operated

at low duty cycles to provide desired event detection probability, whereas those with lower

redundancy will require higher duty cycles. One of the techniques to duty-cycle parameter

based on degree of overlap is as follows,

di =n∑

k=1

Oki ×

1

k(4.4)

where,di is the duty-cycle parameter of camerai, Oki the fraction ofk-overlap with the

neighboring cameras andn the total number of cameras. The intuition is to duty-cycle

each camera in proportion to its degree of overlap with neighboring cameras.

4.4.2 Triggered Wakeup

Object tracking involves continuous monitoring of an object—as the object moves from

the range of one camera to another, tracking responsibilities are transferred via a handoff.

Since cameras may be duty-cycled, such a hand-off involves a triggered wakeup to ensure

62

thresholdDistance

Projectionline Object

Image

Figure 4.5. Region of overlap for triggered wakeup.

that the destination camera is awake. A naive solution is to send triggered wakeups to all

overlapping cameras and have one of them take over the tracking. While doing so ensures

seamless handoffs, it is extremely wasteful in terms of energy by triggering unnecessary

wakeups. A more intelligent technique is to determine the trajectory of the object and using

the region of overlap determine which camera is best positioned to take over tracking duties

and only wake it up.

However, since the object location is unknown to the sensor network, its trajectory can

not be accurately determined. The only known information about the object is the image

it projects onto the camera’s image plane—the object is known to lie along a line that

connects the image to the center of the lens. As shown in Figure 4.5, we refer to this line as

theprojection line, the line on which the object must lie. We can exploit this information

to design an intelligent triggered wakeup technique. Any camera whose region of overlap

intersects with the projection line can potentially view the object and is a candidate for a

handoff. To determine all such cameras, we first determine the set of reference points within

a specificdistance thresholdof the line (see Figure 4.5). To determine these reference

points, equidistant points along the length of the projection line are chosen and reference

points within the distance threshold are identified. Next, the set of neighboring cameras

that can view these reference points is determined (using information gathered during our

initialization process). One or more of these camera can then be woken up. Depending

on the extent of overlap with the projection line, candidate cameras are prioritized and

woken up in priority order—the camera with highest overlap has the highest probability

63

of detecting the object on wakeup and is woken up first. Two important parameters of the

scheme are thedistance thresholdand themaximum number of camerasto be woken up. A

large distance threshold will capture many reference points and yield many candidates for

wakeup, while a small threshold will ignore overlapping cameras. The maximum number

of cameras to be woken up bounds the redundancy in viewing the same object by multiple

cameras—a small limit may miss the object whereas a large limit may result in wasteful

wakeups. We discuss the effect of these parameters as part of the experimental evaluation.

4.5 Prototype Implementation

System Design The approximate initialization procedure involves taking pictures of

reference points (or objects). Reference points are objects like a ball with a unique color

or a light source, that can be easily identified by processing images at each camera. Each

camera after taking a picture, processes the image to determine if it can view a reference

point. If a reference point is visible to a camera, it calculates the location of the reference

point on its image plane and if possible estimates the location of the reference point. The

location can be estimated using an approximation of the distance of the reference point

from the camera. The distance can be determined if dimensions of the reference object

are known a priori along with the size of it’s image on the camera’s image plane. The

image location and distance of object information is exchanged with all other cameras in

the network. The data recorded at each camera can be stored as table of tuples,

< Rk : Ci, ui, vi, di, Cj, uj, vj, dj... >

where,Rk is thekth reference point visible to camerai, (ui, vi) is the projection location

of the reference point in the image plane anddi is the distance of the reference point from

the camera. The tuple also stores information from each camera that can view the refer-

ence point simultaneously. Based on this information collected at each camera, techniques

described above are used to initialize cameras.

64

Object DetectionImage Grabber

Bounding Box

Cyclops HostMote

View Table

Initializationprocedureview

information

trigger

(a) Network Setup (b) Software architecture

Figure 4.6. Setup and software architecture of prototype implementation.

The network setup for our prototype implementation is shown in Figure 4.6(a). The

network consists of 8 cameras covering a region of8f t × 6f t × 17f t. The camera are

equidistantly placed on the longest side, each at a height of3f t facing each other and view-

ing inside the cubical volume. The depth-of-view for each camera is8f t and the horizontal

and vertical viewing regions are7f t and6f t respectively. The setup is used to estimate and

compare k-overlap and region of overlap for each camera.

Hardware Components We used the Cyclops [44] camera sensor in our prototype

implementation to evaluate the approximate initialization techniques. The Cyclops camera

sensor consists of a ADCM 1700 CMOS camera module, and supports image resolutions of

32x32, 64x64 and 128x128. The Cyclops node also has an on-board ATMEL ATmega128L

micro-controller, 512 KB external SRAM and 512 KB Flash memory. The on-board pro-

cessing capabilities of the Cyclops are used for object detection and to detect the size of

object’s image. Each Cyclops sensor is connected to a Crossbow Mote (referred to as the

HostMote) and they communicate with each other via the I2C interface. The HostMote is

also used to receive and send wireless messages and store initialization information on be-

half of the Cyclops. A mote is also used as a remote control to send synchronized sampling

triggers to detect reference points during the initialization process. We experimented with

different objects as reference points in our experiments—small balls with unique colors, a

65

bulb and a glowing ball.

Software ComponentsBoth the Cyclops sensors and the Intel Crossbow Motes run

TinyOS [57]. Each Cyclops communicates with it’s attached mote using the I2C interface

and the motes communicate with each other via their wireless interface (see Figure 4.6(b)).

Cyclops Onboard Tasks:Each Cyclops is responsible for taking images and processing

them locally to detect the reference objects. On receiving a trigger from the HostMote each

Cyclops takes a picture and processes it to detect and recognize reference objects. The

results are communicated back to the HostMote.

HostMote Tasks: The HostMote drives each Cyclops to detect reference objects and

stores all the initialization information for each camera. Once an reference object is de-

tected, the HostMote estimates the distance of the object from the camera and transmits

a broadcast message indicating visibility of the reference object, coordinates of the object

on it’s image plane and distance of object from the camera. Further, the HostMote re-

ceives similar broadcasts from other nodes and maintains theViewTable, a table of tuples

representing viewability information of each reference point.

Trigger Mote Tasks:The trigger mote is used as a remote control for synchronized

detection of the reference object. Once a reference object is placed in a location, the trigger

mote sends a wireless broadcast trigger to all HostMotes, which in turn trigger the attached

Cyclops sensors.


In this section we present a detailed experimental evaluation of the approximate initial-

ization techniques using both simulation and implementation based experiments. Specif-

ically, we evaluate the accuracy of the approximate initialization procedure in estimating

the degree of overlapand region of overlapof camera sensors. In addition, we evaluate

the effect of skew in location of reference points on the accuracy of estimation. Further,

66

we also evaluate the performance of an triggered wakeup application which demonstrates

effective use of the region of overlap information.

4.6.1 Simulation Setup

The simulation setup used for evaluation consisted of a cubical region with dimen-

sions 150x150x150. Two cases, one with 4 cameras and the other with 12 cameras are

used. In the first case, 4 cameras are placed at locations (75,0,75), (75,150,75), (0,75,75),

(150,75,75), oriented perpendicular to the side plane looking inwards. Thek-overlap at

each camera is as follows:1-overlap: 0.54,2-overlap: 0.23,3-overlap: 0.07 and4-overlap:

0.16. In the second case, additional 8 cameras are placed at the 8 corners of the cube and

each of them is oriented inwards with the central axis pointing towards the center of the

cube.

An uniform distribution of reference points was simulated by uniformly distributing

points in the cubical viewing region. To simulate a skewed distribution, a fraction of refer-

ence points were distributed in a smaller region at the center of the viewing region and the

rest were distributed in the entire viewing area. For example, a region of size 25x25x25 at

the center of the viewing region, in different cases, had atleast 25%, 33%, 50%, 66% and

75% of total points within its boundary. We also used restricted regions of sizes 50x50x50

and 75x75x75 with varying fractions of skew in our evaluation.

4.6.2 Degree of overlap estimation

In this section we present evaluation of the techniques used to estimatek-overlap, the

degree of overlap metric, and its use to estimate the duty-cycling parameter.

4.6.2.1 Initialization with uniform distribution of reference points

Figure 4.7 plots the error ink-overlap estimation using the four camera setup with uni-

form distribution of reference points. The absolute difference in the approximate estimation

67

0 100 200 3000

0.1

0.2

0.3

0.4

0.51−overlap

# reference pointser

ror

non−weightedweighted

0 100 200 3000

0.1

0.2

0.3

0.4

0.52−overlap

# reference points

erro

r


0 100 200 3000

0.1

0.2

0.3

0.4

0.53−overlap

# reference points

erro

r


0 100 200 3000

0.1

0.2

0.3

0.4

0.54−overlap

# reference points

erro

r


Figure 4.7. Evaluation of k-overlap estimation scheme with uniform distribution of refer-ence points.

and the exact k-overlap fraction averaged over the 4 cameras is reported aserror. The error

in k-overlap estimation using both the non-weighted and weighted techniques is similar.

Figure 4.7 also plots the effect of number of viewable reference points— reference

points viewable by atleast a single camera— onk-overlap estimation. The error ink-

overlap estimation decreases with increase in number of reference points for both the non-

weighted and weighted schemes. Error in 1-overlap estimation with the weighted scheme

decreases from 0.075 to 0.04 with 50 and 150 reference points respectively.

4.6.2.2 Initialization with skewed distribution of reference points

Figure 4.8 plots thek-overlap estimates with non-uniform distribution of reference

points. The results are averaged for the different fractions of skew within a restricted re-

gion of 25x25x25. As seen from the figure, the weighted scheme accounts for skew better

than the non-weighted scheme—with most benefits for 1-overlap and 4-overlap estima-

tion. The non-weighted scheme performs poorly as it only counts the number of simulta-

68

0 100 200 3000

0.2

0.4

0.6

0.8

11−overlap

# reference pointser

ror


0 100 200 3000

0.2

0.4

0.6

0.8

12−overlap

# reference points

erro

r


0 100 200 3000

0.2

0.4

0.6

0.8

13−overlap

# reference points

erro

r


0 100 200 3000

0.2

0.4

0.6

0.8

14−overlap

# reference points

erro

r


Figure 4.8.Evaluation of weighted k-overlap estimation with skewed distribution of refer-ence points.

neously viewable points, while the weighted scheme accounts for the spatial distribution

of the points. Further, with increase in number of reference points, the error with the

weighted scheme decrease, whereas that with the non-weighted scheme remains the same.

Figure 4.9(a) plots the k-overlap with 150 reference points, and it shows that the weighted

scheme performs better than the non-weighted scheme. The error with the non-weighted

scheme for 1 and 4 overlap is worse by a factor of 4 and 6 respectively.

Figure 4.9(b) plots error in estimation of 1-overlap with 150 reference points and vary-

ing skew. As skew increases, so does the error in both non-weighted and weighted schemes—

error with the weighted scheme being smaller than the non-weighted scheme. The increase

in error is also more gradual with the weighted scheme as compared to the non-weighted

scheme. The error with the non-weighted scheme increases from 0.26 to 0.49 with increase

in skew fraction from 25% to 75% and the corresponding values for the weighted scheme

are 0.045 and 0.09 respectively.

69

1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

k−overlap

erro

r


0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.1

0.2

0.3

0.4

0.5

Fraction representing skew

erro

r


50 100 150 200 2500

5

10

15

20

25

30

# reference points

Per

cent

age

erro

r


(a) k-overlap (b) Effect of skew (c) Duty-cycle parameter

Figure 4.9. Evaluation of the weighted k-overlap estimation scheme.

Duty-Cycling The percentage error in duty-cycle parameter estimation (see Section 4.4.1)

using thek-overlap estimates is shown in Figure 4.9(c). As seen from the figure, error using

the non-weighted scheme is close to 24% and remains unchanged with increase in refer-

ence points. Whereas, error with the weighted scheme is 5% even with only 50 points and

decreases very close to zero with more than 150 points.

From the results presented above, we conclude that the weighted k-overlap estimation

scheme is well suited to estimate degree of overlap of cameras. The scheme performs

identical to the non-weighted scheme with uniform distribution of reference points and sig-

nificantly better with non-uniform distributions. The application-level error in determining

the duty-cycle parameter using the weighted scheme is close to zero.

4.6.3 Region of overlap estimation

In this section we present evaluation ofregion of overlapestimation and the triggered

wakeup heuristic that uses this estimate. Figure 4.10(a) plots results evaluating the effect of

number of reference points on region of overlap estimation. The percentage error reported

is the absolute error in estimated volume corresponding to a region of overlap and the ex-

act volume. As seen in Figure 4.10(a), with uniform distribution of reference points, the

percentage error of all four cameras follows a similar trend. With 50 reference points the

percentage error for the four cameras is between 21-23% and with 100 reference points is

70

0 50 100 150 200 250 3000

5

10

15

20

25

30

35

40

#reference points

Per

cent

age

erro

r

camera 1camera 2camera 3camera 4

0 1 2 3 4 5 60

0.2

0.4

0.6

0.8

1

Wakeup Threshold (#cams)

Fra

ctio

n of

pos

itive

wak

eups

100 ref. pts200 ref pts.300 ref. pts

10 20 30 400

0.2

0.4

0.6

0.8

1

Wakeup distance threshold

Fra

ctio

n of

pos

itive

wak

eups

ncams=1ncams=2ncams=3ncams=4ncams=5

(a) Effect of number (b) Effect of (c) Effect ofof reference points number of cameras distance threshold

Figure 4.10.Region of overlap estimation and wakeup heuristic performance.

12-14%. With higher number of reference points the error decreases and so does the stan-

dard deviation. With 200 reference points the error is 7-8% and with 250 points is 6-7%.

The above results show that region of overlap between pair of cameras can be estimated

with low error—6-7% with uniform distribution in our setup.

Wakeup Heuristic Next, we evaluate effectiveness of the wakeup heuristic based on

the region of overlap estimates with the 12-camera setup. Figure 4.10(b) plots the effect of

maximum number of cameras triggered on the fraction of positive wakeups, i.e., fraction

of cases when atleast one of the triggered cameras could view the object. As seen from the

figure, with increase in maximum number of cameras triggered per wakeup, the fraction

of positive wakeups increases. Further, the fraction also increases with increase in total

reference points in the environment. The fraction of positive wakeups with a maximum

of 2 cameras to be triggered is 0.7 and 0.88 for 100 and 300 reference points respectively

with a distance threshold (see Section 4.4.2) of 20 inches. With a maximum of 5 cameras

to be triggered the corresponding fractions are 0.77 and 0.93 respectively. The fraction of

positive wakeups is over 0.8 with a maximum of 2 wakeups per trigger.The result shows

that the wakeup heuristic based on region of overlap estimate can achieve high fraction of

positive wakeups—close to 80% accuracy with 2 cameras woken up per trigger.

71

Camera Error1 1.5%2 7.1%3 4.9%4 5.8%5 8.7%6 3.1%7 7.9%8 6.7%

Camera Error1 2.4%2 2%3 6.4%4 10.8%5 3%6 4.7%7 4.3%8 0.65%

0 10 20 30 40 50 60 70 800

2

4

6

8

10

12

14

True Distance (inches)

Per

cent

age

erro

r

(a) k-overlap error (b) Region-of-overlap error (c) Distance estimation error

Figure 4.11. Initialization using prototype implementation.

Another parameter that influences the performance of the heuristic is thedistance thresh-

old—the distance along the projection of the object’s image used to approximate overlap-

ping cameras. As shown in Figure 4.10(c), with increase in distance threshold from 10 to

20 with 200 reference points, the fraction of positive wakeups increases and remains rela-

tively constant for a maximum 2, 3, 4 and 5 triggered cameras. With just one camera to be

woken up for each trigger, the fraction of positive wakeups decreases with further increase

(beyond 20) in distance threshold. This indicates that the distance threshold is an important

factor affecting the performance of the heuristic and for our setup a threshold of 20 yields

best performance.

4.6.4 Implementation Results

In this section, we evaluate the estimation ofk-overlap and region of overlap using our

prototype implementation. As described in Section 4.5, we use 8 cameras in our setup and

a light-bulb (1.5 in in diameter) as a reference object placed uniformly in the region viewed

by the cameras. Table 4.11(a) shows the averagek-overlap percentage error at each camera.

The percentage error ink-overlap estimation over all cameras is 2-9%.

We also evaluate the accuracy of region of overlap estimate between pairs of cameras

in the 8-camera setup. Figure 4.11(b) tabulates the average percentage error estimating the

region of overlap between pairs of cameras. The average error in estimating the region of

72

overlap between pairs of cameras varies form 1-11% for our setup. An important factor

that affects the region of overlap estimate is the distance estimate of the object from the

camera. Figure 4.11(c) plots the percentage error in estimating the distance of the object

from the camera based on its image size. As can been from the figure, the error is varies

from 2-12%. For our setup, the region of overlap estimates show that the error is below

11% inspite of the error in distance estimation of the object.

Our results show that the approximate initialization techniques are feasible in real-

world deployments and for our setup had errors close to 10%.

4.7 Conclusions

In this chapter, we argued that traditional vision-based techniques for accurately cal-

ibrating cameras are not directly suitable for ad-hoc deployments of sensors networks in

remote locations. We proposed approximate techniques to determine the relative locations

and orientations of camera sensors without any use of landmarks or positioning technol-

ogy. By randomly sampling the environment with a reference object, we showed how to

determine the degree and range of overlap for each camera and how this information can

be exploited for duty cycling and triggered wakeups. We implemented our techniques on a

Mote testbed. Our experimental results showed that our approximate techniques can esti-

mate the degree and region of overlaps to within 10% of their actual values and this error

is tolerable at the application-level for effective duty-cycling and wakeups.

73

CHAPTER 5

ENERGY-RELIABILITY TRADEOFF IN MULTI-TIER SENSORNETWORKS

In this chapter I will present the design and implementation ofSensEye, a multi-tier

heterogeneous camera sensor network, to demonstrate the benefits of multi-tier sensor net-

works. A simple surveillance application is implemented usingSensEye, which studies the

tradeoffs of energy-efficiency and sensing reliability.

5.1 Background and System Model

This section describes the common processing tasks of camera sensor networks for

surveillance applications and the system model ofSensEye.

5.1.1 Camera Sensor Network Tasks

A camera sensor network will need to perform several processing tasks in order to

obtain useful information from the video and images acquired by various camera sensors.

Two sample applications are, surveillance and monitoring in a disaster response to provide

visual feedback and monitoring of rare species in remote forests. Both applications have

numerous characteristics in common and involve three key tasks.

Object detection: First, the application needs to detect the presence of a new object

whenever it enters the monitored environment. To illustrate, the rare species monitoring

application needs to detect the presence of each animal that enters the monitored environ-

ment, while the surveillance application needs to detect vehicles or people that enter the

monitored area. A good detection algorithm will minimize the latency to detect each new

object that enters the monitored area.

74

Object recognition: Once a new object is detected, it needs to be classified to deter-

mine its type (e.g., a car versus a truck, a tiger versus a deer). This process, referred to

as object recognition, enables the application to determine if the object is of interest and

whether further processing is warranted. For instance, a surveillance system may be inter-

ested in counting the number of trucks on a highway but not cars. In this work, I assume

that an image database of all interesting objects is available a priori, and the recognition

step involves determining if the newly detected object matches one of the objects in this

database.

Object tracking: Assuming the new object is of interest to the application, it can be

tracked as it moves through the environment. Tracking involves multiple tasks: (i) comput-

ing the current location of the object and its trajectory, (ii) handoff of tracking responsibility

as an object moves out of visual range of one camera sensor and into the range of another,

and (iii) streaming video or a sequence of still images of the object to a logging store or a

monitoring station.

The goal is to devise a hardware and software architecture to perform these tasks so as

to optimize power consumption, without sacrificing performance metrics such as latency

and reliability. As explained earlier, rather than choosing a single platform and a single type

of camera sensor, the thesis focuses on multi-tier networks where the detection, recognition

and tracking may be performed on different nodes and cameras to achieve the above goal.

5.1.2 System Model

SensEyeis a camera sensor network comprising multiple tiers (see Figure 5.1). A

canonical sensor node within each tier is assumed to be equipped with a camera sensor,

a micro-controller, and a radio as well as on-board RAM and flash memory. Nodes are

assumed to be tetherless and battery-powered, and consequently, the overall constraint for

each tier is energy. Within each tier, nodes are assumed to be homogeneous, while dif-

ferent tiers are assumed to be heterogeneous with respect to their capabilities. In general,

75

Webcam

Mote

Stargate

Webcam

Mote

Stargate

Cmucam

Mini−ITX

Tier2Tier1

Cmucam

Radio Mote Mote Mote Mote

CmucamCmucam

Tier3Ethernet

PTZ Camera

SerialCable

USB

Figure 5.1. A multi-tier SensEyehardware architecture.

that the processing, networking, and imaging capabilities improve as we proceed from a

lower tier to a higher tier, at the expense of increased power consumption. Consequently,

to maximize application lifetime, the overall application should use tier-specific resources

judiciously and should execute its tasks on the most energy-efficient tier that has sufficient

resource to meet the needs of that task. Thus, different tasks will execute on different tiers

and various tiers of camera sensor network will need to interact and coordinate to achieve

application goals. Given these intra- and inter-tier interactions, application design becomes

more complex—the application designer needs to carefully map various tasks to different

tiers and carefully design the various interactions between tasks.

One of the goals ofSensEyeis to illustrate these tradeoffs while demonstrating the over-

all benefits of the multi-tier approach. To do so,SensEyeassumes a three-tier architecture

(see Figure 5.1). The lowest tier inSensEyecomprises Mote nodes [37] equipped with

900MHz radios and low-fidelity Cyclops or CMUcam camera sensors. The secondSens-

Eyetier comprises Stargate [55] nodes equipped with web-cams. Each Stargate is equipped

with an embedded 400MHz XScale processor that runs Linux and a web-cam that can cap-

ture higher fidelity images than Tier 1 cameras. Each Tier 2 node also consists of two

radios—a 802.11 radio that is used by Stargate nodes to communicate with each other, and

a 900MHz radio that is used to communicate with Motes in Tier 1. The third tier ofSens-

76

Eyecontains a sparse deployment of high-resolution pan-tilt-zoom cameras connected to

embedded PCs. The camera sensors at this tier are retargetable and can be utilized to fill

small gaps in coverage provided by Tier 2 and to provide additional redundancy for tasks

such as localization.

Nodes in each tier and across tiers are assumed to communicate using their wireless

radios in ad-hoc mode; no base-stations are assumed in this environment. The radio inter-

face at each tier is assumed to be individually duty-cycled to meet application requirements

of latency and lifetime constraint on each node. Consequently, the application tasks need

to be designed carefully since the radios on the nodes (and the nodes themselves) are not

“always-on”.

Given the above system model, the key design principles for the design and implemen-

tation ofSensEyeare presented next.

5.2 Design Principles

The design of theSensEyemulti-tier camera sensor network is based on the following

principles.

• Principle 1: Map each task to the least powerful tier with sufficient resources:In

order to judiciously use energy resources, each sensing and processing task should

be mapped to the least powerful tier that is still capable of executing it reliably within

the latency requirements of the application—running the task on a more capable tier

will only consume more energy than is necessary.

• Principle 2: Exploit wakeup-on-demand:To conserve energy, the processor, radio

and the sensor on each node are duty-cycled. Our system employs triggers to wake

up a node in an on-demand fashion and only when necessary. For example, a higher-

fidelity camera can be woken up to acquire a high-resolution image only after a new

object is detected by a lower tier. By putting more energy-constrained higher-tier

77

Figure 5.2. Software architecture ofSensEye.

nodes in sleep mode and using triggers to wake them up on-demand, our system can

maximize network lifetime.

• Principle 3: Exploit redundancy in coverage:The system should exploit overlaps in

the coverage of cameras whenever possible. For example, two cameras with overlap-

ping coverage can be used to localize an object and compute its(x, y, z) coordinates

in the environment; this information can then be used to intelligently wakeup other

nodes or to determine the trajectory of the object. Thus, redundancy in sensor cover-

age should be exploited to improve energy-efficiency or performance.

5.3 SensEye Design

SensEyeseeks to provide low-latency and high-reliability event detection as well as be

energy efficient—conflicting goals to achieve in a homogeneous single tier network.

Task allocation inSensEyeis a point solution in the space of all possible allocation

permutations across tiers. The static task allocation is based on the power requirements

of each task and the power requirements and capabilities of nodes at each tier. Figure 5.2

shows different components ofSensEyeand mapping of each task to the corresponding tier.

Following is a description of each task and its instantiation inSensEye.

78

5.3.1 Object Detection

The first task of a camera sensor network is to detect presence of objects as they en-

tire a region of interest. Low latency detection is possible with always-on nodes or with

dense node deployment and efficient duty-cycling. Always-on nodes reduce energy effi-

cient operation of nodes and periodic sampling can be used to bound latency of detection

and improve energy efficiency.

In general, object detection is the simplest task and hence is assigned to Tier 1, the tier

with the least power requirements and least image fidelity. Tier 1 nodes wakeup period-

ically, acquire an image of the environment and process the image to detect presence of

objects. The sampling rate can be varied to change bound on latency of detection and also

the energy usage of the node. Nodes are initialized randomly for non-synchronized duty

cycles. Nodes perform object detection using a simple frame differencing mechanism. An

image of the background is stored at each node and is used for frame differencing for each

captured image. The frame difference is passed through a simple threshold-based noise

filter to get a cleaned foreground image. The number of foreground pixels along with a

thresholding mechanism is used to detect new objects in the environment.

5.3.2 Inter-Tier Wakeup

In SensEyethe higher tier nodes are by default asleep to avoid usage of nodes with

higher power requirements and conserve energy. Once Tier 1 nodes detect object pres-

ence, one or more Tier 2 or Tier 3 nodes need to be woken up for further processing, i.e.,

recognition and tracking.

Two important aspects of inter-tier wakeup are: (i) Intelligent wakeup of appropriate

higher tier nodes and (ii) Inter-tier wakeup latency. Intelligent wakeup of higher nodes

can be achieved assuming location and coverage information of each sensor is known and

further the object can be localized. Object localization (described in further detail in Sec-

tion 5.3.5) is possible if more than one camera sensors view the object simultaneously. The

79

object’s location along with the coverage and location information of each camera can be

used for intelligent wakeup of higher tier nodes and reducing high power wasteful wakeups.

If as object is observed by a single sensor, all higher tier nodes with overlapping coverage

need to woken up to ensure high reliability.

The separation of detection and recognition tasks across tiers introduces latency. The la-

tency includes the delay in receiving the wakeup signal and the delay in transition from the

sleep to wakeup state.SensEyeuses several optimizations to reduce the inter-tier wakeup

latency. The wakeup begins with the transmission of a short wakeup packet from Tier 1.

Low-power always-on components at higher tiers process these packets and transition the

higher power subsystems from sleep to wakeup for further processing. The techniques

used are similar to those used in Triage [6] and wake-on-wireless [14]. Further, nodes at

higher tiers load the bare minimum device drivers need for operation—thereby keeping the

transition times small during wakeup.

5.3.3 Object Recognition

Once an object’s presence is detected, the next step is to recognize and classify it.

Higher tier nodes capable of acquiring high fidelity images are used for this purpose. The

recognition task is used to identify objects of interest, e.g., identify whether an object is

a person which is of interest or a truck which is not of interest. InSensEyerecogni-

tion involves obtaining an image of the environment, isolating the object from the fore-

ground, identifying object features and using similarity analysis in conjunction with a im-

age database. High fidelity images result in high accuracy recognition and also require

greater processing—both available at higher tiers ofSensEye. Several sophisticated recog-

nition techniques have been studied and developed by the computer vision community.

In this work,SensEyeuses a simple pixel-based comparison as a proof of concept object

recognition technique. A connected components [47] algorithm isolates objects from the

foreground and a color matching heuristic to match the object to the image database.

80

Figure 5.3. 3D object localization using views from two cameras.

5.3.4 Object Tracking

Tracking of moving objects involves multiple sensing and processing tasks—continuous

object detection as it moves through the field of view of cameras, object recognition to en-

sure that the object of interest to the application is tracked across cameras, and finally

trajectory prediction to estimate the movement pattern of the object.

Object tracking inSensEyeinvolves a combination of detection, localization, inter-tier

wakeup as well as recognition. As the object moves through the covered region, different

Tier 1 nodes detect the target. If multiple nodes detect the target, localization can be used

to accurately pinpoint the location of the target. Continuous localization can used to track

the path of the moving object. Our current prototype can handle slow moving objects, and

trajectory prediction schemes for fast moving objects (using techniques such as [69]) is the

subject of ongoing research. FutureSensEyemechanisms can enable acquired images or

video acquired to be displayed at a monitoring station or logged to a persistent store.

5.3.5 Object Localization

An object’s location can be determined if more than one camera sensor can view and

detect the object simultaneously. Localization can provide several optimizations to improve

performance ofSensEye. Localization at Tier 1 can be used to intelligently wakeup the

appropriate higher tier nodes and reduce wasteful wakeups. Tier 1 nodes could further

steer Tier 3 nodes in the direction of the object based on its location. Higher tier nodes

track objects movement based of objects location and can also use it for track prediction.

81

The localization scheme implemented inSensEyeworks for a 3D setting and assumes

that cameras are calibrated at system setup—their locations and orientations known relative

to a global reference frame. Localization as implemented inSensEyeconsists of three steps

as shown in Figure 5.3. The three steps are described below.

Step 1: Calculation of vector along direction of object location.

As shown in Figure 5.3(a), the camera coordinate space is assumed to be the following:

the image plane is the X-Y plane and the central axis perpendicular to the image plane

is the Z axis. The center of the camera lens is at pointP2 : (0, 0, f), wheref is the

focal length of the lens, and the centroid of the image of the object on the image plane is

P1 : (x, y, 0). The vector,v, along which the object’s centroid lies is, therefore, computed

as v = P2 −P1 = {−x,−y, f}. The object’s centroid is calculated by processing the

image, isolating the object and calculating a bounding box around the object.

Step 2: Transforming vector to global reference frame

To translate the object’s vector,v, from the camera’s reference frame to the global refer-

ence frame, the rotation and translation matrices obtained during calculation of the camera

orientations are used. Each camera’s orientation consists of a translation and two rotations.

The translation from the global reference origin to the camera location is denoted by a

translation matrixT. Figure 5.3(b) shows the orientation of a camera as a composite of

two rotations Initially, the camera is assumed to positioned with its central axis along the

Z axis and its image plane parallel to the global X-Y plane. First, the camera is rotated

by an angle ofθ in the counter clockwise direction about the Z axis, resulting in X’ and

Y’ as the new X and Y axes. Next, the camera is rotation by an angleφ in the clockwise

direction about the X’ axis, resulting in Y” and Z’ as the new Y and Z axes. The two rota-

tions are represented by a rotation matrixR and can be used to reverse transform the vector

calculated in Step 1 to the global reference frame. Ifv1 andv2 are the two vectors along

the direction of object location from cameras 1 and 2 respectively, the two corresponding

vectors in global reference frame are:

82

v′1 = R1.v1 (5.1)

v′2 = R2.v2 (5.2)

where,R1 andR2 are the composite rotation and translation matrices. The matrixR takes

the following form:

R =

Cosθ −SinθCosφ −SinθSinφ a

Sinθ CosθCosφ CosθSinφ b

0 −Sinφ Cosφ c

0 0 0 1

(5.3)

where,θ andφ are rotation angles as described in Step 2 and a,b and c are the translation

magnitudes along the X,Y and Z global reference axes respectively.

Step 3: Object Location using Closest Point of Approach

Given the two vectors,v′1 andv′2, their intersection is the location of the object as shown in

Figure 5.3(c). Since the lines are in three dimensions they are not guaranteed to intersect

especially due to error in centroid computation and camera calibration. A standard tech-

nique used for approximating the intersection is using the Closest Point of Approach [12].

The closest point of approach gives the shortest distance between the two lines in three

dimensions. We use this method to get pointsCP1 andCP2, the closest points between

vectorsv′1 andv′2 respectively. The location of the object is given by the mid-point ofCP1

andCP2.

Note that camera calibration and localization in 2D are simpler cases of the more gen-

eral 3D technique presented above.

5.4 SensEye Implementation

This section describes the implementation ofSensEyebased on the design discussed in

the previous section.

83

5.4.1 Hardware Architecture

Our SensEyeimplementation uses four types of cameras—the Agilent Cyclops [45],

the CMUcam Vision sensor [9, 48], a Logitech Quickcam Pro Webcam and a Sony PTZ

camera—and three platforms—Crossbow Motes [37], Intel Stargates [55] and a mini-ITX

embedded PC.SensEyeis a three-tier network, with the first two tiers shown in Figure 5.4.

Tier 1: Tier 1 of SensEyecomprises a low-power camera sensor such as Cyclops [45]

connected to a low-power Mote [37] sensor platform. The Cyclops camera is currently

available only as a prototype. Therefore, the Cyclops platform is used for our individual

component benchmarks and substitute it with a similarly constrained but higher power

CMUcam for our multi-tier experiments.

The Cyclops platform comprises an Agilent ADCM–1700 CMOS camera module, an

ATMega128 micro-controller and a Xilinx FPGA. The board attaches using a standard 32-

pin connector to a Mote, and communicates to it using UART. The software distribution for

Cyclops [45] provides support for frame capture, frame differencing and object detection.

The CMUcam is a less power-optimized camera that comprises an OV7620 Omnivision

CMOS camera and a SX52 micro-controller. The CMUcam connects to a Mote using

a serial interface, as shown in Figure 5.4(a). The CMUcam has a command set for its

micro-controller, that can be used to wakeup the CMUcam, set camera parameters, capture

images, perform frame differencing and tracking.

Tier 2: A typical Tier 2 sensor comprises of a more-capable platform and camera and a

wakeup circuit to wakeup the node from the sleep or suspend state upon receiving a trigger

from a Tier 1 node. In our implementation, as shown in Figure 5.4(b), A Intel Stargate

sensor platform is used along with an attached Mote that acts as the wakeup trigger. Since

the Stargate does not have hardware support for being woken up by the Mote, a relay circuit

described in Turducken [54] is used for this purpose. The Logitech Webcam connects to

the Stargate through the USB port.

84

(a) Tier 1 (b) Tier 2

Figure 5.4. Prototype of a Tier 1 Mote and CMUcam and a Tier 2 Stargate, web-cam anda Mote.

Tier 3: A Tier 3 node comprises a Sony SNC-RZ30N PTZ camera connected to an

embedded PC running Linux.

5.4.2 Software Architecture

The software framework ofSensEyeis shown in Figure 5.5. The description of our

software framework assumes that Tier 1 comprises Motes connected to CMUcam cameras.

Substituting a CMUcam with a Cyclops involves minimal change in the architecture. The

first two tiers ofSensEyecomprise four software components: (i) CMUcam Frame Differ-

entiator, (ii) Mote–level Detector, (iii) Wakeup Mote, and (iv) Object Recognition at the

Stargate. Following is the description of each component’s functionality.

Tier 1 Frame Differentiator: The Tier 1 cameras receive periodic instructions from

the Mote to capture an image for differencing. On each such instruction, the CMUcam

captures the image in view, quantizes it into a smaller resolution frame, performs frame

differencing with the reference background frame and sends back the result to the Mote.

Frame differencing results in image areas where objects are present to be highlighted (by

non–zero difference values). The CMUcam has two modes of frame differencing, (i) a

low resolution mode, where it converts the current image (of88 × 143 or 176 × 255) to a

8 × 8 grid for differencing, or (ii) high resolution mode, where a16 × 16 grid is used for

85

Mote−level

DetectorFrameCmucam

response

poll

trigger wakeup

Stargate

DetectionRecognition

FrameGrabber

Differentiator

Tier 1 Tier 2

Wakeup

Mote

Figure 5.5. SensEyeSoftware Architecture.

differencing. The frame differencing is at very coarse level and hence has relatively high

error to estimate location of the object or its bounding box.

Mote–Level Detector: The function of the Tier 1 Mote is to control the CMUcam

and send object detection triggers to the higher level nodes. On startup, the Mote sends

initialization commands to the CMUcam, to set its background and frame differencing

parameters. Periodically, based in its sampling rate, the Mote sends commands to the

CMUcam to capture an image and perform frame differencing. The CMUcam responds

with the frame difference result. The Mote uses a user–specified threshold and the returned

frame difference result to decide whether an event (object appearance or object motion)

has occurred. If an event is detected, the Mote broadcasts a trigger for the higher tier. On

no event detection, the Mote sleeps till the next sampling time. Additionally, the Mote

duty-cycles the CMUcam by putting it to sleep between two sampling instances.

Wakeup Mote: The Mote connected to the Stargate receives triggers from the lower

tier Motes and is the interface between the two tiers. On receiving a trigger, the Mote can

decide whether to wakeup the Stargate for further processing. Typically, the localized co-

ordinates are used for this purpose. Rather than actually computing the object coordinates

at a Tier 1 Mote, which requires significant coordination between the Tier 1 nodes, our

implementation relies on a Tier 2 Mote to compute these coordinates—the Tier 1 nodes

simply piggyback parameters such asθ, φ and the centroid of the image of the object with

their wakeup packets. The Tier 2 Mote then uses techniques described in Section 5.3.5 to

86

derive the coordinates. The Stargate is then woken up if the object location is within its

field of view, otherwise the trigger is ignored.

High Resolution Object Detection and Recognition:Once the Stargate is woken up,

it captures the current image in view of the webcam. Frame differencing and connected

component labeling [47] of the captured image along with the reference background image

is performed. This yields the pixels and boundaries where the potential objects appear

in the image. Smoothing techniques based on color threshold filtering and averaging of

neighboring region are used to remove noise pixels. Each potential object then has to be

recognized. In our current implementation, we use an averaging scheme based on the pixel

colors on the object. The scheme produces an average value of the red, green and blue

components of the object. The values can be matched against a library of objects and the

closest match is declared as the object’s classification.SensEyecan be extended by adding

sophisticated classification techniques, face recognition and other vision algorithms. We

evaluate a face recognition system in the Experimental section to get an idea of its latency

and power requirements.

PTZ Controller: The Tier 3 retargetable cameras are used to fill gaps in coverage and

to provide additional coverage redundancy. The pan and tilt values for the PTZ cameras are

based on localization techniques as described before. The cameras export a HTTP API for

program–controlled camera movement. We use one such HTTP–based camera driver [8]

to retarget the Tier 3 PTZ cameras.


This section presents detailed experimental evaluation ofSensEye. Specifically, power

consumption, sensing reliability and latency and camera benchmarks to characterize indi-

vidual components are evaluated and are used compare single–tier and multi–tierSensEye

systems.

87

Mode Latency Average Power Energy(ms) Current Consumption Usage

(mA) (mW) (mJ)

Mote Processing 136 19.7 98.5 13.4CMUcam Object 132 194.25 1165.5 153.8

Detection

Table 5.1.SensEyeTier 1 (with CMUcam) latency breakup and energy usage. Total latencyis 136 ms and total energy usage is 167.24 mJ.

A B

0

20

0 1 2Time (seconds)

Cur

rent

(m

A)

Mode Latency Current Power Energy(ms) (mA) (mW) Usage(mJ)

A: Object Detection 892 11 33 29.5B: Idle – 0.34 1 –

Table 5.2.SensEyeTier 1 (with Cyclops) latency breakup and energy usage.

5.5.1 Component Benchmarks

The latency and energy usage benchmarks for Tier 1 and Tier 2 are reported in this

section.

Since minimizing energy usage is an important goal ofSensEye, the power consumption

and latency of each hardware and software component in its different modes of operation is

systematically studied. Tables 5.1 and 5.2 report latency, average power consumption and

the energy usage for object detection at Tier 1 and Table 5.3 provides a similar breakdown

for object recognition at Tier 2.

Tier 1: As seen from Table 5.1, 97%of the total latency of object detection at Tier

1, i.e., 132 ms out of 136 ms, is due to CMUcam processing (frame capture and frame

88

differencing). Also, due its higher power requirements, CMUcam uses 92% of the energy,

i.e., 153.8 mJ out of 167.2 mJ. In contrast, the Cyclops (refer Table 5.2) is much more

energy efficient as compared to the CMUcam and consumes 33 mW for 892 ms, which

is better than the CMUcam by a factor of 5.67 in terms of energy usage. However, the

latency of detection at the Cyclops is around 900 ms, which is more than 6 times as much

as the CMUcam. This latency number is an artifact of the current Cyclops hardware and

can be reduced to around 200ms with optimizations expected in future revisions of the

node. A breakup of the energy consumption of the Cyclops camera for detection is given

in Table 5.2.

Tier 2: The processing tasks at Tier 2 ofSensEyecan be divided as: wakeup from

suspend of the Stargate, stabilization after wakeup for program to start executing, camera

initialization, frame grabber, vision algorithm for detection and recognition and finally the

shutdown procedure for suspend, as shown in Table 5.3. The total latency at Tier 2 to com-

plete all operations is 4 seconds. The largest delays are during camera initialization (1.28 s)

and shutdown for suspend (1 s), with corresponding energy usages of 1725.4 mJ and 768.5

mJ. The least latency task is the algorithm used for object detection and recognition, which

has a latency of 105 ms and the least energy usage of 144.2 mJ.

The comparison of energy consumption and latency reveals some of the benefits of us-

ing a two-tier rather than a single-tier camera sensor network. Every wakeup to shutdown

cycle at Tier 2 consumes around 28 times as much energy as similar task at Tier 1 compris-

ing of CMUcams. When the Tier 1 comprises of Cyclops cameras instead of CMUcams

the ratio of energy usage is 142. There are two reasons for this large difference in en-

ergy consumption between tiers. First, the latency associated with Linux operating system

wakeup from suspend state is significantly greater than the wakeup latency on a highly lim-

ited Mote platform that runs TinyOS. Second, the Stargate platform consumes significantly

greater power than a Mote during the wakeup period. The net effect of greater latency

89

A B C D E F G

0

200

400

1 2 3 4 5 6

Cur

rent

(m

A)

Time (seconds)

Mode Latency Current Power Energy(ms) (mA) (mW) Usage(mJ)

A: Wakeup 366 201.6 1008 368.9B: Wakeup Stabilization 924 251.2 1256.5 1161C: Camera Initialization 1280 269.6 1348 1725.4D: Frame Grabber 325 330.6 1653 537.2E: Object Recognition 105 274.7 1373.5 144.2F: Shutdown 1000 153.7 768.5 768.5G: Suspend – 3 15† –

Table 5.3. SensEyeTier 2 Latency and Energy usage breakup. The total latency is 4seconds and total energy usage is 4.71 J.† This is measured on an optimized Stargate node with no peripherals

attached.

S1 S2

M1 M2 M3 M4

M: MoteS: Stargate

Figure 5.6. Placement of Tier 1 Motes and Tier 2 Stargates inSensEye.

and greater power consumption results in significantly greater total energy consumption

for Tier 2.

5.5.2 Comparison ofSensEyewith a Single-Tier Network

Next, I present an evaluation of the fullSensEyesystem and compare it to a single-tier

implementation. The comparison is along two axes—energy consumption and sensing re-

liability. Sensing reliability is defined as the fraction of objects that are accurately detected

and recognized.

90

The experimental setup consisted of circular objects projected onto a wall with an area

of 3m× 1.65m. Objects appeared at random locations sequentially and stayed for a speci-

fied duration. Only one object was present in the viewable area at any time. Object appear-

ances were interspersed with periods of no object being present in the viewable area. A set

of four Motes, each connected to a CMUcam, constituted Tier 1 and two Stargates, each

connected to a webcam, constituted Tier 2 ofSensEye. Tier 1 Motes used a sampling period

of 5 seconds and their start times were randomized. The object appearance time was set to

7 seconds and the interval between appearances was set to 30 seconds. The single–tier sys-

tem consisted of the two Stargate nodes which were woken up every 5 seconds for object

detection. This differs fromSensEyewhere a Stargate is woken up only on a trigger from

Tier 1. The nodes at both the tiers were placed in such a manner that each tier covered the

entire viewable region as shown in Figure 5.6. The experiment used 50 object appearances

for measuring the energy and reliability metrics.

5.5.2.1 Energy Usage

Tables 5.4 and 5.5 report the number of wakeups and details of detection at each com-

ponent of the single–tier system andSensEyerespectively. As can be seen from the tables,

the Stargates of the single–tier system wakeup more often than the Stargates at Tier 2 of

SensEye. A total of 621 wakeups occur in the single–tier system, whereas 58 wakeups

occur at Tier 2 ofSensEye. The higher number of wakeups with the single–tier are due the

periodic sampling of the region to detect objects. Of out the total 621 wakeups, an object is

detected only 74 times in the single-tier system whereas inSensEyeTier 1 performs initial

detection and the Tier 2 Stargates are woken up fewer times— resulting in lower energy

usage. The Tier 1 sensor nodes are cumulatively woken up 1216 times. The energy usage

of SensEyeduring the experiment is 466.8 J, as compared to 2924.9 J by the single–tier

node, a factor of 6.26 reduction. If the CMUcams inSensEyewere replaced by Cyclops

cameras, a factor of 9.75 reduction in energy usage is obtained.

91

Component Total On Wakeup EnergyWakeups Object No Object Usage

Found Found (Joules)

Stargate 1 311 32 279 1464.8Stargate 2 310 42 268 1460.1

Table 5.4. Number of wakeups and energy usage of a Single–tier system. Total energyusage of both Stargates when awake is 2924.9 J. Total missed detections are 5.

Component Total On Wakeup Energy CyclopsWakeups Object No Object Usage Expected

Found Found (Joules) Energy(J)

Mote 1 304 15 289 50.7 8.96Mote 2 304 23 281 50.7 8.96Mote 3 304 27 277 50.7 8.96Mote 4 304 10 294 50.7 8.96

Stargate 1 27 23 4 127.17 127.17Stargate 2 29 25 4 136.59 136.59

Table 5.5.Number of wakeups and energy usage of eachSensEyecomponent. Total energyusage when components are awake with CMUcam is 466.8 J and with Cyclops is 299.6 J.Total missed detections are 8.

As reported in [45], the Cyclops with Mote consumes 1 mW in its sleep state whereas

an optimized Stargate consumes 15 mW in suspend mode. The CMUcam has a power

consumption of 464 mW in sleep mode and is highly unoptimized. Thus, in the suspend

state, the Tier 2 node consumes more than an order of magnitude more power than the Tier

1 nodes with Cyclops cameras. For our experimental setting of 30 seconds of idle time

between objects, this corresponds to an energy reduction by a factor of 33 forSensEye.

5.5.2.2 Sensing Reliability

Next I compare the reliability of detection and recognition of the two systems in the

above described experimental setup. The single–tier system detected 45 out of the 50 object

appearances andSensEyedetected 42—a 6% decrease in sensing reliability. The result

shows the efficacy of usingSensEyeinstead of a single-tier network, asSensEyeprovides

92

0

10

20

30

40

50

60

5 6 7 8 9

Perc

enta

ge u

ndet

ecte

d ob

ject

s

Duration (seconds)

Objects Not Detected

0

10

20

30

40

50

60

70

0.1 0.2 0.3 0.4 0.5 0.6 0.7

Per

cent

age

unde

tect

ed o

bjec

ts

Speed (m/s)

Objects Not Detected

(a) Undetected objects. (b) Undetected moving objects.

Figure 5.7. SensEyesensing reliability and coverage.

similar detection performance (6% more missed detections) at an order of magnitude less

energy requirements.

The sensing reliability ofSensEyeis dependent on the time for which an object is in

view, the sampling period at Tier 1 and speed of the object if it is moving. Since increasing

sampling period is same as increasing time for which object in view, the effect of different

times for which object is view on sensing reliability is studied. Figure 5.7(a) plots the

fraction of undetected objects with object in–view timings of 5,7 and 9 seconds. As seen

from the figure, when an object is in view for 5 seconds, 52% objects are not detected. With

a time of 9 seconds for each object to be in view, the percentage drops to zero. A timing of

7 seconds yields an intermediate value of 16% undetected objects.

To study the effect of speed of moving objects on sensing reliability, an experiment

where objects moved across the viewable area is conducted. The object started from a

random point on one side of the rectangular area and exited from another random point on

the other side. The sampling period used at the Tier 1 nodes was 5 seconds. Figure 5.7(b)

plots the percentage of undetected objects at different speeds of the moving object. As can

be seen, at the slowest considered speed of 0.2 m/s, a sampling rate of 5 seconds is able

to detect all objects atleast once. A speed of 0.6 m/s results in 62% undetected objects.

93

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

S2S1M4M3M2M1

Frac

tion

of W

akeu

ps

Component

Fraction of wakeups for detection

Overlapping Coverage DetectionSingle Coverage Detection

0

20

40

60

80

100

120

WebcamCmucamCmucam

% e

rror

Localization accuracy of camera sensors

(8x8)(16x16)(80x60)average

(a) Coverage Redundancy. (b) Localization accuracy.

Figure 5.8. Tracking at Tier 1 and Tier 2 inSensEye.

The trend shown is intuitive, given a sampling rate, higher speeds lead to higher undetected

objects. Based on the desired probability of detection, the plots can be used to choose

sampling rates for different object movement speeds.

5.5.3 Tracking at Tier 1 and Tier 2

Since multiple tiers cover a given region of interest, an object can be localized at either

of the tiers for tracking. The deployment densities at each tier is different and differ in the

spatial coverage redundancy. Further, the fidelity of acquired images by sensors at different

tiers also varies. In this section, I will present experiments that quantify spatial redundancy

and sensing reliability, by studying the localization opportunities and localization accuracy

of objects.

If an object can be simultaneously viewed by more than a single camera, it can be lo-

calized. Figure 5.8(a) plots, for each tier, the cases when only a single camera and multiple

cameras covered and detected an object. As can be seen, due to greater spatial redundancy

at Tier 1 than Tier 2, more objects can be detected by more than a single camera simulta-

neously. As a result, 54% of the objects can be localized by sensor nodes at Tier 1, while

94

only 28% objects can be localized by nodes at Tier 2—the results in line with the spatial

coverage redundancy of nodes at each tier.

Another important metric for tracking is the localization accuracy provided by the nodes

at each tier. Figure 5.8(b) is a scatter plot of localization accuracy for objects using the

CMUcam and the Webcam. The CMUcam uses8 × 8 and16 × 16 matrix representations

of the captured image (converted from88×143 and176×255 pixels respectively) for frame

differencing. This is representative of a typical centroid computation that is expected on

Cyclops nodes since these devices are resource-constrained both in memory and compu-

tation capability. The webcam uses a80 × 60 representation calculated from a320 × 240

pixels image. As seen from the figure, the webcam has the least localization error and

the CMUcam using a8 × 8 representation the largest error. The average error for each

configuration is 35%, 20.5% and 4.85% respectively.

Based on the above experiments, Tier 1 nodes can localize as much as twice the number

of objects as compared to Tier 2 but with 15-20% more error in accuracy. The trends

depicted in the figure indicate that if coarse location information is desired or suffices, Tier

1 based localization is sufficient. If accurate location information is required localization

should be performed at the second tier ofSensEye.

5.5.4 Coverage with Tier 3 Retargetable Cameras

To test the coverage and retargatable feature of the Tier 3 PTZ cameras, the number of

times a Tier 3 node successfully views an object is measured. The experimental setup had

40% overlapping coverage among Tier 1 nodes and the PTZ camera could view at most a

quarter of the total coverage area at any time. When an object was detected by more than

one Tier 1 node, previously described 3D localization techniques were used to calculate

the pan and tilt values and retarget the Tier 3 camera. Out of the 50 object appearances,

the PTZ camera could view 46—a 92% success rate. The experiment verifies that 3D

95

0

200

400

600

800

1000

1200

0 1 2 3 4 5 6 7 8 9 10

Powe

r Con

sum

ptio

n (m

W)

Sampling Period (seconds)

Power consumption at Tier 1

MoteCmucam

Total

0

1

2

3

4

5

6

7

30 40 50 60 70 80 90 100

Dist

ance

(fee

t)

Condifence Threshold

Maximum Detection Distance at Tier 1

(a) Effect of sampling period (b) Effect of confidence threshold

Figure 5.9. Sensitivity toSensEyesystem parameters.

localization techniques along with retargetable cameras have a high success rate and are

useful to improve coverage.

5.5.5 Sensitivity to System Parameters

SensEyehas several tunable parameters which effect energy usage and sensing reliabil-

ity. In this section, I explore the sensitivity to two important system parameters, sampling

rate and camera detection threshold.

The power consumption at Tier 1 is a function of the sampling period used to probe

the CMUcam and check for object detections. Figure 5.9(a) plots the power consumption

at a Mote with increasing values of sampling period. The sampling period is varied from

100 ms to 10 seconds and the power consumption at these two ends is 137 mW and 105.7

mW respectively. While the power consumption reduces with increasing sampling period

as expected, it quickly plateaus since the large sleep power consumption of the CMUcam

dominates at lower sampling periods.

From a sensing reliability perspective, each Mote uses a confidence threshold value to

compare with the confidence with which a CMUcam reports a detection. The threshold

determines when triggers are sent to Tier 2. A higher threshold means closer objects will

96

be detected more easily than farther objects and a lower threshold can more easily detect

objects at larger distances. The trend is verified by the plot shown in Figure 5.9(b). We

varied the confidence threshold from 30 to 100 and measured to maximum distance at

which objects are flagged as detected and its trigger sent to Tier 2. As can be seen in the

figure, a threshold of 30 can detect objects till a distance of 6.5 feet and with thresholds

greater than 80 the maximum distance drops to less than 1 feet. Choosing a good threshold

is important since it controls the false positives and false negatives, and hence the energy

consumption and reliability of the system.

5.6 Conclusions

In this chapter, I argued about the benefits of using a multi-tier camera sensor net-

work over single tier networks and presented the design and implementation ofSensEye, a

multi-tier camera sensor network. Using a implementation of a surveillance application on

SensEyeand extensive experiments, we demonstrated that a multi-tier network can achieve

an order of magnitude reduction in energy usage when compared to a single-tier network,

without sacrificing reliability. I have also evaluated the effect of several system parameters

on SensEyeand tested its ability to track objects and use the retargetable PTZ cameras.

Further, the implementation was also used to study the benchmarks of energy usage and

latency of each component at each tier.

97

CHAPTER 6

SUMMARY AND FUTURE WORK

In this thesis, I considered a class of sensor networks—camera sensor networks—

wireless networks with image sensors. I addressed the issues of automatic configuration

and initialization and design of camera sensor networks. I proposed notions ofaccurate

andapproximateinitialization to initialize cameras with varying capabilities and resource

constraints. As compared to manual calibration, which can take a long time (order of

hours) to calibrate several cameras, is inefficient and error prone, the automated calibration

protocol is accurate and greatly reduces the time for accurate calibration—tens of seconds

to calibrate a single camera and can easily scale to calibrate several cameras in order of

minutes. The approximate techniques demonstrate feasibility of initializing low-power re-

source constrained cameras with no or limited infrastructure support. Further, I proposed

multi-tier heterogeneous sensor networks to address drawbacks of single-tier homogeneous

networks. I designed and built,SensEye, a multi-tier heterogeneous camera sensor network.

UsingSensEyeI demonstrated how multi-tier networks can achieve simultaneous system

goals of energy efficiency and reliability. In this chapter, I summarize the contributions of

this thesis and future work to extend the scope of this work.

6.1 Automatic Accurate Calibration Protocol

Vision-based techniques are not directly suitable for calibrating camera sensor networks

as they assume presence of landmarks and abundant resources. In this thesis, I presented

Snapshot, an automated calibration protocol that is explicitly designed and optimized for

sensor networks. Our techniques are based on principles from optics and geometry and

98

are designed to work with low-fidelity, low-power camera sensors that are typical in sen-

sor networks. As experimental evaluation of our prototype implementation showed that is

feasible to employSnapshotto calibrate low-resolution cameras and it is computationally

feasible to runSnapshoton resource-constrained sensor nodes. Specifically, our experi-

ments showed thatSnapshotyields an error of 1-2.5 degrees when determining the camera

orientation and 5-10cm when determining the camera location. We argued that this is a

tolerable error in practice since aSnapshot-calibrated sensor network can track moving

objects to within 11cm of their actual locations. Our measurements showed thatSnapshot

can calibrate a camera sensor within 20 seconds, enabling it to calibrate a sensor network

containing tens of cameras within minutes. I have developed techniques to analyze the

effect of errors inherent in the calibration protocol on estimated parameters. Both the em-

pirical error analysis and derivation of a lower bound on the expected error verify that

Snapshoterrors are very small.

The specific contributions ofSnapshotare as follows:

• Designed and implemented an automated calibration protocol tailored for camera

sensor networks.

• Demonstrated that use of Cricket position sensors to automate the calibration proce-

dure does not effect accuracy.

• Studied the sensitivity analysis of the automated calibration protocol and its error

characteristics. The empirical error ofSnapshotwas found tolerable when compared

to the analytical lower bounds.

• Showed that the application-level error usingSnapshotis small. The error in a track-

ing application, usingSnapshotcalibrated cameras, is in the range of 10cm.

• The implementation ofSnapshotdemonstrates the design of protocol that is efficient,

quick, accurate and feasible to accurately initialize camera sensor networks.

99

6.2 Approximate Initialization of Camera Sensor Networks

Accurate calibration techniques are not feasible for deployments of ad-hoc low-power

camera sensors due to limited resources and lack of landmark nodes for beaconing. I

proposed approximate techniques to determine the relative locations and orientations of

camera sensors without any use of landmarks or positioning technologies. The techniques

determine the degree and range of overlap for each camera and show this information can

be exploited for duty cycling and triggered wakeups. I have implemented our techniques

on a Mote testbed and conduct a detailed experimental evaluation. The results show that

our approximate techniques can estimate the degree and region of overlaps to within 10%

of their actual values and this error is tolerable at the application-level for effective duty-

cycling and wakeups.

The specific contributions related to approximate initialization are as follows:

• Developed techniques to initialize camera sensor networks with no or limited infras-

tructure support.

• Demonstrated techniques to exploit the approximate initialization information to en-

able applications. The effective error at the application level was found to be accept-

able using camera sensors initialized using the approximation techniques.

• The proposed approximate initialization methods demonstrate feasibility of initializ-

ing low-power low-fidelity camera sensors quickly and efficiently.

6.3 Energy-Reliability Tradeoff in Multi-Tier Sensor Networks

In this thesis, I argued about the benefits of using a multi-tier camera sensor network

over single tier networks. Multi-tier networks provide several levels of reliability and en-

ergy usage based on the type of sensor used for application tasks. I presented the design

and implementation ofSensEye, a multi-tier camera sensor network. Using a implemen-

tation of a surveillance application onSensEyeand extensive experiments, I demonstrated

100

that a multi-tier network can achieve an order of magnitude reduction in energy usage when

compared to a single-tier network, without sacrificing reliability. I also evaluated the effect

of several system parameters onSensEyeand tested its ability to track objects and use retar-

getable PTZ cameras. Further, the implementation was also used to study the benchmarks

of energy usage and latency of components at each tier.

The specific contributions of the energy-reliability tradeoff usingSensEyeare as fol-

lows:

• Designed and implemented a multi-tier camera sensor network and demonstrated its

benefits over a single-tier homogeneous network.

• Using the tasks of object detection, recognition and tracking, quantified the energy

usage and latency benchmarks across different tiers.

• Studied and quantified the energy-reliability tradeoff of a multi-tier camera network

and found that a multi-tier network can obtain comparable reliability with substantial

energy savings.

6.4 Future Work

Related to extending the scope of work presented in this thesis, there exist numerous

challenges in the design and operation of multi-tier sensor networks. One of the initial de-

sign decisions is that ofplacement, coverage and task allocation. Given a fixed budget, in

terms of cost or number of nodes, an initial decision has to be made regarding the number

of tiers and number of nodes at each tier. These decisions are closely related to the place-

ment policies and coverage requirements of the applications. Solutions to place sensors

into multiple-tiers and satisfy coverage guarantees are required. Further, task allocation

policies are needed to map application tasks to each tier. I am interested in developing

solutions that will answer these initial configuration questions in a holistic manner. In am

also interested in exploring research problems related to thedynamic behavior of multi-

101

tier sensor networks. Sensors can fail over time, get overloaded or their remaining-energy

may have to be conserved to increase lifetime. In such cases, dynamic policies have to be

used to migrate tasks across tiers to maintain similar system guarantees or decrease reli-

ability to increase lifetime. These policies need to account for the varied capabilities and

requirements at multiple tiers and more over need to be distributed. I aim to develop such

adaptive policies to handle the dynamic behavior of multi-tier sensor systems.

As part of a broader goal, I am interested in real world deployments of sensor network

applications and use of sensor networks to disseminate geographical data via CDNs. Re-

lated to deployment, I am interested in studying the use ofenergy-harvesting sensors.

Due to the additional capability of recharging batteries periodically, such nodes change the

tradeoff of energy usage and other system metrics. I am interested in studying the effect

of this parameter through deployment and build optimized solutions for the same. To aid

quick deployment and ease of prototyping to evaluate proposed ideas, I would like to setup

a generic sensor networks testbed. I envision the testbed to consist of various hardware

platforms, including a variety of sensor and embedded platforms. Related to data dissem-

ination, CDNs disseminating geographical data and a sensor network used to retrieve data

from an area, often deal with spatially co-related data. The spatial co-relation can be ex-

ploited by CDN proxies and sensor network edge proxies for approximate responses with

spatial consistency bounds.

102

BIBLIOGRAPHY

[1] Anandarajah, A., Moore, K., Terzis, A., and Wang, I-J. Sensor Networks for Land-slide Detection. InProceedings of the Third International Conference on EmbeddedNetworked Sensor Systems(2005), pp. 268–269.

[2] Andy Harter and Andy Hopper. A Distributed Location System for the Active Office.IEEE Network 8, 1 (January 1994).

[3] Andy Ward and Alan Jones and Andy Hopper. A New Location Technique for theActive Office. IEEE Personal Communications 4, 5 (October 1997), 42–47.

[4] Bahl, Paramvir, and Padmanabhan, Venkata N. RADAR: An in-building RF-baseduser location and tracking system. InINFOCOM (2000), pp. 775–784.

[5] Bajaj, R., Ranaweera, S. L., and Agrawal, D. P. Gps: Location-tracking technology.Computer 35, 4 (March 2002), 92–94.

[6] Banerjee, N., Sorber, J., Corner, M. D., Rollins, S., and Ganesan, D. Triage: APower-Aware Software Architecture for Tiered Microservers. Tech. rep., Universityof Massachusetts, Amherst, April 2005.

[7] Berg, M., Kreveld, M., Overmars, M., and Schwarzkopf, O.Computational Geome-try, Second ed. Springer, 2000.

[8] Sony SNC-RZ30N Camera Driver.http://cvs.nesl.ucla.edu/cvs/viewcvs.cgi/CoordinatedActuation/Actuate/.

[9] The CMUcam2. http://www-2.cs.cmu.edu/ cmucam/cmucam2/index.html.

[10] Coleman, T. F., and Li, Y. On the convergence of reflective newton methods for large-scale nonlinear minimization subject to bounds.Mathematical Programming 67, 2(1994), 189–224.

[11] Sparton SP3003D Digital Compass. http://www.sparton.com/.

[12] D A. Forsyth and J Ponce.Computer Vision: A Modern Approach. Prentice Hall,2002.

[13] Devore, J. L. Probability and Statistics for Engineering and the Sciences, fifth ed.Brooks/Cole, 1999.

103

[14] E. Shih and P. Bahl and M. Sinclair. Wake on Wireless: An Event Driven EnergySaving Strategy for Battery Operated Devices. InProc. of ACM MOBICOM(2002),pp. 160–171.

[15] Estrin, D., Culler, D., Pister, K., and Sukhatme, G. Connecting the Physical Worldwith Pervasive Networks.IEEE Pervasive Computing 1, 1 (2002), 59–69.

[16] Estrin, D., Govindan, R., Heidemann, J. S., and Kumar, S. Next Century Challenges:Scalable Coordination in Sensor Networks. InProceedings of ACM MOBICOM(1999), pp. 263–270.

[17] F. Zhao and M. Chu and J. E. Reich. Distributed Video Sensor Network. InProc. ofIntelligent Distributed Surveillance Systems(2004).

[18] Fox, D., Hightower, J., Liao, L., Schulz, D., and Borriello, G. Bayesian Filtering forLocation Estimation.IEEE Pervasive Computing(2003).

[19] Gnawali, O., Greenstein, B., Jang, K., Joki, A., Paek, J., Vieira, M., Estrin, D., Govin-dan, R., and Kohler, E. The TENET Architecture for Tiered Sensor Networks. InPro-ceedings of the ACM Conference on Embedded Networked Sensor Systems (SenSys)(November 2006).

[20] Gnawali, O., and Yarvis, M. ”Do Not Disturb”, An Application Leveraging Hetero-geneous Sensor Networks. InACM SENSYS(2003).

[21] He, T., Huang, C., Blum, B., Stankovic, J., and Abdelzaher, T. Range-Free Localiza-tion Schemes in Large Scale Sensor Networks. InMobile Computing and NetworkingMOBICOM (2003).

[22] He, T., Krishnamurth, S., Stankovic, J., Abdelzaher, T., Luo, L., Stoleru, R., Yan, T.,Gu, L., Hui, J., and Krogh, B. Energy-efficient Surveillance System Using WirelessSensor Networks. InProceedings of the Second Internationl Conference on MobileSystems, Applications and Services(2004), pp. 270–283.

[23] Horn, B. K. P. Robot Vision, First ed. The MIT Press , 1986.

[24] Hu, W., Tran, V. N., Bulusu, N., Chou, C., Jha, S., and Taylor, A. The Design andEvaluation of a Hybrid Sensor Network for Cane-toad Monitoring. InProceedings ofInformation Processing in Sensor Networks (IPSN 2005/SPOTS 2005)(April 2005).

[25] Joseph ORourke.Computational Geometry in C. Cambridge University Press, 2001.

[26] Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems.Trans-actions of the ASME–Journal of Basic Engineering 82, Series D (1960), 35–45.

[27] Kay, S.Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory.Prentice Hall, 1993.

104

[28] Kulkarni, Purushottam, Ganesan, Deepak, and Shenoy, Prashant. The Case for Multi-tier Camera Sensor Networks. InProceedings of ACM NOSSDAV(2005), pp. 141–146.

[29] Liu, T., Bahl, P., and Chlamtac, I. Mobility Modeling, Location Tracking, and Tra-jectory Prediction in Wireless ATM Networks.IEEE Journal On Selected Areas InCommunications 16, 6 (August 1998), 922–936.

[30] L.Jiao and Y. Wu and G. Wu and E. Y. Chang and Y. Wang. The Anatomy of a Multi-camera Security Surveillance System.ACM Multimedia System Journal Special Issue(October 2004), 144–163.

[31] Logitech QuickCam Pro Webcam. http://www.logitech.com.

[32] Lorincz, K., Malan, D., Fulford-Jones, T., Nawoj, A., Clavel, A., Shnayder, V., Main-land, G., Welsh, M., and Moulton, S. Sensor Networks for Emergency Response:Challenges and Opportunities.IEEE Pervasive Computing 3, 4 (2004), 16–23.

[33] Mainwaring, A, Polastre, J., Szewczyk, R., and Culler, D. Wireless Sensor Networksfor Habitat Monitoring. InProceedings of the First ACM International Workshop onWireless Sensor Networks and Applications(2002), pp. 88–97.

[34] Manfredi, V., Mahadevan, S., and Kurose, J. Switching Kalman Filters for Predictionand Tracking in an Adaptive Meteorological Sensing Network. InProceedings ofIEEE SECON(September 2005).

[35] McLaughlin, D., Chandrasekar, V., Droegemeier, K., Frasier, S., Kurose, J., Junyent,F., Philips, B., Cruz-Pol, S., and Colom, J. Distributed Collaborative Adaptive Sens-ing (DCAS) for Improved Detection, Understanding, and Prediction of AtmosphericHazards. InProceedings of the Ninth AMS Symposium on Integrated Observing andAssimilation Systems for the Atmosphere, Oceans, and Land Surface(January 2005).

[36] Moses, R.L., and Patterson, R.M. Self-Calibration of Sensor Networks.UnattendedGround Sensor Technologies and Applications IV(April 2002), 108–119.

[37] Crossbow wireless sensor platform.http://www.xbow.com/Products/WirelessSensorNetworks.htm.

[38] N B. Priyantha and A. Chakraborty and H. Balakrishnan. The Cricket Location-Support System. InProc. of MOBICOM(2000), pp. 32–43.

[39] Pathirana, P. N., Savkin, A. V., and Jha, S. Mobility Modelling and Trajectory Pre-diction for Cellular Networks with Mobile Base Stations. InProceedings of theFourth International Symposium on Mobile Ad Hoc Networking & Computing(2003),pp. 213–221.

[40] Polastre, J., Szewczyk, R., and Culler, D. Telos: Enabling ultra-low power wirelessresearch. InProceedings of the Fourth International Conference on Information Pro-cessing in Sensor Networks: Special track on Platform Tools and Design Methods forNetwork Embedded Sensors (IPSN/SPOTS)(April 2005).

105

[41] Polastre, J., Szewczyk, R., and Culler, D. Telos: Enabling ultra-low power wirelessresearch. InProceedings of the 4th International Conference on Information Pro-cessing in Sensor Networks: Special track on Platform Tools and Design Methods forNetwork Embedded Sensors (IPSN/SPOTS)(April 2005).

[42] Priyantha, N. B., Chakraborty, A., and Balakrishnan, H. The cricket location-supportsystem. InIn Proceedings of the 6th annual ACM International Conference on MobileComputing and Networking (MobiCom’00), Boston, MA(August 2000), pp. 32–43.

[43] R. Collins and A Lipton and T. Kanade. A System for Video Surveillance and Mon-itoring. In Proc. of American Nuclear Society (ANS) Eighth International TopicalMeeting on Robotics and Remote Systems(1999).

[44] Rahimi, M., Baer, R., Iroezi, O. I., Garcia, J. C., Warrior, J., Estrin, D., and Srivastava,M. Cyclops: In Situ Image Sensing and Interpretation in Wireless Sensor Networks.In 3rd International Conference on Embedded Networked Sensor Systems(November2005), pp. 192–204.

[45] Rahimi, M., Baer, Rick, Warrior, J., Estrin, D., and Srivastava, M. Cyclops: In SituImage Sensing and Interpretation in Wireless Sensor Networks. InProc. of ACMSenSys(2005).

[46] Rao, A., Ratnasamy, S., Papadimitriou, C., Shenker, S., and Stoica, I. Geographicrouting without location information. InProceedings of ACM MOBICOM(September2003), pp. 96–108.

[47] Rosenfeld, A., and Pfaltz, J L. Sequential Operations in Digital Picture Processing.Journal of the ACM 13, 4 (1966), 471–494.

[48] Rowe, A., Rosenberg, C., and Nourbakhsh, I. A Low Cost Embedded Color VisionSystem. InInternational Conference on Intelligent Robots and Systems(2002).

[49] Rowe, A., Rosenberg, C., and Nourbakhsh, I. A Low Cost Embedded Color VisionSystem. InInternational Conference on Intelligent Robots and Systems(2002).

[50] Russell, S., and Norvig, P.Artificial Intelligence: A Modern Approach. Prentice Hall,2003.

[51] Savvides, A., Garber, W., Moses, R., and Srivastava, M. An Analysis of Error Induc-ing Parameters in Multihop Sensor Node Localization.IEEE Transactions on MobileComputing 4, 6 (2005), 567–577.

[52] Savvides, Andreas, Han, Chih-Chieh, and Strivastava, Mani B. Dynamic fine-grainedlocalization in ad-hoc networks of sensors. InMobile Computing and NetworkingMOBICOM (2001).

[53] Sheth, A., Tejaswi, K., Mehta, P., Parekh, C., Bansal, R., Merchant, S., Singh, T.,U.B.Desai, C.A.Thekkath, and Toyama, K. SenSlide - A Sensor Network BasedLandslide Prediction System. InProceedings of the Third International Conferenceon Embedded Networked Sensor Systems(2005), pp. 280–281.

106

[54] Sorber, J., Banerjee, N., Corner, M. D., and Rollins, S. Turducken: HierarchicalPower Management for Mobile Devices. InProc. of MOBISYS(2005), pp. 261–274.

[55] Stargate platform. http://www.xbow.com/Products/XScale.htm.

[56] Steere, D., Baptista, A., McNamee, D., Pu, C., and Walpole, J. Research Chal-lenges in Environmental Observation and Forecasting Systems. InProceedings of theSixth Annual International Conference on Mobile Computing and Networking(2000),pp. 292–299.

[57] Tinyos website. http://www.tinyos.net/.

[58] Tsai, R. Y. An Efficient and Accurate Camera Calibration Technique for 3D MachineVision. In In Proceedings of 1986 IEEE Conference on Computer Vision and PatternRecogition (CVPR’86), Miami Beach, FL(June 1986), pp. 364–374.

[59] Tsai, R. Y. A Versatile Camera Calibration Technique for High-Accuracy 3D MachineVision Metrology Using Off-the-Shelf TV Cameras and Lenses.IEEE Journal ofRobotics and Automation RA-3, 4 (August 1987), 323–344.

[60] U.M. Erdem and S. Sclaroff. Optimal Placement of Cameras in Floorplans to SatisfyTask Requirements and Cost Constraints. InProc. of OMNIVIS Workshop(2004).

[61] V.C. Raykar, I. Kozintsev and R. Lienhart. Position Calibration of Audio Sensors andActuators in a Distributed Computing Platform. InProc. of ACM Multimedia(2003),pp. 572–581.

[62] W. Feng and B. Code and E. Kaiser and M. Shea and W. Feng and L. Bavoil. Panoptes:A Scalable Architecture for Video Sensor Networking Applications. InProc. of ACMMultimedia(2003), pp. 151–167.

[63] Wang, F. Y. A Simple and Analytical Procedure for Calibrating Extrinsic CameraParameters.IEEE Transactions on Robotics and Automation 20, 1 (February 2004),121–124.

[64] Werner-Allen, G., Lorincz, K., Welsh, M., Marcillo, O., Johnson, J., Ruiz, M., andLees, J. Deploying a Wireless Sensor Network on an Active Volcano.IEEE InternetComputing 10, 2 (2006), 18–25.

[65] Whitehouse, K., and Culler, D. Calibration as Parameter Estimation in Sensor Net-works. InFirst ACM International Workshop on Sensor Networks and Applications(WSNA 2002)(2002).

[66] Xu, N., Rangwala, S., Chintalapudi, K., Ganesan, D., Broad, A., Govindan, R., andEstrin, D. A Wireless Sensor Netowrk for Structural Monitoring. InProceedings ofthe Second International Conference on Embedded Network Sensor Systems(2004),pp. 13–24.

107

[67] Zhang, P., Sadler, C., Lyon, S., and Martonosi, M. Hardware Design Experiencesin ZebraNet. InProceedings of the Second International Conference on EmbeddedNetworked Sensor Systems(2004), pp. 227–238.

[68] Zhang, Z. Y. A Flexible New Technique for Camera Calibration.IEEE Transactionson Pattern Analysis and Machine Intelligence 22, 11 (November 2000), 1330–1334.

[69] Zhao, F., Liu, J., Liu, J., Guibas, L., and Reich, J. Collaborative Signal and Informa-tion Processing: An Information Directed Approach.Proceedings of the IEEE 91, 8(2003), 1199–1209.

108

SENSEYE: A MULTI-TIER HETEROGENEOUS CAMERA SENSOR …lass.cs.umass.edu/theses/puru.pdfPURUSHOTTAM...

Documents

Transcript of SENSEYE: A MULTI-TIER HETEROGENEOUS CAMERA SENSOR …lass.cs.umass.edu/theses/puru.pdfPURUSHOTTAM...