Power Consumption and Performance of a Wearable · PDF fileappliance. Power Consumption and...

CARNEGIE MELLONDepartment of Electrical and Computer Engineering~

Power Consumption andPerformance of a Wearable

Computing System

Denis P. Reilly

1998

Advisor: Prof. Siewiorek

Power Consumption and Performance of a

Wearable Computing System

© 1998

by Denis P. Reilly

Denis P. Reilly - All Rights Reserved

Abstract

The Smart Module project generates a family of interoperable modules supporting real-time

Speech Recognition, Language Translation, and Speech Synthesis. The hardware, software, and

physical form factor of the modules functions less like a computer and more like a tool or an

appliance.

Power Consumption and Performance are key design goals for wearable systems. This

paper examines the effect that various design factors have on these metrics. While processor speed

and type do affect power consumption and performance, memory size and type of secondary

storage have the greatest influence on these goals. In particular, the time it takes to swap

information out of main memory can reduce the performance of a wearable system by more than

half. Finally, a method for modeling Speech Recognition performance to within 10% accuracy is

discussed.

Project Supervisor: Daniel P. Siewiorek

Title: Professor

This project was funded under DARPA Contract DABT 63-95-C-0026

Acknowledgments

I would like to thank my advisor, Dan Siewiorek, for the valuable advice he gave me in the

writing of this paper, as well as the Smart Module Project leader, Asim Smailagic, for giving me

the chance to make a contribution to this project. I would also like to thank the writers of the Smart

Module application software, Ralf Brown, "Ravi" Ravishankar and Kevin Lenzo. A special word

of thanks goes to John Dorsey, my "Special Programming Buddy", who helped me with my first

explorations into Unix Daemon programming, as well as the wonderful world of Newtonscript.

I’d also like to thank some of the people who I met here at CMU, most notably Phil

Koopman, who took me around Pittsburgh when I first came to CMU for Open House, only to be

the second reader on this project a year later. Things really do come full circle, I suppose. In

addition, I’ve encountered too many people in ICES and on D-level for me to possibly thank them

all individually here, so I won’t try - you all know who you are.

Finally, I’d like to thank my family and especially my fianc6, Sarah, for being there for me

this past year and for helping me keep a reasonable level of sanity.

Section 1

Smart Module Overview

1.1 Introduction

One of the main goals of "wearable computing" is to design small, dedicated, truly portable

computer hardware, tightly integrated with its software, that performs a specific function. This

changes the concept of how a computer can be used. Bringing computing-intensive applications to

a wearable platform means that users have mobile access to those applications at any time and any

place, much like a pocketknife can be used while hiking. A well designed wearable computer

should make using computing-intensive applications almost as easy and intuitive as using a hand

tool.

There are several criteria that can be of use when designing a wearable system:

¯ Keep the latencies involved with running the OS and the application as low as possible.

(as close to "instant response" as possible)

¯ Make the battery life as long as possible (reduce power consumption)

¯ Make the interface to the software as intuitive as possible.

¯ Make the form factor of the device as unobtrusive as possible.

(This involves making the device lightweight and operable in multiple orientations)

If all these criteria are met, then the end result will be computer that operates like a hand

tool instead of like a computer. The perfect wearable computer would be more like a Swiss Army

Knife, performing its function extremely well with no latency in an intuitive manner and with an

easily portable form factor, and running close to forever without changing batteries.

This paper will focus on the first two goals of improving performance and reducing power

consumption. These goals seem to be inherently contradictory at first glance: any computing device

that runs at a high clock frequency will tend to consume more power. This paper examines how

close the Smart Module project is to achieving these goals.

The Smart Module project adds two more criteria to the wearable computer design process.

These wearable devices must be modular; they should be usable in different configurations. They

must also be scaleable; existing code should be easily portable to the modules. By using a well-

known OS, the modules have the potential to run a wide variety of applications supported by its

hardware. The OS chosen was Red Hat Linux 5.0, because it is free, customizable, and the

applications already ran on the Linux platform. Linux itself is also scaleable in that it runs on a

wide range of configurations from multiprocessor Pentium II servers to 386-based machines.

SystemController

LTM, ]

Smart Modules Used Independently Smart Modules Used Together

Figure 1. Overall Smart Module System Architecture

4

The Smart Module project started out as an implementation of the PANLITE Text-to-Text

Language Translation software on a single wearable platform. [1] This implementation performed

English-to-Croatian translation through a serial link to a controller device. This controller device

could be any device that accepts serial input - here, a Newton MessagePad 2000 was used. The

current implementation adds a Croatian-to-English back-translator to the Language Translator

Module (LTM), as well as a second Speech-To-Text and Text-To-Speech Module (SRM,

Speech Recognition Module) with English Speech Recognition and Croatian Speech Synthesis

capabilities. Since a single serial link was insufficient for routing information between three

devices, the IP routing functionality built into Linux was used to connected the devices via Serial

PPP.

As shown in Figure 1, these modules can be used together or independently. The LTM

translates any textual information that is sent to it, regardless of the source, as long as the source

uses the proper communications protocol. Similarly, the SRM will convert speech into text and

vice-versa, regardless of the source of speech or text. When only one function is desired, only one

module need to be used. But when linked together, the modules combine their functions into a

comprehensive foreign language synthesis system.

1.2 Smart Module Software Architecture

The LTM runs the PANLITE language translation software, and the SRM runs CMU’s

Sphinx II Speech Recognition Software and Phonebox Speech Synthesis software. This paper will

not deal with the particulars of how each application works: interested readers are directed to

information in [4] [5].

The Smart Module system has two distinct kinds of processes: the Server-Application

Group and the System Controller. A Server-Application Group consists of a UNIX background

process which communicates with an application, such as PANLITE, via Inter-Process

Communication within a module. The server process also communicates with the System

Controller over the TCP/IP Network. The System Controller keeps track of what servers are

present on which modules, and coordinates the flow of information between the servers. It is

possible to interface any number of applications with one server process. Figure 2 illustrates how

data flows between the different pieces of Smart Module software.

I System Controller

TCP/IP

IPC

Data andCommands

Data

]Application]

Figure 2. Flow of Data in Smart Module Software

If a device were designed with sufficient memory, hard disk space, and user input

capability, it would be possible to host multiple Server-Application Groups and even the System

Controller on a single device, assuming this device can function at a sufficient level of

performance. This also allows for some measure of flexibility, as new specialized functions, such

as concurrent translation to additional languages, can be supported by either adding a new

hardware module or incorporating a new server into an existing hardware module.

The System Controller process does not usually need human interaction. In the current

configuration of the Smart Module System, the System Controller operates on a Newton

MessagePad 2000 to give the user a chance to correct the Speech Recognition output if some

words are misrecognized. If future modules contain applications with near-perfect accuracy, the

6

System Controller can be reduced to a simple state machine residing on one of the modules. Or,

since the System Controller is not very processor intensive, it can be included in another, less

powerful module. Whenever that module needs access to the more intensive computation, it can

connect to the appropriate Smart Module and have it do all the processing.

The key factors that determine how many processes can be run on a module are memory,

storage space, and available CPU cycles. Of these, the most important factor is memory, as will be

demonstrated later in this paper. Because it is desirable to run these applications with as little

latency as possible, the entirety of an application’s working data set should be able to stay memory

resident. If any of the application’s active data set is swapped out of fast memory, as most modern

OS’s will do when memory is exhausted, the slow disk accesses will drive the overall latency of

the application to unacceptable levels. Ideally, then, it is only desirable to add servers and

applications to a hardware module when there is sufficient memory for it, and avoid using swap

space altogether.

1.3 Smart Module Hardware Architecture

The Smart Modules have a very simple system-level hardware architecture. The heart of

this architecture is the Cardio processor card, which combines the processor and many of the

motherboard chips into one package, about the size of a PCMCIA card. The hardware architecture

of the modules is illustrated in Figure 3.

All the necessary signals for the ISA and IDE buses come out of the Cardio card. The

Cardio also supports two serial ports, which are used for communication between the modules.

Video Out and Keyboard capabilities are also provided on the board, and they are used in the

project for debugging purposes.

7

The ISA and IDE buses both typically operate at 8 MHz, with a bandwidth of 16 bits. The

ISA bus is limited to 8 MB/s throughput, while the IDE interface can be pushed up to 13 MB/s

throughput. [6] Main memory is significantly faster than this - although the Cardio data sheet [7]

does not have complete information on the internal memory bus of the Cardio, a reasonable

speculation is that the 133 MHz 586-based Cardio has at least a 33 MHz system bus with a

bandwidth of 32 Bits. Even with a wait state, this speculative memory architecture can move 66

MB per second. The main memory is indeed significantly faster than the ISA or IDE buses.

CARDIO

PCMCINHD

ISA

ESS 1888

- Processor [Keyboard I- Memory ~ Mouse- Chipsets VGA

(For Debugging Only)

ISerialPorts

Communicationsto other Modules

Figure 3. Smart Module Hardware Diagram

The secondary storage drives are of the Type II and Type III PCMCIA form factor, but

these drives also support an IDE interface. The PCMCIA socket that is on the Smart Modules is

wired directly into the IDE bus, and there is no PCMCIA controller in the hardware design. While

this precludes the use of anything other than hard disks in the PCMCIA slots, it does save space in

the overall design.

The SRM also includes a sound chip, the ESS 1888, that is used in many commercial

TMsound boards, most notably the SoundBlaster series. This chip is wired directly into the ISA bus

8

according to ESS reference designs; the result is that the software running on the Cardio sees the

ESS chip as a normal sound card attached to the ISA bus, and deals with it accordingly.

1.4 Smart Module Communication

The original Smart Module communications infrastructure consisted of a simple serial link

between the System Controller and the LTM module. This configuration was adequate for

communications between two devices over one link. But this infrastructure became inadequate

when a third device was needed. A second serial link could have been added between modules, but

some type of forwarding scheme would have been necessary between the modules. One would

have to implement this scheme directly in the server software. This scenario would necessitate

reprogramming the server every time a new module is added, reducing the overall modularity of

the system.

Original SM Communications Interface

Serial PPP Network(Forwarding done Automatically)

New SM Communications Interface

Figure 4. Simple Serial Link vs. Serial PPP

9

The communications infrastructure has been changed to a TCP/IP based network running

over serial PPP links, as detailed above in Figure 4. TCP/IP can be built directly into the Linux

kernel, eliminating the need to deal with network particulars in the Server software. It also

supports packet forwarding directly in the kernel. Finally, it can be utilized over a variety of

communications media, opening up the possibility of eventually replacing the serial PPP link with

a serial or PCMCIA-based wireless solution. It is even possible for the system to communicate

with any TCP/IP based intranet or the Internet, if a module is configured as a gateway with a

connection to an outside network.

SystemController

Translator

Figure 5. The Smart Module Virtual Network

Using this networking scheme, the position of each module in the physical network does

not matter; the System Controller simply sends out all communications for all modules over the

same link, creating a virtual network as shown in Figure 5. The 1nodules themselves handle

routing. New modules added to the system can have the capability to modify each others’ routing

tables automatically. Currently, because all of the modules used are physically connected with each

other, the Linux ppp server automatically configures the routing tables of the modules. But if more

modules are added to the system, a dynamic routing protocol must be used to modify the tables of

a module that may not be physically connected to the module that is added.

10

Section 2

Power/Performance Analysis

2.1 Factors affecting Power Consumption and Performance

Improving power consumption and performance seem to be contradictory goals at first

glance: any computing device that runs at a high clock frequency will tend to consume more

power, and a sure way to insure that a device consumes less power is to operate it at a lower clock

frequency. The following are factors that should affect the overall power consumption of the

system:

- Increases in raw processor clock speed between processors of the same generation should

increase the overall power consumption of the system. Power consumption in digital systems is

directly proportional to the rate at which the system clock operates in CMOS devices, since most of

the power is consumed when transistors switch.

- Advances in technology between microprocessor generations have the potential to enable

more advanced processors to do more on the same power budget. This means that, depending on

the goals of the microprocessor designers, a processor from the next generation can operate just as

fast, or even faster, then processors of the current generation while using less power. While this

does not hold true for all microprocessor families, it is important that any processor used in a

wearable system be more power efficient then its less advanced siblings, and that the increase in

performance does not come at the expense of a proportional increase in power consumption.

11

- Using a flash disk for secondary storage should consume less power then using a

spinning disk drive for many reasons, most notably because flash disks have no moving parts.

- Regardless of the type of secondary storage used, performance will decrease and power

consumption will increase if swap space is used excessively by the applications. This is due to the

increased disk activity when swapping. The Cardio cards do support additional memory, but this

memory must be added on the ISA bus, making that memory much slower than spinning disk

drives, and degrading the overall performance of the system.

2.2 Method of Testing

Power consumption measurements were taken by monitoring the current consumed by each

module. Each module was tested at a constant 7.2 volts. The current was measured by a multimeter

at intervals of 400 ms. The multimeter output all its measurements over a serial link, through which

the measurements were recorded. Using this equipment, an accurate measurement can be taken of

the current flowing through the device (and thus, the power consumed) at any point in time.

Appendix D has a typical graph of power consumption over time.

The parts of the module that contribute to power consumption are the Cardio module

(which includes the processor, memory, and support chipsets), the IDE drive, the sound chip, and

the serial ports. Although it is possible only to measure the power consumption of the module as a

whole, the global power consumption statistics will yield clues as to how much power each part is

consuming.

Performance can be quantified in many different ways. For this experiment, it is considered

in terms of how quickly the end-user sees results after initiating a transaction - the less latency that

12

the end-user perceives, the better. This means that the performance of the system actually takes into

account various factors: the speed of the processor and memory, the latency of the application task,

the communications overhead, and the latency of the system controller. Improving any one of these

factors will in turn improve the overall performance of the system. However, the only factor that

can be changed is the speed of the processor and memory.

The power and performance measurements were taken using a body of 10 English test

sentences for the Recognizer and Translator, and their 10 Croatian translations for the Synthesizer.

These sentences were chosen so that each application would have sentences of varying difficulty to

interpret. The additional Swapping measurements were taken using an additional body of 24

sentences sent to the Translator in rapid-fire succession.

2.3 Configurations Tested

It is necessary to keep the Smart Module server software consistent across all the

configurations tested for the sake of comparison. However, there are different hardware

parameters that can be modified in order to test the effect of various factors on power consumption

and performance:

¯ Cardio Cards:

Different Cardio cards were used in the Smart Module system to analyze the effect of

processor type, system speed, and memory on the performance of the system. These cards are

summarized in Table 1.

13

Processor586486486486

Speed(MHz) Memory(MB)133 32100 32100 1675 16

Table 1. Cardio Cards used for experiments

¯ Memory: 32MB/16MB

Some Cardios may not have enough memory for all applications to function effectively

without swapping. The effect of swapping on system performance is examined.

¯ Secondary Storage: Spinning Disk Vs Flash Disk

Spinning disks are the standard secondary storage device for most computing applications. Flash

Disks generally consume less power than spinning devices and they have faster read access times,

but they have slower write access tfines.

2.4 Power Profile of the Modules

From the power consumption data listed in Appendix A, it is possible to develop a power

profile for the Smart Module system. The OS of the Smart Module system is pared down to the

point that the processor is at nearly 0% usage when the main application is not running (idle

mode), and nearly 100% usage when the main application is running (Full On Mode). Therefore,

graph of power consumption over time would resemble what is shown in Figure 6.

14

Power (W)Idle

Suspend

FullOn

Idle

Time

Figure 6. Model of Power Consumption over Time

According to this model, each module has three states that apply to Power Consumption:

Suspend, Idle, and Full On. Each state has an approximate power consumption value associated

with it. In addition, each state transition has a latency value (in seconds) associated with it. This

state diagram is depicted in Figure 7.

Figure 7. Power Consumption States of Smart Module System

15

While this model works well when the system operates with a flash disk drive, it alone

does not adequately describe the behavior of the system when using a spinning disk drive. The

spinning disk drive can also be represented by a state diagram, operating concurrently with the

state diagram for the module. This diagram is included as Figure 8.

Figure 8. State Diagram for Spinning Disk Drive

Power Down mode occurs when the power to the disk drive is cut off. This can occur if

and only if the module is in suspend mode. High Spin mode occurs when the disk drive is being

accessed. Low Spin mode is a power-saving mode implemented by the Linux Kernel.

The new model of power consumption vs. time for each module using a spinning disk

drive is the cross-product of these two state diagrams. Since the disk drives’ Power Down mode

can only occur during the module’s Suspend mode, and vice versa, that reduces the total number

of states in the new model from nine to five. These are Suspend, Idle & High Spin, Idle & Low

Spin, Full On & High Spin, Full On & Low Spin.

16

Power(W)

Idle &High Spin

Suspend

Full On &High Spin

I l_ ..... Idle &Low Spin

Full On &Low Spin

Time

Figure 9. New Model of Power Consumption over Time

Since the total power that is consumed by an active module is composed of the idle power,

the power attributed to the spinning disk (in whatever state it is in), and the power contributed

the application process, the total power can be modeled by the following formula:

2.5 Power Consumption and Performance Analysis

From the power consumption data listed in Appendix A, the distribution of power

throughout the system can be determined as shown in Figure 2. These figures were obtained using

the 133 MHz 586 Cardio card with 32 MB of memory.

Powe~us~,~ndPowertdte

SRM 680 mW 1.9 WLTM 160mW 1.4 W

Table 2. Power Consumption in Suspend and Idle States.

17

The Smart Modules consume their absolute minimum amount of power while they are in

suspend mode, when they consume a mere 160 mW. While the system is running, but idle, about

1.48 W of power could be directly attributed to the Cardio card and various support chips.

As the SRM module is currently designed, the sound chip is always activated, even when

the module is in suspend mode, and it always consumes a minimum of 500 mW. This means that

the SRM, even when in suspend mode, consumes 660-680 mW. This makes a significant

difference in terms of how long a module can be suspended without changing batteries.

While the module is in its idle mode, it consumes a minimum of 1.4 W. Again, with the

sound chip on, the SRM will consume a minimum of about 1.9 W in idle mode. These figures

were measured while each module was idle and running off the Flash disk drives.

SRMLTM

Low S~in High Spin0.6 W 1.2 W0.6 W 0.9 W

Table 3. Power~i.~,~ during Idle Loop (133 MHz 586 w/32 MB RAM)

The spinning drive consumes between 0.9 W and 1.2 W of power while it is spinning, as

shown in Table 3. This can bring the total idle power consumption, Poweria~e + Powerz)i,,~, to a

maximum of 3.1 W for the SRM when the drive is in full spin mode.. The spinning drive has the

ability to partially spin down, however, and save approximately 400 mW to 600 mW when it is in

this low-spin state.

Finally, the applications themselves consume a certain amount of power, PowerAp~ticatio,~, as

shown in Table 4.

18

Function Power

Recognition 3.4 WSynthesis 3.5 W

Translation 2.0 W

Table 4. Additional Power consumed by Applications (PowerApp~c,,~on)

The fact that the sound chip always uses about .5 W of power makes for an inefficient

suspend mode on the SRM. The SRM consumes slightly under 100 mA while in suspend mode,

while the LTM, which does not have the sound chip on board, consumes only 25 mA in suspend

mode. This means that the LTM would last four times as long on batteries as the SRM if the

modules are kept in suspend mode.

The battery life of the synthesizer module over a variety of duty cycles is shown in Figure

10. Given an ideal 1150 mAH battery, the battery life is represented by:

Battery Life ( Hrs) 1150 mAH

Total Current Consumed

Modifying this equation to be in terms of power leads to:

Battery Life ( Hrs) 1150 mAH * 7.2 V

Total Power Consumed

Expressing the Total Power Consumed in terms of the sum of the time that the module is

running an application and the time that the module is idle (or suspended) yields:

Battery Life ( Hrs) 1150 mAH * 7.2 V

( PowerApp * Duty CycleApp) + ( Power~ie + Power~isk * Duty Cycle1~e

or, Battery Life (Hrs) 1150 mAH * 7.2 V

( PowerApp * Duty CyCleApp) q- ( Power,~..~pen a * Duty Cycletaie)

For the purposes of this discussion, Duty Cyclezez~ = 1 - Duty CycleApp

19

14

12

0

Battery Life Of Synthesizer Module

Duty Cycle

~High Spin ~Low Spin ......... Flash Disk ..................SuspendModeI

Figure 10. Battery Life of Synthesizer Module using 1150 mAH battery

The vastly reduced power dissipation and performance characteristics of flash technology

for secondary storage make it an obvious choice for wearable devices, even considering the high

cost of flash disks. While using flash memory does not affect the modules’ power dissipation

while sleeping, it does reduce power during operation, and increase battery life, as seen in Figure

10. This data indicates that when the module is performing only Speech Synthesis with a 20% duty

cycle, using a flash disk will extend the life of a 1150 rnAH battery by approximately 33% over a

spinning disk drive. It should be noted, however, that if the drive is configured to spin down at

certain times, then the actual battery life of the spinning disk drive in the above graph will be

somewhere between the High Spin and the Low Spin curves.

The effect of the use of the Suspend Mode can also be seen on the graph. With this curve,

the Duty Cycle refers to the amount of time the application is running vs. time in suspend mode.

20

By clever utilization of Suspend Mode features in module design, the effective life of a module on

one battery can increase to almost 5 hours with a 20% duty cycle, and this value can go up to 6 1/2

hours if the sound chip is not powered while in suspend mode.

While the Advanced Power Management that is included in each Cardio system enables the

module to control when it goes to sleep, it should not affect power while the unit is awake.

However, when certain types of legitimate APM calls are used, it was found that the module will

actually consume more power when it is idle then when it is running an application. This is

apparently because the Cardio does not correctly implement the CPU Idle call in the AMP

specification [3]. If the Linux AMP driver makes these calls when the CPU is idle, as indicated in

the specification, the idle power actually goes up from about 2.8 W (in High Spin mode) to close

to 6W.

LTM with CPU Idle Calls

6

5.5

4.5

4

3.5

3

2.5

Time (400 ms intervals)

LTM without CPU IdleCalls

5

4.5

2.5

2

Time (400 ms intervals)

Figure 11. Effect of CPU IDLE calls on Power Consumption

21

Another function that the Linux kernel’s APM driver performs is the spinning down of the

spinning disk drive. When there have been no disk accesses for about 12 seconds, the hard drive

spins down, conserving about .5 W. The drive stays in this state until another disk access is made.

It is extremely hard to determine a "duty cycle" for disk spinning when the applications are

running, since the probability of accessing the disk at a given time is highly dependent on what

data is resident in memory from previous operations. Nevertheless, the full-spin power

consumption can be treated as an upper bound and the low-spin power consmnption as a lower

bound for the actual power consumed by the module.

Cardio

586/133/32486/100/32486/100/16

486/75/16

Spinning Drive Flash Drive!

1.889 xRT 1.824 xRT2.402 xRT 2.355 xRT2.280 xRT 2.320 xRT3.046 xRT 2.826 xRT

Table 5. Performance of Synthesizer Module

Table 5 reports the Performance characteristics of the Recognizer application over a range

of configurations. The performance statistics listed are the Sphinx software’ s own internal measure

of performance vs. Real Time. As expected, the more advanced cards performed the task faster

than the less advanced cards.

Simultaneously considering power and performance yields the trade-off chart depicted in

Figure 12. The error value used for the error bars in Figure 12 and 13 was calculated as the

average of the standard deviations of 20 identical utterances of four selected sentences in the

sample set across all four different Cardio cards. While the error varied from 5% to 10% of the

mean performance value for a particular sentence, it stayed rather consistent across processor

configurations. The calculated error value that was used for all error bars on the graph was seven

percent of the performance value for that point.

22

Power vs Performance for RecognitionSoftware (Idle Power Included)

6.5

5.5

~~48 486/100/32 Spin6/100/16 Spin

586/100/32 Spin

486/75/16 Spin

486/100/32 Flsh486/100/16 Flsh

586/100/32 Flsh

[ ~. [ 486/75/16 Fish

2 2.5 3 3.5

Performance (multiple of Real Time)

Figure 12. Total Power vs. Performance for Recognition Application

Figure 12 above illustrates the relationship between power and performance for the

different tested configurations. The ideal relationship would be one in which very little power is

consumed to do work with minimal latency. Since the ideal relationship is a point arbitrarily close

to the origin on this graph, the distance of each point from the origin would be a measure of how

well each configuration fares. Since performance is measured as a ratio of latency vs. real time,

power should be expressed as a ratio as well - the total power over the idle power for the module

at high spin, which is 3.1 W. Once that axis is normalized, the distance of each point from the

origin is a measure of how well that configuration maximizes power consumption and performance

concurrently.

23

Power vs

2.3

2.2

2.1

2

1.9

1.8

1.7

1.6

1.5

Performance for RecognitionSoftware (Normalized to Idle Power)

[ ¯ ’, 486/100/32 Spin~ 486/100/16 Spin

586/100/32 Spin

486/75/16 Spin

486/100/32 Fish486/100/16 Fish

586/100/32 Flsh

] @ ~ 486/75/16 Fish

1.5 2 2.5 3 3.5 4

Performance (multiple of Real Time)

Figure 13. Total Normalized Power vs. Performance for Recognition Application

Cardio Used

486/75/16 Spinning486/100/16 Spinning486/100/32 Spinning586/133/32 Spinning486/75/16 Flash486/100/16 Flash486/100/32 Flash586/133/32 Flash

Dist.~omOfigin

3.62883.19393.30632.82433.22842.93232.97662.4625

Table 6. Distance from Origin for Points in Figure 10.

Table 6 clearly shows the advantages, from a power consumption standpoint, of flash

disks over spinning disks. It also shows a clear progression from the least advanced configuration,

24

the 75 MHz 486/16, to the most advanced card, the 133 MHz 586/133/32, in terms of power

consumption and performance.

As a final test, the effect that disk swapping has both on power consumption and

performance was examined on the LTM. This module runs two translation processes, one that

translates from English to Croatian, and one that translates from Croatian to English. Even with 32

MB of memory, these processes still require 1-5 MB of swap space to coexist without swapping.

Future versions of the PANLITE software will address this issue and operate within the memory

constraints. But the current software provides an estimate on into the effect that swapping has on

performance.

Swapping Penalty

7O

60

~" 50

~ 40

30

N 20

10

Swapping No Swapping

Figure 14. Swapping Penalty of the LTM

This experiment used the 133 MHz 586-based Cardio with 32 MB of memory. It consisted

of translating a series of 24 sentences of varying lengths. The first test was done with only one

translation process active; this process was able to use the available memory without requiring

swap space. The second test was done with both translation processes active, even though only

25

one was used. With both processes active, about 5 MB of disk space was actively used as swap

space. The effect of swapping on the system was dramatic - although the average power

consumption of the system decreased by 4%, the running time of the experiment more than

doubled.

26

Section 3

Performance Forecasting

Now that the performance and power consumption of these systems has been determined

and analyzed, the next logical step would be to develop a model to predict power and performance

values for potential new systems before actually porting the applications to these platforms. This

would be a good way to test candidate platforms before implementation of the solution.

Power consumption is a bit easier to predict than performance. While applications are

running, they will consume as much processor and system resources as they can possibly use. To

determine the maximum power, a benchmark should be run that activates every resource the

processor has, and the result will be a reasonable upper-bound on power consumption. The

benchmark used should be chosen such that any peripherals used by the application also experience

full utilization. Actual power consumption will be determined by how efficient the memory system

is, as well as the duty cycle of the application.

But determining performance across processor generations and families can be a difficult

task. Synthetic benchmarks are a useful tool in determining performance. However, synthetic

benchmarks often use narrow, restrictive algorithms that, at best, can never truly mimic the

performance of a specific application or, at worst, take advantage of an architectural feature that

would go unnoticed in a real application, giving misleading numbers. However, since the smart

module applications are computing-intensive, a synthetic benchmark would serve as a good first-

order approximation of potential performance.

27

For the case of the Speech Recognition application, performance is measured as a ratio of

the total number of seconds to process an utterance over the length of the utterance in seconds. The

absolute lower bound to this performance ratio is 1. (Even if it were computationally possible to

get faster-than-real-time results, it is impossible for a recognition system to recognize an utterance

before it is uttered!) Therefore, when looking at performance from different processors, it is the

portion of the performance ratio that is above one that is significant.

Assume that the relationship between performance and a synthetic benchmark is linear in

nature. (Due to high number of occurrences of floating-point calculations in the Sphinx code, the

BYTEmark Floating Point index was used.) Since overall performance increases when the

corresponding performance ratio goes down, this relationship can be expressed as:

(Performance Ratio) Constant

Benchmark Index

or, if P = Performance Ratio and B = Benchmark Index from a given platform,

Pl * B1 _ P2 when comparing two platforms.

B2

If it is further assumed, as stated above, that the lower bound of the performance ratio

occurs when the performance ratio equals 1, then it should be possible to take this factor out of the

performance equation. (While this is not entirely accurate, since Sphinx does do some work before

the utterance is finished, this inaccuracy will serve to insure that the equation will give a lower-

bound to performance.) This makes the final equation:

((P~- 1) * B,) = ((P2-1)

(P1 - 1) * or, ~-I=P2B2

28

Using this equation, and using the performance statistics from the 486-based 75-MHz

Cardio as a basis, Figure 15 predicts performance over a range of actual BYTEmark FP index

values.

While this does serve as a rather crude approximation, it is an upper bound for the

measured value of performance vs. real time, as denoted by the boxes being below the predicted

curve. The error bars on the curve represent the seven percent experimental measured variation that

was observed in Section 2.5. If the relationship between performance and the benchmark index is

proved to be non-linear on a particular platform, then this equation may have to be modified. But at

least for the tested platforms, this equation can be used to give a first-order approximation of

Recognizer performance.

Predicting Performance of Recognizer

3.5000

3.0000

"-~ 2.5000

~ 2.0000

1.5000

4~5/16

486/10~

1.0000 + 4 ~0.200 0.250 0.300 0.350 0.400 0.450 0.500

BYTEmark FP Index

Figure 15. Predicting the Performance of the Recognizer Applicationusing Synthetic Benchmarks.

29

Section 4

Summary

4.1 Conclusions

Power consumption and performance are key design goals for wearable systems. And even

though it is easy to maximize one at the expense of the other, it is possible to improve on both

concurrently.

Memory size and type of secondary storage have a great impact on the power consumption

and performance of these systems. If main memory is not large enough to hold the working sets of

all applications without swapping, then the performance of the overall system could be cut by more

than half. If flash cards are used for secondary storage, then the overall power consumption of the

system could be improved by as much as 20%.

Of potentially more importance is the ability to test future hardware platforms for their

fitness to host applications for wearable systems before actually porting the applications. The

methods outlined in Section 3 will give a good first-order approximation for the performance that

the Speech Recognizer application might achieve on a new platform.

4.2 Future Work

There are some simple ways this work can be extended: future modules should have a way

to disable peripheral chips like the Sound chip when the module enters the suspend state, reducing

30

the power consumed while suspended. And it would be interesting to see how the system performs

when the Translator is modified so swapping does not occur when two translators attempt to share

32 Megabytes of RAM.

Larger flash cards will be available soon, and when they arrive it will be possible to

examine the benefit of flash technology on the Translation and Synthesis applications.

But the real challenge that lies ahead is not just to continue optimizing performance of these

modules, but to integrate them into some of the other wearable computing projects that are

organized by the Institute for Complex Engineered Systems. The System Controller process does

not require much processing power to operate, and can be easily integrated into an existing

wearable system. Then, when a user needs "expert knowledge" such as Language Translation or

Speech Synthesis, he or she can just plug in the proper module and have access to that knowledge

in any remote location.

31

Appendix A

Experimental Results

Aolo General Notes

In the tables of results, the different Cardios are indicated by the following abbreviations:

A = 486-based 75 MHz Cardio with 16 Megabytes of memory.

B = 486-based 100 MHz Cardio with 16 Megabytes of memory.

C = 486-based 100 MHz Cardio with 32 Megabytes of memory.

D = 586-based 133 MHz Cardio with 32 Megabytes of memory.

All Power values are in Watts, and all time values are in Seconds, unless otherwise

indicated.

A.2. Recognizer Data

Summary of Reco£1nizer DataSpinnin£1 Disk Flash Disk

A B iC D A B C DS~s_pendPw[ _ 0.6840 0.6768! 0.6678 0..68~4_0. 0.~_7~05~6~0~.7~3~_4 _0_.6768, .0..6678

~#iq~Pwr_ 3.0960 3.3192 3.3480 3.1608 n/a n/a~ .#/a #~’Spin Down Pwr 2.5056 2.5848 2.5992 2.4408 1.9440 2.0304 2.0088 1.8504TotaJ De~£ Time 66.2 57.2[ 59.6 40.8 68.4 57.6 58.8 45.6

Latency (x RT) 3.046 2.280~ 2.402 1.889 2.826 2.320] 2.355 1.824

32

Ao3, Synthesizer Data

Max PwrMin PwrA~vg Demo PwrTotal Demo Time

Summar)/of S~/nthesizer Data’A IB IC D

6.876 7.7760 7.7616 7.0920[_3.1_6_.0__8 3.3192_ 3.3264_ 3_~3!9~

6.274 6.2072 7.4880 6.745070 59.21 54.4 34.0

A.4. Translator Data

The Translator Module was not able to function correctly with the 16MB cards when both

translators were active at the same time, so all measurements were taken with the 32 MB cards.

Summary/of Translator DataC D

Suspend PwrMax PwrMin PwrAvg Demo PwrTotal Demo Time

0.1632~ 0.15844 ’9-52. 4.80962.9952 2.82244.0006 3.7975

641 63.6

Swapping Experiment1~ Swap iW/Swap

Max Pwr 6.372 5.0904Min Pwr 2.8008 2.808_A%g Demo PwrTotal Demo Time

3.9868 3.836928.4 64

33

Appendix B

Demo Sentence Set

B.1. Sentences for Main Experiment

These are the sentences that are used for the Recognizer and Translator experiments, along

with their Croatian translations that are used in the Synthesizer experiment. All translations are

obtained directly from the PANLITE translator. While the Translator should process these

sentences with little difficulty, the current version of the Synthesizer has trouble with some of the

Croatian words. Also, the Recognizer generally has an easier time recognizing the earlier sentences

than the later sentences, since the earlier sentences are present in the Language Model of Sphinx.

Are there any mine fields on this map?

Ima li ikakvih minskih polja na ovoj mapi li?

Has anyone been hurt by mine explosions?

Nekoga bio pouredzen u svoje eksplozijama li?

Please show me on this map.

Molim vas poka~ite mi na ovoj mapi.

Speak into the microphone more slowly.

Govorite u mikrofon jo~ sporo.

It is important that we locate landmines.

Va~no je va~no se locirati zakopane mine.

How long are you staying in this town?

Koliko fete se zadrzati u ovom gradu li?

34

Who can I speak with about that?

Mo~e dane razgovara sa oko je li?

How much is that worth?

Koliko je taj vrednost li?

Why are you here?

Zagto se vas ovdje li?

When will he return?

Kad 6e on izbjeglica li?

B.2. Sentences for Swapping Experiment

For the second set of experiments that dealt with swapping in the LTM, the following set of

sentences were used. Each sentence was sent to the translator in a separate command packet, in

rapid-fire succession.

Put down the gun.

Come with us.

Where does this road lead?

We cannot protect you if you cross the border alone.

Do not copy this document.

A fee is required if you want to use our name.

The problem is that we do not have any place to train.

The construction will continue for two more weeks.

This office is closed.

I cannot help you right now.

I am the commander of this division.

35

I am not carrying any weapons.

This is a camera.

Direct all your complaints to my commanding officer in Sarajevo.

Please tell us where the minefield is.

Read this report for your own safety.

If we do not succeed, we risk failure.

Please do not touch the equipment.

It is very sensitive.

We do not accept dinars as currency.

I do not know your cousin in the United States.

How much are these cigarettes?

I will give you this pack of cigarettes if you stop bothering us.

Turn around and lie on the ground.

36

Appendix C

Power Consumption Over Time

C.1. Typical Power Consumption Graph

The following graph illustrates the power consumption of one module configuration over

time. (This example used the 586-based card running at 133 MHz with 32 Megs of memory and

spinning disk drive.) Here, many of the power-related events discussed in the paper can be seen

graphically, along with a discussion of their relevance.

¯ Ticks 1-12: The module is in sleep mode, consuming approximately 680 mW.

¯ Ticks 13-35: The TCP/IP connection is established. While this appears to contribute

substantially to the power consumed in the system, most of the power consumption in this area

can be attributed instead to bringing the system out of suspend mode and spinning up the HD

from rest.

¯ Ticks 36-78: As the experiment starts with the ten sentences listed in Appendix B, we see that

the hard drive is in Full Spin Mode here, consuming 3.16 W when idle.

¯ Ticks 79-149: The Hard Drive spins down in the middle of sentence #2. The module now

consumes 2.4 W when idle.

¯ Ticks 150 - 278: As the experiment continues, the hard drive spins up at tick 150 and down

again at tick 184.

¯ Ticks 278-279: This spike is attributed to closing the TCP link.

¯ Ticks 280-305: Closing the TCP link caused the drive to spin up again.

37

Cardio D with Spinning Disk

7

6

Time (400 ms Ticks)

Figure C- 1. Power Consumption of Synthesizer Module over time

38

Bibliographly

[ 1] C.K. Christakos. Optimizing a Language Translation Application for Mobile Use. Master’s

Project, Carnegie Mellon University, 1997.

[2] J. Dorsey. "Smart Module Networking", personal communication, 1998.

[3] Intel Corporation and Microsoft Corporation. APM BIOS Interface Specification

Revision 1.2, 1996

[4] R. E. Frederking and R. Brown. The Pangloss-lite machine translation system. Expanding MT

Horisons: Proceedings of the Second Conference of the Association for Machine

Translation in the Americas, pages 268-272, 1996.

[5] Ravishankar, M., Efficient Algorithms for Speech Recognition. Ph.D Thesis, Carnegie Mellon

University, May 1996, Tech Report. CMU-CS-96-143

[6] Rosch, W. The Winn L. Rosch Hardware Bible, Third Edition. Sams Publishing, Indiana,

1994.1

[7] Epson Corporation. Epson CARDIO 486-D4 Data Sheet, 1997.

39

Power Consumption and Performance of a Wearable · PDF fileappliance. Power Consumption and...

Documents

Transcript of Power Consumption and Performance of a Wearable · PDF fileappliance. Power Consumption and...